10/09 2024 498
Source: BohuFN
Recently, the large model industry has reignited the "price war," with Alibaba Cloud announcing significant price cuts across multiple commercial offerings under its Tongyi Qianwen platform. As early as May this year, the large model industry had already witnessed a round of price cuts approaching 90%. With the "Hundred Models War" underway, it is clear that no more than five general-purpose large models will ultimately emerge victorious.
For ByteDance, a relatively new entrant among internet giants, its foray into the large model sector was not an early one. While other giants had already unveiled their large models, ByteDance's AI smart assistant "Doubao" arrived somewhat belatedly. Nevertheless, fueled by its relentless pursuit of excellence, "Doubao" has become the most widely used native AI application in China.
Recently, ByteDance has made two significant moves. Firstly, at its recent AI Innovation Tour, it unveiled video generation models, music models, and simultaneous interpretation models, comprehensively covering language, speech, images, videos, and all other modalities. Secondly, ByteDance is exploring the development of its own AI hardware, with its first product potentially being smart earbuds.
From the price war on the B-end of large models to application innovation on the C-end, and finally to the traffic battle in the large model ecosystem, ByteDance has not missed a single "possibility" in the large model industry. As the large model race enters its second half, what "good cards" does ByteDance still hold?
01 ByteDance's Late Entry into the Video Generation Race
In June this year, Kuaishou's self-developed video generation large model "Keling" was officially launched; in August, ByteDance's text-to-video application "Jimeng" followed suit.
In September, ByteDance's Volcano Engine unveiled two large models, Doubao Video Generation - PixelDance and Doubao Video Generation - Seaweed, which are currently undergoing limited beta testing within Jimeng AI.
However, based on current public opinion, "Jimeng" generates content that performs better under certain specific backgrounds and descriptions, but in most cases, the dynamics and lighting effects in "Keling" videos appear more natural. Additionally, there are clear differences in the AI video styles produced by the two. "Jimeng" excels in animation styles, while "Keling" leans more towards a cinematic look.
It is difficult to say which style is superior, but the differences are not solely attributed to the technology behind the large models. They are also closely related to the strategic layouts of ByteDance and Kuaishou.
On the one hand, Kuaishou has a first-mover advantage in video generation models. Although both "Keling" and "Jimeng" were inspired by SORA and launched relatively close in time, they occupy different positions within their respective companies.
"Keling" originated from a tool developed by Kuaishou in October 2023 for generating GIF stickers from static images, which was subsequently elevated to a strategic group project by Kuaishou's Chairman Cheng Yixiao and received full support. In contrast, when Kuaishou launched "Keling," ByteDance was preoccupied with the large model price war, with its primary competitors being Alibaba, Tencent, and Baidu at the time.
In May this year, ByteDance took the lead in announcing significant price cuts for its Doubao general model, with the input price dropping to a minimum of 0.0008 yuan per thousand tokens, claiming to have set a new low in the large model industry. Subsequently, Alibaba, Baidu, and Tencent followed suit, intensifying the price war in the large model industry.
The announcements of price cuts by these giants were made with minimal time lags. According to The Market, after the initial price cut, sales personnel from Volcano Engine actively reached out to customers and promoted their products, suggesting that competing for the B-end market was ByteDance's top strategic priority at the time.
Moreover, Doubao was gaining momentum. According to QuestMobile data, Doubao had 27.5 million monthly active users in June 2024, ranking first among large language model applications in China. In contrast, other AI applications within ByteDance's app layer, such as Maoxiang and Xinghui, had less significant presences, making it clear which was the strategic focus.
On the other hand, the two companies have adopted different strategies for their video generation models. ByteDance has positioned "Jimeng" as a standalone mobile application, separate from its video editing tool Jianying. In contrast, Kuaishou has integrated "Keling" directly into its video editing platform Kuaiying, leading to differences in user convenience and, consequently, user accumulation and video generation volumes.
According to Kuaishou Senior Vice President Gai Kun, over 2.6 million people have used Kuaishou's video generation large model, Keling AI, and have collectively generated over 27 million videos.
While ByteDance has not yet disclosed relevant user data for "Jimeng," it ranks 33rd in the "Photography & Video" download rankings on the Apple App Store, while "Kuaiying" ranks 11th. In terms of video quality, according to blogger Lanxi's sharing, in Meta's AI video paper, Kuaishou's Keling performed the best in a double-blind test comparison with mainstream competitors, even outperforming the yet-to-be-disclosed Sora.
However, ByteDance appears unhurried, as "Keling" has undergone nine iterations in just three months, while ByteDance's Volcano Engine has only recently introduced two new video production models.
ByteDance's calm demeanor may stem from the fact that, given the current computational power and financial resources of technology companies, launching a large model is not as challenging as one might think. The key lies in having high-quality data scenarios and sufficient differentiation.
From this perspective, both Kuaishou and ByteDance, as short video platforms, share the same advantage of video data in the text-to-video race. Furthermore, ByteDance boasts a larger user base and more untapped application scenarios for short videos. Therefore, driving AI ecosystem construction and expansion is ByteDance's top priority.
02 ByteDance's AI+Hardware Strategy to Capture Traffic
Recently, ByteDance has also ventured into exploring AI hardware. According to LatePost, ByteDance is exploring the integration of large models with hardware, with its first product potentially being smart earbuds.
As early as May this year, 36Kr reported that ByteDance was accelerating its exploration of AI hardware, with one product line focused on smart earbuds. Prior to this, ByteDance had already acquired the earbud brand Oladance.
ByteDance is no stranger to hardware exploration. Back in 2018, it acquired the Smartisan team and some patent licenses from Hammer Technology, subsequently launching Smartisan phones, TNT displays, and speakers.
However, faced with fierce competition in the office hardware market, ByteDance shifted its focus to the education hardware sector in 2020, launching the "Dili Education" brand and releasing products such as smart study lights, educational tablets, and electronic dictionaries.
Despite ByteDance's plan to invest tens of billions of yuan annually in the education industry, its blind follow-g the trend and the impact of the "double reduction" policy prevented it from achieving similar success. Currently, only the smart study light remains on the Dili Education official website.
In 2021, ByteDance acquired PICO, the leading VR manufacturer in China, for 9 billion yuan and invested tens of billions more in R&D, marketing, and operations. However, its hardware ambitions were once again dashed as PICO underwent multiple rounds of layoffs last year, leaving only a small hardware team intact.
Despite these setbacks, ByteDance remains undeterred. Its Doubao large model has already collaborated with numerous hardware manufacturers. At the 2024 Spring Volcano Engine FORCE Conference in May, it showcased three AI hardware collaborations: a robot dog, a learning machine, and a learning robot.
In terms of smart terminals, Honor and OPPO have announced collaborations with the Doubao large model. In the smart car alliance, Doubao has also forged deep partnerships with automakers such as Geely, Great Wall, NIO, and GAC Motor.
In fact, the "software and hardware synergy" has undergone several iterations in the development of the internet industry, including PCs, smartphones, wearable devices, and smart homes. In the era of the Internet of Everything, hardware serves as the carrier for software implementation and the gateway for user traffic to reach the ecosystem. The development path of AI hardware is essentially a translation of this synergy to the software side.
As such, ByteDance is not alone in targeting AI hardware. It is reported that Meituan is developing an AI service called "Qiaoyu" and has partnered with the children's wearable device manufacturer "Xiaotiancai." iFLYTEK has released three AI earbuds to strengthen its AI office offerings. Established players like Baidu and Huawei, which have long been present in the smart terminal market, are also actively building their AI hardware ecosystems.
Currently, there are some similarities in the AI hardware layouts of major model vendors, with education, office, and daily life remaining the primary application scenarios. However, for these vendors, having an entry point to connect with the physical world is crucial for forming a closed loop from content to traffic, applications, and hardware, which serves as the foundation for AI ecosystem development.
Nevertheless, this development path is not foolproof. ByteDance's past failures in hardware indicate that while hardware may serve as a "container," not everything can be stuffed into it.
On the one hand, hardware development often has its own pace and is more vulnerable to market maturity challenges. It is difficult to expedite hardware product maturity solely through software businesses. Taking PICO as an example, while AR headsets have significant appeal, issues such as inconvenience and discomfort hinder their widespread adoption.
On the other hand, hardware serves a functional purpose. Given the current parity in large model applications, even with significant hardware subsidies, users may not be willing to pay solely for "a particular software," meaning that large model enterprises must differentiate their applications sufficiently to make the AI+hardware model viable.
Therefore, while hardware serves as a traffic carrier, it is more than just a "carrier." Hardware must provide more convenient access points for large models and more user-friendly interaction forms, aligning with the essence of "software and hardware integration."
03 ByteDance Competes for AI Ecosystem Influence
While it remains to be seen whether ByteDance can make further strides in the AI hardware sector, it is clear that the company's ambitions extend beyond this. Currently, it is also vying for AI ecosystem influence, going head-to-head with giants like Alibaba and Baidu.
In addition to enhancing large model capabilities, refining AI applications, and introducing AI hardware, ByteDance has also launched the intelligent agent development platform "Kouzi" and the AI programming assistant "Doubao MarsCode."
In the B-end market, Tan Dai, President of Volcano Engine, stated that the Doubao large model has undergone real-world validation in over 50 internal businesses and achieved deep collaboration with over 30 external industry enterprises. Since its launch in July this year, the average daily token usage per enterprise customer has grown at a rate of 22 times.
Although ByteDance is not a traditional "BAT" giant, and its Doubao large model arrived months later than those of other giants, it is now laying out its AI ecosystem at its own pace, with its own unique strengths.
Firstly, thanks to ByteDance's rich business scenario accumulation, it is better positioned to refine large model applications. Its business scenarios encompass short videos, social media, online education, e-commerce, and many other fields, providing massive data and diverse application scenarios for the development and training of the Doubao large model.
In fact, ByteDance's strategy in the large model sector differs slightly from that of other giants, placing greater emphasis on the C-end experience. It prefers to refine C-end products first and then expand into the B-end market once the model's capabilities become competitive.
This may also be related to ByteDance's layout in the C-end scenario. After all, its large models and AI products ultimately prioritize serving its flagship traffic-generating apps like Douyin and Toutiao, which has accelerated its progress in the multimodal large model sector.
Secondly, traffic is another of ByteDance's strengths. If the construction of an AI ecosystem requires the combined injection of traffic from both creators and users, ByteDance's AI ecosystem clearly has an advantage in terms of usage scenarios and traffic introduction.
According to a report by Unique Capital, in July this year, ByteDance's CapCut and Doubao surpassed OpenAI's ChatGPT in global AI app downloads, claiming the top spot globally.
With ByteDance's flagship products like Douyin and Toutiao serving as significant traffic gateways, its vast user base and precise data analysis capabilities enable it to further enhance the user experience of large models and drive the development of multimodal large models.
Recently, in addition to unveiling three new models – video generation, music, and simultaneous interpretation – Volcano Engine has also comprehensively upgraded its general language model, text-to-image model, and speech model.
However, ByteDance's abundant traffic support comes at a cost. Industry insiders reveal that in early June alone, over 100 million yuan was invested in advertising for the Doubao large model. Moreover, in the advertising battle for large models, Douyin fully favored its own large model, indicating that ByteDance is also using its advertising revenue to drive user growth for Doubao.
While "traffic" is undoubtedly a unique "good card" for ByteDance, the "burn money for growth" strategy is unsustainable. The key to ByteDance's AI ecosystem development will lie in its ability to quickly convert its market gains into viable business models after land-grabbing. Therefore, when Volcano Engine launched two Doubao video generation models, Tan Dai emphasized the importance of considering commercialization from the outset.
Finally, ByteDance is accelerating its efforts in the "cloud services" market. For enterprises delving into the B-end service market, cloud services are undoubtedly one of the most critical sectors in the internet industry. According to Canalys, China's cloud infrastructure market is projected to reach $85 billion by 2026, with a five-year compound annual growth rate of 25%.
However, in the B-end market, ByteDance faces stiff competition from mainstream cloud vendors like Alibaba Cloud, Tencent Cloud, and Huawei Cloud, with these three alone accounting for over half of the market share.
Moreover, Alibaba, Tencent, Huawei, and others have already found their respective niche markets in which they excel, such as Alibaba's retail industry and Tencent's entertainment and financial industries. It is not easy for ByteDance to gain a foothold in these areas.
Therefore, ByteDance's current situation of being "strong on the C-end but weak on the B-end" will also hinder its rapid expansion in the AI ecosystem. Compared to Alibaba, Tencent, and other major companies, it is difficult for ByteDance to rely on its existing business areas to form a scale effect of large model applications in the commercial field.
Perhaps for this reason, ByteDance has expanded into different vertical tracks such as education and office through hardware in recent years, hoping to find new breakthroughs.
However, if ByteDance wants to continue to "create miracles through effort" and break the inherent perceptions of the industry and customers towards the ecosystem of major companies, it is not enough to merely become a "pragmatist." ByteDance needs to find its own application track, present competitive advantages, and become a more professional large model solution provider in order to achieve overtaking on corners.
In the current large model market, whether it is the B-end or the C-end, each track is crowded with competitors. Although the "traffic" strategy is powerful, it is not omnipotent. Returning to product application and ecosystem construction, the key lies in enabling vertical industry developers and users to obtain more down-to-earth products and services at lower costs and lower thresholds.
The cover image and illustrations of this article are copyrighted by their respective owners. If the copyright holder believes that their work is not suitable for public browsing or should not be used free of charge, please contact us promptly, and this platform will make immediate corrections.