07/05 2024 387
Kuaishou unveiled its Wensheng video model, how much of a lead can product-based innovation give Kuaishou?
@SciTechNews Original
"I even think it outperforms Sora. I believe that this product, within the scope of my usage, is the best in the world today." In front of the camera, Fu Sheng, the chairman of Cheetah Mobile, could hardly hide his excitement. The product he mentioned, which outperforms various domestic and international Wensheng video models, is Kuaishou's Keling video generation large model.
On June 6, Kuaishou released its AI video model, and on the day of its release, many industry insiders like Fu Sheng received invitation codes for the first wave of testing. Judging from industry feedback and the video samples generated by Keling, Kuaishou's Keling seems to be very similar to Sora. In the first five-second video version, industry feedback indicated that both the technical route and the quality of training data stood out. What excites the industry even more is that Kuaishou's Keling seems to achieve an extremely realistic restoration of physical laws, whether it's the rationality of movement, other physical characteristics, or even conceptual combination ability and imagination, all performing quite well.
However, some industry insiders pointed out that Sora was announced in February this year, and its training may have been completed by the end of last year, while Keling had a few more months of training time. Furthermore, being able to utilize more training computing power is Keling's advantage.
Subsequently, on June 21, the Keling model introduced new features such as image-to-video and video continuation, enriching the product's functionality.
The reason why Keling's performance exceeded industry expectations to some extent is that before its announcement, most AIGC practitioners knew nothing about it. Kuaishou's general language large model Kuaiyi, in the domestic large model hierarchy, has not yet made a dazzling performance as a new product, and its popularity is not as high as the numerous new and old forces such as Doubao, Hunyuan, Wenxin Yiyan, and the Dark Side of the Moon, which have invested heavily in marketing budgets and R&D costs.
Under such circumstances, the emergence of Keling seems to have broken industry perceptions, and there is a new path from technological to product transformation. However, it needs to be clear that there is still a significant gap between an impressive product and its widespread application and successful commercialization. Whether Kuaishou can rely on Keling to complete an overtaking and pull away from its peers remains to be seen, as it faces many risks and challenges at this moment.
Part.1
Kuaishou's Unconventional Approach
"It can be said that Keling is definitely a very complex project requiring heavy resource investment and collaboration from multiple teams. It is definitely not something that a single genius can come up with casually." After declining several questions about Keling's parameters and performance at the 2024 Beijing Zhiyuan Conference, Wan Pengfei, the head of Kuaishou's Visual Generation and Interaction Center, gave a more practical answer. In the domestic AI race, the parts that can attract C-end attention are quite limited. The focus is on the various language large models left behind by the Hundred Model War, with the Dark Side of the Moon being a typical representative. It took less than a year for the company to go from obscurity to a valuation of $3 billion, but in terms of products, besides Kimi, which has long-text reading capabilities, the C-end perception of it is not obvious. On the other hand, both capital and entrepreneurs seem to be quite conflicted. Realists represented by Zhu Xiaohu, the managing partner of Jinshajiang Venture Capital, have always maintained a relatively pessimistic attitude. For example, Zhu Xiaohu believes that this generation of large model start-ups faces a worse situation than AI companies like SenseTime from the previous generation: there is no difference in technology between companies, and every generation of technology requires new investments, with the investment scale increasing exponentially. According to media reports, after OpenAI released GPT-4, Zhu Xiaohu's new judgment in his WeChat Moments was, "Model companies that are not deeply tied to large factories have basically been eliminated."
It is precisely under this complex environment that the difficulties faced by product startups have increased. Under the demands of commercialization, major model manufacturers have successively joined the price war, and even though Kimi's valuation has reached $3 billion, it is also trying to monetize through methods like tipping.
Capital's cautiousness, coupled with large factories' FOMO ("fear of missing out") has become the main tone of current AI startups. From this perspective, Keling's success is even more valuable. In the text-to-image race, industry insiders commented that Kuaishou was able to come up with China's first impressive "quasi-Sora" text-to-video large model, which is related to the platform's deep cultivation of video content. However, Douyin has more video data, stronger computing power, and greater investment in AI, so why hasn't it come up with a similar large model? From Morgan Stanley's research report, it can also be seen that currently, Keling's performance is better than Douyin and Tencent's past video generation models. In terms of duration, Keling can generate videos up to 2 minutes long, while Douyin's Jimeng currently only supports generating videos up to 3 seconds long, and Tencent's Hunyuan large model can generate 16-second videos. Ultimately, large models are essentially a battle of basic research and development, and effort must be put into basic research and development. ByteDance's strategy is to promote research and development through the application end, but the entire ByteDance AIGC team is quite chaotic, with not many true technical experts, and C-end application thinking dominates the entire AI platform's research and development, which may not be the right direction. In practice, it is not so much that Keling's product is technologically advanced as it is Kuaishou's successful strategic layout in this race. In the words of Cheetah's Fu Sheng, "Keling's success further proves that Sora is not a technological breakthrough, but a product-based innovation." Keling's unconventional approach undoubtedly brings new inspiration to the industry, but whether Keling has already pulled away from domestic AI giants in the text-to-video race remains to be seen with more practical evidence.
Part.2
Can it become the next hit?
Although Keling has already achieved quite a bit, there may still be a long way to go for it to become the next Killing APP in the AI race.
First, it will take time for Keling to achieve large-scale application. In the latest interactive page, the number of applicants for Keling has reached over 410,000+. According to sources close to Kuaishou, although Keling already has over 100,000+ users, the current trial scope still cannot fully match market demand, and even people within Kuaishou find it difficult to try it out. On the other hand, the current conclusions are based on internal testing videos, which means that Keling's model capabilities may be overestimated.
Meanwhile, Keling is also a mysterious existence within the industry. Just as Wan Pengfei, the head of Kuaishou's Visual Generation and Interaction Center, is vague about various parameters, the outside world is also curious about how much computing power Keling uses, where it comes from, and whether there is sufficient inference computing power for large-scale public testing.
On Keling's side, it is unclear whether it is to maintain product popularity that it has successively introduced features such as image-to-video and video continuation, but testing shows that the overall effect is relatively flat, with unstable generation time and not very obvious differentiation, meaning that these features still need further optimization.
And on the most concerning aspect of computing power in the industry, some practitioners estimate that Kuaishou has rented a large number of "big cards" from both Tencent Cloud and Alibaba Cloud, and Kuaishou itself also has a batch of about several thousand big cards. Based on multiple sources, it is guessed that Keling's training computing power should come from multiple sources.
On the other hand, according to estimates from multiple sources, a text-to-video large model generating a 1-minute 1080P video consumes at least 1 million tokens, and the demand for inference computing power is much greater than that for text-to-text. However, procurement of L40 inference computing power is difficult, and Kuaishou may face bottlenecks, which means that a full-scale public test of Kuaishou's Keling may still be a long way off.
On multiple short video platforms and even many overseas social media, Keling has almost become synonymous with China's Sora. Overseas bloggers are struggling to obtain invitation codes, while domestic reviewers claim that it has reached a stage of being free, usable, and practical, but such conclusions still seem to have a significant gap from the current reality.
To some extent, Keling is a product with strategic significance far greater than practical significance. Its technological leadership is undeniable and has a demonstrative effect on the industry. However, its widespread application in a true sense may still take a long time.
Part.3
The "New Hope" for "Laotie"?
In Keling's external promotion, its advantages are simple and clear: first, it is truly applicable; second, the video generation effect is good. It can not only generate large-scale reasonable movement and simulate the characteristics of the physical world, but also generate videos with a resolution of up to 1080p and a maximum duration of 2 minutes (30fps), allowing users to freely adjust the aspect ratio of the video. Based on this, the outside world has further opened up the imagination space for Keling's commercialization. Some industry insiders summarized that in the media and advertising industries, Keling can be used to quickly generate advertising promotional videos, news reports, etc., which can not only significantly improve production efficiency but also optimize content through data analysis. At the same time, in the education and training industries, Keling can assist teachers in making teaching videos and even generate virtual teaching scenarios, providing students with an immersive learning experience. On the other hand, in the entertainment and social media industries, for social platforms and content creators, the personalized video generation tools provided by Keling will greatly enrich platform content.
Multiple securities firms and research institutions are also optimistic about Keling. Guotai Junan Internet Media Research stated that Keling's large model has built an efficient large-scale automated data solution covering multiple aspects such as massive video mining, multi-dimensional tagging and screening, video description enhancement, and data-driven effect quality assessment, ranking among the forefront of domestic video generation large models.However, the lofty expectations seem to fall short when confronted with reality. According to sources close to Kuaishou, Keling currently has no commercialization plans and does not provide APIs externally. This means that from an investment perspective, Keling has not contributed much to Kuaishou's performance in recent quarters. And judging from Kuaishou's recent performance in the secondary market, it can also be verified that Keling does not seem to have helped Kuaishou much.
In Wan Pengfei's speech, when talking about Keling's future, he mentioned, "The threshold for video creation and the ROI of effects have been greatly improved, and the boundaries between video creators and consumers are gradually blurring. More and more consumers are becoming creators, which is very valuable for the prosperity of the video creation ecosystem." From this, one can simply speculate that perhaps Kuaishou's vision for Keling's future is more inclined towards empowering more creators within its own ecosystem. From another perspective, Kuaishou is currently facing considerable pressure, whether it's advertising or e-commerce, and growth is being challenged by various large factories. The emergence of Keling, if it can, as Kuaishou's responsible person said, lower the threshold, improve ROI, and thus attract more users to become content producers from consumers, undoubtedly has tremendous appeal. In summary, Keling seems to have shown domestic practitioners and a wider audience Kuaishou's efforts and hopes in a new field, but from a global perspective, it may take longer to increase revenue in the short term.