12/27 2024 324
Kuaishou's Keling is merely the next target for Douyin's Jimeng.
Written by | Lan Dong Business, Zhao Weiwei
Zhang Nan, who stepped down as CEO of Douyin earlier this year, should have had a fulfilling 2024.
In the amphitheater on the fourth floor of the Central Academy of Fine Arts' Design Institute, Zhang Nan, a former art student, sat at the center, smiling as she posed for a group photo with a bunch of college students. She was one of the judges. This was a sharing session for an AI interaction innovation competition organized by Jianying's AI product, Jimeng, which Zhang Nan currently oversees and is seen as a potential game-changer akin to Douyin.
A few days later, Zhang Nan publicly described Jimeng as a camera for the "imaginary world," while Douyin is a camera for the "real world."
To create an imaginary world, innovation often stems from young minds. Douyin's success is intertwined with young college students, who were the focus of Douyin's early operations team. Various classic challenge activities gained widespread popularity due to the enthusiastic participation of these young creators. However, unlike Douyin's early days, which operated with limited funds and relied on emotional appeals, the first prize for the Jimeng AI Interaction Innovation Competition is now a cash prize of 100,000 yuan.
The winner, Zhao Chunxiang, is not a student but a young independent developer who previously gained prominence with a diet-tracking product called 'Stomach Book.' The award-winning work this time is an AI video generation solution with precise control of UI/UX for video lenses. In the 2-minute demo shown, by importing a classic scene from the movie 'Cinema Paradiso,' users can generate a video with AI-created effects such as zoom in/out, close-ups, and flowers blooming outside the window.
Three months before Jimeng's launch, Kuaishou's Keling had already been the first to introduce a large-scale video generation model in China. They also did not overlook young people in colleges and universities. Kuaishou Keling jointly organized an AI creation competition with the China Academy of Art and other colleges, and the three first-prize winners, focusing on themes of life, advertising, and free expression, each received a prize of 36,666 yuan.
Sora opened the door to AI modeling for the real world, while Jimeng and Keling followed the path blazed by Douyin and Kuaishou, investing heavily to create miracles through sheer effort.
A bigger competition looms in 2025. A research summary circulating online about ByteDance's AI video generation products states, "ByteDance hopes to utilize AI capabilities within its ecosystem and believes that next year, each ecosystem will form a closed competitive loop. By May 1st next year, the Kouzi intelligent agent platform, Doubao, Douyin, and B-end capabilities will form a connected ecological network, with more manifestations and usage scenarios for text-to-video generation."
In 2025, in multiple markets such as e-commerce advertising and short dramas, Douyin Jimeng may face off directly with Kuaishou Keling.
'High opening, low momentum' versus 'low opening, high momentum' is the most significant difference between Douyin Jimeng and Kuaishou Keling at present.
The latest data from QuestMobile shows that on the day of its launch, Jimeng sparked heated discussions on the Douyin platform, and Keling AI also reached a peak of popularity on the Kuaishou platform through continuous heat accumulation. However, a notable difference is that Jimeng's content interaction volume peaked early and then declined, while Keling's content interaction volume gradually increased over the month.
This is the result of factors such as promotion strategies, user experience, and market competition, but one of the most direct reasons may be that Jimeng's product launch was delayed, and the user experience did not meet expectations. After in-depth use, users can easily distinguish the quality and stability of generated content compared to similar products. Even though it received a lot of pre-release buzz, Jimeng is still less popular than Keling.
This does not necessarily mean that Jimeng is absolutely behind. Users who deeply engage with AI products believe that to effectively use domestic AI products, one cannot rely solely on a single product. Especially in text-to-video creation, users often choose to use Jimeng for initial text-to-image generation and then Keling for image-to-video generation because "Jimeng's AI-generated images are relatively superior."
The ByteDance research summary also highlights a significant gap between Jimeng and Keling. Jimeng has daily active users ranging from 200,000 to 220,000, of which 70% are individuals or small MCN studios, with relatively few large enterprises. There are approximately 25,000 paying users, with an average monthly subscription fee of around 50 yuan. In contrast, Keling serves over 5 million users during the same period, with over 2 million cumulative paying users and cumulative payments of approximately ten million yuan.
It is challenging to verify the authenticity of such research summaries as ByteDance has issued risk warnings to investors about the popular 'Doubao concept stock' in the secondary market to avoid unnecessary investment losses.
The reason why Kuaishou Keling exhibits 'low opening, high momentum' is partly due to the more stable capabilities of the large video generation model itself, giving it a first-mover advantage. Another factor is the success of its marketing strategy. 'Lan Dong Business' mentioned in the article 'Kuaishou Keling Puts Pressure on Douyin Jianying' that Kuaishou successfully created buzz about Keling's popularity in overseas markets by having overseas tech influencers test Keling-generated content, which then led to increased domestic popularity.
Six months later, Keling's popularity in overseas markets is still much higher than Jimeng's, with 67 times more followers on social media platform X.
On the same day Zhang Nan appeared at the Volcano Engine Conference to announce the latest news about Jimeng, Kuaishou upgraded the Keling large model, claiming a 195% improvement in overall performance compared to the previous 1.5 model in internal evaluations. Earlier, in the third-quarter earnings call a month ago, founder Cheng Yixiao also expressed optimism about Keling, stating that Keling AI's monthly commercial revenue exceeded ten million yuan and that they were confident in achieving rapid revenue growth next year.
Jimeng, which started strong but has since weakened, hopes to become a new way of creation and experience. According to the research summary, Jimeng does not have a clear commercial return target for next year but aims to establish a business model. "Profitability will come later," and next year, Jimeng will focus on product implementations such as collaboration with media and film production.
Douyin was not the first to do short videos but overtook Kuaishou in 2018 to become the leader in the short video market. Zhang Nan once summarized four key factors for their rise: full-screen HD, music, special effects filters, and personalized algorithm recommendation technology.
Now, as Jimeng faces off against Keling, can they recreate the story of Douyin overtaking Kuaishou?
Jimeng is currently just one product of ByteDance in the application layer of multimodal large models, belonging to Douyin's Jianying team, with support from Volcano Engine, ByteDance's cloud service. On the Volcano Engine's model marketplace, ByteDance offers 20 large model products across various types including text, speech, and vision. Additionally, Volcano Ark also provides products from DarkSide of the Moon and AI Spectrum.
AI could become the next core business pillar for ByteDance. In contrast, it is difficult to find displays of related large model commercial applications on Kuaishou Magnetic Engine's official website.
ByteDance's aggressive stance in the large model field has already been demonstrated this year through its C-end product Doubao. In September this year, mobile data research firm Sensor Tower released a global AI application report showing that ChatGPT was the most downloaded AI application worldwide from January to August, with Google's Gemini ranking fourth and ByteDance's Doubao ranking fifth, being the only Chinese product on the list.
This is inseparable from Douyin's abundant traffic, advertising, and investment support. Doubao and Kimi have been fiercely competing in the advertising market this year. According to statistics from AppGrowing, an advertising intelligence analysis platform, Doubao Smart Assistant's advertising investment amounted to nearly 18 million yuan in April and May, and soared to 124 million yuan in early June, with restrictions placed on large model advertising within the Douyin platform, including Kimi.
'In terms of marketing budget, Jimeng's budget will start to increase in December, and in the first quarter of next year, especially around the Spring Festival, it will reach hundreds of millions of yuan in investment,' the ByteDance research summary mentioned. Besides marketing, ByteDance's underlying chip reserves cannot be underestimated. According to a report by the Financial Times, ByteDance has purchased about 230,000 NVIDIA chips, making it the largest Chinese buyer of NVIDIA AI chips. The Information also reported in September that ByteDance ordered over 200,000 NVIDIA H20s this year.
Therefore, based on Doubao's leading position in the domestic large model market, the future focus will be on how Douyin and Doubao can collaborate with Jimeng, which also means that Kuaishou's standout product, Keling, will face a siege from ByteDance's large models.
In September this year, ByteDance released two video generation tools, PixelDance and Seaweed, targeting OpenAI's Sora. Jimeng AI has already integrated with Doubao, and one of the supports for Jimeng is the more capable PixelDance. The official introduction states that it can generate high-quality 1080p resolution videos up to two minutes long and excels at depicting complex movements and interactions between objects.
Currently, whether it's Douyin or Kuaishou, the main application scenarios for AI-generated videos are converging. Besides charging C-end users, B-end scenarios include serving the film and television production and post-production markets, such as short dramas, and serving advertising and e-commerce content marketing, such as generating different images for product material displays.
At the Volcano Engine Conference, Zhang Nan showed two AI short films created by Jimeng creators. One of them was the science fiction short drama 'Awakening,' which was released in July this year and received over 400,000 likes on Douyin in a single day. During the same period, Kuaishou's Keling also produced 'Mountain and Sea Mirror of Riding the Waves,' both of which were trial productions.
However, AI-generated videos are currently only auxiliary to film and television production and are still produced on a small scale. To complete large-scale post-production for films and television, both Jimeng and Keling are progressing along similar lines to Sora's Dit architecture (a diffusion model combined with the Transformer architecture used for image and video generation tasks), and both have a long way to go, with commercialization still premature.
After OpenAI's Sora was opened for use, a series of generated videos did not meet external expectations. However, Google's recently released video generator Veo2, through a series of tests, has shown more impressive performance than Sora.
In particular, for a famous scene of cutting a tomato, Google's Veo2 cleanly slices the tomato with a knife, avoiding the fingers, while the knife in the Sora video cuts through the hand, making Sora the subject of ridicule once again and leading industry insiders to believe that Sora is more inclined towards motion, while Veo2 focuses more on physical accuracy.
Some AI industry insiders believe that the reason Google was able to surpass Sora is not only because it discovered Sora's weakness in physical accuracy but also because it used YouTube to train its AI models.
ByteDance's technical team was not unaware of Sora's weakness in physical accuracy. In November, the Doubao large model team published a paper titled 'How Far is Video Generation from World Model: A Physical Law Perspective,' exploring whether video generation models can observe the interrelationships between objects and extract a set of stable physical laws from them.
'Visual ambiguity can lead to significant errors in fine-grained physical modeling, and relying solely on video representation is not sufficient for accurate physical modeling,' the paper argues. The paper believes that for video generation models to become accurate world models, they still face challenges.
The two authors researching this direction are both very young, one born after 1995 and the other after 2000. Just like Jimeng and Keling, which both require the participation of young art students to create an imaginary world, the technical foundation for this AI imaginary world is also laid by young minds. The two authors spent eight months searching for a gateway to the world model.
Finding a bottleneck takes eight months, but breaking through it may take even longer.
When will Douyin truly become Jimeng? In that research summary, ByteDance has three main paths for AI development next year: first, the Doubao family ecosystem; second, the full AI integration of products like Douyin; and third, multimodal models and world models, including Jimeng. The multimodal path is the focus, with "unlimited support and investment because it is an important node for transformation, and larger losses are acceptable."
When Google beats Sora, it signals that the model myth created by OpenAI will be broken; and Kuaishou Keling is merely the next target for Douyin Jimeng.