12/18 2024 339
Written by | Hao Xin
Edited by | Wu Xianzhi
Recent news about AI video, when viewed collectively, presents a thought-provoking narrative.
The arrival of Sora did not ignite an explosion of popularity but rather sparked complaints and dissatisfaction. Fortunately for competitors like Conch, the news comes as a sigh of relief.
According to November's AI Product Rankings data, the growth rate of visits to major AI video products has started to converge significantly or even decline. Conch experienced a 39.33% year-on-year growth, while Keling saw a 29.99% decrease in website visits and Pika a 48.19% drop. In contrast, last month's growth rates were astonishing, with Conch increasing by 2772.92% and Pika by 787.65% year-on-year in October.
Sources indicate that ByteDance has internally elevated the priority of its Jimeng product, aiming to create an 'AI version of Douyin' through a novel approach. Simultaneously, the company plans to allocate more resources towards multimodal product forms, optimizing large vision generation models to cater to Jimeng's needs.
Currently, the AI video sector faces several challenges: the first-mover advantage is not evident, the market is in its nascent stages, and shuffling is frequent, rendering OpenAI's dimensionality reduction strategies ineffective. Absolute technical barriers and differentiated niches have yet to emerge, leading to insufficient user loyalty, where launches and new releases are transient 'highlight moments' followed by declining traffic. Furthermore, the target audience remains unclear, with only the top 10% of users currently engaging with these tools.
An industry insider shared with Photon Planet, "We're still in the 'model as product' stage. Like general AI assistants, the primary issue with AI video generation tools is that their scenarios are too generalized. Ordinary consumers lack the mindset, and the generation outcome is akin to a lottery draw."
Even ByteDance's Jimeng and Kuaishou's Keling are currently positioned as 'general-purpose tools' without integration into their broader product ecosystems. The moderate investment from ByteDance and Kuaishou has inadvertently opened a window of opportunity for startups. However, as the 'Jimeng priority' increases, new variables may surface.
Integrating Jimeng, Jianying, and Douyin could pave the way for a rapid transition to an AI version of Douyin. The existing vast user base and C-end market are unattainable for many startups. Prior to this, the seeds of AI innovation were already sown in Douyin.
Rather than aiming to be 'Sora,' the focus should be on becoming 'Jianying'.
Sora initially gained attention for its innovative DiT model technology. Now, even OpenAI has delivered complete products. In this latest update, while Sora's underlying model capabilities are not particularly impressive, it has established a comprehensive AI video production workflow, offering users substantial editing control.
Currently, AI video is still in its exploratory phase, and two potential differentiation paths are visible for Sora-like products: specializing to create an AI version of 'Premiere' or popularizing to create an AI version of 'Jianying'.
According to Photon Planet's understanding, although most companies have a 'Super Creator Program' inviting professional creators to test advanced features, showcasing their model's capabilities, the features eventually launched adhere to principles of simplicity and speed. Notably, Jimeng and Keling are designed to lower the usage threshold, enabling novice users to input prompts and generate short videos immediately.
Based on this, creating an AI version of 'Jianying' may be the direction most players aim to pursue.
Latest data reveals that Jianying and CapCut achieved over triple-digit revenue growth in 2024, with total revenue nearing RMB 10 billion. Additionally, the global monthly active users of Jianying and CapCut have surpassed 800 million.
An overlooked fact is that ByteDance's Jianying has never been a standalone product but rather a potent combination of 'Douyin/TikTok + Jianying/CapCut.' With the introduction of Jimeng, a closed loop is formed where Jimeng generates content, Jianying edits it, and Douyin consumes it, emphasizing ByteDance's significant focus on this ecosystem.
General-purpose AI video tools are at a crossroads, with startups creating advantageous features and seeking further differentiation opportunities. Certain features may attract specific user groups, such as categorizing video production by themes like reality or fantasy, reinforcing the product's attributes and labels among users.
ByteDance also has options. Doubao represents an intermediate state, and Jimeng may not be the final product form, but both can serve as tools and entry points to attract users. Repeated AB testing during this process not only provides data for training underlying models but also nurtures more AI products.
'Conchs' Dress Up Douyin'
Upwards, AI video seeks traditional and authoritative recognition. Downwards, its spread is much faster than anticipated.
In the second half of this year, AI video virally spread across platforms like Douyin, Kuaishou, and Xiaohongshu, creating several waves of popularity. A popular 'Venom' effect went viral on PixVerse, and the AI video 'Maling Fighting' to alleviate audience emotions during variety shows spread rapidly, with the watermark revealing its origin as Conch. Following the re-release of 'Harry Potter,' AI creations like 'Wizard Cat' and 'Rapping Kitten' went viral domestically and internationally.
Users readily embrace the novelty and impact of AI technology, even willing to pay for it, temporarily supporting a group of intermediaries earning from information arbitrage on Xiaohongshu and Xianyu.
The strongest competitor is Douyin. AI video creation tools, including PixVerse and Conch, have enhanced Douyin's content. Popular AI effects and templates validated by the market will soon be available on Jianying, with lower usage difficulty and one-click publishing to Douyin, making users unaware of their original source. For instance, the foreign hit 'Rapping Kitten' was recreated by netizens into 'May God Bless Cat Food and Freeze-Dried Food,' and the related Jianying template once topped the popularity list.
Currently, AI has found an intermediate state on Douyin – AI effects. Materials and templates like 'AI Pets Dancing Like They Ate Mushrooms' and 'Everything Can Be an AI Wool Curl' remain highly popular. As of December 9, nearly 3 million people have used the 'Everything Can Be a Wool Curl' template.
It's evident that before deeply integrating Jimeng and Douyin, Jianying already undertook some AI video functions. This results in ordinary users' perception of AI video primarily coming from large platforms like Douyin and Kuaishou.
The same dilemma exists in AI applications in mini-program form. Meituan's 'Miaoshua AI,' Jieyue's Soul Extractor, and AI Music Dubber offer temporary entertainment but struggle with long-term usage and spread. Users may forget when to open these mini-programs, whereas Douyin and Jianying provide entry points and help establish user habits.
Perhaps realizing their disadvantages, AI tools like Conch and PixVerse have embarked on overseas expansion. However, Photon Planet has recently noticed that Conch is also increasing its investment in C-end marketing and customer acquisition.
The statement 'AI first made Bilibili profitable' continues to gain value. Following the Conch AI Assistant, Minimax has launched a new round of advertising targeting Conch's video platform. We found that some Bilibili movie and TV uploaders have received Conch advertisements, featuring exaggerated promotional phrases like 'superior image understanding ability' and 'instruction following ability' in satirical videos, looking out of place.
Minimax's approach has merit. Bilibili naturally carries a parodic and meme gene, well-suited to the current AI video model's presentation. Some of the earliest 'Maling Fighting AI Synthesis' and 'Maling Fighting Parody' videos originated from Bilibili. Judging from the comments, viewers did not feel uneasy but rather found that the AI form better expressed their emotions while watching variety shows.
In addition to the fertile soil, Bilibili's uploaders are potential paying users. Some uploaders often lack materials when creating video content, resorting to hand-drawn or animated alternatives. AI generation can enhance efficiency.
Of course, copyright and regulatory issues may arise. Recently, the State Administration of Radio, Film, and Television issued a document ordering the cleanup of short videos with AI 'modified' dramas, strictly implementing content review requirements for generative AI.
Evolution of AI Video
Reflecting on Kuaishou's journey from GIFs and short videos to a comprehensive short video community platform offers insights into current AI video tools.
In the AIGC wave, text-to-image and text-to-video generation were among the first technologies to reach usability. Initially, AI video training relied on text-to-image and image-to-image generation, composing countless static frames into a few seconds of animation. Later stages gradually saw the development of one-step text-to-video and image-to-video generation, roughly following the trajectory of GIFs to short videos.
The next stage is likely to involve a transition from general-purpose to vertical applications and from tools to platforms and ecosystems.
In fact, some startups have already recognized the limitations of being a single 'tool' and aspire to become a creator and content community. However, results have been underwhelming, with most players focusing on developing new features. Cultivating a mature community requires time and user scale accumulation. As mentioned earlier, even if users bypass Jimeng and use startup products, we still witness a significant flow of videos with 'Conch' and 'PixVerse' watermarks to Douyin, Kuaishou, and Bilibili.
Nonetheless, this does not deter new players from envisioning the future form of native AI video tools, as even ByteDance itself is 'starting from scratch' to find an AI version of Douyin.
From AI music production tools to AI music applications, Suno provides a case of traditional and AI fusion. Opening Suno evokes a sense of familiarity. If Suno were to be described in one sentence, it would be a combination of 'NetEase Cloud Music' and 'Douyin.' The familiar product type inadvertently lowers the user threshold, and relying on long-term usage habits, users can quickly grasp its 'music software' and 'social platform' attributes.
AI is embodied in content provision, with all music on Suno generated by AI. The community content originally hosted on the tool naturally transforms into music charts. The Douyin-style design completely breaks down the barrier from tool to mass application, allowing even ordinary users to create theme music by casually snapping photos or uploading images and videos.
Compared to music, AI video has a stronger viral and shareable nature. ByteDance's mentioned intermediate state might have another interpretation: current AI applications can be purely AI-native, AI-transformed traditional applications, or 'the most familiar stranger.'