04/10 2026
479

Source | Bohu Finance (bohuFN)
“Have you ever saved a fox on a snowy mountain?” Recently, the AI short film “Rescuing a Fox on the Snowy Mountain” went viral across the internet. This short film, filled with the style of Shaw Brothers martial arts films, is closely tied to the release timing of Seedance2.0.
In February this year, ByteDance launched the AI video generation model Seedance2.0, which quickly went viral for its multi-shot storytelling, higher image fidelity, and ability to produce cinematic-quality videos in 60 seconds. Many film and television professionals even praised it as “director-level AI.”
However, behind its “ease of use,” the massive user demand has overwhelmed the servers. Many content creators report that the waiting time for Seedance2.0 can be as long as 8-10 hours.
Sora, on the other side of the ocean, is equally troubled by computing power constraints. This product, once hailed as a “milestone,” was suddenly shut down on the eve of OpenAI's IPO. Sora's excessive consumption of computing power without achieving profitability is believed to be the trigger for its shutdown.
The pitfalls Sora encountered have not gone unnoticed by domestic AI video generation models. Recently, Volcano Engine announced that the Seedance 2.0 API is open for public beta applications from enterprise users, accelerating its push for profitability on the B-side.
In 2026, as Token usage surges, the AI video sector is also mired in disputes over Token utilization. From the arms race in parameters to the fierce battle for commercial monetization, domestic video large models are also learning to “keep accounts.”
01 The “Cost Account” Doesn't Add Up
Since its release, the Seedance 2.0 video generation large model has remained highly popular. With its simple creative approach and realistic visual effects, it has become the tool of choice for many content creators and short drama teams.
Tan Dai, President of Volcano Engine, stated that producing a high-quality animated drama used to cost over 10,000 yuan per minute, but now, with Seedance 2.0, the cost can be reduced by 4,000-5,000 yuan per minute.
However, content creators soon found themselves trapped in the nightmare of “queuing.” On social media, many users complain about queues of up to 100,000 people. Generating a 15-second video can take several hours of waiting, and there's no guarantee of success.

“Ease of use” and “usability” are at opposite ends of the scale, and ByteDance has had to make a choice. Some creators note that the “queuing” situation for Seedance 2.0 has recently improved, but at the cost of a Synchronous descent in video quality—a somewhat serious case of “diminished intelligence.”
According to China Entrepreneur, a source close to Volcano Engine revealed that Seedance is reallocating computing power weights. By reducing the computing power allocated to individual tasks and the precision of model operations, it aims to accommodate more simultaneous users.
For example, the fast version of Seedance 2.0 can still clearly present subject structures and basic camera movements. However, when faced with multi-person interactions or detailed texture requirements, it is more prone to physical distortions, making it more suitable for single tasks.
The impact of the transition from the standard version to the fast version may be minimal for ordinary users. However, for short drama and animated drama professionals, what used to require generating two or three videos to find one usable piece of content now requires seven or eight attempts.
There's always a price to pay for affordability—cost is an unavoidable issue.
According to AIPress in November 2025, generating a 10-second video with Sora costs approximately $1.3. However, the current price for generating a 10-second video using Sora on mainstream third-party platforms can be as low as about $0.1 per use. On the official channel, ChatGPT with Sora functionality has a minimum subscription threshold of just $20 per month.
Analysis firm SemiAnalysis estimates that if 3 million daily active users each generate one video per day, Sora's daily cost would reach $3.9 million. Adding in GPU leasing, electricity expenses, inference costs, etc., the daily operational cost would be around $15 million.
Calculated annually, Sora would burn through more than $5 billion, equivalent to a quarter of OpenAI's annual revenue. Such a business model indeed struggles to be associated with “profitability.”
Despite bearing heavy cost pressures, large model companies dare not slacken in the “technology race,” as falling behind in this monthly refreshed technical competition means being forgotten by the market.
However, this “technology-first” mindset has trapped the industry in a vicious cycle of “increasing competition, increasing losses.”
From the outset, Sora aimed to build a “physical world simulator,” but this grand vision clashed with the commercial need for “affordability and ease of use.” Coupled with persistent copyright issues, the costs of video generation models have become increasingly uncontrollable.
According to Appfigures data, Sora's cumulative revenue over its entire lifecycle is just $2.1 million, far from sufficient to support OpenAI's technological ambitions.
Beyond “showing off,” “survival” is the more critical issue.
02 Learning from Sora's “Crossing the River”
With Sora's exit, several questions linger for the entire AI video sector:
How can copyright issues be resolved? Where is the commercial value of AI-generated videos? And how can the “more users, higher costs” death spiral be avoided?
Sora didn't have time to answer these questions, but Seedance and Kling are already feeling their way forward.
Kling is moving faster. From the outset, Kuaishou has clearly focused on the content ecosystem, targeting a broad user base and achieving a closed-loop process from technical capabilities to content participation and then to paid conversion.
Currently, Kling's revenue streams mainly consist of two paths:
One is centered around P-side users (professional creators), high-paying clients with demanding video quality requirements. The other is B-side revenue, including marketing agencies like BlueFocus, hardware manufacturers like Vivo and Lenovo, and gaming companies like 37 Interactive Entertainment, with over 20,000 enterprise clients.

Kling positions itself as a widely usable “productivity engine.” It offers API access to both individual developers and large enterprises without discrimination, allowing ecosystem partners to handle scenario implementation and commercial monetization independently. Meanwhile, Kling adjusts its cost structure through tiered pricing to better disperse policy and copyright risks in single markets.
For ordinary creators, the entry threshold for Kling's Gold Membership is just 66 yuan per month, comparable to competitors. However, for professional advertisers and film and television companies, the annual fee for the Black Gold Membership reaches 11,079 yuan, far exceeding Jimeng's 5,199 yuan per year.


(Top: Kuaishou membership pricing; Bottom: Jimeng membership pricing)
If Kling follows an “indiscriminate open” path, Seedance adopts a “targeted open” strategy. Leveraging ByteDance's content distribution advantages, it focuses on integrating upstream and downstream of the film and television industry chain.
ByteDance owns film and television production tools like Jimeng, CapCut, and Lark AI, as well as content platforms like Douyin, Hongguo, and Tomato Novels. The former possesses comprehensive model capabilities, while the latter provides IP reserves and distribution channels, forming a commercial closed loop from AI content production to traffic monetization.

On this basis, Seedance has further established deep collaborations with leading film and television production companies like Huace Film & TV, Linmon Media, and China Literature, exchanging technological empowerment for joint development rights to high-quality IP.
However, with the rapid explosion of AI-driven film and television, copyright disputes and insufficient computing power have gradually surfaced. ByteDance has had to adjust its strategy and actively screen clients.
Recently, Volcano Engine announced that Seedance 2.0 will open for public beta testing among enterprise users, further accelerating commercialization. However, the usage threshold for ordinary creators has also been raised.
According to Volcano Engine's announced pricing, including video input costs 28 yuan per million Tokens, while excluding video input costs 46 yuan per million Tokens. Generating a 15-second video requires approximately 300,000 Tokens, roughly costing 1 yuan per second. According to Guoyuan Securities statistics, the API price per second for mainstream models domestically and internationally generally ranges from 0.2-1 yuan per second.

Additionally, enterprise clients are required to sign a minimum cooperation agreement, pay a 1 million yuan deposit, and assume copyright risks themselves. Without signing a minimum agreement, ordinary users cannot enjoy benefits such as high concurrency, real human face authorization, or custom virtual avatar libraries.

While Kling accelerates its openness and Seedance narrows its focus, their goals align—the current value of AI video models lies not in how many users they can attract but in how much incremental value they can bring to the entire ecosystem.
03 Has the Profitability Turning Point Arrived?
Avoiding the pitfalls Sora encountered, domestic AI video models have begun to signal commercial landing.
As of January this year, Kuaishou's Kling has exceeded $300 million in ARR, with revenue expected to double this year. In 2025, Minimax's AI-native product revenue grew from $21.805 million to $53.075 million, driven by the continuous promotion of products like Hailuo AI.
However, a wide gap remains between “commercialization” and “profitability.”
On the one hand, a high ARR doesn't necessarily mean profitability. It is well known that video generation is a major consumer of Tokens, even more so than some Agents performing simple tasks.
Currently, domestic and foreign video generation model companies have not provided specific cost data. However, judging from ByteDance's tens of millions in capital investment, Kuaishou's capital expenditures are expected to reach approximately 26 billion yuan in 2026, which cannot be covered by ARR in the billions.
Kuaishou's CFO Jin Bing revealed during last year's second-quarter earnings call that Kling AI has achieved positive gross margins at the inference computing power level. However, when considering capital expenditures and training investments, video generation large models are still a long way from profitability.
At a recent Volcano Engine roadshow in Wuhan, when asked about profitability and commercialization goals, Volcano Engine President Tan Dai responded, “We haven't made a three-year profitability plan yet.”
In the short term, balancing the computing power costs and commercial revenue of video generation large models seems difficult. During the “market capture” phase, expanding the ecosystem is the top priority, as the ecosystem's scale will determine the speed of large model commercialization.
On the other hand, while B-side willingness to pay exists, customer unit prices are not high enough.
Currently, video generation large models primarily focus on commercializing enterprises with strong “willingness to pay” for AI videos, such as short dramas, animated dramas, advertising, and e-commerce.
However, it's important to note that these enterprises primarily use large models for “cost reduction and efficiency enhancement.” If large model companies significantly raise usage thresholds or if more suitable large model products emerge, many small and medium-sized enterprises may be deterred.
According to multiple media reports, the top short drama companies that were among the first to adopt Seedance2.0 have monthly recharge amounts ranging from 2-3 million yuan. When the “affordability” of the pie diminishes, the attractiveness of large models naturally decreases.
Therefore, the current competition among video generation large models is no longer purely about technical capabilities. Technical capabilities determine the product's floor, but to raise the ceiling, differentiation and the commercial ecosystem behind the product matter even more.
For example, vertical players like Runway, Aishi, and Shengshu find it difficult to compete with major companies in terms of R&D investment and ecological resources. Therefore, while optimizing model capabilities, they also strive to make their products more akin to production tools in the film and television industry.
Aishi's newly launched model, PixVerse V6, can generate complex special effects scenes and special shots with simple prompts. Shengshu has released a general world model strategy to improve the stability and authenticity of long shots. Vertical players are attempting to carve out their ecological niches in the industry with differentiated capabilities.

ByteDance has its approach, Kuaishou has its rhythm, and vertical players have their survival space. Before the industry's profitability turning point truly arrives, “showing off” is no longer the only answer in the AI video sector.
Only by transforming technological advantages into differentiated capabilities, workflow efficiency, and verifiable revenue models, and by seeking commercial value in real-world application scenarios, can Chinese large model companies forge a differentiated path and emerge as survivors in this money-burning competition.
AI video may not be a “sexy” business, but its imaginative potential remains highly attractive.
The cover image and illustrations belong to their respective copyright owners. If the copyright holders believe their works are unsuitable for public browsing or should not be used free of charge, please contact us promptly, and our platform will make immediate corrections.