Sora nicknamed "futures", domestic Sora plays a game of "turn left, turn right"

08/29 2024 497

Produced | Entrepreneurs Frontline

Art Editor | Li Yufei

Auditor | Song Wen

ChatGPT ignites a new wave of AI, and Sora stirs up another wave of large-scale text-to-video generation. Nowadays, the field of AI-generated video has entered the stage of commercial competition, with various players showcasing their unique strengths.

It is noteworthy that two mainstream models are attracting heated debate in the pursuit of commercialization. Kuaishou Keling focuses on the consumer end, while Xinyi Technology focuses on the business end, both demonstrating remarkable momentum in their respective fields.

This piques curiosity about their underlying business logic and unique advantages. Upon closer inspection, one cannot help but conclude that "all roads lead to Rome" in this context.

1. In the race for commercialization, those who cannot deliver have no future

Sora has dug a hole in the field of AI-generated video, and it's time to fill it. "Burning money" is a problem that players in this field cannot ignore.

Sora's emergence has overshadowed previously renowned text-to-video companies like Runway and up-and-coming startups like Pika. Even Elon Musk, a long-time critic of OpenAI, has had to publicly admit defeat, stating, "Humans must accept defeat when they lose a bet."

However, to this day, Sora remains closed to the public. Despite its impressive video effects surpassing almost all other players, the lack of commercialization has led some to sarcastically label it as a "futures" product, implying that Sora has dug a deep hole for everyone.

OpenAI did not intentionally create this situation. Behind the glitz and glamour lies the risk of exceeding budget expectations and the uncertainty of finding a viable commercial path.

During the period when Sora was not open to the public, OpenAI burned through money at an alarming rate. Recently, it was revealed that OpenAI faces up to $5 billion in losses this year and may deplete its cash reserves within the next 12 months. This underscores the urgent need for OpenAI to secure new financing to survive.

(Image from Shutterstock, based on VRF agreement)

It's important to note that this primarily reflects OpenAI's spending in the large language model field, particularly the fierce competition over free models and the expansion of training scale. Compared to text-to-text generation, text-to-video generation demands even higher computing power, translating to even greater financial demands.

Even a formidable player like OpenAI has struggled to find an effective commercial path in the field of text-to-video generation. The challenge of revenue not covering the enormous costs faced by all players is clear.

AI-generated video that cannot be commercialized has no future. This harsh reality is increasingly recognized by players in the field.

Past business cases consistently demonstrate this truth. Take smart homes, another industry selected by the times, as an example. Back in 1990, Microsoft founder Bill Gates initiated the construction of a "House of the Future," which took seven years to complete. It embodied his vision of future smart home living, but at the time, the outside world was unimpressed.

(Image from Shutterstock, based on VRF agreement)

In 1995, Gates published a best-selling book titled "The Road Ahead," outlining his vision for smart homes. Over three decades, through iterations from Smart Home 1.0, 2.0, to 3.0, this vision has finally become a reality. During this journey, countless companies failed, while those that survived thrived through commercialization.

At the end of last year, Ayla Networks was ruled bankrupt by the court. Once hailed as the pioneer of IoT cloud platforms, it ultimately failed to reach commercialization. In contrast, giants like Huawei, Xiaomi, and Haier have thrived through commercialization.

When Sora debuted, it was hailed as ushering in the "ChatGPT era of video." However, without commercialization, this era cannot be considered truly arrived.

2. Domestic AI-generated video industry plays a game of "turn left, turn right"

Amidst the bustling AI-generated video landscape, most players, aside from Sora, are fiercely competing for commercialization. Especially since June this year, domestic players have become more "pragmatic," prioritizing commercialization.

While the outcome remains uncertain, some players have strategically positioned themselves and begun to emerge as industry leaders, successfully setting the trend.

Currently, two mainstream business models have emerged in the domestic AI-generated video sector. The first, represented by Kuaishou Keling, targets consumers (C-end); the second, led by Xinyi Technology, focuses on business clients (B-end).

Let's start with Keling. Launched by Kuaishou in June this year, Keling is a domestically developed AI-generated video large model that rivals Sora. Upon release, Keling was immediately opened to the public for testing within Kuaishou's video editing app, Kuaishou Video Editor.

Keling adopts a consumer-focused business model. Unlike the free-for-all approach in text-to-text generation, Keling requires payment from users and offers various subscription plans, including monthly, quarterly, semi-annual, and annual packages. After subscribing, users can choose from three tiers of membership prices, with the monthly plan ranging from $10, $37, to $92, offering 660, 3000, or 8000 "inspiration points" respectively, which can generate 66, 300, or 800 high-performance 5-second videos.

This model has attracted numerous early adopters. According to public data, as of July 30, over one million people had applied to test Keling. On the same day, Kuaishou announced the expansion of Keling's AI beta service to global users, marking a significant breakthrough in the consumer market.

Meanwhile, Xinyi Technology's AI-generated video large model, targeting business clients, is also progressing vigorously.

Compared to the numerous players vying for consumers, few target business clients due to the threshold for large model adoption. Even Sora, a star product, has not yet considered B-end deployment.

So, why did Xinyi Technology choose this path?

Last July, Xinyi Technology launched its first domestic video-focused generative AI (large language model) – Xinyi Video Large Model. In January this year, the model was registered under the "Provisional Measures for the Administration of Generative AI Services," becoming the first video AI large model to obtain such registration in China.

On July 5, at the 2024 World Artificial Intelligence Conference, Xinyi Technology unveiled the 2.0 version of its large model. The upgraded version can automatically adjust video pacing, shot transitions, and more, enhancing video controllability. It can also generate higher-quality, longer videos with realistic simulations of scenes and character movements, resulting in more vivid and natural video content.

(Image from Xinyi Technology's official Weibo account)

Centered around the Xinyi Video Large Model, Xinyi Technology has also launched AI-native application platforms like "One Frame Per Second Creation" and AI digital human platforms like "Xinyi Digital Human," among other AI-related products and services.

Adhering to the belief that "products should not merely exist in laboratories," Xinyi Technology has implemented solutions across various industries, including tourism, finance and insurance, media, marketing, publishing, government affairs, education, automotive, and healthcare. Data released in early July shows that the platforms based on the Xinyi Video Large Model boast over 3 million users, generating 120,000 minutes of video content daily.

Take the finance and insurance industry as an example. In the past, insurance agents varied widely in their backgrounds and knowledge, affecting their ability to efficiently serve clients. "People from different industries cannot possibly create content every day; communication is key. Our goal is to help agents increase productivity and quality within limited time through our large model," said Mao Muzi, Vice President of Xinyi Technology, in an interview with "Interface News – Entrepreneurs Frontline."

He added, "Agents used to produce only one video a day, but with our product, they can create five, 50, or even 100 videos while ensuring quality." In his view, efficiency is crucial in a competitive market. "For instance, insurance agents need to promptly provide insurance options based on client needs. In the past, this relied on agents' knowledge and experience. Now, with the large model, human interpretation is replaced by model interpretation and generation."

(Image from Shutterstock, based on VRF agreement)

In the marketing industry, efficiency gains are even more pronounced. Marketers need to calculate ROI and identify high-potential customers among many leads, crucial for all companies. In the past, this process was inefficient, relying on manual identification online and door-to-door visits offline. With the large model, marketers can segment and tag leads, applying different marketing strategies and materials to different groups, ultimately enhancing communication efficiency with clients.

"With AIGC tools, a time-consuming task becomes simple. Users make one decision, and the rest is handled by the large model," said Mao Muzi. Moreover, AIGC significantly reduces production costs. "Theoretically, the large model can generate diverse content to meet various needs."

Take building an IP for corporate executives as an example. Typically, executives struggle to set aside time and maintain consistency in content creation, focusing instead on corporate strategy and operations. "AIGC can help. Executives or founders only need to confirm content, and the team handles the rest, enabling rapid IP development," Mao explained.

It becomes clear that B-end users choose the Xinyi Video Large Model primarily for its convenience and efficiency gains in video processing, analysis, and application.

According to Mao Muzi, Xinyi Technology's business scope continues to expand, encompassing areas like insurance consulting and digital human insurance advisors in finance and insurance; digital hotlines and consultation desks in digital government; and digital hosting and news production efficiency enhancements in media.

Curious minds wonder how Xinyi Technology tackles the challenges of the B-end market.

"Our advantage lies in material segmentation and analysis, particularly in scripting video logic. Unlike simple content processing offered by other companies, our editing boasts a clear storyline and narrative logic, making it more suitable for various usage scenarios," Mao explained.

Both Kuaishou Keling and Xinyi Video have emerged as industry leaders due to their remarkable commercialization achievements. While they target different markets (C-end and B-end), they both demonstrate the dynamic of "turn left, turn right" and serve as benchmarks for their respective models.

3. All roads lead to Rome; the future is promising

The smart home envisioned by Bill Gates took 30 years to evolve from a dream to reality. Now, Gates' latest assessment of the AI technology revolution is that while AI's prevalence won't happen overnight, it won't take a decade either.

Precisely because of pioneers like Kuaishou Keling and Xinyi Technology, who dare to take risks and experiment, the commercialization of the AI-generated video sector is accelerating.

"Unlocking AIGC video productivity," these two players have not been hindered by their different business models ("turn left, turn right"). What do they each possess, and what have they done right?

(Image from Shutterstock, based on VRF agreement)

Unlike the aggressive competition seen in the text-to-text generation sector, the AI-generated video landscape has seen hesitation from many major players. However, Keling and Xinyi Technology boldly seized opportunities and invested fully.

After defining their overall strategic direction, developing quality products became the true foundation for these companies. All players in this sector must prioritize four core elements when building video generation models: model design, data security, computational efficiency, and model capability expansion.

Kuaishou Keling leverages Kuaishou's vast user base and the platform's urgent need to improve its existing business with AI technology. In the era of AI, Kuaishou must prioritize territory expansion and moat-building.

Xinyi Technology, though just over two years old, boasts technological, data, and resource advantages uncommon among startups. Born from Yixia Tech, a seasoned veteran in the video industry, Xinyi Technology inherits rich video resources, team experience, and operational expertise.

(Image from One Frame Per Second Creation's official Weibo account)

Its founder, Han Kun, is renowned in the video industry, having led products and companies like Ku6, Miaopai, Xiaokaxiu, and Yizhibo, which have navigated the PC video era, short video era, and mobile internet era, consistently staying ahead of industry trends.

In this new era of video AIGC, Xinyi Technology has capitalized on years of accumulation and refinement, emerging as a B-end leader in AI-generated video with its low-key yet pragmatic approach.

(Image from One Frame Per Second Creation's official Weibo account)

According to OpenAI's definition of Sora, it represents a viable path towards building a general simulator of the physical world and another milestone towards AGI.

In essence, the two mainstream business models of Kuaishou Keling and Xinyi Technology, each in their own way, contribute to filling the hole dug by Sora.

Beneath their apparent divergence ("turn left, turn right"), both strive to bring the seemingly elusive vision of "building a general simulator of the physical world" closer to reality. Past business cases serve as a reminder – technology that remains in laboratories has no future; only widespread commercialization can ensure a promising outlook.

Whether for C-end or B-end, Kuaishou Keling and Xinyi Technology will eventually end up in the same place and open the 'Pandora's box' in the era of AI.

*Note: The topic picture in the article comes from Shutuku, based on the VRF agreement.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.