12/27 2024 487
Image courtesy of Visual China
Blue Whale News, December 23 (Reporter: Wu Jingjing) - Following a series of setbacks, including delisting, the once-prominent autonomous driving company TuSimple has embarked on a new entrepreneurial trajectory.
In August 2024, the company first hinted at a shift towards AIGC. Four months later, on December 17, TuSimple unveiled its new brand, CreateAI, and introduced its maiden large-scale video generation model, "Ruyi".
Currently, the Ruyi-Mini-7B version is officially available on Hugging Face, allowing users to download and utilize it. It is understood that "Ruyi" is specifically designed to run on consumer-grade graphics cards (like the RTX 4090).
Is the transition from autonomous driving to AIGC a seamless one?
Why did the company pivot from autonomous driving to the entirely different realm of visual large models?
In an exclusive interview with Blue Whale News, TuSimple's technical leader explained that the pivot was primarily driven by the company's strategic transformation and business development needs. On one hand, TuSimple had already amassed experience in algorithms, computing power, and data within the AI sphere through its autonomous driving endeavors. On the other hand, co-founder Chen Mo's resources in the gaming industry provided an opportunity to rapidly implement technology.
Chen Mo also mentioned in a previous media interview that TuSimple was seeking a swifter path to "revive" the company using existing resources. With the continuous advancement and disclosure of visual model technology, AIGC emerged as the most viable option in terms of commercial potential and technical feasibility.
According to Blue Whale News, the TuSimple team working on visual models was previously the same team that developed autonomous driving technology, allowing for the reuse of some technical expertise.
TuSimple's technical leader elaborated that both AI visual models and autonomous driving rely on the "troika" of algorithms, computing power, and data to propel technological development. Both also heavily depend on extensive data for training and optimization. Video generation technology is quite similar to the perception module in autonomous driving, as both are primarily data-driven with relatively short research and development paths and clear technical foundations.
In his view, autonomous driving technology encompasses multiple algorithm modules such as perception, localization, planning, and control, spanning multiple domains including software systems, hardware design, and vehicle structure design. In contrast, the research and development path for video model technology is shorter, with a narrower technical scope primarily focused on data processing and model training.
"The significance of data even surpasses that of algorithms," emphasized the technical leader of TuSimple. He mentioned that TuSimple had accumulated substantial experience in data annotation within the autonomous driving field, with a self-built annotation team and platform, and a comprehensive data processing workflow. "These experiences and tools can be directly applied to the data preparation stage of video models, saving considerable time and cost."
Nevertheless, in many aspects of visual model technology and effects, TuSimple still needs to explore from scratch. Currently, the company's video models prioritize five key indicators: generation quality, consistency, controllability, ease of use, and cost.
TuSimple's technical leader told Blue Whale News that the primary objective is model generation quality, ensuring that the generated video content meets high standards in visuals, movements, and details. "The company adopts a spiral development strategy, gradually enhancing model controllability, ease of use, and cost-effectiveness while maintaining generation quality and consistency."
TuSimple Chooses a Third Path: Producing Content Rather Than Monetizing Models
Currently, the field of visual models is witnessing continuous advancements. On December 9 (local time), OpenAI officially released its latest large-scale video generation model, Sora-Turbo, capable of generating new video content based on text, image, or video input. Additionally, in the domestic market, both large tech giants like ByteDance and Kuaishou, as well as startups like Pika, Aishi Technology, and Shengshu Technology, are actively pushing forward with technological and product iterations.
Is TuSimple entering the fiercely competitive visual large model sector to grab a share of the pie?
From the company's current business progress and interviews, the answer is no. TuSimple's technical leader told Blue Whale News that a more accurate description would be that TuSimple aims to become a content company rather than a large model technology company, differentiating itself from platforms like Kuaishou and startups like Pika.
Currently, there are essentially two business models for visual large models on the market: one targets C-end users, providing paid video generation tools or services for creators to produce content; the other targets B-end companies in the film, entertainment, and gaming industries, helping them reduce costs and increase efficiency.
TuSimple's technical leader explained to Blue Whale News that if positioned as a pure video model company, there are obvious challenges for both C-end and B-end markets:
On one hand, for C-end users, the target audience for video generation tools are professional creators rather than the general public. The pricing model and profit prospects are unclear, and video models require significant computing power, leading to high operational costs. In the domestic market, it is challenging to attract users and achieve profitability through charging in the short term.
On the other hand, purely B-end technology empowerment faces significant hurdles due to the difficulty for technology companies to deeply understand specific scenario needs and effectively integrate technology into actual production processes, while controlling content quality and style.
Unlike many video models that are focused on pursuing technological versatility, TuSimple has chosen a different third path: directly open-sourcing model technology, not monetizing models, and acquiring classic IPs to produce content itself.
According to Blue Whale News, TuSimple currently has dedicated anime and gaming teams already developing new projects.
"We aspire to build an AI-driven video content creation company, establishing an end-to-end video content generation chain. Ultimately, we aim to attract users and realize commercial value through high-quality content," said TuSimple's technical leader. "Technology is merely a tool; the ultimate goal is to provide content to users."
Currently, TuSimple has ventured into the anime and gaming sectors. Its new brand, CreateAI, has secured the official license for the renowned martial arts IP "The Legend of the Condor Heroes" and will develop a large-scale open-world RPG game. In August 2024, the company also announced a collaboration with Shanghai Three-Body Animation Co., Ltd. to jointly develop the first animated feature film and video game in the "Three-Body Problem" series. It is reported that the company will also launch SLG game tools and games in December.
"We now possess the top IPs of 'The Legend of the Condor Heroes' and 'The Three-Body Problem,' and our goal is to achieve $1 billion in revenue by 2027," Chen Mo stated in a recent interview regarding TuSimple's future direction in AIGC.