The Shot Fired by GTC: Li Auto's Ambitions Beyond Intelligent Driving Upgrades

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

03/25 2026 597

After allocating half of its R&D budget to AI, Li Auto revealed its first major move at GTC.

Original content from AutoPix (ID: autopix)

On March 17, Li Auto unveiled its next-generation intelligent driving foundation model, MindVLA-o1, at NVIDIA GTC 2026.

A day later, Li Xiang engaged in a dialogue with Zhan Kun, the head of the base model, on Bilibili, further explaining the logic behind this model. Through this series of announcements, Li Auto conveyed a message that extends beyond mere technological upgrades in intelligent driving.

Li Auto aims to create an "AI for the physical world." Viewed within the context of Li Auto's developments over the past year, its significance becomes clearer.

Over the past year, Li Auto has been reorganizing its structure while continuously investing in key capabilities such as chips, operating systems, base models, and drive-by-wire chassis. Li Xiang has increasingly frequently mentioned embodied AI and AI.

At GTC, Li Auto added a technical framework to this narrative: 3D ViT, multimodal reasoning, predictive latent world models, and a unified VLA architecture—a complete AI framework for embodied AI.

Li Auto is no longer just explaining how a car becomes smarter but is instead articulating why an automaker should connect intelligent driving with embodied AI.

One Model, Two Machines

The true significance of MindVLA-o1 lies not in its being a new model released by Li Auto but in its attempt to answer a more fundamental question than "how to upgrade intelligent driving": Why has progress in AI for the physical world been slow?

Li Auto's answer is clear. Today, most physical AI systems, whether for intelligent driving or robotics, essentially rely on "learning the world from 2D videos." Models are primarily trained on 2D images and videos, acquiring semantic recognition and associations rather than a true understanding of three-dimensional space.

Li Xiang compares this to the "0-6 years problem" in human development. Most people can drive a car with only an elementary school education because they have already completed foundational spatial training through real perception in three-dimensional space during their first six years. Current AI, however, "trains intensively for tasks it will perform as an adult without ever resolving the spatial training needed during its '0-6 years.'"

A system may recognize pedestrians, vehicles, and intersections ahead, but it may not truly understand the relative spatial relationships of these objects or possess genuine predictive capabilities for future changes.

The importance of this judgment lies in its placement of many of the industry's efforts over the past few years within a more fundamental framework. Whether it's BEV or OCC, the industry has been striving to "supplement" the three-dimensional world for machines, but Li Auto argues that these solutions remain inadequate. BEV flattens the scene, while OCC expresses spatial occupancy but lacks sufficient semantic information.

These approaches hold engineering value but have not yet reached the true physical world.

MindVLA-o1 aims to bridge this gap. 3D ViT does not simply reprocess 2D information but attempts to directly reconstruct a unified representation of space, position, point clouds, semantics, and pixels within video streams. Combined with geometric cues provided by LiDAR, it forms a visual encoding method closer to the "real world."

If 3D ViT addresses "how to see," then multimodal reasoning tackles "how to think." It no longer merely identifies what is happening currently but simulates the future within a latent world model, preemptively deducing potential state changes in the next few seconds.

For intelligent driving, this means the model no longer simply reacts to the current frame but begins to possess a form of "mental rehearsal" capability. Li Auto refers to this as Generative Multimodal Thinking, essentially integrating language understanding, spatial understanding, and future prediction into a single reasoning framework.

Taking it a step further, MindVLA-o1's goal is not to create a stronger perception model or a stronger trajectory model but to truly unify Vision, Language, and Action.

This determines that its boundaries extend beyond vehicles. The same VLA model and data system can control both vehicles and robots. Intelligent driving is merely the starting point not because it is unimportant but because it is currently the most mature and easily scalable physical AI scenario.

Vehicles offer sufficiently complex environments, high-frequency data, clear control objectives, and a more mature mass-production pathway than humanoid robots. For any company aiming to enter the physical world AI space, vehicles serve as an entry point and a realistic training ground.

Li Auto is constructing an AI foundation capable of transcending product forms. Cars, robots, and AI glasses will share a single perception and decision-making system.

While it previously focused on catching up in intelligent driving, Li Auto now aims to demonstrate that it is building a much larger platform.

MindVLA-o1 did not emerge out of thin air. It is the first tangible outcome of Li Auto's intense transformations over the past year. With one model serving two machines, the technological boundary between intelligent driving and embodied AI becomes blurred.

This marks the starting point for Li Auto's strategic expansion.

After GTC, Li Auto Redefines Automotive Intelligence

Li Auto's emphasis on "autonomous driving being just the beginning" is no coincidence. For many automakers, 2025 marks a year when the automotive industry's ceiling becomes increasingly clear, while the imaginative potential of AI grows ever larger.

On the morning of January 26, 2026, Li Auto held an online company-wide meeting where Li Xiang spoke for nearly two hours. He barely mentioned automotive business, focusing entirely on AI: "2026 is the last year for any company aspiring to become an AI leader to board the train."

Shortly after, at the end of January, Li Auto initiated an organizational restructuring. The R&D system was no longer divided by traditional software/hardware functions but was reorganized along "human anatomical" lines—chips as the heart, datasets as the lungs, the operating system as the nervous system, perception and models as the brain, software agents responsible for skills, and hardware agents handling energy, driving, and control.

▍From left: Li Xiang, Zhan Kun

Li Xiang repeatedly emphasizes that "the ultimate form of the automobile is a robot," attempting to reposition the automobile within the AI industrial chain. Vehicles and robots, intelligent driving and embodied AI, are not two distinct worlds.

In perception, 3D vision, LiDAR, and spatiotemporal modeling are universal; in decision-making, language, planning, world models, and reinforcement learning are highly reusable; in execution, while drive-by-wire steering, braking, and chassis control differ in form from robotic actuators, they all evolve toward "enabling models to directly drive the body."

This neatly connects Li Auto's seemingly over-engineered layout (layouts) of the past few years. Self-developed chips, Xinghuan OS, base models, drive-by-wire chassis, data engines, world models, and reinforcement learning infrastructure—if these were solely for today's advanced driver-assistance systems, they would indeed seem excessive, costly, and overly long-term.

However, if the goal shifts to "building an AI system for the physical world," these investments become coherent.

Internally, Li Auto divides this system into MindData, MindVLA-o1, MindSim, and RL Infra, attempting to create a closed loop connecting data, models, simulated worlds, and continuous evolutionary capabilities.

The underlying intent is clear: to unify chips, data, operating systems, models, software agents, and execution control into a single system rather than continuing with the traditional automaker approach of "multiple subsystems advancing in parallel."

This path also holds practical appeal. Compared to directly pursuing humanoid robots, vehicles are easier to commercialize first and serve as a continuous source of real-world data collection. They are both products and training grounds.

Thus, Li Auto's motivation for going all-in on AI stems not merely from Li Xiang's belief in AI or the storytelling appeal of GTC announcements. A more plausible reason is that the automotive industry is compelling head companies (leading companies) to explore boundaries beyond vehicles, with embodied AI representing the most natural and imaginative extension.

Selling Cars and Pursuing AI: A Dual Battle

Not everyone within Li Auto understands this strategy. Some employees wonder why the company doesn't concentrate solely on selling cars.

Li Xiang's response is direct. In the Bilibili video, he compares the automotive industry's barriers to those in the smartphone industry. Every smartphone manufacturer can build high-quality hardware and software, but the most transformative changes came from chips and operating systems—like Apple, which established its barriers through the A-series chips and iOS, barriers that Samsung could not overcome simply by working harder on smartphones.

The immediate benefit of these adjustments is efficiency.

After restructuring, the training efficiency of Li Auto's intelligent driving models has improved from bi-weekly iterations to daily ones. While this figure may not directly equate to product competitiveness, it demonstrates Li Auto's attempt to shift an automaker's R&D rhythm toward that of an AI company.

In the Bilibili video, Li Xiang discusses time allocation.

He says that 70% of his time is still spent on automobiles—reviewing designs, evaluating features, experiencing handling, and assessing NVH—often occupying half a day. Meanwhile, AI-related meetings "might take only half an hour to an hour."

The reason he seems to talk extensively about AI externally is that "evolution requires repeated communication and explanation." Users easily grasp improvements like range increasing from 200 km to 400 km but struggle to understand "why drive-by-wire and active suspension are essential for embodied AI vehicles."

Li Xiang's explanation is clear, but it still suggests that Li Auto is embarking on a new marathon, pre-building technological stacks for a direction that may only yield clear product returns in several years.

At the vehicle level, Li Auto must find a path to translate this heavy technological framework into product experiences that users can perceive, are willing to pay for, and that differentiate it from competitors.

▍All-new L9 Livis

The all-new L9 will serve as the first test case for this validation. Set to launch in the second quarter of this year, Li Auto positions it highly. The L9 represents not just a generation change but the first systematic implementation of a "flagship SUV for the embodied AI era," integrating the 3D ViT perception system, self-developed Mach 100 chip, drive-by-wire chassis, active suspension, and more.

This is why Li Auto's current position is delicate. On one hand, it seeks to shed its old label as a manufacturer of "only range-extended family SUVs" and ascend to a higher tier as a technology company. On the other hand, it must continue selling cars well, maintaining sales and profits to provide a realistic foundation for its grand AI narrative.

This is why Li Xiang discusses both silicon-based lifeforms and store partnerships, L9 generation changes, pure electric vehicle volume scaling, and channel efficiency. For Li Auto, these two endeavors are not separate. AI defines the upper limits of the next stage, while car sales and organizational efficiency determine how quickly it can reach that stage.

Li Xiang believes that "2026 is the final boarding window." Once chips, models, data, and hardware form a closed-loop effect in the hands of a few companies, the cost for latecomers to catch up will rise exponentially. Just as few attempt to build a smartphone operating system from scratch today, once the window closes, competitive possibilities will vanish.

Whether this endeavor succeeds remains unknown today. However, one thing is increasingly clear: Post-GTC, Li Auto has become a company aiming to be "next-generation."

This article is original content from AutoPix (autopix) and is not authorized for reproduction.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links