Li Auto's Further Evolution: 'Full-Stack' as the Only Path to Surpass Tesla

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

06/19 2026 497

In the AI era, humans need to give AI a bigger stage.

“Today's smart cars are not intelligent; they are driven by functions,” Li Xiang, the founder and CEO of Li Auto, made a surprising statement at the beginning of the press conference.

Li Xiang's logic is quite simple. The smart cars defined by the industry have always been based on the trifecta of software-defined hardware, real-time connectivity, and continuous upgrades. This is essentially just a pile of callable functions that fail to truly embody intelligence.

As a benchmark, Li Xiang used previous intelligent driving systems to prove this point. For example, in terms of safety, traditional intelligent driving systems “let go” when encountering unsolvable problems, waiting for human intervention. In terms of efficiency, intelligent driving systems drive too slowly in complex areas, causing frustration. In terms of capability, traditional intelligent driving systems are essentially combinations of driving instructions and cannot truly think about driving strategies.

Regarding what a true intelligent vehicle is, Li Xiang provided a definition: an electric vehicle, a professional driver, an AI computer, and a life assistant. It should protect human safety, complete tasks independently, and be more efficient than humans.

To gain a deeper understanding of the details of Li Auto's intelligent upgrades, Guangzhui Intelligence interviewed Li Auto's CTO Xie Yan and the head of Li Auto's base model, Zhan Kun. From technical details to practical implementation, we saw Li Auto once again stepping into a new frontier of intelligence.

Cabin and Intelligent Driving Evolution: Can Li Auto's Vehicle AI Surpass Humans?

Li Auto's intelligent upgrades are mainly reflected in two aspects: Mach VLA (intelligent driving) and Li Xiang Tongxue Agent (intelligent cabin).

In the cabin, the Agent can break down a “big task,” call tools on its own, and then integrate and output the results. Li Auto demonstrated an extremely complex voice command on-site: the user requested a route planning for the Eight Great Sights of Yanjing.

At first glance, this seems like a task that requires the AI to “connect the dots” for the routes of eight scenic spots, but the actual output was truly impressive. The AI, during its self-directed search, noticed that some scenic spots were temporarily closed and that some could not be visited due to time constraints, so it generated a more feasible tour plan.

This demonstrates the Agent's ability to “think vicariously,” as traditional AI cabins would not consider the feasibility of the plan.

Subsequently, Li Auto showcased a demonstration more in line with family scenarios: several locations were introduced to the vehicle's system, and then the order of arrival was specified in a jumbled sequence, followed by a final designated location. It can be seen that Li Auto's vehicle could correctly understand and successfully plan a navigation route.

The shocking aspect of this demonstration is that Li Auto's cabin can not only understand text input comparable in length to a “rapid-fire” speech but also sort out the contextual relevance of “three-stage” instructions.

In the intelligent driving segment, Li Auto first reviewed its past “achievements” in intelligent driving. As of June 14, Li Auto's driver assistance system has cumulatively avoided safety risks 17,273,307 times, including 55,671 major risk avoidances.

Ensuring safety relies on the reaction capability of intelligent driving. Zhan Kun played several intelligent driving videos on-site, most of which dealt with sudden road conditions, such as pedestrians suddenly appearing. The most thrilling one involved a vehicle that had overturned on the highway. At that time, the car was traveling at 120 km/h, and only a light was visible in the vehicle's display, making it impossible to determine what was ahead or the distance. However, Li Auto's intelligent driving still correctly triggered the AEB, allowing the system to decelerate and then bypass the obstacle without human intervention.

Behind this reaction speed and adaptability lies the optimization of the entire intelligent driving technology architecture.

According to the introduction, Li Auto's new-generation Mach VLA architecture reduces system visual input latency by 47%, model inference latency by 43%, chassis response latency by 38%, operating system scheduling latency by 28%, and overall end-to-end latency by 40%. Technically, Li Auto has unified the traditional modular structure of perception-prediction-planning in intelligent driving into a native multimodal MOE large model. At the same time, it replaced traditional perception schemes with 3D VIT. Simply put, the computational process of the intelligent driving data flow is reduced, resulting in faster reactions.

It is reported that the end-to-end reaction latency of the Mach VLA system has been reduced to 0.28 seconds. The average human reaction time is 0.45 seconds, and the limit for F1 drivers is 0.25 seconds. This means that at a speed of 120 km/h, these extra fractions of a second are equivalent to braking about 6 meters earlier.

So, when will the new vehicle intelligence be available?

Li Auto expects that in July, the overall efficiency of intelligent driving will increase by 30%, and the Travel Guide Agent will be officially launched; in September, human-like reverse parking capabilities will be introduced, the vehicle intelligence will be able to open garage doors on its own, and the Agent will be able to connect to mobile phones and computers to operate Feishu and WeChat; in December, safety and efficiency will surpass those of humans, with reaction speeds improved to 0.2 seconds, surpassing those of F1 drivers.

To achieve this plan, Li Auto needs to build a new intelligent foundation encompassing chips, models, and operating systems.

To Build Better AI, a Larger Technical Stage is Needed

Li Auto believes that the issue with the previous generation of intelligent vehicles is not in a specific module but rather that their underlying architecture did not consider AI.

To unleash its full potential, Li Auto believes it is necessary to upgrade chips, algorithms, and intelligent architectures all at once.

Chips: Removing the von Neumann “translation layer.”

Four years ago, Li Auto decided to develop its own chips. Li Auto's CTO Xie Yan and Li Xiang reached a consensus: “Self-development is not to prove ourselves but to truly solve problems.”

Starting from scratch, Li Auto abandoned its reliance on all traditional architectures.

At the press conference, Li Auto finally disclosed the specific details of the Mach M100 chip:

In terms of hardware parameters, the Mach M100 Ultra adopts a 5nm automotive-grade process and delivers a single-chip computing power of 1280 TOPS. To accelerate AI-native algorithms, Li Auto uses a data flow architecture and redefines the internal details of the chip, resulting in an actual computing power utilization rate of over 82% for the Mach M100.

Xie Yan introduced that the classic computing chip architecture is the von Neumann architecture, but this architecture uses 30% of its transistors for cache coherence, instruction reordering, and branch prediction. The reasons for these designs are essentially to make human programming more convenient. The data flow architecture adopted by Li Auto returns to the essence of computing—using a compiler to explicitly orchestrate data movement and timing, calculating AI in a more AI-like manner.

Discussing the differences between the two architectures, Xie Yan said, “The von Neumann architecture is designed to adapt to human thinking by abstracting computing into sequential instructions, allowing humans to reason step by step. Data flow, on the other hand, involves large-scale concurrency, with multiple data streams progressing simultaneously. It not only advances in time but also requires spatial layout, known as temporal-spatial compilation. Additionally, without instruction sequencing, it primarily uses a consumer-producer model, with a large number of consumers and producers operating simultaneously, requiring a completely different compilation framework.”

Simply put, Li Auto has “completely figured it out” with this chip.

If previous chip computing architectures relied on a central processor to break down large tasks and then “distribute” them, aiming to do everything, the data flow architecture is a “naturally suited” production line, representing a “dimensionality reduction strike” in efficiency by specialized tools over general-purpose tools.

Regarding this change in thinking, Xie Yan also quoted the words of Jack Dennis, the pioneer of data flow: “Nowadays, computer division of labor is too fine. Hardware people don't understand software, chip people don't understand compilers, and software people don't know how hardware works. When you see both hardware and software simultaneously, you essentially have a complete picture of computing. When you see everything, you can create greater innovations.”

Perception: From BEV to 3D ViT, AI Truly “Understands” the Physical World

At the visual perception level of intelligent driving, Li Auto has taken a step further on the industry's mainstream route.

Traditional intelligent driving systems mostly use BEV (Bird's Eye View) to represent the environment, but Zhan Kun pointed out its fundamental flaw: “The problem with using BEV is that if I haven't defined what a ditch or a pit is, the downstream decision-making lacks this information.”

In other words, previous technical solutions only allowed the car to see the external environment but did not reach the dimension of “understanding.”

Li Auto's solution is 3D ViT (3D Vision Transformer). Technically, 3D ViT is a modeling method for three-dimensional space. This approach allows the intelligent driving system to input richer layers of information, not only understanding the 3D structure of the environment but also the attributes, textures, and types of 3D objects. Since the modeling incorporates vision, it can now also see colors.

The evolution of this technical solution essentially represents humanity “letting go of preconceptions”: less human-driven planning and more AI-native architectures and computations.

Zhan Kun explained, “The previously well-known 'The Bitter Lesson' refers to the idea that machines, without any human priors and trained solely on data, will perform better than humans. The latest concept is the 'Vision Bitter Lesson,' which is about how to judge the quality of your visual representation: by whether you can take corresponding actions—if you can bypass this ditch, it proves you understand the ditch. By constructing a very good three-dimensional spatial representation standard, we allow downstream actions to truly understand, thus fully demonstrating visual capabilities.”

Computing Architecture: Disagreeing with 'Cabin-Driving Integration,' 'Cabin' and 'Driving' Require Their Own Expertise

In 2026, cabin-driving integration has become the latest trend in the intelligent vehicle industry.

Cabin-driving integration offers two major advantages: First, cost reduction, as combining the two chips for the intelligent cabin and intelligent driving into one directly reduces the cost of one chip; second, after cabin-driving integration, a unified central architecture allows AI to seamlessly flow between the two intelligent systems, enabling direct control of intelligent driving with voice commands.

However, Li Auto has its own judgment on this trend.

Xie Yan stated, “The cabin and driving are two independent systems. Especially when upgrading from L3 to L4, intelligent driving requires a system with higher certainty, with dedicated memory and computing resources. At this point, the significance of integration diminishes greatly because resources cannot be switched in real-time; otherwise, it would reduce certainty.”

Li Auto believes that there needs to be an AI computing center in the vehicle.

Xie Yan explained, “Just like running OpenClaw on a notebook, the AI computing doesn't happen on the notebook but on a Token Provider Server. Similarly, in a vehicle, there is a Token Server.” The advantage of this design is its high efficiency and the ability to isolate different tasks without interference. The certainty of intelligent driving tasks—whether memory or bandwidth—can be guaranteed not to be disturbed by other tasks, which can only be achieved through joint software and hardware design.

It can be seen that Li Auto's AI research is at a stage where humans are further “letting go” and allowing AI to “roam” more freely. This approach is also a microcosm of the entire AI large model era (such as the impact of Scaling Law on sophisticated models).

Taking a step forward, Li Auto's intelligent research has once again ventured into uncharted territory.

'Full-Stack' as the Only Path to Surpass Tesla

Venturing into uncharted territory is not about blindly moving forward but about seeing who is leading and choosing one's own path.

After the interview, Guangzhui Intelligence had a clear feeling: Among the Chinese intelligent vehicle industry's “varied” benchmarks against Tesla, there is finally a player who truly understands Tesla's “strength.”

On the press conference stage, Li Xiang said candidly, “Tesla is truly powerful, and the pressure is really immense.” But he quickly added, “Only by truly aiming for the highest standard can we have a chance to surpass Tesla.”

After testing Tesla's FSD V14.3, Zhan Kun believes that Tesla has established barriers in two aspects:

First is its solid foundational experience.

“It provides a strong sense of security, excellent efficiency, and good comfort. These are its basic skills,” Zhan Kun said. “I may not take difficult roads, but these basic skills can reach this level.”

Second is its unique capabilities in various details.

“Tesla yields to special vehicles, has precise perception in extremely narrow passages, and can recognize traffic police directions. These capabilities are very strong.”

Regarding whether Li Auto can catch up with Tesla's FSD, Zhan Kun is confident, “In terms of foundational experience, we need a very good evaluation system. We hope to start with our own testing and product teams, working with users and the media to figure out how to evaluate our model—how to balance its sense of security, comfort, and efficiency. There are many ways to do this, and we are confident that we can catch up to the level of FSD V14. Combined with the fact that our chip performance has not been fully released, we can be more efficient and react faster.”

After gaining a deeper understanding of the technology, Li Auto also sees that a “full-stack” approach is the only path to surpass Tesla.

The prerequisite for catching up is to first establish barriers that others cannot quickly overcome. Zhan Kun bluntly stated, 'Only a full-stack approach can establish a true moat. It's crucial whether your computing power, chips, and infrastructure can be fully unified under your own control. If you only have an advantage in algorithms, the talent mobility between China and the US is rapid, making it easy for your advantage to be replicated. However, if you are full-stack, the cost of migration is high and difficult.'

Yet, behind the pursuit of full-stack self-research lies hardships that exceed imagination for Li Auto.

'Are you willing to invest your energy in the hard work that builds a moat? What constitutes hard work? For example, meticulously cleaning data. There are numerous details involved, which may not seem glamorous, but it is these minute details that collectively form a moat,' said Zhan Kun.

Conclusion

Every surge in AI serves as a revelation that new kings cannot emerge from old architectures.

Venturing into the uncharted territory of intelligence requires more than just slogans; it necessitates a new underlying architecture, a different approach to work, and a new target to pursue.

Perhaps, at this year's Livis Day, the book 'The Art of Parking' in Li Xiang's hands was both a self-deprecating Easter egg and a metaphor for the technical route. When true intelligence emerges, tasks that once required human skill will become increasingly 'artistic.'

In the AI era, humanity needs to provide AI with a broader stage.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links