Li Auto's VLA: The 'Super Brain' Revolutionizing Autonomous Driving with AI Large Models

03/24 2025 439

Introduction

Welcome to Driverless Cars Are Coming (WeChat Official Account: Driverless Cars Are Coming), where we delve into the groundbreaking advancements in autonomous driving technology, specifically Li Auto's latest breakthrough with VLA.

This isn't just another advancement; it's akin to equipping cars with a 'super brain', propelling autonomous driving into a new era.

(For reference, please click:

"Li Auto Executive: Renaming technology doesn't constitute a breakthrough, yet it often goes unchallenged! The importance of self-research")

I. Hell-Level Challenge: Navigating China's Autonomous Driving 'College Entrance Exam'

When Elon Musk acknowledged at the 2024 financial report meeting that 'FSD faces bus lane challenges in China', the industry chuckled – the Silicon Valley Iron Man had finally encountered the unique complexities of Chinese roads.

In China, autonomous driving isn't just about overcoming obstacles; it's about surmounting hell-level challenges.

Bus lanes, which have stumped Tesla, epitomize the 'grand finale' of China's road conditions: a blend of ground markings, aerial signage, and dynamic LED screens, with over 30 textual rules varying across cities, such as 'restricted from 7-9 AM and 5-7 PM' and 'bus priority only'.

Even more perplexing are temporary bus lanes marked with 'blue dashed lines + miniature text', confusing even local drivers.

But this is merely the beginning:

Tidal lanes change direction daily, with peak hour lane configurations resembling Tetris blocks;

Waiting areas require precision timing, akin to the reverse parking test in driver's license exams;

ETC toll stations boast 15 different sign combinations, with some stations closing manual lanes at night, replaced by handwritten 'Please use ETC' notices taped to windows.

Even more daunting is the rapid pace of dynamic updates: a leading city added 87 variable lanes last year, averaging 1.67 new rules per week; during road construction in a smaller city, an intersection underwent a 'straight to left turn → left turn to no entry → restore original condition' sequence for three consecutive days.

Data speaks volumes:

Li Auto's field tests revealed a 42% freshness decay in high-definition map-reliant solutions within three months;

Tesla FSD's misjudgment rate in waiting areas in China reached 37%, 4.6 times higher than in North America;

A new model required an average of 2.3 takeovers to navigate dynamic lane intersections.

"This isn't just a technical hurdle; it's a comprehensive test of comprehension, logic, and game theory," remarked Li Auto's autonomous driving engineers.

While Musk's team grapples with 'recognizing Chinese', Chinese players have ascended to a higher battlefield.

II. Real-World Training of AI Large Models: Empowering Cars to 'Think Fast' and 'Think Slow'

At the 2024 GTC Conference, Li Auto unveiled a counterintuitive insight: the human brain's 'dual-system theory' holds the key to deciphering Chinese road conditions.

The fast system operates like a reflex:

A single Orin-X chip runs an end-to-end model;

Inputs from 8 cameras + lidar data directly output driving trajectories;

Processing speed hits 200 frames/second, four times faster than a human blink.

The slow system mimics an experienced driver's brain:

A 2.2 billion-parameter visual language model (VLM) analyzes road conditions in real-time;

It comprehends complex semantics like 'tidal lanes in use 200 meters ahead';

Automatically initiates 'chain of thought' (CoT) reasoning in special scenarios.

A real-world example of dual-system synergy:

When a test vehicle encountered 'snow cover + temporary construction + tidal lanes' in Qingdao, the VLM completed reasoning in 0.8 seconds:

1. Recognized blurry '7-9 AM east to west' text reflected in snow;

2. Combined contradictory data from Gaode/Baidu Maps to adhere to real-time signage;

3. Commanded the end-to-end model to execute the high-risk maneuver of 'reversing into the opposite lane for 200 meters'.

Even more impressive is the evolution fueled by data:

As training data increased from 1 million to 10 million video segments,

The average miles per intervention (MPI) soared from 15 kilometers to 107 kilometers;

The pass rate for tidal lanes jumped from 58% to 92%.

But the true game-changer lies in the world model – this cloud-based 'metaverse' generates 32,000 extreme scenarios daily, allowing the autonomous driving system to 'die' 100,000 times in the virtual world.

In one stress test, the system encountered the formidable combination of 'heavy rain + hail + road collapse + ambulance driving against traffic' within 24 hours, ultimately achieving '0 interventions'.

III. The VLA Physical Agent Awakens: Cars Begin 'Thinking Autonomously'

In 2025, Li Auto unveiled its masterstroke: the VLA (Vision-Language-Action) large model.

This isn't a mere technological upgrade; it's a qualitative leap, transforming cars from 'tools' into 'intelligent agents'.

Six core technological breakthroughs:

1. 3D Gaussian spatial encoder: Self-supervised training enables the model to inherently understand 'lane lines covered by snow' and 'obstacles in heavy rain'.

2. MoE sparse architecture: Packs 53 billion parameters onto the Orin-X chip while maintaining a reasoning speed of 10Hz.

3. Hybrid attention mechanism: CoT reasoning takes just 23ms, five times faster than human 'lane-changing contemplation'.

4. Diffusion model trajectory generation: Predicts 6-second trajectories with 2-step sampling, boosting game theory success rates by 41%.

5. 3D scene generation engine: Accelerates reconstruction speed by 7 times, generating a provincial capital's entire road network in one hour.

6. Value alignment technology: Trains with 4.5 million takeover data segments, teaching the system to 'wait 3 minutes rather than rush for 1 second'.

Product-side 'soul moments':

Parking mode: In a Beijing mall field test, the vehicle autonomously navigated a 3-story underground garage, recognizing a 'hidden parking space behind a pillar' and completing a reverse parking maneuver with a difficulty coefficient of 9.8.

Dialect understanding: When a Sichuan user said 'go straight to the end and turn left', the system executed the command flawlessly.

Extreme weather response: In -35℃ blizzard conditions in Harbin, VLA judged real lane directions by snow melt agent traces.

Most impressive is the visualization of thinking:

The user interface displays real-time 'attention to delivery electric bikes on the right rear', 'calculating overtaking success rate'.

In construction zones, the system voice explains 'will use the non-motorized lane for 50 meters, confirmed no pedestrians'.

It proactively inquires before the ETC lane: 'Detected insufficient balance, should we switch to the manual lane?'

After a practical test, a tech blogger exclaimed: 'This isn't autonomous driving; it's like having an AI veteran driver behind the wheel'.

IV. The 'Cambrian Explosion' in the Trillion-Dollar Market

While other automakers compete in 'NOA without maps', Li Auto has opened a new frontier: the Pandora's box of Physical AI (physical agents).

Industry disruption is underway:

Tests by a logistics company revealed a 73% increase in autonomous cargo pickup success rates for VLA-equipped trucks.

Collaboration with a robotics company developed a 'home service form' enabling vehicles to autonomously pick up and drop off children to and from school.

In a Xiong'an New Area pilot project, the VLA fleet achieved 'coordinated passage without traffic lights', boosting intersection passage efficiency by 210%.

The far-reaching impact lies in the data flywheel:

400,000 mass-produced vehicles contribute 4.5 million kilometers of real road condition data daily.

The cloud-based world model adds 800TB of training data each month.

Model iteration speed reaches 'hourly levels'; after a rainstorm warning, the system updated its wading strategy within two hours.

"This isn't a single technology breakthrough; it's a reconstruction of the entire mobility ecosystem," noted a securities firm analyst in the latest research report, emphasizing that VLA not only revolutionizes the driving experience but also potentially catalyzes trillion-dollar new markets like 'mobile intelligent spaces' and 'autonomous driving service subscriptions'.

V. Epilogue: When Cars Learn to 'Think'

Looking back at the autonomous driving journey:

In 2015, the industry debated 'whether to use lidar'.

In 2020, the focus shifted to 'how to get rid of high-definition maps'.

In 2025, Li Auto provided the ultimate answer with VLA: allowing cars to truly understand the physical world.

At a conference, Li Xiang posed a thought-provoking question: 'If cars can think autonomously, do we still need steering wheels?'

While there's no definitive answer, it's clear that when vehicles start recognizing 'Starbucks' as more than just three Chinese characters but a coffee-scented destination, and when the system comprehends 'snowy roads' as dangerous scenarios requiring slow driving, we're standing at the cusp of intelligent mobility's singularity.

In summary, Driverless Cars Are Coming (WeChat Official Account: Driverless Cars Are Coming) believes that soon, when children ask 'why do cars need people to drive', we'll explain this era of autonomous driving, filled with struggles and breakthroughs, much like we explain 'why carriages needed drivers'.

And at this moment, Chinese engineers are scripting new answers in the physical world with AI large models.

Dear readers, what do you think? References: Zhi Xing Xing Auto, Article "VLA: A Crucial Step Towards Autonomous Driving Physical Agents | Full Text of Li Auto's Jia Peng's GTC 2025 Speech"

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.