In-Depth Analysis of XPENG's 2026 New Product Launch: Physical AI - VLA Intelligent Driving, VLM Cabin, and a Myriad of SKUs

01/16 2026 447

Recently, XPENG hosted its 2026 Global New Product Launch event, unveiling four updated models: the P7+, G7, G6, and G9. Alongside these, the company introduced two powertrain systems (including extended-range options) and three intelligent driving configurations: Max, Ultra SE, and Ultra.

Considering the numerous configurations, such as battery range options, calculating the total number of SKUs (Stock Keeping Units) becomes a formidable task, potentially overwhelming car buyers with choices.

However, my primary concern extends beyond consumer confusion to the challenges faced by XPENG's product development engineers and supply chain teams. The complexity involved in developing these vehicles, managing the supply chain, and ensuring quality control poses significant hurdles. Both product technology and suppliers are under immense pressure, making it challenging to maintain high standards.

Thus, it appears that XPENG's product management strategies have seen little evolution since the G9 era. Its success in 2025 still hinges on the dividends from its Mona model and Volkswagen's outsourcing business. For a detailed analysis of XPENG's 2025 product sales, please refer to our previous article, "Two Graphs to Understand XPENG's Vehicle Sales and 2026 Development."

Nevertheless, XPENG showcases a remarkable aptitude for product and technology innovation in the automotive sector. This article summarizes XPENG's intelligent strategies and product technology evolution, drawing insights from He Xiaopeng's presentation at the 2026 XPENG Global New Product Launch: Physical AI.

During the launch, He Xiaopeng declared, "In the past decade, XPENG has been synonymous with intelligent electric vehicles; in the new decade, XPENG represents Physical AI and globalization."

What exactly is 'Physical AI,' and why would an automaker position itself at the forefront of this emerging field? This signifies more than just a marketing upgrade; it represents a technological leap from 'digital survival' to 'physical world interaction.'

I. What is 'Physical AI'?

If generative AI (like GPT) can be likened to 'dreaming in the cloud,' then Physical AI is 'working with a body.'

In XPENG's vision, Physical AI transcends mere digital signal processing, such as text and images, and gains the ability to perceive, reason, decide, and manipulate physical entities. It faces unique challenges in the physical world—friction, sudden weather changes, unpredictable human behaviors (e.g., a construction worker waving on the roadside), and an extremely low tolerance for errors.

XPENG's Physical AI strategy aims to create an intelligent agent capable of understanding complex physical laws and interacting in real-time. Currently, this agent is embodied in vehicles (P7+, G7, etc.), with plans to expand to Robotaxi, flying cars, and even humanoid robots.

II. VLA 2.0: Transcending 'Mapping' to Perceive the World Like Humans

The cornerstone of XPENG's Physical AI ecosystem is the second-generation VLA (Vision Language Action) large model. XPENG claims this represents a 'species evolution' in autonomous driving technology.

1. What does this evolution entail?

Prior to VLA 2.0, even so-called 'end-to-end' technologies heavily relied on rule-based code as a fallback. He Xiaopeng described them as 'patchwork monsters'—efficient but limited in generalization.

XPENG asserts that its VLA 2.0 is a true end-to-end large model, bypassing the L2-level VLA 1.0 and supporting upgrades to L4 autonomous driving. Its core logic has undergone a qualitative transformation:

  • No reliance on 'God's Eye': Traditional L4 systems (like Waymo) depend on high-definition map scanning, avoiding unmapped areas. VLA 2.0, like humans, drives by observing road conditions in real-time.
  • No reliance on 'God's Hand': Traditional L4 systems often require remote intervention by cloud safety operators in challenging situations. VLA 2.0 possesses independent causal reasoning capabilities.

However, XPENG has not provided a clear definition of VLA 1.0. Here, I venture to define VLA 1.0 as a combination of VLM (Vision Language Model) and Action, where user language extracts instructions and feeds them into either one-stage or two-stage end-to-end action commands.

Yet, the specifics of VLA 2.0 remain unclear. Perhaps XPENG intends to convey that this is the true VLA, though whether true VLA can be implemented in the automotive industry within the next one or two years remains uncertain.

2. The 'Turing'-Powered Aesthetic of Brute Force

To drive such a sophisticated system capable of understanding the physical world, ordinary computing power is insufficient. XPENG's solution is a trifecta of 'large model + large computing power + big data':

  • Large Model: Enormous parameter count, specifically trained for the appearances and motions of the physical world, rather than just language models.
  • Big Data: Not just repetitive mileage but 100 million abnormal scenario clips. This equates to a human driver learning all possible misfortunes and extreme road conditions over 6,500 years.
  • Large Computing Power: XPENG's deepest moat—self-developed Turing AI chip.

XPENG has effectively integrated the three pillars of AI—algorithm, data, and computing power. However, the realm of high-computing-power chips is complex, with chip computing power and bandwidth often shrouded in secrecy.

III. VLM and AIOS 6.0: Not Just Assistants, But 'Butlers'

If VLA handles 'how to drive,' then VLM (Vision Language Model) handles 'how to understand you.'

In XPENG's Physical AI architecture, the intelligent cabin is no longer a passive command executor but an active perception brain with 7 billion parameters, deployed locally (to protect privacy).

'Proactive Service' is the hallmark of this AI generation, reminiscent of our earlier article, 'Intelligent Cabin Series I: What Is It?' It understands you like the cars we described. Previously, you had to say, 'Turn on the air conditioning' when cold; now, VLM senses your state and the ambient temperature, proactively stating, 'It's a bit cold inside; I've adjusted the temperature for you.' Previously, when encountering a road closure, navigation might have instructed you to turn around in a simplistic manner; now, VLM understands roadblocks and police gestures, proactively telling you, 'The road ahead is closed; I've rerouted for you.'

This is the projection of Physical AI within the cabin: it possesses the ability to observe and interpret, integrating digital twins with real-world emotional interactions.

IV. Conclusion: The Moment from 'Horse-Drawn Carriage' to 'Automobile'

At the launch, He Xiaopeng drew a historically resonant analogy: the era of full autonomous driving in 2026 resembles the transition from horse-drawn carriages to automobiles or from feature phones to smartphones.

XPENG's Physical AI strategy is not merely about equipping cars with faster computers but reimagining cars through AI.

  • It has eyes (cameras + VLA) to understand the driving environment, such as a construction worker's wave;
  • It has a brain (Turing chip + VLM) to comprehend cabin user needs, like your fatigue and desires;
  • It has a robust body (P7+, G7, Robotaxi) to take you anywhere you desire.

In 2026, we may stand at the threshold of an era where AI is no longer content with chatting on screens but begins to turn the steering wheel, entering our physical world. For XPENG, this is not just a product launch but a grand experiment on 'how machines can live like humans.'

Undoubtedly, from the perspective of AI-era automotive development, XPENG's product technology direction is correct. However, XPENG's product management and mindset seem problematic when applied to the physical realm of automobiles. After all, multiple products and complexities lead to development, supply chain, and ultimately, experience and quality control complexities.

Thus, XPENG faces both crises and opportunities. It remains the same XPENG, but now everyone is catching up. XPENG must accelerate its intelligent technology implementation.

References and Images

*Unauthorized reproduction and excerpting are strictly prohibited*

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.