Xiaopeng's humanoid robot showcases double the capacity with a single bowl of jelly

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

11/14 2025 483

Author | Mao Xinru

A humanoid robot strutting down a catwalk has successfully captured global attention.

This robot is IRON, which made its debut at Xiaopeng Technology Day on November 5th.

IRON's unveiling was nothing short of dramatic. Faced with accusations that the robot was merely a person in a costume, Xiaopeng founder He Xiaopeng felt both helpless and disheartened, having to repeatedly refute rumors and showcase the robot's internal structure to prove its authenticity.

Meanwhile, across the ocean, Elon Musk not only publicly praised IRON on social media, stating that its design was impressive, but also predicted that China and Tesla would dominate the entire robot market in the future.

Perhaps He Xiaopeng, who felt disheartened while refuting rumors, was overjoyed after proving himself, as he demonstrated to the public the current capabilities of Chinese manufacturing.

While proving himself, He Xiaopeng lamented that he never expected to have to prove, akin to a scene from 'Let the Bullets Fly,' that he had only eaten one bowl of jelly by cutting open his stomach.

However, within this single bowl of jelly, one can see double the capacity derived from software and hardware technology.

Finding the Balance Point of Personification

Some say that the first Turing test for embodied intelligence was completed by Xiaopeng's robot IRON. Engineers had to 'dissect' the robot to prove whether it was human or robotic.

The reason for the public's suspicion was none other than the fact that IRON looked too human.

From an essential perspective, dissecting a robot involves a machine that resembles a human. There are two common approaches to achieving this goal:

Replicating muscle kinematics and touch through internal bionic actuation, such as artificial muscles.

Reducing the uncanny valley effect by wrapping the robot in a soft exterior to make it more 'approachable.'

The most typical implementations of these two routes are the Protoclone robot developed by Clone in the Netherlands and the NEO robot developed by 1X in Norway.

Protoclone opted to use artificial muscles instead of motor drives, achieving 164 degrees of freedom in the upper body of the humanoid robot. However, artificial muscles require high material technology and hydraulic system technology, and its current design cannot fully optimize these two aspects.

Furthermore, complete replication of human tendon morphology, while achieving a high degree of personification, results in an excessive uncanny valley effect, leading to a poor user experience.

In contrast, 1X chose to wrap the robot in soft fabrics like nylon to convey a sense of gentleness, emphasizing safety and affinity. Meanwhile, Figure AI also adopted a similar approach of wrapping fabric around its third-generation robot, F.03, this year.

Xiaopeng, however, chose a more interesting combination of these two paths. The robot's interior uses a motor-driven + elastomer-connected scheme, while the exterior is wrapped in soft fabric, enabling the robot to achieve a balance of rigidity and flexibility, with both a stable body and a soft touch.

A crucial material in this scheme is the elastomer, which acts as the human fascia structure in IRON.

This high-performance material is both highly elastic and lightweight, characterized by its ability to quickly recover its shape after being subjected to force, with extremely strong deformability reversibility. It constructs a lattice structure through 3D printing technology, simulating the contraction, cushioning, and dynamic response characteristics of human muscles.

Materials like this are used in the elbow joints of Ubtech's Walker S2, the arms, chest armor, and legs of Zhiyuan's Lingxi X2, and the shock-absorbing muscles at the joints of Figure 02.

Of course, no matter how soft the exterior is, what truly determines whether it looks human is whether the robot can move like a human.

IRON's catwalk was the biggest 'trigger' for public suspicion. Behind this seemingly simple gait lies another upgrade to Xiaopeng's mechanical architecture.

Compared to the previous generation of IRON, this generation has seen a significant increase in degrees of freedom, from 62 to 82, while the new IRON possesses a human-like spine.

In previous humanoid robot designs, the torso was often rigidly designed, leading to stiff upper body movements and difficult center-of-gravity control.

The human-like spine, however, allows the robot to more naturally distribute its center of gravity when walking, turning, and bending, not only making it look more human but also enabling smoother and more stable movements.

Specifically, the bionic spine has 5 degrees of freedom, utilizing linear actuators and ball joint universal joints, combined with hip joints that use the same ARF series structure as Optimus, allowing IRON to perform more natural movements such as abduction, rotation, and bending.

In addition to the major upgrade in torso design, Xiaopeng has also updated the dexterous hand, increasing its degrees of freedom from 15 to 22, approaching the degrees of freedom design of a human hand.

Furthermore, Xiaopeng has independently developed a 16mm harmonic joint, currently the smallest in the industry, to achieve a balance between performance and size for the dexterous hand.

The size of IRON's dexterous hand is almost identical to that of a human hand, adopting the most mature linkage transmission scheme in the industry, where the harmonic reducer acts as the tendon of the joint, determining whether the movements are precise and smooth.

In contrast to the hardware-driven software logic in the automotive sector, robots follow a software-driven hardware approach. In other words, Xiaopeng upgraded IRON to accommodate a smarter software brain.

This approach is shared by leading players both domestically and internationally, such as Figure AI and StarMotion Era, who follow a 'model algorithm-hardware parameter' collaborative development mode to avoid the adaptation losses of a generic model + generic hardware.

A More Human-like Intelligent System

If hardware shapes the robot's body, then software and intelligent systems are the key to giving it life.

He Xiaopeng stated at the Technology Day that the new generation of IRON is the first robot to be equipped with Xiaopeng's first-generation physical world large model.

By constructing a high-level capability combination of VLT (Vision-Language-Task) + VLA + VLM, seven systems achieve three high-level intelligences: dialogue, walking, and interaction.

Breaking down this combination, VLT, VLA, and VLM can each form an independent system, corresponding to the brain, interaction, and cerebellum, respectively. They can be combined in pairs to handle tasks based on their type and difficulty, with the three models integrated to form a systematic brain-cerebellum architecture.

Among them, the VLT large model is a brand-new large model developed specifically for robots, regarded as the core engine for autonomous robot action, enabling deep thinking and autonomous decision-making.

Xiaopeng's goal is a high-level humanoid robot that not only resembles humans in appearance but also mimics the human brain in intelligence. Although the functionality is compared to the human brain and cerebellum, the essence of this architecture is also the fast and slow brain system commonly discussed in the robot industry.

Models like Figure's Helix, StarMotion Era's ERA-42, StellarChart's G0, and Stardust Intelligence's DuoCore all adopt the fast and slow brain system.

This system is derived from the fast and slow thinking theory, which posits that the human brain has two thinking systems: fast thinking and slow thinking. Introducing this theory into VLA model design aims to resolve the conflict between speed and intelligence.

In the VLA model, the fast brain is usually handled by a lightweight, independent policy network responsible for generating real-time, fluid movements. The slow brain is typically handled by a massive, pre-trained VLM responsible for advanced scene understanding and task planning.

Although the dual-system significantly enhances the robot's execution efficiency, designing the fast brain as a completely new, independent module from the slow brain prevents it from directly accessing and utilizing the vast amount of pre-trained knowledge within the slow brain.

This results in the fast brain acting like a soldier who only follows orders without understanding the deeper meaning behind them.

This differs from human thinking to some extent, as human reactions are not strictly segmented; actions often carry understanding and habitual reflection.

Therefore, Xiaopeng has more clearly delineated VLA and VLM in the architectural design while introducing VLT to discern task difficulty, providing the system with different levels of response speed.

Meanwhile, Xiaopeng has reused the latest second-generation VLA from the automotive side on IRON. The new generation of VLA eliminates the language translation link in traditional VLA models, achieving end-to-end output from visual signals to action commands, reducing information loss and improving response speed.

Specifically, the second-generation VLA chooses to reuse signals from V, allowing V signals and L signals to jointly influence decision-making A, thereby utilizing L's reasoning capabilities while avoiding information loss during the translation process from V to L when L is solely used as a decision-making representation.

This architecture is very similar to the one shared by Tesla at ICCV 2025.

Tesla uses V-related multimodal signals to generate L as an intermediate representation on one hand and to produce higher-dimensional signals such as panoramic segmentation and 3D Gaussian representations on the other. These multimodal perception signals, along with L's natural language explanations, jointly determine the outputted Action.

Members of Xiaopeng's autonomous driving team have also stated that Xiaopeng's second-generation VLA is both a VLA model and a world model, using its VLA data to train the world model.

Tesla is doing the same, combining the world model with its intelligent driving model by inputting the next world state predicted by the world model into the intelligent driving system for further evaluation and training.

As a company that manufactures both vehicles and robots, Xiaopeng and Tesla share a high degree of similarity in their technological paths.

This is not only reflected in the intelligent driving systems and self-developed chips for vehicles but also in the reuse of the model + chip technology system for robot products, maximizing technological synergy.

Two Major Industry Challenges: Dexterous Hands and Application Scenarios

Despite the significant upgrades in software and hardware brought by Xiaopeng's robot this time, it still faces two major challenges in the development of the humanoid robot industry: dexterous hands and the question of where humanoid robots can be applied.

Although the new generation of IRON's dexterous hand has achieved human-like degrees of freedom and developed a smaller reducer joint, this dexterous hand still faces significant issues in terms of cost and reliability.

Firstly, in terms of cost, Mi Liangchuan, the head of Xiaopeng's robot division, stated that the current cost of the dexterous hand accounts for 60% of the total robot cost, which is not only far above the industry average of 25-35% but also a long way from the industry's ideal target of 5-10%.

He Xiaopeng also mentioned that the cost of one hand is much higher than the annual cost of hiring a worker, and the dexterous hand breaks down within a month of being placed in a factory, indicating that high cost does not equate to high reliability.

The technological dilemma behind this is that high degrees of freedom mean more motors, reducers, and sensors, as well as more potential failure points.

Of course, this is also a common issue in the current industry, which desires both human-like high degrees of freedom and low cost while expecting the product to be durable enough.

Many industry insiders have stated that the lifespan of most third-party dexterous hand products is between 1-3 months, with some even lasting only 7 days.

A deeper look reveals that the average maintenance cycle for robot dexterous hands is too short, pointing to an industry-wide pain point: technologies such as flexible joints, micro harmonic drives, cable management, and thermal control are still immature.

If the dexterous hand represents a technological hard bottleneck, then application scenarios represent a commercialization soft challenge.

From discussing robots entering factories to perform assembly tasks last year to shifting towards the 'three guidance' scenarios of 'guiding tours, guiding purchases, and guiding patrols' this year, Xiaopeng's adjustment in the direction of robot applications reflects a deepened understanding of the industry's practical dilemmas.

He Xiaopeng clearly stated that through practical verification, they have found that under the current technological level, humanoid robots are neither suitable for entering factories to undertake repetitive manufacturing tasks nor for entering households to handle complex housework.

This candor actually reflects the challenges faced by the entire industry.

Currently, the application of humanoid robots in the market is still primarily entertainment-oriented, with most products being of medium to small size. The deployment of full-size humanoid robots in real-world environments is extremely low. Currently, the only full-size humanoid robot being used in factories is UBTECH's Walker S2.

Narrowing the focus and making a horizontal comparison among the three automakers developing their own robots—GAC, XPENG, and Tesla:

GAC's GoMate was originally planned for deployment in factories but was later found unsuitable and is now applied in security and inspection.

XPENG's IRON was initially planned for factory use but has now shifted to three service scenarios.

Tesla has not yet produced a mature Optimus robot for deployment. If the issue of dexterous hands cannot be resolved, it may also be applied in offline stores.

The change in planned application scenarios is just the first challenge. The second is determining the application value of humanoid robots when they are deployed.

Looking back, Chinese service robots have already been widely applied in areas such as shopping guides and patrols, with wheeled robots from companies like Yunji and Qilang already in widespread use.

Do humanoid robots offer unique value in these scenarios? Currently, apart from the interactive experience of being more human-like, they have not formed strong competitive barriers.

Taking offline store scenarios as an example, if the robot only serves to explain products, perhaps a salesperson could provide a more comprehensive explanation than a humanoid robot. If it is to accompany customers in experiencing vehicles, the robot's performance requirements are higher, such as opening car doors and sitting inside to explain, which require flexible joints and dexterous hands.

In scenarios like museums, where the environment is relatively simple and spacious, ensuring complete human-machine safety while also ensuring comfortable human-machine interaction is a challenge.

If the robot frequently adjusts its route to avoid viewers, or requires viewers to actively avoid it, or if there are glitches during interaction, it will be difficult to achieve the desired service effect.

The debut of XPENG's IRON serves as a mirror for the industry, reflecting both the new exploratory achievements in global technology and the industry's current challenges.

Currently, the global humanoid robot industry is at a stage of converging technological routes, rapidly decreasing costs, and expanding application scenarios. The year 2025 is even seen as the first year of mass production, with over 10,000 humanoid robots expected to be produced and shipped globally.

At this critical juncture, with a year of explosive growth in technology and mass production, the public's psychological threshold has been continuously raised. Failing to be eye-catching will result in a loss of traffic and attention. Behind the heated discussions, the imperfections of the technology also need to be viewed rationally.

However, the technological development of humanoid robots has never been an isolated technological competition but rather a collaborative evolution of the global industry. The exploration of the new generation of IRON embodies a path that draws on global advanced experience while combining its own strengths for localized innovation.

For all embodied intelligence players, there is actually no need to deliberately predict industry turning points. What is more important is to focus on accumulating technology and be fully prepared for the arrival of the turning point.

Imperfect exploration is precisely the necessary path to reaching the turning point.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links