05/22 2025
371
Produced by ZhiNeng Technology
Humanoid robots are gradually transitioning from avant-garde laboratories to industrialization, becoming a pivotal technological emblem of the fusion between artificial intelligence and intelligent manufacturing.
The introduction of Tesla's Optimus hardware solution sparked rapid development in the global humanoid robot industry. Chinese enterprises have actively engaged at various levels, including body manufacturing, core components, and autonomous algorithms, enriching the industrial ecosystem. Policy incentives, industry-university-research collaboration, and the diversification of the supply chain collectively propel this technological sector into a phase of large-scale development.
However, numerous technical challenges remain in achieving human-like motion control, visual perception, and hand-eye coordination. The journey for humanoid robots to evolve from 'usable' to 'user-friendly' is still lengthy and arduous. We delve into hardware evolution, functional realization, and practical applications, identifying the key contradictions and development trends at the current stage.
01
Technological Evolution and Rapid Formation of the Industrial Ecosystem
The industrialization of humanoid robots became evident following Tesla's Optimus hardware design. The incorporation of 14 rotary actuators and 14 linear actuators set a benchmark for industry standardization, raising public and industry expectations for large-scale mass production.
Driven by this benchmark, humanoid robot development accelerated globally, with China exhibiting a particularly rapid growth trajectory.
Domestic manufacturers exhibit high differentiation in morphological design, ranging from typical bipedal + linear + rotating joints to bipedal + pure rotating joint structures, and even bipedal combined with wheeled chassis or fully wheeled platforms. This diversity signifies a trend towards multi-path evolution.
Behind this technological and structural diversity lies China's robust industrial chain, supporting rapid trial-and-error and evolution across different technological directions.
With a focus on key products, core technologies, and typical application scenarios, it is aimed to reach the world's advanced level in comprehensive strength by 2027. This is supplemented by industrial funds, innovation center construction, and other means to optimize resource allocation. National-level innovation centers have been established in various regions, serving as an effective bridge between the industrial chain, research institutions, and policy systems.
The landscape of body manufacturers is no longer confined to traditional robot companies; technology firms, automakers, AI startups, and others have also entered the market.
From Unitree's G1 and H1 series to XPeng's planned mass production of industrial-grade humanoid robots in 2026, and the progress of Zhiyuan Robot's Expedition series, humanoid robots are transitioning from research prototypes that 'show off strength' to functional scenarios that 'seek implementation'.
The component supply system supporting the entire ecosystem is evolving in tandem. Previously, parts with high technical thresholds, such as planetary roller screws and coreless motors, gradually gained mass production capabilities after continuous process optimization.
Standard industrial products like servo motors and harmonic reducers have been incorporated into the logic of large-scale supply, continuously optimized to meet the lightweight, high-precision, and high-frequency response requirements of humanoid robots. Capital continues to flow into the core component manufacturing sector, laying a solid foundation for subsequent industry expansion.
02
Core Functions: Technological Challenges and Partial Breakthroughs
As the humanoid robot industry gains momentum, the technological challenges of truly enabling robots to 'move, see clearly, and grasp accurately' remain formidable.
Taking the coordination of 'cerebrum and cerebellum' functions as an example, current motion control and generalization abilities are still nascent. Humanoid robots inherently possess complex structures, high centers of mass, and small support areas, making them less stable than quadrupedal structures in dynamic balance control.
In terms of control methods, both Model Predictive Control (MPC) and Whole-Body Control (WBC) face issues of high modeling complexity and substantial computational power consumption.
While learning methods, such as reinforcement learning enabling complex movements like dancing and kicking, have demonstrated impressive experimental results, they are limited by factors like low environmental generalization ability, challenging reward function design, and poor fault tolerance, resulting in unsatisfactory reliability and stability in real-world scenarios.
Imitation learning, while assisting in generalization control, also faces issues with the quality of data sources. Teleoperation data has poor reusability, motion capture is constrained by equipment layout and data volume, and video data presents parsing challenges.
The direction of large models is emerging as another dimension of exploration for robot intelligence. Visual language models (VLM) utilize general model capabilities to counter task complexity, and the VLA architecture also attempts to transfer knowledge to motion command generation.
However, due to the extremely high demand for hardware resources and long response delays of large models, they currently remain at the cognitive level of task decomposition and path planning. True motion control still relies on the high-speed feedback mechanism of the cerebellum.
As the 'eyes' of humanoid robots, the visual system's stability and diversity processing capabilities in complex environments have become key shortcomings. Despite the rich approach of 3D vision, including multi-view stereo vision, dToF, structured light, LiDAR, etc., each has its pros and cons, making it difficult to achieve a balance between cost, accuracy, environmental adaptability, and safety.
For instance, structured light excels in indoor close-range imaging but performs poorly outdoors and is bulky; LiDAR offers high accuracy but is expensive; monocular vision combined with deep learning is low-cost but limited by occlusion and texture absence.
Relatively mature solutions have emerged in the service robot field. Orbbec, with a high market share in the 3D vision sensor market, offers products tailored to various robot types and optimized for different scenarios.
Grasping and manipulation are core capabilities for robots to perform tasks, making dexterous hands another technological high ground. Currently, various manufacturers are conducting in-depth and differentiated explorations in hand structure, control strategies, and sensing capabilities. Manufacturers like Lingxin Chouxiu and Leadshine Intelligence have launched products with multiple degrees of freedom and adaptability to different task types, but they still grapple with the stability of generalized grasping actions.
Hand-eye coordination is pivotal in solving this dilemma. Relying solely on vision or touch for precise operations has limited success rates. Combining visual recognition with tactile feedback to achieve dynamic adjustments has become the mainstream approach for enhancing upper limb manipulation capabilities. Leadshine Intelligence's teleoperation training and Pacinian's imitation strategy are both attempts to transfer human operational experience to robots, exploring more possibilities for enhancing generalization ability.
Summary
Practical humanoid robots require more patience and accumulation. Standing at the cusp of 'stepping out of the laboratory,' humanoid robots are witnessing simultaneous evolution in body manufacturing, control systems, core components, and the expansion of application scenarios, stimulating unprecedented industry vitality.
Achieving truly human-like intelligent behavior, mobility, and manipulation capabilities in humanoid robots will take time. From the distributed control of the cerebrum and cerebellum systems to multi-modal perception and hand-eye coordination strategies, each technological module breakthrough demands significant resources and time.
Finding a harmonious balance between the commercialization process and technological refinement is a challenge that every practitioner must currently address.