05/15 2025
398
Produced by ZhiNeng Technology
Following the explosion of large AI models, humanity stands on the brink of another technological revolution—humanoid robots.
Morgan Stanley's latest research report projects that by 2050, the global humanoid robot market will generate an annual revenue of $4.7 trillion, with a cumulative deployment of 1 billion units. This volume is nearly double that of the current global automotive industry, signaling that "physical AI" will be the most disruptive industry wave in the coming decades.
In this transformative wave, China not only has the potential to become the largest application market but may also achieve significant advancements in core components, autonomous operating systems, and engineering integration capabilities, leveraging its comprehensive manufacturing system, engineering talent pool, and cost control expertise.
Building on Morgan Stanley's report, let's delve into the technological development path and industrial realization logic, systematically examining whether China can emerge as the leading player in this "physical AI revolution".
01
Engineering Starting Point:
Why Humanoid Robots?
Throughout the history of robot development, the choice of "form" has always revolved around balancing functionality and cost. Compared to wheeled or tracked robots, the humanoid form offers unparalleled adaptability to human environments, such as stairs, doorknobs, tools, chairs, and cockpits, without the need to alter the environment.
This adaptability hints at the potential for "plug-and-play" functionality once humanoid robots integrate general intelligence through VLA (Vision-Language-Action) models.
01
Engineering Challenge: Perception-Comprehension-Execution Closed Loop
The core challenge of humanoid robots isn't whether they can have legs and arms but how to establish a complete "perception-comprehension-execution" closed loop.
This requires:
◎ A Perception Module integrating multimodal sensors (RGB-D cameras, IMUs, ToF lasers, tactile sensors, etc.); ◎ A Decision Model fusing VLA models, utilizing architectures like Transformers for semantic understanding and task planning; ◎ Motion Control relying on high-precision motors, screws, and reducers, combined with closed-loop control algorithms, to achieve sub-millimeter-level motion execution; ◎ Real-time Feedback through a highly reliable RTOS or ROS2 framework ensuring millisecond-level control latency.
From an engineering implementation perspective, the industry is currently in a phase of technological advancement (2025-2035), focused on breaking through coupling bottlenecks among subsystems and evolving towards highly integrated, low-power, low-latency, and high-redundancy fault-tolerant structures.
02
Cost Inflection Point: Transition from Experimental Platforms to Consumer Products
Currently, humanoid robots' BOM (Bill of Materials) cost ranges from $50,000 to $200,000, primarily due to "flexible execution systems + intelligent model integration." However, with progressing trends, the industry will reach an "inflection point" around 2035:
◎ VLA models transitioning from custom training to transfer learning and edge deployment; ◎ The proportion of domestically produced parts increasing to over 80%; ◎ Component lifespans, such as motors, screws, and harmonic reducers, exceeding critical thresholds (>20,000 hours); ◎ Main control chips shifting from X86/GPU to ARM+NPU dedicated SoCs.
This will enable humanoid robots to reach the TCO (Total Cost of Ownership) inflection point in commercial scenarios (manufacturing/logistics/customer service), entering a period of mass deployment (2035-2045), followed by household penetration after 2045.
02
Value Chain Deconstruction:
Who Controls the 'Brain', and Who Occupies the 'Manufacturing High Ground'?
Morgan Stanley proposes that the humanoid robot industry comprises three parts: the 'brain' model, the 'body' hardware, and the integrated control system. Each level involves technical trade-offs between "cost-performance-versatility" in engineering implementation.
01
Robot Model: The Evolution of Generalization from VLM to VLA
The current construction path of VLA (Vision-Language-Action) models shares architectural thinking with large language models (LLM) but presents more complex engineering requirements. These models must integrate 3D visual input, linguistic semantic understanding, and multi-objective path planning to support intelligent behavior in real environments.
Data collection heavily relies on "real physical interaction," with costs far exceeding traditional text or image data acquisition. Crucially, the model must adapt to hardware feedback mechanisms to achieve "perception-decision-action" closed-loop control within 100 milliseconds, demanding high real-time performance and system coordination.
From an engineering standpoint, multiple challenges remain:
◎ A lack of standardized physical interaction datasets (e.g., grasping, walking, object classification) limits model generalization; ◎ Algorithm optimization must consider edge deployment, adapting to hardware platforms with limited computing power (e.g., edge NPUs or embedded GPUs); ◎ As tasks evolve from static (e.g., handling, patrolling) to dynamic interactions (e.g., dialogue, collaboration), there's an urgent need to build a system architecture supporting real-time reasoning and multi-threaded perception.
Chinese companies like UBTech, Fourier Intelligence, and Xiaomi have begun deploying autonomous VLA model training and optimizing edge inference in conjunction with domestic chips from Cambricon, Horizon Robotics, Huawei Ascend, etc., initially forming technological accumulation. However, compared to international leaders like NVIDIA's GR00T and Google's Gemini Robotics, there's still a generational gap. The core issues lie in the lagging construction of cross-modal unified datasets and insufficient accumulation of engineering optimization toolchains, which will be key breakthrough directions in the next stage.
02
Integrated Manufacturing System: Soft-Hard Synergy is the Core Barrier
Robots are not merely a "pile of hardware"; true competitiveness lies in system-level deep integration and engineering capabilities. Leading robot companies are building engineering moats through multi-dimensional strategies encompassing mechanism design, motion control, power supply and thermal management, redundancy, and safety mechanisms.
◎ In mechanism design, lightweighting and rigidity matching are crucial, with increasing use of high-modulus composites and carbon fiber structures; ◎ Motion control trends towards whole-body control, requiring IMUs, sole sensors, and vision systems for precise state estimation; ◎ Power supply and thermal management emphasize high-power density and low-heat design to avoid motion instability due to temperature rise; ◎ Redundancy and safety mechanisms address abnormal joint locking, environmental perception obstacle avoidance, and semantic-level permission control, ensuring operational stability and safety.
Internationally leading companies like Tesla Optimus, Apptronik, and Agility Robotics commonly adopt modular design to enhance system iterability and scenario adaptability.
In contrast, Chinese manufacturers must strengthen their longstanding "motion control system" shortcoming, focusing on developing high-bandwidth, anti-interference, and fast-response closed-loop control systems. This will promote soft-hard real-time collaboration, achieving comprehensive breakthroughs from hardware to system levels.
03
Core Components: China's 'Counterattack Battlefield'
In designing and selecting humanoid robot core components, reducers, lead screw motors, and force/torque sensors are crucial for overall machine performance.
◎ Reducers: Harmonic reducers excel in precision and compactness but have limited lifespans. Planetary roller reducers offer longer lifespans and higher rigidity, suitable for heavy-load scenarios. New elastic transmission solutions balance lightweight and integration, becoming cost-effective and efficient alternatives. ◎ Lead screw motors trend towards hollow structural design, facilitating wiring and heat dissipation. Optimized electromagnetic interference suppression enhances system stability and promotes electric drive system integration. ◎ Force/torque sensors mostly adopt a six-dimensional strain gauge structure, requiring a resolution below 1N for fine operation sensing. Morgan Stanley estimates that using China's supply chain, the BOM cost is approximately $46,000, only one-third of European and American solutions (around $130,000). Chinese manufacturers like Top, Hengli, Leadshine, and Sunward are leading the industrial chain upgrade through "engineering cost reduction + precision enhancement." To accelerate commercialization, it's recommended to advance the engineering evolution path from three aspects: ◎ First, optimize mechanical design and power consumption matching using system simulation platforms like Simulink and Gazebo; ◎ Second, promote integrated reducer solutions, co-packaging reducers, motors, and sensor modules to improve integration and reliability; ◎ Finally, introduce self-alignment error compensation algorithms to address motion precision challenges in multi-joint redundant systems, thereby enhancing robot performance and mass production feasibility.
From Engineering Breakthroughs to Industrial Transitions, What Role Should China Play?
The core of the humanoid robot industry isn't "whether a robot can be built" but "whether reliable, safe, and economical humanoid robots can be mass-produced with an industrialization logic."
This requires state-enterprise coordination in three dimensions:
◎ Clarify technological route choices: Make strategic decisions between power systems (hydraulic vs. electric), control frameworks (centralized vs. distributed), and operating systems (general-purpose vs. proprietary); ◎ Data and simulation-driven R&D systems: Build cross-enterprise "physical interaction data platforms" and "high-fidelity simulation libraries" to support VLA model and control strategy training; ◎ Industrial chain coordination: Promote standardization and modular design of components, accelerate the formation of an industrial ecology akin to a "humanoid robot general platform," lower entry barriers, and accelerate scenario implementation.
In summary, humanoid robots represent not just a consumer electronics iteration but a profound reconstruction of industrial and social infrastructure. In the future, whoever controls the trinity of capabilities—"brain algorithms + body manufacturing + system control"—will dominate the next generation of general artificial intelligence. For China, this is not just a technological battle but also a test of engineering system capabilities. In this "physical AI" revolution, which may span 25 years or longer, Chinese engineers will emerge as the true decisive force.