05/06 2026
392

As China's '15th Five-Year Plan' sets sail, global embodied AI, in the form of 'physical entities + intelligence', is diving into the deep waters of large-scale commercial applications. From industrial settings to commercial services, and from healthcare and wellness to home services, it has become a core engine driving the development of new quality productive forces.
However, the four key elements of embodied AI—ontology, data, models, and applications—are tightly coupled and interdependent, with the industry's technological roadmap yet to converge and significant non-consensus within each element.
On April 28, 2026, the 3rd China Embodied AI and Humanoid Robot Industry Conference opened grandly at the Zhongguancun National Independent Innovation Demonstration Zone Exhibition and Trading Center. With the theme of 'Competing in the Trillion-Dollar Embodied Humanoid Track · Reshaping the New Era of Future Industries,' it brought together over a thousand top experts from government, industry, research, academia, and finance, injecting new ideas into the industry's development.
From Models to Data: Breaking Through the Bottleneck of Physical Intuition
2026 marks the first year of commercial deployment for embodied AI. As it moves from laboratory demos to real-world scenarios, a core paradox has become increasingly prominent: models possess 'vision' but lack 'intuition.' This dilemma of 'having vision but lacking intuition' is emerging as the biggest obstacle to the industry's transition from technical validation to large-scale commercialization.
Faced with the complexity of the physical world, the old approach of simply stacking parameters and data is showing signs of fatigue. At the conference, industry players such as Qianxun Intelligence, Dawn & Dusk Technology, CAS Fifth Era, Raybot Technology, and Beijing Humanoid are breaking through via model architecture, attempting to 'squeeze' the value of every piece of information under conditions of data scarcity.
CAS Fifth Era introduced the human brain's spatial cognition mechanism, upgrading the model's internal 'one-dimensional semantic vectors' to 'three-dimensional heatmaps.' 'This transformation aims to preserve key structural information such as an object's volume and position, solving the 'spatial blindness' issue caused by traditional VLA models compressing spatial information,' explained a young chief scientist from CAS Fifth Era and a researcher at the Institute of Automation, Chinese Academy of Sciences. The core logic is to enable the model to possess spatial reasoning abilities like humans through architectural design, rather than relying solely on massive data fitting.
Dawn & Dusk Technology jumped out of the trap of pixel-level generation, turning to 'entity-target'-centric representations and proposing the GCWM target-causal large model. The co-founder & CTO of Dawn & Dusk Technology introduced that through multi-worldline causal search, the model no longer just predicts the next frame but the changes in target states caused by different action sequences. He straightforward (directly stated), 'This shift from 'pixel streams' to 'causal chains' is a crucial step in endowing models with physical intuition.'
Sun Rongyi, Director and Vice President of Qianxun Intelligence, shared valuable practical experience in verifying the Scaling Law curve of embodied models by using 1 million hours of pre-training data as a foundation, compressing the fine-tuning data requirements for complex tasks from 'months of collection' to '20-30 hours.' This fully proves the 'magnifying glass' effect of high-quality base models on data, but the premise is having a solid underlying data infrastructure.
Moreover, the commercialization progress of embodied AI robots depends on the evolution of model architecture, while the model's engineering capabilities are ultimately limited by data quality. It can be said that the thickness of the data infrastructure directly determines the depth of commercialization.
Currently, many deployed cases remain stuck in low-value scenarios like simple grasping and palletizing. These scenarios can often be replaced at low cost by traditional automation or human labor, making the ROI (Return on Investment) untenable. The real commercial value lies in complex, flexible long-duration tasks (such as industrial assembly and home care), which precisely require the most scarce high-quality physical data support.
The industry generally faces the dilemma of 'abundant 2D data but scarce physical data.' During the conference, Daimeng Robotics, MetaVision, Tashan Technology, Lingi Technology, National-Local Co-construction, and Raybot Technology all presented their views based on their commercialization practices.
Chen Pu, CTO of Lingi Light · MetaVision, pointed out that the current mainstream training data is mostly 2D video after frame extraction. However, robots, as 3D entities, trained solely on 2D data struggle to understand depth information and mechanical interactions, leading to frequent errors in fine operations like grasping and pushing. It's like only teaching robots to 'recognize pictures by looking' but not teaching them 'tactile sense.'
In scenarios like industrial quality inspection and commercial services, touch is key to ensuring success rates and safety. Models without tactile data are highly prone to ' roll over ' (failures) in real physical interactions.
Zhang Dong, Partner and Chief Commercial Officer of Daimeng Robotics, emphasized that the lack of tactile feedback prevents robots from perceiving the risk of 'crushing an egg.' Daimeng Robotics' '3D Strategy—Hardware Generates Data, Data Feeds Models, Models Optimize Hardware,' centered on touch, aims to fill this critical gap. Among them, the recently released Daimon-Infinity, the world's largest full-modal physical world dataset containing tactile information, enables models to learn 'priorities and urgencies.'
There's still a long physical journey ahead to go from 'having vision' to 'having intuition.' The ultimate goal of embodied AI is not just a model competition but a dual-wheel drive of 'data infrastructure + model architecture': more efficiently acquiring high-quality physical interaction data (visual + tactile + mechanical) and using cognitively enhanced model architectures to transform it into true physical intuition.
The Commercialization Tipping Point of 'Hands' and 'Nerves'
Currently, as embodied AI moves from laboratories to factories and homes, dexterous hands and force sensors are no longer just nice-to-have accessories but core bottlenecks determining whether robots can 'work' and 'interact safely.'
From the technological releases and deployment practices of leading companies like Lingxin Dexterous Hands, InTime Robotics, BrainCo, StarEra, Yuequan Bionics, CAS Silicon Era, Tashan Technology, and Chaowei Sensing, breakthroughs in the core modules across three dimensions—structural bionics, tactile perception, and cost control—are pushing embodied AI toward a commercialization tipping point.
From the conference scene, the dexterous hand industry has long faced issues like performance, stability, and cost, essentially a trade-off between 'AI training friendliness' and 'industrial-grade reliability.'
BrainCo, based on its accumulation in brain-computer interfaces and neuroprosthetics, emphasizes human hand grip force adjustment—a 'feedforward + feedback' control closed loop (closed loop) (visual anticipation + tactile adjustment). Zhang Zhige, General Manager of BrainCo's Embodied AI Systems Department, introduced that the company's Revo3 U21 series features 21 degrees of freedom with fully direct-drive and full-palm tactile integration, and its reverse-drive structural design can effectively buffer external impacts.
She also admitted that this 'human-like' structural design aims to allow large models to map human hand movements more directly, shortening training cycles, and effectively serving the early training of embodied large models and high-end research needs.
StarEra, starting from the data foundation of AI training, focuses on reliability and impact resistance, emphasizing the key role of dual encoders (output end + motor end) in eliminating backlash and improving control precision. Wang Letian, Vice President of Products at StarEra, admitted that the XHAND1 series achieves a linear force-control relationship through a fully direct-drive joint solution, allowing reinforcement learning algorithms to converge more efficiently, effectively achieving 'pencil lead insertion'-level precision, and enabling rapid repurchases and scaling (large-scale) deployment in high-demand scenarios like logistics and assembly.
Notably, StarEra has already secured a 50 million yuan single order in logistics scenarios, proving the value of dexterous hands in real industrial tasks like handling and assembly.
As the only company in China to achieve full-stack self-research of commercial five-fingered dexterous hands, Lingxin Dexterous Hands has significantly reduced unit costs through modular design and large-scale production. The co-founder & CTO of Lingxin Dexterous Hands revealed that the company currently has a monthly production capacity of over 4,000 units and plans to exceed 10,000, and will bring prices down to consumer-grade levels through innovations like plastic joint modules, targeting future C-end service robots.
This 'volume-for-price' strategy demonstrates the driving force of large-scale production in pushing costs down, a necessary path for dexterous hands to evolve from research devices to universal components.
Additionally, if dexterous hands are the 'hands' of robots, then force sensors are their 'nerve endings.' Only when sensors can perceive micronewton-level force changes in real-time can robots truly achieve 'soft contact' rather than 'rigid collisions.' During the conference, representatives from companies like Ubot Robotics, Xinghui Sensing, and Kunwei Intelligence shared breakthrough paths for sensing in embodied AI robots.
Among them, Shen Xinxing, co-founder of Xinghui Sensing, pointed out that traditional industrial sensors struggle to meet the demands of humanoid robots for lightweight design, anti-interference, and large measurement ranges. The company's ultra-thin six-axis force sensors and joint torque sensors, relying on the quality control capabilities of the automotive supply chain, are attempting to resolve the contradiction between high prices and customized needs. This also confirms the survival logic of component companies balancing 'price-for-volume' and 'technology-for-success.'
As Wang Letian said in his speech, 'A dexterous hand that can train models well is a good hand.' The data generation capabilities of hardware directly determine the evolution speed of AI models. The evolution of dexterous hands and force sensors is no longer an isolated hardware upgrade but the physical foundation for the coevolution (co-evolution) of the embodied AI 'brain' (AI models) and 'cerebellum' (motion control).
When robots possess both dexterous and reliable 'hands' and sensitive 'tactile nerves,' the tipping point for their entry into factories, homes, and even exponential growth is no longer far away.
From 'Point Breakthroughs' to 'Flywheel-Driven' in Embodied AI
Moreover, as embodied AI moves from laboratories to industrialization, it faces a core paradox: technological roadmaps have not converged, but commercialization is imminent.
At the conference, the practices of three companies/institutions—BAAI, Leju Robotics, and Songyan Dynamics—outlined a breakthrough path from 'ontology foundations' to 'open-source ecosystems' to 'commercial closures': rather than betting on a single technological roadmap, building an iterative flywheel that accommodates diverse explorations.
What does the 'body' of embodied AI look like? Currently, it's still a 'non-consensus' stage. As Yao Guocai, Head of Embodied Infra & Data at BAAI and Associate Researcher at the National Key Laboratory of Multimedia Information Processing, Peking University, said at the conference, 'Embodied AI is currently between 'dawn and morning,' with technological roadmaps far from converged.' This divergence is vividly reflected in the practices of many companies.
Leju Robotics (Industrial Pragmatists): The usage cost of robots must be lower than human labor costs—this is a hard metric for industrial deployment. Leju Robotics adheres to a full-sized humanoid route, focusing on non-standard scenarios like industrial depalletizing and handling. Its value lies in verifying the robustness of bipedal robots in real production lines, providing an ontology model for 'human-robot collaboration.' It also addresses recruitment difficulties in assembly lines, having jointly built a production line with Dongfang Precision that can produce one robot every 30 minutes.
Songyan Dynamics (Consumer Lightweights): Taking the opposite approach, it launched 'Xiaobumi,' a 'consumer-grade companion' priced at 10,000 yuan and standing less than 1 meter tall. Through self-developed core components and lightweight design, it downgrades robots from 'professional equipment' to 'consumer partners,' targeting family companionship and education. Through a 10,000 yuan price point and emotional design (e.g., one-click return and safety features), it addresses the 'usability' and 'desirability' issues of C-end products.
BAAI (Universal Base Platforms): Instead of aligning with a specific ontology, it builds an Infra platform supporting multi-ontology access. This 'de-ontologized' approach aims to retain flexibility for future form evolutions. Its commercialization lies not in directly selling robots but in empowering ontology manufacturers through a one-stop platform. This 'water-selling' model offers higher risk resistance during periods of technological uncertainty.
Additionally, on the 'high-quality data' front, there was a consensus on leveraging open-source ecosystems to solve the data flywheel dilemma. BAAI released a multi-ontology dataset with 4.66 million downloads and innovatively proposed a 'trajectory quality evaluation language.' Its value lies in establishing a unified 'measurement standard' for fragmented embodied data, reducing industry trial-and-error costs.
Leju Robotics deployed nine domestic training grounds, producing 25 million real-machine data points annually, and open-sourced the OpenLettuce dataset. It transforms real industrial scenarios into reusable data assets, driving algorithm iteration in the real physical world.
Open-sourcing is not charity but industrial-grade 'crowdsourced R&D.' Only by breaking down data silos can models evolve from 'watching videos' to 'understanding physics.'
Finally, from the speeches of multiple ontology companies at the conference, it's clear that industry values ROI, consumers value experience, and research values upper limits. Future ontology competition will focus more on large-scale commercialization. There are no absolute rights or wrongs in ontology design, only scenario suitability. Industry needs 'strongmen,' while homes need 'butlers.' Diversification is an inevitable feature of the industry's early stages, and the commercialization of embodied AI must be 'scenario-driven,' not 'technology-driven.'
Whale's Commentary
The 'singularity moment' for embodied AI robots is closer than you think. However, without breakthroughs in physical common-sense reasoning and safe generalization capabilities, the industry will linger in B-end project-based stages, failing to ignite true large-scale markets.
The future winners will not be the companies that create the 'perfect robot' earliest but those that can fastest build a closed-loop flywheel of 'data-model-scenario.' The ultimate destiny of embodied AI will belong to long-termists who understand both technological iteration and commercial essence.
*Editor's Disclaimer: Original content is hard-earned; please respect the author. For reprints, please contact us.