01/26 2025
463
Building a humanoid robot may very well be the hottest trend in 2025.
Since the fourth quarter of last year, with the deepening application of large models, the concept of embodied intelligence has gained significant traction, marked by a notable shift of many autonomous driving veterans into this burgeoning field.
A few influential examples include Dr. Yu Yinan, founding member and vice president of the autonomous driving chip listed company Horizon Robotics, and former president of the software platform product line, who teamed up with Zhao Zhelun, former director of intelligent driving products at Li Auto, to establish Vitality Dynamics, a startup specializing in embodied intelligence.
Li Zhenyu, senior vice president of Baidu Group and assistant to the CEO, and former head of Baidu's Intelligent Driving Group (IDG), joined Tashi Zhihang, another startup in the embodied intelligence sector.
Liu Fang, former technical leader and mass production lead of Xiaomi's autonomous driving products, founded Amio Robotics, a company dedicated to embodied intelligence...
With one hand in building cars and the other in creating humanoids, the entrepreneurial journey of autonomous driving veterans knows no bounds.
01 Automakers Also Want a Slice of the Pie
In fact, beyond the few former autonomous driving practitioners mentioned above who have recently ventured into embodied intelligence, many heavyweights in the autonomous driving industry have long been involved in this field.
Zhang Li, COO of WeRide, joined humanoid robot company ZhuJi Power less than half a year after leaving in June 2023, serving as co-founder and COO, and overseeing the company's global strategic planning, channel expansion, project implementation, marketing, and government relations.
Gao Jiyang, who held key technical positions at Waymo and Momenta, founded StarSeer AI, an embodied intelligence company, in May 2023, and serves as CEO. Shortly after its establishment, the company received tens of millions of dollars in investment from top investment institutions including IDG Capital and BV Baidu Ventures.
In April 2023, Guo Yandong, who served as chief scientist at XPeng Motors and OPPO, founded Shenzhen AI Square Technology Co., Ltd., serving as founder and CEO. The company's self-developed multimodal large model, AI2R Brain, enables Alpha Bot robots to achieve high-precision perception and natural interaction in diverse scenarios.
Additionally, there are other autonomous driving elites who have transitioned to the embodied intelligence sector.
Sun Zhaozhi: Executive at Luobo Intelligence, former head of product design for XPeng Robotics, and product leader in Didi Chuxing's car manufacturing project.
Wang Fan: Former CTO of Zongmu Technology, later joined Beidou Smart Union Technology Co., Ltd. as CTO.
Wang Rongxing: Vice President of Product Engineering and Research and Development at Momenta. After leaving, he ventured into large models and founded Wanren AI in June 2023.
Chen Junbo: Head of unmanned vehicle algorithms at Cainiao Network's "E.T. Logistics Lab". After leaving Alibaba, he co-founded "Youlu Robotics" with former Alibaba Robotics CEO Gu Zulin and others, entering the outdoor cleaning robot field.
Xiao Jun: Former Vice President of JD.com Group and President of JD Logistics X Business Unit. He chose to venture into the field of warehouse robots, founding Tianxiaxian Zhichuang Robotics, which received 200 million yuan in investment from Zero2IPO Ventures, Matrix Partners China, and Redpoint China.
An interesting phenomenon is that as autonomous driving veterans flock to embodied intelligence for entrepreneurial ventures, automakers are also stepping up their game, collectively deploying embodied intelligence.
On December 26, 2022, GAC Group released its self-developed third-generation embodied intelligent humanoid robot, GoMate, marking a significant development in automakers' deployment of embodied intelligence.
GoMate is a full-size wheeled-legged humanoid robot with 38 degrees of freedom. Dr. Zhang Aimin, head of GAC Group's robot R&D team, stated that GAC Group plans to achieve mass production of self-developed components by 2025, leading the way in demonstrating and applying complete machines in production lines and industrial parks of GAC Motor and AION and other OEM factories. By 2026, small-scale production of complete machines will be realized, gradually expanding to large-scale mass production.
Earlier, during the 2024 Li Auto AI Talk, Li Xiang, CEO of Li Auto, responded to whether they would make humanoid robots, saying that the probability is definitely 100%. Li Xiang also emphasized that Li Auto is an artificial intelligence company focused not on the intelligence of cars, but on the car-ification of artificial intelligence, aiming to promote the popularization of artificial intelligence to every household.
At XPeng Motors' "AI Technology Day" on November 6, XPeng Motors also unveiled a new generation of humanoid robot, "Iron Allen". This humanoid robot has entered the production training of the XPeng P7+ model at XPeng Motors' Guangzhou factory and will focus on scenarios such as factories and offline stores in the future.
In addition to direct participation, some automakers are deploying embodied intelligence through partnerships or investments.
Chery Automobile has jointly developed the humanoid robot Mornine with AI company Aimoga, entering the humanoid robot field. SAIC Venture Capital participated in the A3 round of strategic financing for humanoid robot ontology enterprise Zhiyuan Robotics, and SAIC Group entered the field of humanoid robots by investing in robotics companies.
02 The Same Technological Core
The seamless integration of car manufacturing and humanoid robot creation largely stems from the homology of their technology, which is based on perceiving and interacting with the environment, calculating external information, and guiding machine movement. Essentially, both require intelligent brains, with only the physical manifestations differing.
Andrej Karpathy, the former head of Tesla's autonomous driving department, also stated in a recent interview that cars are robots, and the technology transfer from cars to the humanoid robot Optimus involves minimal workload, with readily available tools. It's simply a matter of reconfiguring the system from cars to robots, but essentially, it's the same.
Although the technologies are homologous, in practical applications, autonomous driving and embodied intelligence face different types and levels of data processing requirements and computing power.
In terms of computing scale, autonomous driving involves more high-speed scenarios and is closely related to safety, requiring high-computing-power chips to support real-time computation. In contrast, intelligent robots, at this stage, usually operate in relatively low-speed environments, with correspondingly lower computing demands.
Huang Chang, co-founder and CTO of Horizon Robotics, once explained the basic computing power requirements in the autonomous driving industry: L2 level requires about 10 TOPS; L2+ requires tens of TOPS; L3 requires more than 100 TOPS; L4 requires 1000 TOPS, and to fully meet L4 requirements, even thousands of TOPS are needed.
However, for large robot models, taking a single NVIDIA A100 capable of reasoning the 7B large model as an example, its TOPS peak reaches 1248. Therefore, establishing a cloud-edge-end collaborative digital computing environment is crucial for ensuring the perception, computing, and processing capabilities of various robots.
In terms of motion control, autonomous driving only needs to control 2D motion, while for humanoid robots, even the grasping motion of the hands alone requires mimicking the motion of more than 20 joints of organisms, posing a higher challenge for computing and control.
Furthermore, the data faced by autonomous driving is relatively "standardized", mostly involving lane lines, traffic lights, and other content operating within certain rules. Robots, on the other hand, work in complex and varied scenarios, leading to a corresponding increase and diversification in data types, which also puts certain requirements on computing power.
Taking hotel service robots, which are already in large-scale application, as an example, although engineers have enhanced the robot's map-building capabilities, applied VSLAM technology for machine recognition, and marked no-entry lines, strengthened obstacle avoidance algorithms and sensors, etc., allowing robots to avoid conflicts with pedestrians as much as possible and better and more sensitively recognize obstacles on the ground.
Now, there are at least three types of environmental changes in hotel scenarios that can affect the operation of robots.
In terms of the physical environment, changes in the ground environment due to carpet cleaning, meeting equipment and facilities, decoration placement, etc., may cause robot recognition failures, resulting in incorrect robot path recognition.
In terms of the social environment, disputes may arise in elevators with people or others over priority in taking the elevator.
In terms of the digital environment, the hotel's networking method may cause communication failures for robots in elevators or corners.
Musk once said that Tesla's existing FSD technology can, to some extent, reuse algorithms with robot modules in the future, but it is still only in the "underlying modules". "Higher-level computing and scenarios still require special development for the robot platform."
Li Xiang also said, "If we can't even solve L4 autonomous driving in cars, how can we solve more complex problems? Because a car is a contactless robot, and the road is standardized, including road signs and participants, and everyone is trained in traffic rules. I think this is already the simplest robot. If cars can't be realized, other artificial intelligence robots are still very limited."
03 Reducing the Price
In the competition of embodied intelligence, commercialization is a common challenge faced by entrepreneurs. At Tesla's annual shareholders' meeting last year, Musk talked about humanoid robot startups for the first time, saying, "Prototypes are easy, but mass production is difficult, even impossible." The reasons lie not only in software algorithms and hardware sensors but also in production and engineering capabilities. Put bluntly, the produced products need to have a suitable price that can be accepted by users.
Let's first consider software algorithms.
From the perspective of underlying algorithm models, the software of embodied intelligence can be divided into the brain and the cerebellum.
The brain is responsible for perceiving and simulating human thinking and decision-making processes, while the cerebellum mimics the complex motor control of organisms.
Environmental perception and understanding in brain algorithms, like autonomous driving, are primarily based on machine vision, which has developed relatively maturely. Currently, many embodied intelligence startups rely on the multimodal large models of technology giants for empowerment.
For example, Figure 02 under Figure AI and EVE, NEO under 1X Technologies are both connected to OpenAI's end-to-end large language-vision (VLM) model.
Leju Robotics' "Kuafu" is connected to Huawei's Pangu embodied intelligence large model.
Walker S under UBTech is connected to Baidu's ERNIE Bot large model.
The "Xiaoxing" series under Xingdong Jiyuan is connected to Alibaba's Tongyi Qianwen and Zhipu Qingyan large models.
Zhiyuan Robotics' multimodal general large model uses iFLYTEK's Spark large model (self-developed for operation), etc.
These large models from technology giants perform well in basic capabilities and each has its own strengths. In this regard, startups generally have similar starting points with little difference.
However, in terms of the cerebellum algorithm for motion control, each company displays different technical paths.
The most typical case is the development of control algorithms for humanoid robots, which has gone through several stages, including model-based control algorithms (LIPM+ZMP), dynamic model control and optimal control algorithms (MPC+WBC), and simulation learning + reinforcement learning (IL+RL).
The current mainstream is the MPC+WBC solution, but the IL+RL route is widely regarded as the mainstream control method in the future. However, after numerous attempts by startups, this solution has entered a technical bottleneck and is difficult to break through in the short term. This is also the main reason why many humanoid robot ontology companies have launched wheeled bionic robots instead of bipedal humanoid robots.
The hardware aspect mainly includes chips, sensors, and "hands" and "feet" for executing motion control operations.
Currently, the cost of computing chips in humanoid robots accounts for about one-severaltieth of the total cost. Hu Chunxu, head of the developer ecosystem at Digua Robotics, a chip and solution provider spun off from Horizon Robotics, said in an interview with the media that the proportion of chip costs does not vary much among different types of robots, approximately ranging from 7% to 10%. In the case of an average price of 500,000 yuan for a humanoid robot, the cost of the chip layer does not exceed 10,000 yuan. However, in the future, as the costs of components such as motors for humanoid robots decrease, the proportion of chip costs is expected to increase.
In contrast, the "hands" for motion control and execution are much more expensive.
Since the hands need to perform actions such as grasping, placing, pushing, and pulling, the requirements for operation precision are also high, and the degrees of freedom for upper limb operations naturally increase accordingly.
Professionals have stated that without the "hands", a humanoid robot may have around 27 degrees of freedom. However, at the end of November last year, Tesla's demonstrated Optimus humanoid robot was able to catch and place a tennis ball with flexible movements. The dexterous hand used has 11 more degrees of freedom than the previous generation, reaching 22.
Amid the industry's consensus on prioritizing high degrees of freedom, developing dexterous robotic hands is akin to building an entirely new robot. This is a significant factor contributing to the persistent high price of such "hands."
Xi Yue, co-founder of Xingdong Jiyuan, once introduced their products by stating that "hands constitute approximately 1/5 to 1/4 of the total cost of a humanoid robot." This is mainly due to the steep price of current tactile sensors, which can exceed the cost of an entire "hand" without tactile sensors. A professional noted, "The tactile sensors used on a single hand may cost several thousand yuan, and mass application will only become feasible when the cost of these sensors represents about 10% of the total cost of the hand."
From the above analysis, it is evident that the cost trajectory of robotic technology mirrors that of driverless cars. Initially priced in the hundreds of thousands or even millions, the cost of Robotaxi has gradually decreased to around 200,000 yuan, with unmanned delivery vehicles costing less than 100,000 yuan. This rapid decrease underscores the acceleration of autonomous driving's popularization and penetration. For startups focused on embodied intelligence, reducing the price of robots will be crucial for gaining a competitive edge in the future market.