Embodied AI is booming, but large-scale implementation will take time

09/24 2024 549

Author | Chen Wen

Source | Insightful Research Institute

"The next wave of artificial intelligence is embodied AI, which can understand, reason, and interact with the physical world," NVIDIA CEO Jen-Hsun Huang's prediction at the ITF World 2023 Semiconductor Conference last year is becoming a reality.

In May, humanoid robots dominated the International Conference on Robotics and Automation (ICRA2024) held in Yokohama, Japan.

In July, the humanoid robot "Eighteen Arhats" exhibited at the World Artificial Intelligence Conference (WAIC 2024) in Shanghai was the star of the show.

In August, at the World Robot Conference, which concluded in Beijing, humanoid robots were undeniably the center of attention. Officials stated that this was the largest gathering of humanoid robots to date, with over half of the attendees congregating at the booths of humanoid robot companies.

Clearly, like large language models, humanoid robots, as the most important physical manifestation of embodied AI, are moving to the forefront of the AI stage.

The robots on display were versatile, capable of writing, laundry, and a wide range of household chores. They could also practice Wing Chun kung fu, serve as boxing training partners, and even function as personal bodyguards.

The scene was lively, but delving deeper, how close are these popular embodied AI and humanoid robots to standing on their own two feet?

01 Large Language Models Drive the Popularity of Robots

Before delving into our discussion, let's clarify what embodied AI is.

According to the English translation of embodied AI (Embodied Artificial Intelligence, EAI), two key components are evident: the "embodiment" and the "agent," characterized by "perception, decision-making, physical entity, and environmental interaction." Put simply, embodied AI can perceive and understand its surroundings, executing specific tasks in the physical environment.

Over 60 years ago, Omron, a pioneer in automation technology, proposed that "machines should handle tasks that machines can do, freeing humans to engage in creative activities." However, due to immature technology, this vision remained elusive.

Since 2022, embodied AI has entered a new phase of development, fueled by the rise of large language models.

On August 2, startup Figure AI unveiled Figure02, a humanoid robot supported by an AI model developed in collaboration with OpenAI, enhancing its real-time conversational capabilities and commonsense reasoning.

Clearly, large language models provide the technical foundation for humanoid robots to achieve superior perception, decision-making, and interaction abilities, opening up immense possibilities for brain-perception decision-making and cerebellar motor control.

Furthermore, humanoid robot hardware technology has made significant strides. Tesla's Optimus Gen2, released at the end of 2023, features self-developed actuators and hinged foot connections with force sensors, enabling a 30% increase in walking speed and improved balance over Gen1.

Crucially, Tesla's launch of Optimus propelled the company's share price higher for 11 consecutive trading days, boosting Elon Musk's net worth by approximately $67 billion. This further validated the market's outlook on the commercialization of humanoid robots, refocusing the tech community's attention on embodied AI.

This is evident in three main areas.

Firstly, governments worldwide are guiding policy development. China's short-term policy goals focus on achieving technological breakthroughs in core components, while long-term goals emphasize industrial application and ecosystem development. Overseas policies prioritize breakthroughs in cutting-edge technologies and the implementation of key scenarios.

Summary of Key Policies on Chinese Humanoid Robots Source: Various government websites, CICC Research Department

Secondly, there is a diverse range of players entering the field, intensifying competition. Established robot companies like UBTECH, Boston Dynamics, and Fourier Intelligence, along with emerging startups like Zhiyuan Robotics, Galaxy Robotics, Xingdongjiyuan, and Zhuji Power, are joined by two additional groups: technology giants like iFLYTEK, Baidu, Tencent, and Google, leveraging their algorithmic advantages in perception and cognition; and cross-industry players like XPeng, Xiaomi, Dreame, and Tesla, who often have well-defined application scenarios and shared industrial chains.

Thirdly, there is significant capital interest in the embodied AI sector, driven by the pursuit of returns on investment. In 2023, nine domestic humanoid robot companies raised over RMB 1.9 billion in cumulative funding. In the first half of this year, 13 such companies raised over RMB 2.5 billion. Galaxy Robotics, which was founded just a year ago, secured over RMB 700 million in Series A funding in June, valued at several billion RMB, making it the largest Series A round of the year. Under the tag of "Advanced Manufacturing - Robotics," there were 135 funding rounds as of early August this year.

02 The Hands and Feet of Robots: Still Undecided

Despite the buzz in the industry, humanoid robots face numerous challenges in research and development.

Foremost among these is the convergence of technical routes. Wang Tianmiao, Honorary Dean of the Beihang University Robotics Institute and Dean of the Zhongguancun Zhiyou Research Institute, stated at the 2024 World Robot Conference that humanoid robots face two primary difficulties: the development of suitable general and vertical professional large language models for robots, which are still in the early stages; and the technical and cost challenges associated with dexterous hands.

The term "software" refers to the ability to break down complex tasks into numerous subtasks, which must integrate seamlessly in the physical world, requiring the empowerment of large language models to enable human-robot interaction. In other words, robots must possess generalization capabilities, such as a household service robot that can autonomously prioritize and complete tasks like cleaning, cooking, and organizing without explicit commands.

The industry generally agrees that there are no hardware barriers for humanoid robots. While there are significant differences in mechanical performance such as mobility and load-bearing capacity among different manufacturers, these gaps are not insurmountable and will eventually be bridged by time and cost. Ultimately, the generalization capabilities based on software determine a humanoid robot's performance, making it adaptable to various task scenarios and truly "usable."

Extending from the generalization capabilities of robots, there is no consensus on the technical direction for the physical form and end-effector choices of humanoid robots, i.e., their hands and feet. The industry is divided on whether robots should have bipedal or non-bipedal mobility. While bipedal robots align with the embodied AI concept, they lag behind wheeled robots in practicality, stability, and development costs under current technological conditions. Each approach has its advocates, with some arguing for the long-term significance of bipedal algorithms, while others prioritize the applicability and scalability of non-bipedal solutions.

The choices for robot hands are even more diverse. Some companies opt for full-fledged five-fingered hands, while others start with two- or three-finger grippers. The diversity in technical routes stems from manufacturers' desire to define the technology before a unified standard emerges.

More challenging than the choice of hands and feet is data collection. Enhancing robots' software generalization capabilities requires training data, which often involves capturing human behavior, complicating the process due to the need for human participation. Zhiyuan Robotics, for example, plans to establish a sampling factory with around 100 robots and 150 workers by the end of September, aiming for each worker to produce 1,000 data points per day. However, the feasibility of this data collection model remains to be seen.

03 How Do Humanoid Robots Make Money?

Notably, many humanoid robot companies have entered small-scale mass production, with some pricing their products below RMB 100,000. For instance, Unitree Robotics' G1 humanoid robot sparked heated debate with its RMB 99,000 price tag upon launch in May. At the 2024 World Robot Conference, Unitree announced the mass production version of G1, designed for large-scale manufacturing.

Zhiyuan Robotics plans to commence mass production of its bipedal humanoid robot in October, aiming for monthly output of 100 units and annual shipments of around 200. They also expect to ship around 100 wheeled robots. EX Robotics CEO Li Boyang revealed that the company has achieved profitability through mass production and aims to produce around 500 units this year, with shipments increasing further next year. Tesla has also announced plans to produce small batches of humanoid robots next year, deploying over 1,000 units in factories to assist with work.

While there are many positive developments, humanoid robots are still far from commercial viability. Wang Tianmiao notes that current prices, whether RMB 150,000, 100,000, or lower, are primarily targeted at research platforms, akin to the line-controlled chassis in the autonomous driving industry. Humanoid robots are largely consumed internally within the industry, with peers purchasing them for research and development.

Industry insiders identify three types of opportunities for embodied AI, mirroring those in autonomous driving. Firstly, early entrants can position themselves for the long-term by developing local capabilities for humanoid robots, similar to L4 autonomous driving. Secondly, there are opportunities in specific scenarios, akin to autonomous driving in mines, enclosed parks, and street cleaning. While the industry is still exploring these opportunities, humanoid robots have significant potential. Lastly, there are upstream and downstream industrial opportunities. Sometimes, selling shovels is more profitable than mining. Upstream opportunities include smart computing centers, compute chips, and edge models, while downstream opportunities encompass sensors, joint modules, and other components analogous to radars and smart cockpits in autonomous vehicles.

With these considerations, the development path for the humanoid robot industry is clear. While opinions vary on humanoid robots' share of the smart robot market, some optimists believe it could exceed 60%, while others peg it at around 30%, citing the versatility of other robot types like arm, wheeled, and tracked robots. Ultimately, the specific form of humanoid robots will depend on application scenarios, customer needs, and willingness to pay for service costs and product features. Technological innovation and development will ultimately determine their success.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.