The Ideal and Reality of Humanoid Robots

07/11 2024 533

Li Kaifu once mentioned the "con artists are back" curve that circulates in the AI community. Humans constantly evaluate machines on whether they possess human intelligence, and this process always begins with amazement at the stunning performance of AI in certain fields, gradually leading to the realization of AI's various limitations at the time, resulting in a significant psychological gap.

Recently, humanoid embodied intelligent robots made a concentrated appearance at the World Artificial Intelligence Conference (WAIC). What we felt on-site was a complex phenomenon where "humans are doomed" coexisted with "con artists are back."

Specifically, those who believe that "humans are doomed" are mostly ordinary spectators who feel impressed but do not fully understand, while those who remain calm or even skeptical about humanoid robots are mostly insiders in the AI and robotics fields.

For example, Fu Sheng, Chairman and CEO of Cheetah Mobile and Chairman of OrionStar, stated, "Robots have erupted in this year's exhibition halls, but in daily life, we haven't seen them being used on a large scale anywhere. The industrial explosion in the robotics industry is still far from coming... Time will prove that skepticism towards humanoid robots is justified."

Which of these two mindsets truly represents the truth of the humanoid robot industry?

In fact, there is no absolute truth. Different mindsets arise from different criteria for judgment. The general public, practitioners, and technical experts each have their own "scoring sheet for the humanoid robot in my heart," with varying evaluation scales.

Regarding expectations for humanoid robots, there are significant differences between the public's expectations, media propaganda, and the actual progress of the industry in terms of the three criteria of humanoid form, large models, and embodiment. This constitutes the current ideal and reality of humanoid robots.

The Ideal and Reality of "Transformers"

"Why aren't they moving? What's the point of plugging them in if they're not performing?"

"It looks cool when they're plugged in and lit up."

The most eye-catching attraction at this year's WAIC was the "Eighteen Guardians" in the central exhibition hall. With 18 humanoid robots standing side by side, almost every visitor gathered in front of the booth to take photos. Next to the booth, I overheard this conversation.

In the ideal of the general public, humanoid robots are like Transformers or mecha warriors, walking steadily and quickly, moving flexibly, and performing tasks effortlessly, whether it's working in factories, caring for the elderly, or delivering packages.

However, in reality, the humanoid robots at WAIC spent most of their time on the exhibition stands, performing simple hand movements like holding apples or carrying cups at specific times. Tesla's robot even remained motionless inside a glass display case. Compared to the roaming robot dogs, humanoid robots appeared much more "introverted."

This reveals that the question of whether "bipedal walking" is necessary has become the biggest cognitive difference between the public and practitioners regarding humanoid robots.

In general, bipedal humanoid robots are the "crown of robotics" and the ultimate direction, representing the "Transformers" that the public expects.

However, there are at least three steps to go through: bipedal walking, executing complex tasks, and large-scale commercialization.

Currently, even the first step of "bipedal walking" is not in the best state technologically and commercially.

On the one hand, stable walking requires the system to have extremely high robustness.

When facing various abnormal situations and inputs, robots must still quickly adjust their postures through the motion control module to maintain normal operation.

Improving system robustness depends on real interactions between robots and humans, as well as the physical world, to accumulate high-quality data. If the robot encounters a problem that has not appeared in the training environment, it may experience an abnormality or "crash," resulting in low system development efficiency.

Additionally, "bipedal walking" is not a rigid demand commercially.

For example, both Tesla and Figure have announced plans to deploy humanoid robots in factories to perform tasks such as battery sorting. However, in reality, over 80% of manufacturing operations rarely involve the lower limbs, relying mainly on the upper limbs, particularly the hands. This upper-limb-focused simple system reduces control difficulty, lowers investment costs, and is easier to mass-produce, as only the most critical function (the hands) needs to be replicated. Adding limbs and a torso significantly increases control difficulty, endurance, and costs.

Therefore, the currently scalable robot forms are mostly single, minimalist ones like robot dogs and robotic arms. The "Transformers" that meet public expectations will only bring significant industrial effects after overcoming many hurdles.

For several years, we should expect to see humanoid robots more often on exhibition stands and in display cases rather than in close interaction.

The Ideal and Reality of Large Models

"There are so many robots here, I feel like the theme has gone off-track," a practitioner in the computing field said to me.

The proliferation of intelligent robots at the AI conference is fundamentally due to large models opening new doors to embodied intelligence solutions.

Traditional AI systems, limited by a lack of prior knowledge, struggle with comprehension and generalization abilities, making it difficult for robots to possess basic common sense judgment abilities like humans. This severely restricts the development of advanced embodied intelligence. When robots perform tasks, human engineers often need to break down complex instructions into a series of simplified, programmatic steps, which the robots (such as robotic arms) then execute one by one. Clearly, this level of "intelligence" is not very high, and humans still need to do a lot of coding and development work.

Ideally, large models would bring revolutionary changes to the "intelligence level" of humanoid robots.

Compared to traditional machine learning methods, large models possess stronger generalization abilities, providing new solutions for a wide range of tasks for humanoid robots, such as complex task analysis, smooth and continuous dialogue, and zero-shot reasoning.

For example, telling a humanoid robot, "I'm hungry," would automatically analyze the underlying need behind this statement and break it down into executable specific actions, such as observing the physical environment and retrieving an apple from the fridge for you, without the need for human instruction breakdown.

However, in reality, the changes brought by large models to humanoid robots still remain at the primary level of "natural language interaction."

Currently, most humanoid robots have more of a ChatGPT-like "mouth." While this integration can provide a more natural and vivid interactive experience, it merely upgrades existing voice interaction rather than achieving a disruptive breakthrough in "end-to-end" task execution capabilities.

Why hasn't highly automated operation without human intervention been quickly realized despite large models?

Fundamentally, robotics is a highly complex discipline involving precision machinery, automatic control, electrical engineering, and computational science, ultimately presenting as a highly complex intelligent mechatronic system.

From supervised machine learning to large language models, it represents a technological breakthrough in the field of computing, which can play a role in interaction, planning, decision-making, and other aspects. However, the further evolution of humanoid robots from mechanization to high automation requires technical and resource support from perception technology, drive and transmission technology, and 10G networks.

The Ideal and Reality of the Rise of Domestic Robots

"American companies are responsible for hyping up concepts, while Chinese companies are responsible for bringing robots to the ground, commercializing them, driving down prices, and making robot freedom accessible to everyone."

At this WAIC conference, domestic humanoid robots performed significantly better than overseas companies. Both Tesla and Google's robot displays were quite dull. In contrast, domestic humanoid robots not only appeared in large numbers and on a large scale but also demonstrated commercial capabilities in many specific scenarios, such as cooking robots, telecommunications robots, and household companion robots.

So, does this mean that domestic humanoid robot manufacturers will rise quickly?

While we certainly hope this day will come soon, the reality is still uncertain at present.

In terms of data, technology giants like Tesla and Google have accumulated years of experience in the autonomous driving field, feeding sufficient spatial data to their models to solve learning problems for humanoid robots in complex spaces, thereby facilitating better iterative learning. However, at WAIC, we saw that most domestic humanoid robot manufacturers have relatively isolated business areas. AI companies with extensive data accumulation, such as Baidu and SenseTime, focus more on intelligent robots in automotive forms. This means that solving the data problem for humanoid robots relies on ecosystem-based, industrialized, and collaborative solutions.

At the algorithmic level, domestic multi-modal large models with GPT-4-like capabilities are still relatively scarce, significantly limiting humanoid robots' ability to recognize maps and complex scenes through visual, audio, and other multi-dimensional data. Currently, overseas academia and industry are systematically working on multi-modal large models. For example, OpenAI built an embodied AI model for Figure 01 based on GPT-4, while Google launched PaLM-E, a multi-modal embodied vision-language model. The University of California, Berkeley, introduced LM Nav to gradually integrate hardware bodies, motor cerebellums, and decision-making brains. Currently, domestic basic large models still have some catching up to do.

Developing the domestic humanoid robot industry is a difficult but correct path. On this path, we neither want "con artists to come back" nor "humans to be destroyed." History tells us that technological development inevitably goes through five stages: rise, peak, trough, climb, and stabilization.

To avoid falling into a trough and sustain development, the humanoid robot industry must continually correct its coordinates between ideal and reality, delivering practical value at every stage of development.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.