We forgot again, humanoid robots ≠ AI

04/11 2025 448

"Have you seen 'Ex Machina'? Is that deceptive Ava really AI?"

My friend's sudden question made me ponder. In the movie, every glance and line of Ava sent shivers down one's spine - she was too human-like, leading viewers to take it for granted that this was the ultimate form of AI.

But in reality, Ava and what we now call AI are two entirely different things. Her abilities reside in a concept called embodied intelligence.

Embodied intelligence is not a more advanced form of AI but a completely different technological path: it does not rely on the scaling laws of large language models. Instead, it interacts with the environment through a physical body, learning about the world like an infant.

However, due to the rise and spread of science fiction films, it has been misidentified as AI for many years.

Today, let's set aside stereotypes and chat about this:

Why is embodied intelligence different from AI? And how far are we from a real "Ava"?

The image of robots in films and TV series like 'Ex Machina' has subtly reinforced the perception that "embodied intelligence = AI".

But in reality, this is a misunderstanding.

Embodied intelligence is not the same as AI.

To understand what embodied intelligence is, we must first distinguish between embodied and disembodied concepts.

Embodied refers to the existence or cognition that must be realized through the interaction between a physical body and the environment, emphasizing the foundational role of bodily experience, such as in humans. Disembodied refers to the existence or cognition that can exist independently of a physical carrier, emphasizing the autonomy of abstract forms, such as software and algorithms.

Traditional AI, similar to ChatGPT, is disembodied intelligence. It can achieve abstract reasoning of symbolic logic independently of physical entities and exist on any terminal. Large language models establish vocabulary correlation networks through massive text training to derive optimal solutions for probability mappings. However, this intelligence developed in a virtual environment lacks perception of the physical world. It struggles to understand the mechanical control required to pick up a water cup or how to avoid suddenly appearing obstacles.

Embodied intelligence emphasizes interaction with the physical world, anchoring the cognitive abilities of intelligent agents in specific bodies, akin to a "union of spirit and flesh". This body must possess clear boundaries and self-awareness: first, it must be unique and capable of self-movement and manipulation; second, it must be able to interact with the environment, accumulating experience and learning patterns from it. This embodied learning mechanism sets the evolution path of embodied intelligence apart from purely data-driven AI.

And humanoid robots, as representatives of embodied intelligence, are even further removed from AI.

First, humanoid robots must possess a physical body capable of interacting with the real world. This is not as simple as adding a shell but involves establishing a complete perception-action loop.

Secondly, this body must be mobile. To truly bring the robot's body to life, three major challenges need to be overcome: precise grasping control, such as picking up chopsticks or peas; dynamic balance, walking on uneven surfaces like hills and stairs; and multitasking coordination, like walking while carrying a tray.

Moreover, they need multimodal senses to interact with the world. Embodied intelligence requires building a richer sensory system than AI, not only able to see (computer vision) but also to hear (sound source localization), touch (force feedback), and even smell (chemical sensing).

Finally, they need a brain smarter than large language models, as brains based on scaling laws struggle with causality. The brain of a humanoid robot should take the path of a world model, learning from interactions with the real world, transcending probabilities, and moving towards patterns.

In summary, a humanoid robot is a complex intelligent agent integrating multiple cutting-edge technologies. But is such a complex intelligent agent really just one step away from us, as online marketing suggests?

Even after combing through all the demos from the top humanoid robot companies, we can only see a giant humanoid figure that constantly dances, flips, and sometimes can't even hold an apple steadily. Its battery life is usually less than 5 hours, often running out of power; complex movements require manual remote control, similar to a child's remote-controlled car. However, a child's race car only costs a few hundred yuan, while a robot costs hundreds of thousands.

It has to be said that this is more a concept focused on showmanship than practicality.

Because the humanoid robot we imagine needs to be a perfect "lover" like the one in "My Sassy Girl," proficient in both martial arts and literature, beautiful like Ayase Haruka, and occasionally showing a charming contrast.

Ideals are full, but reality is harsh. AI development has fallen into the quagmire of inherent technical methodologies. We are enthusiastic about large language models and reinforcement learning but overlook the fact that they only teach AI how to imitate, not how to understand the real and complex physical world.

And this is precisely what humanoid robots need to overcome to achieve strong development.

Recently, Turing Award winner Yann LeCun expressed his views at the Paris AI Summit that to study human-level AI, we must break and rebuild, achieving three abandonments: abandoning large language models that only calculate probabilities, abandoning contrastive learning that resembles a jigsaw puzzle, and abandoning reinforcement learning that rewards and punishes AI like training a dog.

The brain of a humanoid robot needs technologies that can interact with the environment, such as world models and planning algorithms; the body of a humanoid robot requires biomimetic joints that support movement, sensor fusion systems that represent senses, motion control algorithms similar to the cerebellum, and high-energy-density batteries as its heart.

However, these technologies are stuck in research bottlenecks, similar to large language models before 2020.

On the one hand, it is because they are too difficult; on the other hand, the costs are too high.

For example, humans' ability to use tools is built on a proprioceptive system formed through millions of years of evolution. When using a screwdriver, humans can adjust their grip in real-time through touch and predict changes in torque during rotation. For a robot to achieve the same operation, it needs to overcome three major challenges: 0.1 Newton-level force control accuracy, high-sensitivity tactile resolution, and motion planning coordination to avoid self-collision.

Even the most advanced robots today struggle to pick up the same apple in different scenarios. Almost all demos released by technology companies show smooth tables without clutter or spacious, well-lit laboratory environments. If there's an extra pear nearby or the apple accidentally rolls onto the floor, they may be at a loss.

On the other hand, the research and development costs of experimental humanoid robots like Boston Dynamics' Atlas generally exceed US$2 million. Tesla's self-developed "brain" main control chip costs US$32,000 each, a dexterous hand costs US$12,000, the sensor system as senses totals US$110,000, and the total cost of linear joints supporting movement is approximately US$150,000. Even if mass production is achieved, a significant amount of funds will still be required for maintenance and repairs later, as a humanoid robot has around 30 joints that may become inflexible or jammed after multiple training sessions and wear and tear.

It's not hard to see that the development of humanoid robots is merely creating humans for the sake of creating them. But from an industrial perspective, the efficiency of bipedal movement is unstable and far inferior to the combination of wheeled chassis and robotic arms; moreover, the technology is not mature, and there is no clear development path, making it far from a lucrative business.

Since the cost-effectiveness is so low, where does the idea of humanoid robots come from, and why is it so popular?

Humanoid robots are a concept hyped up by capital and a spectacle for fundraising.

From January to October 2024, there were 69 financing events in the global humanoid robot industry, totaling over RMB 11 billion. Among them, 56 occurred in China, totaling over RMB 5 billion, with some leading companies raising over RMB 1 billion in a single round.

However, the current financing boom stems from capital speculation rather than technological maturity, and some companies have inflated valuations. To attract funding, companies continuously market impractical functions such as fighting and flipping on social media, leading to severe homogeneity, high prices, poor capabilities, and difficulties in implementation. Essentially, they are talking about the future without grounding it in technology.

The capital frenzy is beginning to subside after a period of sobriety.

Some companies have fallen. CloudMinds, once valued at US$3 billion, has been exposed to unpaid wages, layoffs, and a cash flow crisis due to broken capital chains. UBTech, known as the first stock of humanoid robots, has lost over RMB 5 billion in five years, with its market value evaporating by over HK$100 billion.

Some investors have pulled out. Zhu Xiaohu has withdrawn from humanoid robots in batches, including from Xinghai Map and Songyan Power.

This is not an isolated case. This concept hyped up by capital has undergone at least three disillusionsions in history.

In the 1970s, Waseda's WABOT-1 stood on two feet for the first time but moved slowly and consumed enormous energy, only being displayed in laboratories.

In the 1990s, there was a service robot bubble. Honda's ASIMO, costing up to US$2 million, could only serve tea and water, and all related projects were terminated in 2018.

In the 2010s, the myth of social robots, SoftBank's Pepper, was discontinued in 2023 due to mechanical dialogue, high price, and high failure rates.

In the 2020s, the explosion of AI once again awakened the capital frenzy for humanoid robots. But this still cannot conceal the fact that Boston Dynamics has changed hands multiple times, belonging to Google, SoftBank, and Hyundai. Another company, Atlas, has seen decreasing funding due to the disconnection between its showy videos and actual implementation.

Looking back at history, the development of humanoid robots has always been trapped in a cycle: it always begins with stunning laboratory demos attracting capital speculation, leading to valuation bubbles, and ultimately resulting in a collective withdrawal of funds due to commercialization failures.

Countless companies and investors have repeatedly played out the story from hope to disillusionment. There are three main reasons why this cycle repeats itself:

First, burning money is too intense, but a business loop has not been formed, and there is no market, overly reliant on investment. Building a humanoid robot often costs millions, dozens of times more expensive than robotic arms used in factories. No matter how much money investors pour in, it is difficult to find willing customers. Just like Honda's ASIMO, which cost US$300 million to develop, it could only perform tasks like serving tea and water in science museums.

Secondly, the current technology is actually insufficient to support the explosion of humanoid robots and is still in a state of groping in the dark. The current VLA (Vision-Language-Action) model still has an error rate of up to 40% in dynamic environments, far from the level of autonomous decision-making; high power consumption means that most humanoid robots can only work continuously for less than 5 hours, far below industrial demands.

Finally, the direction is off course, with strong entertainment attributes and weak tool attributes. To attract investment, they continuously put on shows like dancing, neglecting practicality. Robots that cannot reduce human burden and improve overall efficiency in vertical scenarios have fueled a larger bubble.

Talking about business without grounding it in technology is just an illusion. It can almost be asserted that even if this wave of humanoid robots does not burst, it will certainly become silent like history. After all, no one is willing to spend hundreds of thousands to buy a large machine that can only dance. But on the other hand, fortunately, we are still far from that "Ava" who can deceive human emotions.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.