The Rise of Physical AI: Some New Reflections

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

05/18 2026 505

Physical AI represents the ultimate form of AI development, requiring not only an understanding of human instructions but also all the laws governing the physical world.

Recently, a term called 'Physical AI' has been generating a lot of buzz in the industry.

This term was actually mentioned repeatedly by Jensen Huang during his speech at the CES exhibition in Las Vegas early last year, but it wasn't until this year that 'Physical AI' truly saw an explosion in popularity.

So, what exactly is 'Physical AI'?

A few days ago, I saw a video of a robot watering plants. The robot first walked to the faucet, turned on the valve to fill a watering can, then turned and walked to the flowerpot, adjusted its angle, and evenly poured water without bumping the rim of the pot or spilling any water.

To get a machine to understand 'carrying a cup of water,' it needs to know that the cup is cylindrical, calculate how much force to apply to hold it without slipping or breaking, understand that water is a liquid and may spill if shaken, and adjust its arm angle in real-time while walking to counteract body movements.

Humans can do these things intuitively by the age of three. But for AI, this represents a huge leap. Over the past decade, AI has learned to see, hear, speak, and draw, but it has remained confined to the screen. What Physical AI aims to do is place this intelligent brain into a body capable of moving, jumping, grasping, and releasing in the real world.

Simply put, Physical AI enables AI to understand and interact with the physical world. It's no longer just about processing text and images but making correct movements in environments where gravity, friction, and inertia come into play.

One rarely discussed fact in China is that the term 'Physical AI' did not originate from the PR departments of some chip giant. The concept was first introduced in a 2020 paper published in Nature Machine Intelligence, which systematically defined Physical AI for the first time:

A class of physical entity systems capable of performing tasks typically associated with intelligent organisms, with the core being the deep integration of physical laws into artificial intelligence systems, enabling machines to no longer be 'physically blind' and to complete closed loops from perception to action.

From the academic shot heard in 2020 to the industry-wide takeover in 2026, six years have passed. During this time, sensor costs have dropped by several orders of magnitude, edge AI computing power has moved from theory to engineering, and the reliability and mass production capabilities of robot bodies have quietly reached a tipping point—these are the hidden drivers propelling Physical AI from academic papers to production lines.

From Demonstrations to Real Work

If 2023 was the year large language models taught AI to chat, then 2026 can be summed up in one word for Physical AI: work.

The changes are visible to the naked eye.

Around this time last year, robotics companies showcased their capabilities by releasing demo videos, setting up scenes, rehearsing repeatedly, and filming in one continuous take. While visually impressive, you didn't know how many attempts it took.

This year, the approach is entirely different. This year, ZhiYuan Robotics did something on a 3C production line in Nanchang: they threw robots into a real factory, had them work continuously for hours, and broadcast the entire process live. There was no preset script, no limited scenarios—just the same production line workers face daily. Hundreds of thousands of people watched online.

A month later, ZhiYuan announced in Hong Kong that it had achieved mass production of 10,000 humanoid robots. Moving from a single prototype in the lab to 10,000 units on a factory production line represents a fundamental shift.

ZhiYuan's approach is interesting. Most robotics startups focus on a specific area—some on the robot body, some on large models, some on dexterous hands. ZhiYuan chose a different path: developing a full stack while simultaneously investing in four directions—body manufacturing, AI models, dexterous operations, and data collection. They've also invested in over 60 companies upstream and downstream in the supply chain.

The trade-offs of this approach are clear. The parent company employs over a thousand people, with the number expected to rise further by the end of this year. Annual salaries alone amount to billions of yuan. This path is expensive, but once it pays off, the barriers will be the highest.

Deng Taihua, the founder of ZhiYuan, proposed an analytical framework called the 'XYZ Curve.' He said the development of embodied intelligence has three stages: X is the exploration phase, where everyone is still playing with demos; Y is the deployment growth phase, where robots begin to work on real production lines; Z is the final stage of intelligent emergence.

He characterizes 2026 as 'the first year of deployment, officially transitioning from 'capable of movement' to 'capable of work.'' The difference between 'capable of movement' and 'capable of work' may seem like one word, but it represents the coming-of-age ceremony for the entire industry.

Overseas, the pace is no slower.

Figure AI, an American humanoid robotics company, is a major player in this track (translated as 'track' or 'field'). Last September, they completed a funding round of over $1 billion, reaching a valuation of $39 billion, making them the highest-valued humanoid robotics company globally at the time.

A month later, they released their new product, Figure 03, standing at 1.68 meters tall and weighing around 60 kilograms, demonstrating tasks like watering plants, serving food, and folding clothes. Founder Brett Adcock specially (translated as 'deliberately') added on social media that all actions were autonomously completed by the robot without remote control.

Notably, Figure made a significant strategic shift by terminating its collaboration with OpenAI and fully transitioning to its self-developed neural network system, Helix.

This system mimics human cognition with a three-layer structure: the bottom layer manages balance and instinctive reactions, the middle layer translates brain commands into motor control at 200 times per second, and the top layer serves as the logical brain, responsible for understanding scenes and making decisions. This 'instinct-reflex-thinking' three-layer architecture is quite clever, effectively equipping the robot with a nervous system that won't crash.

Another noteworthy development is NVIDIA's announcement at the GTC conference this year that it has formed deep collaborations with the world's four major industrial robotics giants—ABB, KUKA, Yaskawa, and Fanuc. Over 2 million industrial robots already installed on production lines globally will now be able to undergo virtual debugging and AI training through NVIDIA's simulation platform.

These four companies combined account for more than half of the global industrial robotics market. Over the next decade, these robots will undergo an upgrade from 'traditional programming' to 'AI-driven' systems. The software platform that can integrate into this process will essentially gain control over the 'operating system' layer of the next generation of industrial automation. NVIDIA clearly doesn't want to miss this opportunity.

Cross-Industry Supply Chain Surge

Another interesting phenomenon is the large-scale (translated as 'large-scale') influx of automotive supply chain companies into the Physical AI track (translated as 'track' or 'field').

At this year's Beijing Auto Show, established automotive suppliers like Aptiv, Valeo, Horizon Robotics, and Qianxun SI showed off robotics-related solutions. Many industry insiders recognized that embodied intelligence perception shares similarities with automotive intelligent driving perception—automotive solutions can be directly applied to humanoid robots.

Upon closer inspection, this makes sense. An automotive intelligent driving system is essentially a 'mobile robot' perception-decision-execution closed loop, with its visual perception, path planning, and real-time control modules highly homologous in technical architecture to traditional industrial robots and humanoid robots.

Cameras, radars, drive-by-wire chassis, and real-time operating systems in the hands of automotive suppliers can be adapted for the robotics field with minimal changes. In this sense, the hundreds of billions of yuan invested in automotive intelligence over the past decade is now flowing into the Physical AI track (translated as 'track' or 'field') through 'technology spillover.'

This may explain why Chinese robotics companies have been able to rush into mass production so quickly. Manufacturing capabilities and supply chain management don't emerge out of thin air—much of it is readily available. Many component suppliers that have been honing their skills on automotive production lines for over a decade are now entering a new battlefield.

There are clear overseas examples. Take Tesla, whose first-generation humanoid robot, Optimus, is also accelerating its entry. Tesla explicitly announced during its Q1 2026 earnings call that the company is transitioning toward 'a future centered on AI, autonomous robotaxis, and humanoid robots,' with the first-generation robot production line set to reach 1 million units in capacity and replace existing Model S and Model X production lines.

While 1 million units may seem exaggerated in today's context, Tesla's logic is clear: it aims to replicate its mass production capabilities and supply chain management experience from automotive manufacturing directly into the humanoid robotics field.

Musk doesn't want just a 'robot capable of movement' but a 'mass-produced tool' capable of collaborative work with humans in factories. If this path succeeds, its impact on manufacturing automation will be no less than that of the Model 3 on the fuel car market.

Why World Models Suddenly Became Viable This Year

After discussing the major industry moves, let's zoom in one layer deeper. What is the technological foundation of this Physical AI competition?

In one sentence: the engineering breakthroughs in world models. I believe this is the most crucial point for understanding this wave.

The concept of 'world models' isn't new—it was proposed in 2018 with a simple core idea: enable AI to develop an internal understanding of how the physical world operates so it can predict outcomes like 'what will happen if I push this cup.' However, this remained largely theoretical—too computationally intensive, with unstable generation quality and no real-time interaction capabilities.

The turning point came in the past year. NVIDIA introduced a series of models called Cosmos, whose core capability is generating physically plausible action data from text or images.

For example, to train a robot to move boxes in various weather conditions, you don't need to film videos in real rain, snow, or midnight factory settings. By setting parameters in a simulated environment, Cosmos can directly generate massive amounts of highly realistic training data covering extreme scenarios.

Early this year, Ant Group's Lingbo team open-sourced a framework called LingBot-World, specifically designed for interactive world models. It can generate nearly 10 minutes of continuous, stable video with end-to-end interaction delays controlled at the second level. Users can control virtual characters in real-time using a keyboard and mouse, with the model instantly reflecting scene changes. The significance lies in transforming world models from 'offline rendering' to 'online interaction,' boosting training efficiency by an order of magnitude.

Startup company Jijia Shijie released the GigaWorld-1 platform, positioned as a 'digital sandbox' for the physical world. A month later, Alibaba's ABot-PhysWorld surpassed it on a benchmark called WorldArena, claiming the top spot in overall rankings. Competition is advancing by the month.

The importance of these open-source projects lies not in their parameter counts but in transforming a 'giants-only' game into a tool accessible to small teams. When enough people are building the wheels, more vehicles will actually run.

The reason world models have become essential in the Physical AI era is that they answer the long-standing question: how can robots learn the complex laws of the physical world cost-effectively and efficiently?

Acquiring real-world training data is extremely expensive and inherently biased—it's hard to gather all edge scenarios like factory night shifts during blizzards, emergency responses to power outages in logistics warehouses, or sudden human interventions on production lines. But synthetic data can. By manipulating scene parameters with prompts in simulated environments, researchers can generate large-scale training videos covering extreme conditions within hours—a process that would take months or even years through traditional real-world data collection.

This breakthrough's leverage effect likely surpasses any single algorithmic improvement.

A Paradigm Shift

The breakthrough in world models is just one part of the evolution in Physical AI's technology stack. Changes in underlying technologies are driving a restructuring of the entire robotics industry.

Traditional robots use a 'perception-planning-control' three-stage approach: sensors perceive the environment, engineers write rules for path planning, and finally, actions are executed. This works fine in structured environments like factory assembly lines but falls short in complex scenarios—machines can only follow preset scripts and get stuck when encountering unfamiliar situations.

Physical AI takes a different path: 'perception-reasoning-execution.' After perception, instead of following human-written rules, trained neural networks reason independently about what actions to take and then execute them. The fundamental difference is that the former relies on 'engineers thinking for the machine,' while the latter enables 'the machine to understand the physical world itself.'

The International Organization for Robotics Standards released a technology roadmap this year, predicting that within the next three years, 80% of new robot models will adopt this new architecture, with traditional three-stage approaches gradually fading from the mainstream. This isn't minor tweaking—it's a complete paradigm shift.

As one industry expert aptly summarized: Physical AI represents the ultimate form of AI development because it requires not only understanding human instructions but also all the laws governing the physical world.

Jensen Huang said the 'ChatGPT moment' for robot development has arrived. In my view, the 'ChatGPT moment' for Physical AI differs fundamentally from that of language models. The language model moment let ordinary people worldwide use AI for the first time. The Physical AI moment marks the first time AI truly begins to work.

This track (translated as 'track' or 'field') is now at a unique stage: the direction is locked in, the concept is recognized, but the pattern (translated as 'landscape' or 'competitive landscape') remains unsettled.

On one hand, demonstrating capabilities and achieving mass production are entirely different skill sets. While a prototype may work once, producing 10,000 units that perform reliably in real-world scenarios tests manufacturing consistency, supply chain resilience, scenario generalization, and maintenance systems—factors largely unrelated to AI algorithms but capable of eliminating many players. On the other hand, the high cost, long cycles, and narrow coverage of real-world data collection almost guarantee that Physical AI's large-scale training will heavily rely on synthetic data.

Meanwhile, industries seemingly unrelated to 'AI'—such as automotive supply chains, traditional industrial automation, and consumer electronics contract manufacturing—are accelerating their entry into Physical AI through technology spillover. Their manufacturing capabilities, supply chain management experience, and scenario resources may be the key variables determining Physical AI's implementation speed.

An intuitive judgment is that, much like the 2023 AI wave triggered by ChatGPT, where infrastructure providers ultimately captured the most value rather than model companies, the Physical AI wave might follow a similar pattern.

NVIDIA's layout (translated as 'strategic layout ' or 'strategic moves') suggests it's betting on this direction, but the story isn't over yet. 2026 marks the first year of deployment, and industrial competition has only just begun. When we look back in three years, which names will still be at the table and which will have exited may surprise most people.

This article is original to Xinmou

— END —

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links