Why Did Physical AI Suddenly Go Viral for Its 'Brick-Moving' Capabilities?

06/01 2026 559

Author: Chen Wen

Source: Insight New Research Society

Since the dawn of 2026, a fresh buzzword has taken the AI community by storm: 'Physical AI.'

At the CES exhibition earlier this year, Jensen Huang repeatedly emphasized that 'the next wave of AI will be AI operating in the physical world.' More recently, Justin Sun boldly claimed, 'The dividends of virtual AI have been fully realized; Physical AI represents the most significant opportunity in the next three years.'

On the industrial front, Figure AI, a star company, captured the internet's attention with a five-day continuous live stream of robotic sorting operations. Meanwhile, China's ZhiYuan Robotics announced the rollout of its 10,000th general-purpose embodied robot...

Statements from industry leaders and tangible advancements in embodied intelligence have shifted the industry's focus toward the grand narrative of transitioning from virtual intelligence to physical execution. However, many still ponder: Is 'Physical AI' an inevitable turning point in technological development, or merely a cleverly repackaged concept?

Before addressing this question, let's dissect this somewhat technical term.

Physically, Physical AI refers to artificial intelligence technology that deeply integrates AI with the physical world. However, a deeper look reveals that while virtual AI handles 'thinking and communication,' Physical AI must 'perceive and act.' This means it transcends intelligent agents confined to screens, enabling machines to perceive, understand, and execute complex operations in the real physical world.

In simpler terms, Physical AI empowers autonomous machines (such as robots and self-driving cars) to perceive, understand, and execute complex operations in the real world. Wang Xiang, an executive member of the China Computer Federation, systematically elaborated on this concept at the third China International Supply Chain Expo, stating, 'Physical AI means AI systems possess closed-loop capabilities of 'perception-reasoning-action-feedback' in the real world.'

To put it plainly, while traditional AI could 'chat,' Physical AI can 'get things done.' As AI moves beyond the ChatGPT dialog box into factories, warehouses, and households in the real world, these are the challenges Physical AI aims to tackle.

This distinction is particularly evident in the developments of two prominent robotics companies this year.

One is the U.S.-based Figure AI, which used a five-day live stream to demonstrate that 'robots can truly work.' Starting on May 14, the live stream featured three Figure 03 humanoid robots taking turns sorting express packages on a production line. Their tasks included detecting barcodes, grasping packages, reorienting them, and placing them on a conveyor belt with the barcode facing down.

During the live stream, one robot worked continuously for over 33 hours, handling more than 40,000 packages. Founder Brett Adcock stated that the robots operated in 'fully autonomous mode' using the company's latest Helix 02 model.

The significance of Figure AI's live stream lay not only in showcasing its technical capabilities but also in demonstrating to the world through real-time footage that Physical AI technology has surpassed the 'laboratory demonstration' stage. A company broadcasting live footage of robots working continuously on a production line for days without major issues is, in itself, a powerful technological declaration.

China's ZhiYuan Robotics conducted a similar live stream, deploying its ZhiYuan Elf G2 on a tablet production line (MMIT - Multimedia Integration) at the Longcheer Technology Industrial Park in Nanchang to work alongside humans. Live testing data showed that the robot operated continuously for eight hours without major anomalies, achieving an overall operational success rate of over 99.5%. It completed a single process in just 18-20 seconds, handling 310 products per hour, with one robot capable of undertaking the workload of two processes.

Going a step further than Figure AI, ZhiYuan Robotics officially announced in March that it had achieved the delivery of 10,000 units of the world's first general-purpose embodied intelligent robot, accomplishing this milestone in just over three months, from December 2025 to March 2026.

Beyond delivery volume, ZhiYuan Robotics revealed plans to reach RMB 10 billion in revenue by 2027. Drawing on the development experiences of cutting-edge industries like new energy, autonomous driving, or semiconductors, a company less than two years old achieving mass production and delivery at the 10,000-unit level while setting a RMB 10 billion revenue target is nothing short of remarkable in the hard technology sector.

These two companies have proven with solid data and real-world scenarios that Physical AI no longer relies on remote control or pre-set scripts to 'perform' but possesses the capability to autonomously complete complex tasks in real environments.

More critically, ZhiYuan's achievement of surpassing the 10,000-unit delivery threshold and tying mass production capabilities to existing orders indicates a turning point in this field from 'technical validation' to 'commercial realization.' In other words, the 'feasibility' of Physical AI is no longer in question; the real competition has entered the deep waters of 'usability' and 'economic viability.'

This raises the question: Why did Physical AI suddenly explode this year? Looking back, besides genuine commercialization demand, a series of technological breakthroughs have been the biggest driving force.

First, large language models (LLMs) have brought 'understanding capabilities' to robots. Traditional robots rely on deterministic code and rule-based programming, akin to engineers pre-writing a 'script' that robots follow strictly for every action. This model has a significant flaw: even slight changes in the robot's working environment require rewriting the code, making it less robust and difficult to cross the commercialization threshold.

However, with Google's attempt to combine LLMs with robotic physical execution and the subsequent release of embodied multimodal large models like Google PaLM-E and RT-2 in August 2023, robots gained the ability to automatically break down complex tasks into steps and execute them through natural language instructions. LLMs thus completed the leap from 'dialogue understanding' to 'physical execution.'

At CES 2026, Jensen Huang pointed out the essence of this technological evolution: Physical AI represents a transfer of underlying control. When Physical AI passes the technological evolution threshold, control shifts from human-written deterministic code to neural networks with generalization capabilities that understand physical laws.

At this point, robots no longer merely 'execute code' but possess the ability to 'understand instructions and self-plan actions.'

If LLMs solved the problem of 'understanding,' world models addressed the challenge of 'acting in the physical world.' The core of world models is to enable AI to develop an internal understanding of how the physical world operates.

NVIDIA's release of the Physical AI World Foundational Model Platform, Cosmos, at last year's CES marked a significant milestone. The platform's core capability is generating physically plausible motion data from text or images, allowing developers to use Cosmos to accelerate the development of physical AI for intelligent vehicles, robots, and video analysis AI agents.

According to NVIDIA, Cosmos is trained on over 20 million hours of real-world data, significantly reducing the difficulty of simulation and model training. With world models, AI systems can conduct massive simulated exercises in virtual environments before transferring them to the real physical world.

The ultimate capability of robots is not merely 'seeing' or 'hearing' but 'doing correctly.' The emergence of Vision-Language-Action (VLA) models enables robots to simultaneously process visual input, language understanding, and motion control, achieving a closed loop of 'seeing is doing.'

In September of last year, DeepMind released Gemini Robotics 1.5, a new generation of multimodal embodied intelligence large model, claiming it to be the world's first thinking model optimized for embodied reasoning. NVIDIA introduced Isaac GR00T N1.6, an open-source model designed specifically for humanoid robots, enabling whole-body control.

Meanwhile, the Beijing Humanoid Robot Innovation Center open-sourced the Embodied Cerebellum Large Model XR-1, the first model in China compliant with national standards for embodied intelligence. Trained on over one million data points, it can perform complex dual-arm operations such as picking and placing, pushing and pulling, and rotating.

At this point, Physical AI has 'gathered' the foundational technical capabilities necessary for implementation. LLMs enable machines to 'understand' human intentions, world models allow machines to 'anticipate' physical consequences, and VLA bridges the last mile from 'understanding' to 'doing correctly.' Together, these three elements equip robots with the foundational capability to autonomously execute tasks in open environments for the first time.

Of course, dexterous manipulation still faces bottlenecks, with numerous issues remaining in the fine control of dual arms and hands. In other words, while Physical AI has secured an 'entry ticket' to 'work in factories,' transitioning to 'serving tea and water in households' requires overcoming the qualitative leap from 'rough movements' to 'fine-grained operations.'

Understanding the past and present of Physical AI is crucial. Now, the embodied intelligence industry must confront the question: What core dimensions will the next phase of competition revolve around?

Drawing lessons from the development of autonomous driving, which could not bypass the 'data war,' embodied intelligence—sharing a similar logic—cannot sidestep it either. Generally, whoever possesses higher-quality training data holds the influence.

In the industry, NVIDIA has taken the lead in establishing barriers for world models with Cosmos, whose training model based on over 20 million hours of real-world data is difficult to replicate quickly. Meanwhile, ZhiYuan has completed the mass production and deployment of 10,000 robots, meaning it possesses real-world, feedback-driven data collection capabilities, widely regarded as a data moat in the industry.

It should be noted that the data required for Physical AI competition is not merely about quantity but necessitates collaboration between synthetic and real data.

Relying solely on real data faces scalability challenges and hardware wear-and-tear costs, while excessive dependence on synthetic data creates a simulation-to-reality (sim2real) transfer gap. The 'cross-data-source learning' solution from the Beijing Humanoid Robot Innovation Center is a product of this approach, enabling robots to train using vast amounts of human video data, significantly reducing training costs while improving efficiency.

Thus, it becomes clear: whoever can truly establish a complete closed loop of 'synthetic data training-real data fine-tuning-real-world scenario feedback' will occupy the high ground in this competition.

After resolving the data issue, efficiently integrating Physical AI with Virtual AI becomes crucial for Physical AI to advance further.

A direction often overlooked in current discussions about Physical AI is that Physical AI and Virtual AI are not mutually exclusive. From a technical architecture perspective, a complete Physical AI system can be roughly divided into three layers: the perception layer (sensors, visual recognition) at the bottom, the cognitive decision-making layer (AI reasoning) in the middle, and the action execution layer (mechanical control) at the top.

Virtual AI primarily handles the middle layer, while Physical AI needs to bridge the complete chain from perception to execution.

NVIDIA's full-stack solution of 'chips + models + tools' embodies this approach, with the Jetson Thor edge computing platform providing computational power, the GR00T model delivering intelligence, and the Isaac platform offering development toolchains. Following this solution, whoever can achieve deep integration of hardware and software in the future will not only complete the closed loop of Physical AI from 'brain' to 'limbs' but also establish their technological moat.

Finally, there is the commercialization process of Physical AI. Three years ago, capital's imagination of the robotics field stemmed from 'technological vision.' Now, the capital market has adopted more pragmatic evaluation criteria: delivery capabilities.

Media statistics show that in 2025, the total financing in China's embodied intelligence sector reached RMB 73.5 billion across 744 financing events, with an additional RMB 37 billion-plus added since 2026, surpassing RMB 110 billion cumulatively. However, beneath this flourishing landscape, a noticeable structural shift in capital flow has occurred.

In May 2026, Tianji Intelligence completed a RMB 1 billion Series B financing, with its core bargaining chip being over 10,000 orders in hand for Q1, covering 45 robotics companies.

CAS Fifth Epoch secured hundreds of millions of yuan in Series A financing around the same time, disclosing that it had secured hundreds of millions of yuan in overseas orders.

In the financing rounds of Vita Dynamics and Luming Robotics, industrial investors such as SAIC Shangqi Capital and Mitsubishi Electric entered the fray, aiming to tie production line capacity to robot delivery capabilities.

In contrast, Cartwheel Robotics, a U.S.-based humanoid robot startup, declared bankruptcy in March 2026 despite its technological vision but lacking order support.

These positive and negative cases indicate that capital no longer pays for flashy demos but only for genuine mass production and delivery capabilities.

The sudden popularity of Physical AI may seem abrupt, but it is a natural progression.

Certainly, there are those within the industry who contend that the term 'Physical AI' is largely a rebranding effort by the capital market, and at its core, it remains a natural progression of embodied intelligence and robotics technology. Nevertheless, it cannot be denied that the ascent of Physical AI unmistakably signifies the AI industry's shift from 'virtual intelligence' to 'tangible implementation,' marking an irreversible historical trajectory.

In the most recent competitive landscape, Figure AI demonstrated its prowess on a global scale through live streaming, ZhiYuan Robotics fortified its industrial position through large-scale production and delivery, and NVIDIA constructed a platform ecosystem with Cosmos and GR00T... The pressing questions now are: Which company will emerge as the OpenAI of the Physical AI realm? And in which application scenario will we first witness its 'ChatGPT moment'?

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.