06/09 2026
450
Over the past year, the embodied AI sector has achieved a fundamental consensus: training robot models capable of functioning effectively in the physical world necessitates access to large-scale, real-world machine data, measured in millions of hours.
However, once this consensus is reached, genuine disagreements arise.
Currently, the industry explores multiple data collection pathways: real-world machine collection, remote operation, and real production deployment, each underpinned by distinct technical approaches, cost structures, and commercial visions.
On June 5th, Force Robotics announced the successful completion of its merger with logistics robotics firm Atomix through equity acquisition, alongside securing a new round of financing. Leading domestic large model companies, including Zhipu, Stepfun, SenseTime, and Alibaba, have all invested, with additional support from core industry players like Huaqin and SAIC Hengxu.
This move represents more than a mere corporate merger; it signifies a strategic bet on the data collection route for embodied AI: integrating model capabilities into real business workflows to enable robots to establish data flywheels on actual production lines.
In just over a year since its inception, Force Robotics has constructed a closed-loop system encompassing models, frameworks, evaluation, and scenarios. Its embodied native large model, DM0, leads in real-world machine evaluations and has accumulated extensive industrial practice within the logistics sector. Atomix, on the other hand, boasts a business network handling over 600,000 daily shipments, spanning more than 20 countries globally, and serving nearly a hundred brands.
The significance of this merger lies in its approach to embodied AI: high-quality real-world machine data is often not merely "collected" in laboratory settings but "generated" under the pressures of real business operations. Order fluctuations, long-tail products, occlusions, misgrabs, exception handling, system integrations, and continuous operations—the complexity of the real world provides the most valuable resource for training robots' generalization capabilities.


Real Machine Data Gains Consensus, Yet Collection Pathways Diverge
As embodied AI progresses, the industry has formed a foundational understanding: a robot's cognitive capabilities are largely constrained by the volume of real-world data it has been exposed to.
However, the true debate lies in the method of data acquisition.
The first pathway is real-world machine collection.
This involves deploying robots in real-world scenarios to perform tasks while recording the entire process. For instance, robots picking items in warehouses, moving bins, or sorting packages.
Some leading Chinese companies advocating for real-world machine collection promote a diversified data collection philosophy. They oppose an over-reliance on "clean data" and instead advocate for open, goal-driven, and diversified data collection, allowing robots to learn from messier, more continuous, and realistic experiences.
Nevertheless, the real-world machine collection route faces challenges, such as difficulties in cold starts, high failure costs, and significant data noise. If a robot's initial capabilities are insufficient, it cannot be directly deployed at scale in real-world scenarios.
The second pathway is remote operation collection.
Remote operation collection involves humans remotely controlling robots through devices, enabling the robots to mimic human actions to complete tasks. Common methods include VR controllers, master-slave robotic arms, exoskeleton gloves, motion capture devices, and force feedback devices.
The advantage of remote operation lies in its controllable quality and suitability for cold starts, particularly for collecting data on fine operations, dual-arm coordination, and multi-step tasks. When robots are unable to perform tasks independently, having humans guide them initially represents the most direct training method.
However, remote operation collection also has inherent limitations. It requires operators, equipment, venues, and robot maintenance, with limited daily operating hours per person. Achieving millions of hours of real-world machine data solely through human effort is challenging. Moreover, remote operation data essentially teaches robots "how humans do it," which may not align with the optimal approach for robots.
The third pathway is real business deployment.
The key distinction of this pathway is that it does not prioritize designing collection tasks around training but instead allows robots to enter real business scenarios, naturally generating data during continuous operations.

The merger between Force Robotics and Atomix can be viewed within this framework. Atomix provides AI-native flexible warehousing solutions for logistics and warehousing scenarios, enabling seamless integration of "storage—transportation—sorting" through multi-type robot collaboration. According to public information, Atomix has completed over 500 project deliveries, serving more than 60 brand clients, including industry leaders like Uniqlo, Mixue Ice Cream & Tea, and CATL. Its pallet four-way shuttle business has also gained engineering delivery experience in large-scale projects, with a single project scheduling up to over 600 robots.
This not only signifies business scale but also, from the perspective of embodied AI data, represents an entry point capable of continuously generating real tasks and feedback.

High-Quality Real Machine Data Is Generated from Business Operations, Not Collected
In embodied AI, high-quality data is not specifically collected for training purposes but is generated under the pressures of real operational conditions. This represents the true divergence between the real business deployment pathway and other pathways.
Historically, the industry has focused more on how to collect more real-world machine data, leading to pathways such as remote operation, low-cost data factories, and diversified collection. These address the issue of data supply.
However, the real business pathway poses a different question: how can robots first operate effectively in business scenarios and naturally generate data in the process? This involves three layers of logic.
1. Picking: A High-Frequency, Verifiable, and Closed-Loop Atomic Task Entry Point
Logistics scenarios are suitable as an early entry point for embodied AI due to their numerous high-frequency, repetitive, and result-clear operational tasks. The most typical example is picking.

Picking encompasses a set of continuous capabilities: robots must first identify objects, judge their positions and orientations, then complete grasping, moving, and placing while adjusting and recovering from misgrabs, drops, occlusions, and spatial constraints. Picking covers the core training units of embodied AI: visual understanding, spatial positioning, end-effector control, object interaction, failure recovery, and continuous execution.
Additionally, picking possesses several characteristics highly suitable for training models: high frequency, clear and verifiable results, and transferability to other tasks.
According to public information, Atomix possesses the largest and most authentic logistics picking data source in China's embodied AI sector. For Force Robotics, such data represents the "real fuel" needed for continuous model iteration.
2. Real Business Data: Abundant, Result-Feedback-Inclusive, and Recovery-Path-Inclusive
Data from real production lines is more complex and valuable. It records not only what robots do correctly but also how they make mistakes, who takes over, how the system handles exceptions, how tasks are recovered, and whether fulfillment is ultimately completed.
In other words, real business data inherently contains three types of training value:
Success paths, demonstrating to models how to complete tasks;
Failure samples, highlighting to models where problems are likely to occur;
Recovery processes, showing models how to continue after errors.
3. Massive Order Volumes Drive Robots to Evolve from Single-Point to Systemic Capabilities
In laboratories, a robot completing a single grasp, sort, or transport task can suffice as a demonstration. However, in real business scenarios, this is far from sufficient.
Only real order flows can continuously generate real tasks; only continuous tasks can produce continuous data; only continuous data can enable sustained model iteration. This represents the fundamental difference between the data factory pathway and the real business pathway.
Tang Wenbin, founder and CEO of Force Robotics, mentioned that the scaling of embodied large models must shift from manual collection to industrial scenarios. This statement reflects the ongoing changes in the industry: the expansion of embodied AI data cannot rely solely on laboratory collection and remote operation labor indefinitely but must ultimately enter real industrial processes.
Of course, this pathway is not without challenges.
Logistics scenarios are indeed high-frequency, essential, and verifiable, making them suitable as an early entry point for embodied AI. However, transitioning from logistics picking to more open scenarios like households, supermarkets, elderly care, and service robots involves multiple challenges in task complexity, environmental openness, cost structure, and safety liability.
Nonetheless, it represents a more practical industrial pathway and development direction: first establishing data closed loops in high-frequency real-world scenarios, then expanding model capabilities to more tasks and scenarios.

Embodied AI Competes on the Deep Integration of Models and Scenarios
The China Academy of Information and Communications Technology (CAICT) stated in the "Embodied AI Development Report 2025" that embodied AI represents a new technological paradigm emphasizing the deep integration of "data-driven" artificial intelligence with "scenario-driven" robotics. The true closed loop of embodied AI lies in the deep integration of scenarios and models.
From an industrial perspective, the merger between Force Robotics and Atomix also represents a reorganization of closed-loop capabilities:
Force Robotics provides model and infrastructure capabilities:
DM0 is the world's first embodied native large model, with only 2.4 billion parameters, yet ranks first globally in RoboChallenge real machine evaluations. Its advantage lies not just in model performance but in its design for the real physical world: through multi-source data pre-training, multi-task cross-robot pre-training, and fine operation capabilities in complex tasks, it enables the model to learn more universal physical laws from different data, tasks, and robot bodies.

Dexbotic is an open-source framework for embodied AI development, supporting multi-source data hybrid training, cross-robot adaptation, and seamless integration of imitation learning and reinforcement learning. Public information shows it has served universities such as Tsinghua, Peking, Princeton, and Imperial College, as well as enterprises like Tencent and Beijing Humanoid, gathering thousands of developers.
RoboChallenge is a real machine evaluation platform co-initiated by Force Robotics and Hugging Face, attracting nearly 20 embodied AI companies for joint operation and serving over 80,000 real machine tests globally.
Atomix brings logistics scenarios, project delivery capabilities, and sustained order flows, addressing "where robots operate and how they operate continuously."
This integration of models and scenarios will be further reflected in the upcoming logistics robot system "Flyda." According to disclosed information, "Flyda" will target hybrid operation scenarios involving multiple robot types, highlighting DM0's cross-robot application, three-level sorting, and multi-type robot collaboration.
With the merger completed, Force Robotics' organizational structure has further globalized. It is forming a "Beijing + Five Global Locations" framework: Beijing handles model R&D and technological core, Hong Kong serves as the global headquarters, Singapore hosts Atomix's global commercial headquarters and radiates the Southeast Asian market, Japanese and Korean subsidiaries stay close to key clients like Fast Retailing, and the U.S. office is planned to launch in June 2026.
In July, Force Robotics will also release its next-generation DM model, general-purpose robot bodies, and next-generation application infrastructure. Meanwhile, the company has initiated the "Global 100 Talent Recruitment Plan," opening core positions in embodied AI core algorithms, robot learning, large model training, and engineering implementation to match global client and market demands.
Conclusion
2026 has been dubbed the "Year of Embodied Data" by many industry insiders.
As consensus grows around the need for million-scale real-world machine data, the key competition in embodied AI lies in where the data comes from, at what cost, and whether it can be sustained. From a long-term perspective, shifting from manual collection to industrial scenarios is the inevitable path for real-world machine data.
Because the future winners in the embodied AI race will not be just companies with the strongest single-point models but players who simultaneously possess real scenario entry points, scalable data engineering capabilities, embodied native model capabilities, and commercial closed-loop capabilities.
END This article is an original work by "Intelligent Evolution Theory." Welcome to follow us.
"Intelligent Evolution Theory" focuses on in-depth commentary in cloud computing and intelligent technology. We aim to interpret technological trends and provide insights into technology-driven business transformations through accessible language.