06/02 2026
364
On June 1st, I-Drive announced that the cumulative commercial operation mileage of its fully autonomous physical AI has surpassed 160 million kilometers. At the same time, leveraging the extensive data amassed from these diverse and complex real-world scenarios, I-Drive officially launched its Physical AI World Model—TransWorld.
Built on multimodal heterogeneous data, with fine-grained physical interaction information at its core, and aiming for cross-scenario and cross-form generalization, the model establishes a closed-loop, self-evolving "Physical AI Flywheel" through triple heterogeneous data collection, four-stage data refinement and distillation, and a five-tier pyramid model architecture. This signifies I-Drive's entry into a new era of data-driven, model-evolving, and globally deployable physical AI.

Triple Multimodal Heterogeneous Datasets Form the Bedrock of Physical AI
"Algorithms and computing power provide the framework for physical AI, but its intellectual potential is ultimately determined by the 'real-world density' of the data," stated Zhang Dezhao, Chairman and CEO of I-Drive. "Computing power can be rapidly scaled with financial investment, and algorithms can be advanced through talent acquisition. However, high-quality real-world data cannot be rushed—it must be cultivated in authentic scenarios and accumulated gradually over time. This makes data the most time-intensive and critical asset among the three key components."
Guided by this strategic vision, over its 11-year journey, I-Drive has gathered 160 million kilometers of commercial operation data by deploying systems across varied scenarios, including extreme operations, intelligent cleaning, and smart mobility. These datasets exhibit three key characteristics: multidimensional, multimodal, and multiscenario heterogeneity, enabling deep engagement with the physical world.

The first dimension is multidimensional heterogeneity. To equip physical AI with comprehensive perception, I-Drive utilizes a diverse sensor array for full-dimensional data fusion. Beyond LiDAR, cameras, and IMUs, actuators like brush motors capture physical parameters such as ground friction coefficients, contact torques, and material resistance, offering fine-grained mechanical interaction data like contact feedback. These real-world physical feedbacks effectively address the long-tail interaction blind spots that simulation environments often miss, providing the most valuable and authentic "superfuel" for deep physical AI training.
The second dimension is morphological heterogeneity. I-Drive's data spans multiple terminal forms, including the Snail Little White (Wo Xiaobai) AI cleaning robot, extreme operation agents, AI rovers, and Robobuses, enabling natural cross-form validation and generalization. Obstacle avoidance strategies learned in cleaning scenarios can be applied to navigation in narrow passages for inspection robots; rainy-day operation plans developed in sanitation can guide shuttle vehicles in adverse weather.
The third dimension is scenario heterogeneity. The data comprehensively covers various real-world environments, from industrial warehousing and transportation hubs to commercial buildings, hospitals, campuses, parks, scenic areas, and even extreme wild environments without GPS signals. This full-scenario coverage, deep penetration, and complete physical interaction patterns provide rich real-world textbooks for algorithm models, enabling them to balance generalization capabilities with real-world execution performance when facing diverse scenario combinations.
Four-Stage Refinement and Distillation: Crafting Superfuel for the World Model
Zhang Dezhao emphasized, "Beyond data scale, the competition in general intelligence hinges on data value. Data detached from real physical interactions and complex scenario experiences is merely lifeless numbers. The competitive edge of physical AI lies in extracting the most nuanced physical laws and common-sense knowledge from massive mileage."
I-Drive refined its 160 million kilometers of data through a four-stage process:

The foundational level consists of de-identified multimodal data that, after compliance processing, supports self-supervised pre-training and physical law learning for the world model.
The second level refines structured scenario data. Through fine-grained labeling and clustering, this data serves as environmental templates for simulation training, building a robust and extensible foundational scenario library for the model.
The third level accumulates interaction and game-theoretic data, capturing dynamic human-machine interaction characteristics across low, medium, and high-speed domains. This forms the optimal corpus for training physical AI's non-verbal intent prediction and advanced decision-making capabilities.
The fourth level, the "golden set" of high-quality annotated data, undergoes dual human-machine review to transform abstract environments into structured slices with fine-grained physical attributes. This provides the framework for AI to understand physical common sense, serving as reward evaluation functions during reinforcement learning to clearly define optimal and suboptimal behaviors and guide model deliberation and trial-and-error.
Five-Tier Pyramid Architecture: Driving the Evolution of Physical AI
Built on 160 million kilometers of high-value data, I-Drive developed the five-tier pyramid architecture world model TransWorld. From bottom-layer multimodal perception to top-layer cross-form generalization, TransWorld forms a complete closed-loop self-evolving system.

The L1 Multimodal Perception Foundation Layer condenses multimodal data into a structured unified environmental representation, establishing physical spacetime consistency to capture the rich details of the physical world comprehensively and precisely.
The L2 Physical Common Sense Internalization Layer extracts motion mechanics laws, such as friction and inertia, along with object permanence, from massive perceptual data, constructing a "physical common sense brain" that enables the model to understand causal relationships between machine actions and physical feedbacks, granting it the ability to predict future states.
The L3 Physical World Simulation Layer constructs a virtual simulation environment, breaking through the scale limitations and reuse constraints of real data to supply high-quality, diverse samples for model training.
The L4 Reinforcement Learning Training Layer uses real data to correct simulation biases in the model, driving intelligent agents of different forms from "learning to think in simulation" toward "acting robustly in reality."
The L5 Generalization and Reasoning Layer breaks through the generalization bottleneck of transferring foundational physical cognition to different intelligent agents. With just a small number of adaptation samples, it achieves "learn once, deploy across multiple forms."
From absorbing multimodal nutrients at L1 to empowering diverse intelligent agents at L5, every successful cross-form and cross-scenario deployment of physical AI injects a continuous stream of real-world interaction data into the system. These data feed back to the foundational cognition downward and nourish reasoning strategies upward, forming a closed-loop self-evolving "Physical AI Flywheel." This self-driving force, unconstrained by specific hardware forms and capable of continuous evolution in real hybrid physical worlds, is the engine propelling I-Drive to lead physical AI across technological singularities and reconstruct productivity.
I-Drive will leverage TransWorld as its super foundation, relying on its continuously evolving physical AI brain, to build a globally leading mobile intelligent agent platform and drive deep integration between digital intelligence and the physical world.