06/16 2026
468

In the latter half of the global world model competition, Xingyuan Intelligence has already secured its place.
The world model is rapidly emerging as the next frontier in artificial intelligence.
Recently, AI luminaries Li Feifei and Yann LeCun have both placed their bets on world models. Earlier, tech giants like NVIDIA, Google, and OpenAI have also identified 'world models' as a promising pathway to next-generation intelligence. At the 2026 Zhiyuan Conference on June 16, Wang Zhongyuan, President of the Zhiyuan Research Institute, underscored the significance of world models and delineated four distinct technical routes, sparking widespread discussions on the topic.
During the conference, Xingyuan Intelligence, an embodied AI company nurtured by the Zhiyuan Research Institute, officially launched the world's first Embodied Interaction World Model—ω-EVA (Omega-EVA), marking its formal entry into the validation phase of transitioning from the digital to the physical realm.
This launch is more than just a product introduction; for the Xingyuan Intelligence technical team, it represents a robust response to the challenge of 'how to implement world models in embodied AI' and a gateway to the realm of 'physical AGI'.
However, securing a place in the competition is not just a positive signal for embarking on the world model journey; it also signifies that the adventure has just begun.
01. A Ticket to the Future
The term 'world model' is currently the hottest topic in the AI community, yet it lacks a unified definition.
Under the Zhiyuan Research Institute's framework, Wang Zhongyuan categorizes the technical routes of world models into four types: the VLA route centered on language, the video generation route centered on pixels, the simulation route centered on 3D structures, and the JEPA route centered on visual representations.
The Embodied Interaction Model ω-EVA released by Xingyuan Intelligence takes a different approach: it focuses on latent space modeling centered on 'interaction' and 'action,' emphasizing how world models can genuinely integrate into a robot's real-time decision-making process, transforming them from training aids into active feedback modules in action decision-making.
Specifically, the ω-EVA model is characterized by its innovative closed-loop logic of 'Envision, Verify, Act.' By learning the dynamic relationships between 'current state—action—future state,' the model captures moving objects, interaction targets, and key state changes in latent space, applying this capability to action generation, internal envisioning, and action correction.
In simpler terms, Xingyuan Intelligence prioritizes the causal relationship between robot actions and environmental changes, enabling robots to predict how their actions will alter the physical world's state, rather than merely relying on the temporal correlation of visual frames to generate future scenes.
At the Zhiyuan Conference exhibition area, Xingyuan Intelligence used a randomly scrambled Klotski puzzle board to demonstrate this capability intuitively.

Figure: Xingyuan Intelligence Booth
According to Xingyuan Intelligence staff, the robot restoring the Klotski puzzle board doesn't just recognize the pieces; it first 'imagines' the global changes after moving the sliders in its mind before executing the actions, optimizing the action plan based on the deduction results, and ultimately completing the restoration.
This is not a preset program but real-time reasoning by the model about physical constraints and causal relationships. In Sun Zhenguo's view, co-CEO of Xingyuan Intelligence and head of the Embodied Interaction World Model Research Center at the Zhiyuan Research Institute, 'world models should not just predict the future during training but should genuinely participate in action generation.'
He further elaborated using the Klotski puzzle as an example. Although it appears to be a logical reasoning problem, for robots, it is closer to a continuous decision-making process: each move alters the current situation and affects the feasibility of subsequent paths. Robots need to understand not only 'what they see now' but also 'what will happen next if they move this way?'—this is precisely the question that Xingyuan Intelligence's Embodied Interaction World Model ω-EVA attempts to answer.

Figure: Sun Zhenguo, Co-founder of Xingyuan Intelligence
For embodied AI companies, real-world tasks do not neatly fit into fixed time windows; they resemble a series of naturally connected events—reaching out, touching, grasping, moving, and putting down—with each key change corresponding to a natural joint in the actions.
Currently, most world models are still confined to 'offline prediction' or 'environmental simulation,' serving as auxiliary functions. When performing tasks, robots often passively execute instructions and cannot predict the chain reactions of their actions, leading to insufficient operational stability in complex physical environments.
This is precisely the problem Xingyuan Intelligence aims to solve.
In terms of technical routes, Xingyuan Intelligence adopts a collaborative 'embodied cerebrum and cerebellum' scheme in its technical architecture, aligning with the 'fast-slow system' approach of embodied AI stars like 1X and Figure AI.
Current world models include future video generation routes and latent space representation learning routes such as JEPA. Xingyuan Intelligence's ω-EVA also adopts latent space modeling but further emphasizes real-time interaction between action candidates and imagined consequences, allowing future prediction results to directly participate in action correction.
To a certain extent, this edge deployment addresses the pain point of embodied AI—high latency. Liu Dong, CEO of Xingyuan Intelligence, made an analogy: 'If a robot carries more than a dozen sensors and transmits several gigabytes of data per second to the cloud, by the time the instructions return a few seconds later, the robot may have already collided with an obstacle.'
He said, 'This is also the advantage of ω-EVA. It redefines the role of world models in robot systems, allowing prediction results to genuinely feed back into action generation. This can compensate for the technical shortcomings in the field of embodied AI and will also empower various types of robots in industrial operations, services, healthcare, and other categories, accelerating the large-scale deployment of general-purpose embodied AI technology.'
02. Opportunities and Challenges
Despite being a newcomer to the field, having entered only last year and founded just ten months ago, Xingyuan Intelligence, an embodied brain business startup incubated by the Zhiyuan Research Institute, has raised a total of 1 billion yuan in funding through a full-stack 'software-hardware integration and edge deployment' route and has made early strides in commercialization by securing orders.
In terms of business strategy, Xingyuan Intelligence has a clear vision, explicitly stating that it does not manufacture robot bodies but serves as a 'shovel seller,' providing software-hardware integrated embodied brain solutions. Founder and CEO Liu Dong revealed that China's manufacturing industry does not lack hardware capabilities; what is truly scarce is a 'brain' that can adapt across different bodies and scenarios.
As embodied AI transitions from single-point actions to complex task execution, the role of the 'brain' becomes increasingly vital, with world models emerging as a key capability within industry consensus.

Figure: Xingyuan Intelligence Robot Serving Coffee On-Site
Despite securing a place in the world model competition, the challenges faced by Xingyuan Intelligence and the entire industry cannot be overlooked.
On the one hand, the 'route debate' over technical paths persists.
Although ω-EVA has demonstrated the possibility of an 'interaction closed loop,' currently, existing routes represented by VLA (Visual-Language-Action) models still dominate the mainstream market.
Wang Zhongyuan previously admitted, 'VLA is the present; world models are the future.' In this regard, Xingyuan Intelligence does not deny the importance of language understanding but remains grounded in the VLM model. Although VLA models have achieved good deployment results in specific scenarios such as industrial sorting and service robots, their limitations are evident, including poor generalization, lack of physical common sense, and insufficient active exploration capabilities.
For Xingyuan Intelligence, the ambition of ω-EVA lies in bridging the gap between 'prediction' and 'action.' How to make the market accept this hybrid paradigm of 'VLA + world model' still requires educational efforts.
On the other hand, there is the 'scarcity dilemma' of real physical data.
Building a true world model requires full-modal data covering force, touch, depth, and 3D point clouds. Not only is it challenging to collect data from different scenarios, but it is even more difficult to capture 'effective data' from this data.
Sun Zhenguo pointed out that VLA relies on high-quality successful trajectory data, with 8 hours of collection yielding only 3 hours of effective data; while world models can utilize failed trajectories, increasing efficiency to 6-7 hours. However, this still does not fundamentally solve the scarcity of long-tail scenario data.
In addition, large-scale commercial deployment still requires exploration.
Although Xingyuan Intelligence has secured more than 70% of the market share among leading robot body manufacturers and claims to have raised over 1 billion yuan in funding, Liu Dong still calmly compares the current state of embodied AI to 'autonomous driving in 2015 and 2016'—everyone is aiming for L4 and L5, but true L2 has not yet been mass-deployed.

Figure: Robots on the Assembly Line
For Xingyuan Intelligence and the entire industry, securing a place is just the beginning of the journey.
Especially at present, world model technology is still in its early stages of development, and the industry has not yet reached a complete consensus. 'Technological innovation precedes products and systems. We now need to explore various technical paths to promote the breakthrough of world models,' Wang Zhongyuan admitted. 'But ultimately, a system or product for a specific scenario is needed to prove that the technical goals we repeatedly emphasize today, such as physical verifiability, long-term sequencing, and causal logic inference, can truly be applied in various scenarios.'
From the laboratory to the real world, Xingyuan Intelligence has demonstrated a complete set of system capabilities running through perception, understanding, decision-making, execution, and feedback through its embodied brain and world model, showcasing its ability to transition from model capabilities to scenario execution.
However, ultimately entering end-user devices, systems, and real tasks, the capabilities of world models still need to be tested by the market. After all, for the current embodied AI industry, the competition focus is not just on model development and technological breakthroughs; stable and continuously deliverable deployment capabilities in real-world scenarios are the key points.