12/01 2025
430
Today, Google DeepMind is thrilled to announce the launch of SIMA 2, a groundbreaking leap from a mere instruction executor to an engaging, interactive gaming companion. This transformation is made possible by harnessing the advanced capabilities of the Gemini model.
SIMA 2 transcends its original role by not only executing instructions given in human language within virtual worlds but also by contemplating its objectives, interacting with users, and continually refining its abilities over time.
According to officials, this advancement marks a significant stride towards Artificial General Intelligence (AGI) and carries immense implications for the future of robotics and the evolution of AI embodied technology.
The inaugural version of SIMA mastered over 600 language-based skills across a spectrum of commercial video games. It navigated these virtual environments as if it were a human player, relying on 'visual' input from the screen and manipulating a virtual keyboard and mouse.
By integrating the Gemini model at its core, SIMA 2 has evolved to not only respond to instructions but also to engage in thoughtful reasoning about them.
To train SIMA 2, researchers utilized human demonstration videos annotated with language labels, supplemented by labels generated by the Gemini model. Consequently, SIMA 2 can now articulate its intentions to users and elucidate the steps it undertakes to achieve its goals.
SIMA 2 also boasts the capability to transfer learned concepts, laying the groundwork for achieving the broad generalization abilities characteristic of human cognition. This very capacity enables SIMA 2 to perform tasks with a level of proficiency that closely mirrors that of human players.
One of the most exhilarating new features of SIMA 2 is its capacity for self-improvement. Throughout its training journey, SIMA 2 progressively tackles increasingly complex tasks through a process of trial and error, guided by feedback from the Gemini model.
After an initial phase of learning from human demonstrations, SIMA 2 can then progress to learning entirely through autonomous gameplay in new games. This allows it to enhance its skills in previously uncharted virtual worlds without the need for additional human-generated data. In subsequent training iterations, SIMA 2's own empirical data can be leveraged to train the next generation of even more powerful agents.
SIMA 2 is designed to operate seamlessly across various gaming environments, which serve as invaluable testbeds for general intelligence. These environments enable the agent to acquire new skills, practice complex reasoning, and continually learn through autonomous gameplay.
However, research also highlights that agents still encounter challenges when tackling extremely long-duration, complex tasks that demand extensive multi-step reasoning and goal verification. SIMA 2, in particular, must operate within a limited context window to achieve low-latency interactions. Furthermore, executing precise low-level operations through keyboard and mouse interfaces, as well as achieving robust visual understanding of intricate 3D scenes, remain formidable challenges that the entire field is striving to overcome.
SIMA 2's success underscores that AI, when extensively trained using diverse multi-world data and leveraging the powerful reasoning capabilities of the Gemini model, can effectively integrate the abilities of numerous specialized systems into a cohesive, general-purpose agent.
Moreover, SIMA 2 provides robust support for applications in the robotics field. Skills ranging from navigation and tool utilization to collaborative task execution are essential components for enabling future AI assistants to achieve intelligent physicalization in the real world.
References:
https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/