Google DeepMind Unveils Gemini Robotics 1.5: Ushering AI Agents into the Real World

11/17 2025 528

Recently, Google made a groundbreaking announcement, introducing two cutting-edge robotic models—Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. These innovations represent significant leaps forward in robotic intelligence and adaptability.

Both models are set to revolutionize intelligent experiences by leveraging advanced cognitive capabilities:

Gemini Robotics 1.5 – This Visual-Language-Action (VLA) model is a game-changer, capable of converting visual data and instructions into precise motion commands for robots. Prior to execution, the model engages in a thought process, simulating its actions, which aids robots in more accurately assessing and accomplishing complex tasks. Additionally, it supports transfer learning, enabling faster skill acquisition.

Gemini Robotics-ER 1.5 – This Visual Language Model (VLM) stands out for its ability to reason about the physical world, seamlessly integrate digital tools, and devise comprehensive multi-step plans for task completion. The model currently leads the pack in spatial understanding benchmarks.

In our daily lives, most tasks demand contextual awareness and multi-step execution, presenting substantial challenges for contemporary robots.

Earlier this year, Google integrated Gemini's multimodal understanding prowess into robotics, empowering robots to perceive, strategize, and execute intricate tasks. The newly launched models further amplify this capability.

Gemini Robotics-ER 1.5 functions akin to a sophisticated brain, orchestrating robotic activities with precision. It excels in planning and logical decision-making within physical settings. Boasting advanced spatial awareness, it facilitates natural language interactions, gauges robots' success probabilities and progress, and seamlessly accesses tools like Google Search for information retrieval or utilizes any third-party, user-defined functions.

Moreover, Gemini Robotics-ER 1.5 offers detailed natural language instructions for each task phase, which Gemini Robotics 1.5 then employs its visual and linguistic comprehension to execute specific actions directly. It also aids robots in contemplating their actions, facilitating better resolution of semantically intricate tasks, and can even articulate its thought process in natural language, enhancing decision-making transparency.

Both Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 are constructed upon the foundational Gemini model series, fine-tuned with distinct datasets to specialize in their respective functions. When utilized in tandem, they bolster robots' generalization abilities, enabling them to tackle longer tasks and adapt to more varied environments.

The team rigorously evaluated Gemini Robotics-ER 1.5 across 15 academic benchmarks, including Embodied Reasoning Question Answering (ERQA) and Point-Bench, assessing the model's proficiency in pointing, image question answering, and video question answering.

The evaluation outcomes reveal that Gemini Robotics-ER 1.5 achieved top-tier performance across all 15 academic embodied reasoning benchmark tests, outperforming ChatGPT-5 and ChatGPT-5-mini.

Gemini Robotics-ER 1.5 showcases capabilities such as object detection and state estimation, segmentation masking, pointing, trajectory prediction, and task progress estimation and success detection.

Gemini Robotics 1.5 transcends mere instruction or plan translation; it now engages in pre-action contemplation. This entails generating internal reasoning and analysis sequences in natural language to execute tasks requiring multiple steps or deeper semantic comprehension.

During this intricate thinking process, the Visual-Language-Action model can opt to divide longer tasks into simpler, shorter segments that the robot can execute successfully. It also enhances the model's ability to generalize and solve novel tasks, making it more resilient to environmental changes.

Gemini Robotics 1.5 also demonstrates exceptional transfer learning capabilities. It can seamlessly transfer learned actions from one robot to another without the need for customizing the model for each new instance. This breakthrough accelerates the acquisition of new behaviors, aiding robots in becoming smarter and more practical.

By incorporating agent capabilities, Gemini Robotics 1.5 surpasses models that merely respond to commands, creating systems capable of genuine reasoning, planning, proactive tool utilization, and generalization.

Google will make Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio. For further details on constructing with the next generation of physical agents, please visit the developer blog:

https://developers.googleblog.com/en/building-the-next-generation-of-physical-agents-with-gemini-robotics-er-15/

Reference: https://deepmind.google/discover/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.