11/19 2025
508
The notion of creating a world with a single sentence is evolving from a digital metaphor into a tangible physical reality. When AI can not only answer queries like “how to make a cup of coffee” but also directly operate machines to produce a freshly brewed, perfectly tempered cup, we are on the cusp of a new era.
This week, there is an intense display of AI “showcasing” in full swing. Alibaba's “QianWen” APP is officially positioned as a “personal AI assistant capable of chatting and handling tasks”. Ant Group's “LingGuang” focuses on “generating small applications within 30 seconds using natural language”. Yesterday, Google's Gemini 3 sent shockwaves through the industry with its overwhelming multimodal and Agent capabilities.
These three products all converge on the same core point: the competition in AI is shifting from the “art of conversation” to the “ability to get things done”. It is moving from the bit - streams of the digital world to intervening in the atoms of the physical world.
Three - Level Leap in AI Execution
“Help me book an economy - class ticket for the earliest flight to Beijing next Monday and pay with the corporate account.” Tasks that once required opening an app, multiple clicks, and data input are now becoming routine for AI to execute on command.
Alibaba regards the “QianWen” project as the “future battle of the AI era”, with ambitions far beyond that of a mere chatbot. According to “Emerging Intelligence”, the QianWen APP plans to deeply integrate all aspects of life, including maps, food delivery, ticket booking, office work, learning, shopping, health, and more. Its core evolution lies in “Agentic AI”—a new paradigm capable of understanding intentions, planning steps, and autonomously executing tasks.

“Alibaba plans to gradually add Agentic AI functions to QianWen over the next few months, supporting natural - language shopping on platforms like Taobao and Tmall,” a person close to Alibaba revealed. This means that user instructions will shift from “show me down jackets” to “buy a long down jacket suitable for minus ten degrees Celsius, within a budget of 1,500 yuan, lightweight and warm for my family”. AI will then automatically complete the entire process of searching, filtering, comparing prices, selecting sizes, and placing the order.
Ant Group's “LingGuang” demonstrates another, more lightweight approach to interacting with the physical world. Its “Flash Apps” feature allows users to generate interactive small applications within 30 seconds using natural language. One user was amazed, saying, “When I asked LingGuang to create a 'life timer', the app it generated was not only visually appealing but also allowed me to intuitively grasp the time I've lived and the time remaining. This astonishing (zhèn hàn, 'shock') came from transforming an abstract concept into a perceivable, interactive physical experience.”

Google's release of Gemini 3 has taken this execution capability to new heights. Its powerful screen - understanding ability (with a ScreenSpot - Pro score of 72.7%) represents a key technological breakthrough. It means that AI can now “understand” any software interface and operate it like a human, without relying on dedicated API interfaces. From operating complex professional software to guiding you through phone settings, Gemini 3 showcases its potential as a “universal operator”.

These three products clearly outline the leap in AI execution: from passive Q&A to proactive task planning, from information integration to physical transactions, and from virtual assistants to operators of the physical world.
Key Breakthroughs in Multimodal Understanding and Tool Invocation
AI's ability to step out of virtual chat boxes is rooted in technological breakthroughs in its “eyes” and “hands”, namely, multimodal understanding and tool invocation capabilities.
Google's Gemini 3 demonstrates overwhelming advantages across multiple benchmarks. It scored 37.5% in the “Ultimate Human Exam”, which covers professional and academic fields, and achieved 23.4% in the high - end MathArena Apex test, significantly outperforming other models. More crucially, its exceptional performance in end - device operation benchmarks lays the performance foundation for AI to operate physical devices like phones and computers.

Alibaba's Qwen series models also provide a solid technological foundation. Qwen3 - Max scored 69.6 in the SWE - Bench evaluation, which tests coding abilities, and achieved a breakthrough 74.8 in the Tau2 Bench test, which requires deep reasoning, surpassing international top models like Claude Opus4. Notably, Qwen3 - Max won the inaugural AI large model real investment competition with a 22.32% return. This proves that AI can not only handle structured tasks but also make effective decisions in complex, uncertain environments.
Breakthroughs in visual capabilities represent the “final stride” in connecting to the physical world. The “image search” function of the QianWen APP can directly identify real - world items and jump to shopping pages, while LingGuang's “LingGuang Vision” can recognize objects in real - time through the camera and provide relevant information. When AI can “see” the world as we do, it can truly understand instructions like “grab that book on the table for me”.
“In the past, AI was 'blind', relying solely on our descriptions to understand the world. Now, it has gained 'vision' and can link what it sees with knowledge graphs,” noted an industry analyst. “Multimodal understanding has brought AI down from the 'text universe' to the 'physical earth'.”
From Single App to Gateway for All Aspects of Life
Technology sets the lower bound, while the ecosystem determines the upper bound. Whether AI applications can truly integrate into the physical world largely depends on the breadth and depth of their underlying ecosystems.
Alibaba is advancing its AI strategy in a more coordinated manner. “The AI technological revolution allows Alibaba's diverse products and services to generate greater synergies,” an internal source noted. The QianWen APP aims to become a “super intelligent hub” connecting Alibaba's ecosystem, including Taobao, Tmall, Alipay, Gaode Maps, Ele.me, and Fliggy. In the future, users might simply tell QianWen, “I want to visit Shanghai Disneyland this weekend,” and it will automatically plan the itinerary, book flights, hotels, and tickets, creating a seamless experience loop.
Ant Group's LingGuang is rooted in Alipay's rich financial and local life scenarios. Although it hasn't fully integrated Alipay's core APIs yet, the potential is immense. Once realized, users could complete complex financial operations like transfers, wealth management, bill payments, and credit loans through simple natural - language dialogues, compressing multi - step processes into a single conversation.
Google leverages its global product matrix to provide Gemini with a broader testing ground. From Search to Gmail, from Google Calendar to Maps, Gemini's Agent capabilities can permeate every aspect of users' digital lives and indirectly influence physical - world behaviors. For example, it can automatically add flight itineraries from emails to the calendar and remind users of departure times.
This competition in ecosystem integration is essentially a contest of “scenario density”. Whoever provides AI with richer, higher - frequency physical - world interaction scenarios will enable their AI to learn and evolve faster. A deep user comparison revealed, “QianWen clearly understands Chinese users better in shopping and local life scenarios, while Gemini excels in handling complex information and global task planning.”
As more life scenarios are integrated, AI will no longer be just an assistant answering questions but a true partner capable of handling practical matters. The challenge facing all players is the same: how to ensure that AI better understands and serves this complex, uncertain physical world while maintaining safety and reliability. The curtain has just risen on this competition, and its victor may well define the human - computer interaction standards and user experience paradigms for the next decade.