AI Phones Hit a Pivotal Moment: Between Doubao and Qianwen, Gemini Chooses a Unique Approach

03/17 2026 432

The Gemini phone has officially made its debut.

At the Galaxy S26 launch event hosted by Samsung late last month, Samsung and Google jointly announced the launch of Screen Automation capabilities powered by Gemini for the Galaxy S26.

In essence, Gemini can directly interact with apps on the phone screen: opening apps, recognizing screen elements, tapping and swiping, entering text... and performing a series of UI actions before handing over the confirmation step to the user.

Image Source: Samsung

Yes, it sounds quite similar to the Doubao AI assistant found on the Nubia M153 (affectionately known as the "Doubao Phone" within the tech community), both capable of executing "agent" tasks on behalf of users—such as ordering food, booking rides, or making online purchases—with a single voice command.

Judging by the feedback from overseas media and forums, this feature has finally been introduced in the latest beta update.

However, we've also observed that Google hasn't simply replicated Doubao's approach. While both rely on GUI-based agents for technical implementation, Gemini creates a local virtual sandbox within Android and restricts the initial batch of apps open to Gemini's "operations" to a select few.

This approach distinctly sets Google apart from domestic manufacturers. In fact, when compared to ByteDance's Doubao and Alibaba's Qianwen, Google has opted for a path that appears both bold and cautious.

Empowering an AI Operating System, Not Overhauling the Phone

At first glance, Gemini's "Screen Automation" might seem like just another iteration of the "Doubao AI assistant." It can also order food, book rides, and place orders for you, resembling an AI agent that manages your phone.

But delving deeper reveals a fundamental difference in Google's solution.

The underlying logic of Doubao is straightforward: AI interprets screen pixels, identifies buttons and input fields like the human eye, and simulates finger taps. Its primary advantage is universality—theoretically, it can operate any app since the AI only interacts with the screen.

Gemini, however, takes a more "cautious" stance. When performing tasks, Gemini doesn't directly manipulate apps on your home screen. Instead, it launches a local virtual sandbox within Android, enabling the AI to run the target app in this isolated environment.

The entire process is transparent, and users can terminate tasks at any time or take over operations mid-task.

Essentially, Gemini's "Screen Automation" is not positioned as an all-powerful agent that can freely control your phone but as a strictly system-constrained automation feature.

Google also actively limits the number of apps supporting automation in the initial phase. Currently, the supported categories are primarily ride-hailing, food delivery, and dining services, with support limited to Lyft, Uber, GrubHub, DoorDash, Uber Eats, and Starbucks.

User access is also restricted. Currently, only the Samsung Galaxy S26 series can experience this in the beta version, with Google planning to extend support to the Pixel 10 series. Additionally, Gemini free users are limited to 5 daily uses, Plus members to 12, Pro members to 20, and Ultra members to 120.

This reflects both computational limitations and user concerns about AI "meddling with their phones," particularly in Western markets. Thus, Google has implemented permission isolation, mandatory manual confirmation for critical steps, and real-time AI operation interruption.

But ultimately, this is just a transitional phase. Google's ambitions extend far beyond enabling Gemini to operate a few specific apps.

Many have noticed Gemini's GUI capabilities but overlooked a system-level transformation occurring within Android.

Just before the Samsung Galaxy S26 series launch, Google officially published a blog post titled "Intelligent Operating Systems: Enhancing AI Agents for Android Apps" and introduced a new application capability interface system—AppFunctions. This allows apps to proactively declare their AI-callable functions to the system.

For instance, a food delivery app can inform the system that it supports restaurant search, item addition, and order submission. When a user instructs Gemini, "Order me a pizza," the AI doesn't necessarily need to navigate through the interface step-by-step—it can directly invoke these capabilities to complete the task.

If we view this mechanism as AI "function calling," the picture becomes clearer. In Google's design, AI agents have two paths to execute tasks: one is directly invoking app capabilities through system interfaces, and the other is GUI automation via screen recognition.

The former offers greater efficiency and stability; the latter ensures compatibility with apps that haven't adapted to the new interfaces.

This means Gemini's future device automation capabilities won't rely solely on "AI reading the screen to operate the phone" but will adopt a hybrid architecture combining system APIs and GUI.

AppFunctions application example. Image Source: Leitech

This difference may seem technical, but the underlying product logic is straightforward. Unlike Doubao, which makes AI use the phone like a human, Google aims to make AI coordinate apps like a system.

When AI merely interprets screen pixels, it remains external to the system, only mimicking human operation logic. But once integrated into the OS, AI can directly orchestrate capabilities across apps.

From this perspective, Gemini Screen Automation's true objective may not be ordering food or booking rides. What Google truly aims to build is a new Android operating logic and ecosystem. This also explains, to some extent, why Google is collaborating with Qualcomm to promote "Android PCs" (not Chromebooks).

It also clarifies why Gemini's approach seems both bold and cautious.

The bold aspect lies in its attempt to make AI Android's central scheduling hub; the cautious aspect is that Google doesn't plan to let AI freely take over the entire phone but instead advances this change incrementally through system interfaces, permission controls, and app whitelists.

Compared to the vision of a "universal AI agent," this route is slower and more restrained. But for an operating system with billions of devices, Google may not have much room for bold trial and error.

Doubao Goes Left, Qianwen Goes Right, Gemini Takes the Middle Path

Compared to Google's mobile strategy, the Doubao AI assistant, unveiled late last year, chose the simplest and most bold path: making AI use the phone like a human.

In this setup, AI interprets screen pixels, identifies buttons, input fields, and page structures, then simulates finger taps to complete operations. Whether ordering food, comparing prices, or making payments, the AI executes step-by-step on the phone interface.

This approach's primary advantage is universality. Since the AI only interacts with the screen, it doesn't require any app's interface support or platform authorization. Theoretically, it can perform any operation a human can.

This is why many feel Doubao resembles a "true AI phone" upon first use.

Image Source: Doubao

But the problems are equally apparent. When AI can interpret the entire screen and operate all apps, permission and security issues become inevitable. Meanwhile, many internet platforms resist such automation since it bypasses their own entry points and recommendation systems.

In short, Doubao's route is technically direct but inherently conflicts with the app ecosystem.

In contrast, Alibaba's Qianwen takes a different approach, leveraging Alibaba's service ecosystem to make AI a central scheduling hub. In this system, a user's voice command is broken down into specific tasks, which then invoke services like Taobao, Alipay, Gaode, and Fliggy to complete them.

For example, searching for products, placing orders, or planning routes directly calls real business capabilities instead of simulating interface operations. Since all operations occur within the ecosystem, AI doesn't need to bypass app permissions or trigger platform risk controls, and direct service interface calls often yield higher efficiency.

Image Source: Leitech

But the issue is equally clear: ecosystem boundaries. Qianwen can only schedule services within Alibaba's ecosystem. Once user demands involve other platforms, its capabilities significantly decline.

From this perspective, Doubao and Qianwen represent two typical AI agent paths. The former seeks to let AI take over the phone itself, pursuing universality; the latter integrates ecosystems to let AI manage service processes, pursuing business depth.

Google's Gemini, to some extent, stands between them. At this stage, Gemini retains GUI automation capabilities, meaning it can also operate apps by recognizing interfaces like Doubao when necessary. However, Google has also introduced new application capability interfaces within Android, allowing apps to proactively expose AI-callable functions to the system.

If apps support these interfaces, Gemini doesn't need to navigate through interfaces step-by-step but can directly invoke app capabilities to complete tasks. In other words, Google's solution is a hybrid path:

System interfaces take precedence, with GUI automation as a fallback.

In the short term, this approach lacks Doubao's wow factor or Qianwen's ability to rapidly integrate a mature ecosystem. But its advantage lies in avoiding direct conflict with the app ecosystem while retaining sufficient universality.

Final Thoughts

Zooming out, it's not hard to understand why these three paths have diverged.

ByteDance lacks an operating system or local services ecosystem, so it must let AI directly take over the phone. Alibaba owns a vast service network, so it makes AI schedule its own business services. Google, however, owns Android—an OS covering billions of devices.

Thus, Gemini's goal from the start wasn't to create a stronger phone assistant but to integrate AI into the system, transforming Android from an "app-running platform" into an "intelligent app-scheduling system." From this perspective, Gemini's restraint isn't caution but an inevitable choice for a platform-level company.

Google, Doubao, Qianwen, Gemini

Source: Leitech

Images in this article come from the 123RF stock photo library.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.