When AI Tries to Take Over Your Phone: The Battle for 'Entry Point Definition' Between Doubao II and Competitors

04/14 2026 431

Yiyan Business Observer

The competitive focus of AI phones will escalate to the reconstruction of 'interaction.'

I. Introduction: From 'Overpriced Engineering Prototypes' to a Mass-Market Answer

In December 2025, the 'Doubao Mobile Assistant Technology Preview Edition' by Nubia M153 quietly launched. Despite its modest specs and a limited run of 30,000 units, it quickly sold out—and even fetched tens of thousands of yuan on the secondhand market—thanks to its 'magical' demonstration of AI 'acting like a human' to operate the phone, automatically completing cross-app tasks like ordering food and publishing content. This 'engineering prototype,' with its system-level AI agency capabilities, posed a sharp question to the industry: When AI no longer just empowers single functions but takes over entire operational workflows, does the very essence of a smartphone change?

Recently, ZTE officially confirmed during an earnings briefing that its second-generation Doubao AI phone, developed in collaboration with ByteDance, will be released in the mid-to-late second quarter of 2026. This means the interaction paradigm that once sparked shock and controversy is about to face rigorous testing in the mass market. 2026 is seen as the breakthrough year when AI phones transition from concept to widespread adoption, with the industry entering a phase of deep reconstruction. According to IDC, AI phone shipments in China are expected to reach 147 million units in 2026, accounting for 53% of the overall smartphone market. Doubao II aims to be the vanguard ship defining 'true intelligence' in this 'AI-for-all' wave.

II. Analyzing Doubao's Approach: The Risks and Vision of Radical Generalism

The core logic of Doubao phones can be summarized as 'vision-driven, simulated interaction, and intent-direct fulfillment.' Unlike the prevalent 'feature plugin' model in mainstream AI phones (e.g., AI photo editing, AI summarization), Doubao's approach enables large models to understand UI elements by recognizing screen pixels, then simulate clicks, inputs, and other operations to string together cross-app task flows. The ideal state: A user simply says, 'Arrange a weekend trip to Hangzhou,' and the AI automatically checks flights via travel apps, compares prices on platforms, creates a calendar event, and even generates an itinerary.

The revolutionary advantage of this path lies in its 'ultimate generality.' It doesn't rely on app developers providing dedicated interfaces and can theoretically operate any Android app—making it the most thorough technical solution to break 'app silos' and enable seamless service flow. This contrasts sharply with Alibaba's QianWen, which emphasizes 'closed-loop ecosystems,' and Google's Gemini, which focuses on 'secure, controllable interactions.'

However, its disruptiveness comes with three high risks:

1. Ecosystem Conflict Risk: Direct simulation of operations encroaches on app platforms' control over interaction entry points and user data. The first-gen product quickly faced bans from mainstream apps, revealing fundamental commercial conflicts.

2. Experience Stability Challenges: Computer vision-based automation is highly susceptible to frequent UI changes, network delays, and popup interruptions, making task execution chains fragile. A vast engineering gap remains before it can be deemed 'reliable.'

3. Security and Trust Dilemmas: Granting AI equivalent user privileges introduces risks of accidental touches and privacy breaches during payments, authorizations, and other critical operations.

Thus, the mission of the second-gen product is to evolve from 'showcasing tech' to 'practical utility,' hinging on whether it can achieve a breakthrough balance across these three risks.

III. 2026 Industry Competition Landscape: A Tripartite Divide with Diverging Paths

When Doubao II enters the 2026 market, it won't face a blue ocean but a mature battleground dominated by giants with diverging strategies. The competition has transcended superficial hardware specs, delving into interaction philosophies and ecosystem strategies, forming three distinct camps:

1. Full-Stack Integration: Defining Underlying Rules

These players control the full stack, from chips and operating systems to AI frameworks. Google, leveraging its Android ecosystem dominance, designed a 'virtual sandbox' path for Gemini: AI runs apps in an isolated system environment, with operations visible and interruptible by users. This preserves automation potential while prioritizing security and controllability, aiming to establish open, standardized new interaction protocols as the system provider. Apple and Huawei, relying on closed ecosystems with integrated hardware and software, achieve the deepest, most efficient fusion of AI capabilities and system services. Their strength lies in holistic experience and optimization—they are the rule-setters.

2. Vertical Scenario Focus: Building Killer Features

Most mainstream Android vendors choose this path. They either develop in-house or deeply collaborate with large models, pouring AI capabilities into core scenarios like photography enhancement, office productivity, and entertainment innovation to create standout features consumers can clearly perceive. For example, Samsung's Galaxy AI strengthens real-time call translation, while Xiaomi's HyperOS focuses on imaging intelligence. Their strategy: While maintaining top-tier hardware performance, use AI to provide significant 'value-added' experiences and compete for the mass-market base. This is the noisiest, most crowded track (track).

3. Ecosystem Empowerment and Focus: Reshaping Service Entry Points

This camp doesn't pursue (or Not pursuing for now /temporarily not pursue) full-stack integration but reshapes service access through unique AI capabilities.

Alibaba (QianWen) takes an 'ecosystem aggregation' route. Its large model is deeply integrated into proprietary super-apps like Taobao, Alipay, Gaode, and Fliggy, acting as an efficient internal service orchestration hub. User instructions are decomposed and directly invoke APIs across business lines, ensuring smooth, stable, and compliant experiences. Its strength lies in high efficiency within commercial closed loops, but its capabilities are clearly bounded by its proprietary ecosystem.

ByteDance (Doubao), in contrast, chose an 'ecosystem-agnostic' route. Through partnerships with hardware players like ZTE, it deeply embeds the Doubao large model as a system-level capability, aiming to become a unified intelligent agent transcending all app boundaries. It doesn't settle for service closed loops but challenges the entire mobile internet's interaction paradigm.

The mainstream view holds that the 2026 competition is essentially a battle for 'entry point definition.' Will super-apps remain the primary gateways, or will system-level AI evolve into the unified entry point? Doubao II is the most radical embodiment of the latter ideal.

IV. Doubao II's Breakthrough Points: From 'Protocols' to 'Trust'

To stand out in this complex landscape, Doubao II cannot rely solely on visionary ideas; it must resolve the core contradictions exposed by the first-gen product and transition from a 'tech miracle' to a 'daily dependency.' Its success hinges on four keys:

1. Ecosystem Breakthrough: From 'Confrontation' to 'Protocols'

The first-gen's 'bans' forced Doubao to rethink relations with mainstream apps. Industry sources suggest the second-gen may have reached 'agreements' with some top apps, including Alibaba's ecosystem, to open limited permissions in high-frequency scenarios like ride-hailing and food delivery. This would mark a major commercial pivot, proving that 'system-level AI agency' need not be disruptive but can co-create new experiences with app partners. The depth and breadth of cooperation directly determine its practical value.

2. Experience Revolution: Stability and Scenario Depth

The second-gen must prove its automation isn't just smooth in demos but maintains high success rates in users' complex daily phone environments. This requires massive improvements in underlying visual models and deep system-level optimization by ZTE. Simultaneously, it must identify 'perception-gap' benchmark scenarios (e.g., complex multi-app travel planning, platform-wide price comparisons) where it far outperforms existing AI assistants, creating viral word-of-mouth.

3. Building 'Trustworthy Automation'

While granting AI high privileges, it must introduce unprecedented transparency and controllability. Gemini's 'sandbox visualization' has set a benchmark. Doubao II may need similar visual operation progress tracking, critical step confirmations, and finer-grained permission hierarchical management (tiered management) to address deep-seated user anxieties about security, privacy, and 'loss of control.' Technological radicalism must be matched by equal or greater security commitments.

4. Proving Cooperation Model Efficacy

ByteDance and ZTE's division of labor—'AI brain + hardware carrier'—challenges traditional full-stack models. The second-gen must prove this split enables faster AI iteration, more ultimate (extreme) soft-hard collaborative optimization (collaborative optimization), and ultimately delivers experiences on par with or surpassing full-stack giants. Meanwhile, ZTE's parallel development of its in-house intelligent agent platform 'Co-Claw' signals that this cooperation is open and non-exclusive. Doubao's success could encourage more 'software-hardware decoupled' alliances, potentially reshaping industrial division of labor.

V. Conclusion: Reshaping Human-Machine Relations, Not Just Phones

In the long run, the three paths won't remain permanently parallel but will converge through competition.

Historically, each smartphone evolution has been a revolution in human-computer interaction interfaces: from physical buttons to touchscreens, from command lines to graphical interfaces. Today, the AI phone revolution may be more profound than the shift from keyboards to touch—it seeks to change not 'how to operate' but 'whether operation is needed at all.'

The 'agent-based operation' vision carried by Doubao II ultimately aims not to detach humans from phones but to liberate us from tedious, repetitive digital labor, focusing cognitive resources on decision-making, creativity, and emotional connections. The competition's outcome won't be decided by spec sheets or benchmarks but by hundreds of millions of users voting with their most primal feeling: Does it make me feel lighter, more efficient, and freer?

In Q2 2026, Doubao II will submit its answer. Regardless of its market success, it has already, alongside Google's sandbox and Alibaba's closed loop, pushed the smartphone industry to a new height of contemplating the 'soul of interaction.' The AI phone story is no longer about 'adding intelligence' but about 'reconstructing interaction.' The show has just begun.

END

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.