04/16 2026
463

Yiyan Business Observer
The focus of competition in AI phones will escalate to the reconstruction of 'interaction.'
In December 2025, a device named the Nubia M153 'Doubao Mobile Assistant Technology Preview Edition' quietly entered the market. With modest specifications and a limited run of 30,000 units, it quickly sold out—and even fetched tens of thousands of yuan on the secondhand market—due to its demonstration of AI 'operating the phone like a human,' automating cross-app tasks such as ordering food and publishing content. This 'engineering prototype,' with its system-level AI agency capabilities, posed a sharp question to the industry: When AI no longer merely empowers individual functions but can take over operational workflows, does the essence of the smartphone change?
Recently, ZTE officially confirmed during its earnings briefing that its second-generation Doubao AI phone, developed in collaboration with ByteDance, will be released in the mid-to-late second quarter of 2026. This means the interaction paradigm that once sparked shock and controversy is about to undergo rigorous testing in the mass market. 2026 is seen as the breakthrough year when AI phones transition from concept to widespread adoption, with the industry entering a phase of deep restructuring. According to IDC, AI phone shipments in China are expected to reach 147 million units in 2026, accounting for 53% of the overall smartphone market. Doubao second generation, riding this wave of 'universal AI adoption,' aims to be the vanguard defining 'true intelligence.'
The core logic of Doubao phones can be summarized as 'vision-driven, simulated interaction, and direct intention fulfillment.' Unlike the prevailing 'functional plugin' model of mainstream AI phones (e.g., AI photo editing, AI summarization), the Doubao approach enables large models to identify screen pixels, understand UI elements, and simulate clicks and inputs to orchestrate cross-app task flows. Its ideal state: A user simply says, 'Arrange a weekend trip to Hangzhou,' and the AI automatically checks flights via travel apps, compares prices on platforms, creates a calendar itinerary, and even generates a travel guide.
The revolutionary advantage of this path lies in its 'ultimate generality.' It does not rely on app developers providing dedicated interfaces and can theoretically operate any Android app, making it the most thorough technical solution for breaking 'app silos' and enabling seamless service flow. This contrasts sharply with Alibaba's Qianwen, which emphasizes 'closed-loop ecosystems,' and Google's Gemini, which focuses on 'secure and controllable interactions.'
However, its disruptiveness comes with three high risks:
1. Ecosystem Conflict Risk: Direct simulated operations encroach on app platforms' control over interaction entry points and user data. The first-generation product quickly faced bans from mainstream apps, revealing fundamental commercial conflicts.
2. Experience Stability Challenges: Computer vision-based automation is highly susceptible to frequent app UI changes, network delays, and pop-up interruptions, making task execution chains fragile and leaving a vast engineering gap to achieve 'reliability.'
3. Security and Trust Dilemmas: Granting AI user-level permissions introduces risks of accidental touches and privacy breaches during payments, authorizations, and other critical operations.
Thus, the mission of the second-generation product is to transition from 'flashy tech demos' to 'practical tools,' hinging on whether it can achieve a breakthrough balance across these three risks.
When Doubao second generation enters the 2026 market, it faces not a blue ocean but a mature battleground dominated by giants pursuing divergent paths. Competition in AI phones now transcends superficial hardware comparisons, delving into interaction philosophies and ecosystem strategies, forming three distinct camps:
1. Full-Stack Integration: Defining Underlying Rules
These players control capabilities spanning chips, operating systems, and AI frameworks. Google, leveraging its Android ecosystem dominance, designed a 'virtual sandbox' path for Gemini: AI runs apps in an isolated system environment, with operations visible and interruptible by users. This preserves automation potential while prioritizing security and controllability, aiming to establish open, standardized new interaction protocols as a system provider. Apple and Huawei, relying on closed ecosystems integrating hardware and software, achieve the deepest, most efficient fusion of AI capabilities and system services. Their strength lies in delivering complete, optimized experiences as rule setters.
2. Vertical Scenario Focus: Building Killer Features
Most mainstream Android vendors choose this path. They either develop in-house or deeply collaborate with large models to infuse AI capabilities into core scenarios like photography enhancement, office productivity, and entertainment innovation, aiming to create standout features consumers can clearly perceive. For example, Samsung's Galaxy AI strengthens real-time call translation, while Xiaomi's HyperOS focuses on imaging intelligence. Their strategy: Maintain top-tier hardware performance while using AI to provide significant 'value-added' experiences, vying for the mass-market base. This is the loudest, most crowded track (track).
3. Ecosystem Empowerment and Focus: Reshaping Service Entry Points
This camp does not pursue (or Not pursuing for now temporarily eschews) full-stack integration, instead reshaping service access through unique AI capabilities.
Alibaba (Qianwen) adopts an 'ecosystem aggregation' route. Its large model deeply integrates with proprietary super-apps like Taobao, Alipay, Gaode, and Fliggy, transforming AI into an efficient internal service orchestration hub. User instructions are decomposed and directly fulfilled via APIs across business lines, delivering smooth, stable, and compliant experiences. Its strength lies in high efficiency within commercial closed loops, but its capabilities are clearly bounded by its proprietary ecosystem.
ByteDance (Doubao), in contrast, chose an 'ecosystem-agnostic' route almost opposite to Alibaba's. Through partnerships with hardware vendors like ZTE, it deeply embeds the Doubao large model as a system-level capability, aiming to become a unified intelligent agent transcending all app boundaries. Unsatisfied with service closed loops, it challenges the entire mobile internet's interaction paradigm.
The mainstream view holds that the 2026 competition is essentially a battle for 'entry point definition.' Will super-apps remain the primary gateways, or will system-level AI evolve into the unified entry point? Doubao second generation is the most aggressive embodiment of the latter ideal.
Amid this complex landscape, Doubao second generation cannot stand out through visionary ideas alone. It must resolve core contradictions exposed by the first generation, achieving a leap from 'technological marvels' to 'daily dependencies.' Its success hinges on four keys:
1. Ecosystem Breakthrough: From 'Confrontation' to 'Protocols'
The first generation's 'bans' forced Doubao to restructure relationships with mainstream apps. Industry sources suggest the second-generation device may have reached 'protocols' with some head (leading) apps, including Alibaba's ecosystem, granting limited permissions in high-frequency scenarios like ride-hailing and food delivery. This marks a major turning point in its business model, proving that 'system-level AI agency' need not be disruptive but can collaborate with app developers to co-create new experiences. The depth and breadth of partnerships directly determine its practical value.
2. Experience Revolution: Stability and Scenario Depth
The second-generation device must prove its automation is not just smooth in demos but maintains high success rates in users' complex daily mobile environments. This requires significant advances in underlying vision models and deep system-level optimization by ZTE. Simultaneously, it must identify several 'perception-shattering' benchmark scenarios (e.g., complex multi-app travel planning, platform-wide price comparison shopping) where it far outperforms existing AI assistants, creating viral word-of-mouth.
3. Building 'Trustworthy Automation'
While granting AI high permissions, it must establish unprecedented transparency and controllability mechanisms. Gemini's 'sandbox visualization' has set a benchmark. Doubao second generation may need to introduce similar visualized operation progress, critical step confirmations, and finer-grained permission hierarchical management (tiered management) to address deep-seated user anxieties about security, privacy, and 'loss of control.' Technological radicalism must be matched by equal or greater security commitments.
4. Proving the Efficacy of Collaboration Models
ByteDance and ZTE's division of labor—'AI brain + hardware carrier'—challenges traditional full-stack models. The second-generation device must prove this division of labor (division) enables more agile AI iteration, superior soft-hard collaborative optimization (collaborative optimization), and ultimately delivers experiences on par with or surpassing full-stack giants. Meanwhile, ZTE's parallel development of its proprietary intelligent agent platform 'Co-Claw' signals that this cooperation is open and non-exclusive. Success for the Doubao model would encourage more 'software-hardware decoupled' alliances, potentially reshaping industrial division patterns.
In the long run, these three paths will not remain permanently parallel but will converge through competition.
Historically, each evolution of mobile phones has fundamentally been a revolution in human-machine interaction interfaces: from physical buttons to touchscreens, from command lines to graphical interfaces. Today, the change (transformation) brought by AI phones may be more profound than the shift from keyboards to touchscreens—it attempts to change not 'how to operate' but 'whether operation is necessary at all.'
The 'agency-based operation' vision carried by Doubao second generation ultimately aims not to detach humans from phones but to liberate us from tedious, repetitive digital labor, focusing cognitive resources on decision-making, creativity, and emotional connections. The outcome of this competition will not be decided by spec sheets or benchmark scores but by how hundreds of millions of users vote with their most primal feelings: Does it make me feel lighter, more efficient, and freer?
In the second quarter of 2026, Doubao second generation will deliver its answer. Regardless of its market success, it has already joined Google's sandbox and Alibaba's closed loop in propelling the smartphone industry to a new height of contemplating the 'soul of interaction.' The AI phone story is no longer about 'adding intelligence' but about 'reconstructing interaction.' The show has just begun.
END