AI Glasses and AI Phones: The Dual Software-Hardware Strategy of Tech Titans!

12/05 2025 347

[In the AI era, companies not venturing into hardware may find themselves at a disadvantage in the next competitive round.]

Stop focusing solely on who boasts the largest model parameters—the real competition is just beginning.

Within a month, Alibaba unveiled six AI glasses simultaneously. Today, ByteDance has seamlessly integrated Doubao into its mobile system, stocking up 500,000 units of its new devices. This is not just an experiment; it's a direct challenge for market entry.

No matter how advanced the model, it holds no value if users don't recognize its capabilities. As everyone starts discussing Agents and 'intent-driven experiences,' you'll realize—the true battle for AI supremacy isn't about 'who answers more intelligently,' but 'who feels more integral to your operating system.'

One is a pair of glasses that step out from the shadow of the phone, the other is a mobile assistant redefining smartphone rules. Essentially, both are competing for tickets to the 'next generation of human-computer interaction.' Behind this lies not just a shift in interaction methods but a relocation of platform control.

The cloud wars are over; now, it's time to settle things on the devices.

I. After the Peak of Large Models, Entry Points Determine Success or Failure

In the first half of the AI wave, large models were the undisputed stars. Whoever had larger parameters, broader training datasets, and faster inference speeds could dominate industry competition. However, by the second half of 2024, this model race began to show signs of fatigue.

Not only have leading players like OpenAI and Anthropic delayed their next-generation model releases, but the capability gap among top domestic large models is also rapidly closing. The tug-of-war in understanding capabilities between Quark, Doubao, Wenxin Yiyan, and Tongyi Qianwen has started to blur user perception. While the technical ceiling hasn't been reached, user enthusiasm has plateaued. The model itself can no longer be the decisive factor.

Thus, the focus has shifted—from the model's inherent 'strength' to how it is 'utilized,' and ultimately, to the user.

But users don't interact with models directly; they engage with services through terminals. This means whoever controls the touchpoints closest to users holds the dominant power to transform model capabilities into service value. In the AI context, these touchpoints are precisely embedded hardware like AI phones and AI glasses.

Jin Xian, head of Alibaba's intelligent terminal products, openly stated the logic: 'All data for training large models relies on business data generated at the endpoint. Many models are trained on data collected from usage scenarios like phones, tablets, and computers to serve those very scenarios.' In other words, the endpoint isn't just a distribution terminal for models—it's their 'feedback loop.' Every user invocation, interaction path, and operation record reinforces the model's capabilities.

Peng Deyu, a renowned tech industry commentator, added: As AI enters the 'Agent stage,' this trend becomes even more pronounced. The traditional 'you ask, I answer' Chatbot logic is no longer sufficient. Users now expect, 'Say one thing, and it gets the job done.' This means AI must not only understand language but also intervene in actual task chain execution.

Take the newly released Doubao Mobile Assistant as an example. If a user says, 'Write a positive review for last week's Meituan order,' it must navigate multiple apps, recognize page elements, simulate click paths, and complete a full task chain. Without deep operating system permissions and multimodal large models' screen understanding capabilities, this is nearly impossible.

And such capabilities require the endpoint as their landing ground.

The value of the endpoint lies not just in 'interaction efficiency' but in 'ecosystem dominance.' For major players, which device users rely on, which system they execute tasks on, and who holds the permission to invoke entry points determine the foundational layout of the future platform landscape.

OpenAI's acquisition of IO, a hardware company founded by Apple's former chief design officer Ive, for nearly $6.5 billion in May this year, is seen as a strategic signal of going all-in on Agent hardware. Google's Gemini team is collaborating with Samsung to advance endpoint deployment. Domestically, Xiaomi, Li Auto, Alibaba, and ByteDance are all intervening in terminal form factor transformations through various means.

This isn't enthusiasm for 'building hardware' itself but anxiety over 'not losing entry points.'

If GPT brought people into the AI era, then starting in 2025, the door through which AI truly enters users' lives may not be in the cloud but in the glasses before your eyes or the phone in your hand.

II. Two Paths, One Goal: Competing for the Next Generation of Entry Points

While both Alibaba and ByteDance are making moves in the AI hardware space, their approaches are nearly polar opposites.

Alibaba has chosen to create a new species from scratch—AI glasses. The six Quark AI glasses released on November 27th appear to be 'function-first' engineering prototypes, prioritizing practicality over fashion or form. Their mission isn't to woo ordinary consumers but to validate the logic of 'perceptual human-computer interaction.'

In Alibaba's vision, AI glasses are the next-generation 'personal mobile entry point.' They aren't just accessories for phones but gradual replacements for phone scenarios. Song Gang, head of Alibaba's intelligent terminal business, explicitly stated at the launch event: 'It's the device with the greatest potential to challenge phones in the future.' This isn't marketing rhetoric but a radical reevaluation of interaction.

In the phone era, users complete tasks through 'download an app—open it—search—operate.' AI glasses aim to let users simply say, 'Take a photo and upload it to Weibo,' and have the AI invoke the camera, recognize the scene, and publish the content. The underlying logic is no longer apps but Agents—an interaction hub that understands intent and acts proactively.

This reflects Alibaba's typical approach of cloud-model-and-terminal collaboration. For large models to iterate, they must rely on business data collected at the endpoint to 'feed' them. Only by building their own hardware can they fully integrate data collection, system invocation, and user interaction.

In contrast, ByteDance has chosen a nearly opposite path: it doesn't build phones but aims to 'rebuild the phone system.'

The engineering prototype nubia M153 phone, released on December 1st in collaboration with ZTE, isn't about new hardware. Its core selling point is the 'Doubao Mobile Assistant'—an AI Agent embedded in the operating system with full task chain execution capabilities. It understands screen interfaces, simulates clicks, and jumps across apps to achieve 'intent-driven services.'

Unlike traditional voice assistants that execute shallow commands, Doubao Assistant delves into the OS bottom layer (OS core) and uses multimodal large models to understand graphical interfaces, enabling 'complex task completion within virtual screens.' For example, if you say, 'Next month, I'm going to Paris—mark my saved restaurants on the map,' Doubao can break this down into six steps, including extracting from social media, marking on Gaode Maps, booking tickets via Ctrip, and organizing notes—executing them like a human.

This effectively 'reconstructs the main control logic of the mobile operating system,' making AI the 'first entry point' of the system rather than a feature within an app.

ByteDance has opted for a more flexible strategy: collaborating with phone manufacturers to deeply embed its software capabilities into device ecosystems. According to Geek Park, citing a former ZTE product manager, the initial stocking volume for the nubia M153 reached 500,000 units—a highly aggressive figure for a system-level pre-installation project of an AI assistant.

This isn't ByteDance's first foray into hardware. As early as 2018, it acquired the Smartisan team to enter the phone ecosystem; in 2021, it acquired PICO to venture into VR; in early 2024, it acquired Oladance to enter AI earphones. Now, all these hardware resources have been integrated into 'ByteDance's Ocean Department,' led by Liu Chengcheng (founder of 36Kr) and reporting to Zhu Jun (head of Flow). Organizationally, this is already a rare strategic-level department configuration for ByteDance.

Alibaba is building a new entry device, while ByteDance is transforming the existing entry system. The former disrupts app logic with 'devices + scenarios,' the latter rewrites interaction protocols with 'systems + models.' But their goal is the same—whoever gains initiative at the terminal may secure the next ecosystem-level entry point in the AI platform era.

No matter how different their paths, both internet giants have arrived at the same conclusion this time: the main arena of the AI era is shifting toward device endpoints.

III. Bubble or Starting Point? The Realities and Uncertainties of AI Hardware

AI hardware may sound like the next 'big thing,' but real-world implementation is more complex than expected.

Take Doubao AI Phone. While its initial stocking volume of 500,000 units is a heavyweight investment for a manufacturer like ZTE, it still falls short of the 2–3 million unit shipments typical of mainstream flagship phones. Moreover, priced at 3,499 yuan, it targets developers and tech enthusiasts rather than the mass market. This product is more of a 'technical validation entry point'—designed to test the landing experience of the AI assistant, refine system invocation logic, accumulate templates for system permission collaborations, rather than a true consumer electronic product.

But even as a 'preview version,' the technical uncertainties exposed by Doubao Assistant are significant. Whether 'task chain execution' is stable, 'screen recognition' is accurate, or how it handles exceptions, false touch judgments, and safety fault tolerance across multiple apps—AI control at the system level essentially requires a reconstruction of the operating system architecture. Any bug could spell disaster for user experience.

Official documentation explicitly states that the current 'phone operation' feature is still in the technical preview stage and far from large-scale stable deployment. This tension between 'fantasy' and 'reality' reflects that AI Agents are still in a polishing phase at this stage.

The same goes for Alibaba's AI glasses. While launching six products at once demonstrates a high strategic commitment, such devices currently lack a clear market foundation in China. From a product standpoint, Quark AI Glasses follow a 'perception-driven + Agent-controlled' minimalist route, aiming for 'plug-and-play, dialogue-as-interaction.' Logically, this has the potential to disrupt phones, but the technical conditions aren't mature yet.

Especially, current AI glasses face significant bottlenecks in sensors, battery life, and computing power integration. Achieving 'environment recognition + intent understanding + action execution' requires devices to possess stable multimodal reasoning capabilities and complete scene modeling—a high threshold even in 2025.

A more practical question is: Are users truly ready to hand over 'interaction rights' to AI?

Doubao Assistant already has the capability to operate in the background automatically, bypassing user clicks to complete task chains. However, this raises another issue: how to safeguard data permissions, personal privacy, and payment security. While the official demo retains manual confirmation for payments, the ability of AI Agents to bypass apps and directly simulate interactions still carries risks of abuse. Especially in a stage where security boundaries are unestablished and system permission standards are inconsistent, such 'overqualified' AI products could become regulatory gray areas.

Despite these challenges, this wave of AI hardware enthusiasm isn't a bubble.

Quite the opposite—it's an inevitable phase in the evolution of large model platforms. When Chatbots lose their novelty, app user growth slows, and model capabilities become imperceptible, only by reconstructing interaction forms can AI reshape its 'user value perception interface.'

Hardware isn't the endpoint but a platform-level reconstruction that 'unlocks entry points—invokes systems—collects data—feeds back to models.'

Currently, Google's AI glasses project has entered the POC stage; Xiaomi and Li Auto are frequently testing AI glasses and in-car AI assistants; OpenAI acquired IO to build Agent hardware; ByteDance is testing full-chain system integration via Doubao Assistant; Alibaba is betting on glasses to challenge phone dominance. Globally, tech companies are now positioning themselves around 'platform-level AI entry points.'

This isn't just a hardware upgrade war but a signal for the launch of a new platform cycle.

Written by Gaojian Guanchao. Original content. For reprints, please contact for authorization.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.