Doubao Phone's 'Life-and-Death Struggle': Permissions, Boundaries, and Ecosystems

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

12/09 2025 553

Whose 'Slice of the Pie' Has Doubao Taken?

By Guo Jiage

Edited by Zhang Xiao

Within a mere week of its launch, the Doubao Mobile Assistant has already found itself at the center of multiple waves of public scrutiny.

On December 5, the Doubao team announced that, to ensure a harmonious balance between technological advancement and industry acceptance, they would be implementing "standardized adjustments" to the AI's mobile phone operational capabilities. This includes limiting the use of financial apps and certain games.

Figure/Official Weibo account of Doubao Mobile Assistant

Earlier, on December 1, ByteDance's Doubao team unveiled a technical preview version of the Doubao Mobile Assistant. Developed in collaboration with mobile phone manufacturers at the operating system level and based on the Doubao large model, it was officially announced to debut first on ZTE's Nubia M153. Priced at 3,499 yuan, this product, which truly embodies the concept of an "AI phone," was once resold for over 10,000 yuan in the secondary market.

Shortly thereafter, numerous industry insiders criticized Doubao for obtaining a high-risk operating system permission known as INJECT_EVENTS, which enables its cross-app operational capabilities. This prompted a public statement from Li Liang, Vice President of Douyin, who stated, "As long as users grant permission, there is no privacy infringement. The Doubao Mobile Assistant can only perform tasks on the phone with user authorization."

Figure/@Li Liang's personal Weibo account, Douyin Group

In just one week, this series of controversies has not only thrust system-level AI into the limelight but also exposed the potential and limitations of AI Agents on terminal devices—making every user truly own their "Jarvis" is far more challenging than initially imagined.

A Product That Closely Resembles an AI Agent

The intuitive functions of the Doubao Mobile Assistant are not overly intricate.

Beyond basic multimodal capabilities and local tool invocation, its core strengths lie in cross-app automatic execution and global memory functions.

In Doubao's earliest demonstration videos, typical use cases included comparing prices and placing orders across different e-commerce platforms, retrieving locations saved by users in apps like WeChat, Xiaohongshu, and Dianping, and marking them on maps.

Similar to the contextual awareness found in early AI search software, the Doubao Mobile Assistant's global memory extends this capability to encompass all of a user's routine phone operations: a saved schedule, a favored restaurant, or someone's birthday. Global memory enables the assistant to integrate tasks scattered across different apps or time points, achieving continuous cross-app operations, such as assisting you in booking a restaurant or flight by recalling memories.

If early Doubao resembled a smart voice assistant, it now more closely resembles having your secretary directly embedded in the system. It can comprehend screen content, assess interface states, and independently find task paths without relying on plugins or APIs. While traditional voice assistants merely issued commands on your behalf, Doubao begins to perform tasks for you.

To some extent, it is a product that closely resembles an AI Agent.

However, from a technical standpoint, the core functional logic of the Doubao Mobile Assistant consists of two parts: screen reading + simulating user operations.

It can acquire the interface structure and element information of the current screen through accessibility interfaces provided by the operating system. In other words, it can "see" the position, attributes, and text information of each interactive element on the screen.

Figure/Official website of Doubao Mobile Assistant

In fact, this functionality is not novel. Most early Android accessibility features already permitted screen reading and app control—a system permission originally designed for visually impaired users. The distinction is that the Doubao Mobile Assistant essentially transforms an AI large model into a system-level mobile assistant, preinstalled on the phone with the highest permissions, directly integrated into the system. This enables operations to transcend mechanical execution, allowing them to autonomously judge and plan based on context and task objectives, thereby completing more complex, cross-app automated tasks.

This product model indeed appears enticing. If traditional phones operated on a "you tap" basis and voice assistants on a "you say, I help you open" basis, now it's "you say, I handle everything."

Ultimately, this represents a qualitative leap in user experience but merely a quantitative accumulation in technical foundations, lacking any disruptive breakthroughs in underlying technology.

So, the question arises: Why has this capability only emerged now?

On one hand, the multimodal understanding capabilities of large models have significantly improved over the past two years. On the other hand, the reduction in inference costs has made it feasible to keep models running in the background continuously, rather than being prohibitively expensive, laggy, or unstable as before.

This turning point arrived in late 2024, when model capabilities, computational costs, and user demands collectively approached a threshold, with user scenarios becoming the shared goal of AI large models. Doubao seized this window of opportunity. The key difference between Doubao and other AI products lies not in computational power but in its focus on a mobile terminal—the phone—more closely tied to users.

The emergence of the Doubao Mobile Assistant is clearly not aimed at proving technical prowess but at occupying this potentially lucrative entry point. With app growth plateauing and content platform competition intensifying, "entry points" have become far more critical than "functions." Once a model can reliably execute cross-app tasks, it gains the potential to redistribute mobile ecosystem traffic.

Meanwhile, as apps become increasingly complex and ecosystems more bloated, users' dwindling patience fuels a desire for simplicity, directness, and speed, creating a replaceable link within the entire mobile phone industry for AI.

This also explains why the Doubao Mobile Assistant sparked so much discussion in a short time: it touches not just products but the underlying ecosystems built by internet giants over the years. Whether it can progress further depends on how it navigates more complex ecological frictions.

Collective Siege by Core Applications

The spark for public debate ignited when WeChat suddenly "isolated" the Doubao Mobile Assistant.

Just one day after its official release, users began to notice that when operating WeChat on a phone equipped with the Doubao Mobile Assistant, they triggered WeChat's "abnormal login environment" alert, forcing WeChat offline and preventing normal login.

Soon after, Alibaba followed suit. Users reported encountering strong pop-up warnings when logging into apps like Taobao, Xianyu, and Damai, being forced out and shown an abnormal login environment message, requiring device replacement for relogin. Simultaneously, apps like Agricultural Bank of China and China Construction Bank on the phone were also denied access.

These core app failures rendered the Nubia engineering phone, which had seen its secondary market price surge nearly fourfold, virtually unusable within days.

Although WeChat's public response was that "no special actions were taken; it may have triggered existing security risk control measures," viewing this from the perspective of the Doubao Mobile Assistant reveals that the system-level permissions it relies on are precisely the sensitive areas for such risk controls.

To "operate other apps like a human," an AI assistant must obtain high-level permissions or manufacturer-level access. Continuous cross-app invocations inevitably touch upon vast amounts of third-party data, including chat records, contacts, and payment information, all requiring explicit user authorization, adherence to the principle of least privilege, and localized data protection.

When AI Agents face complex real-world conditions like app version updates and network fluctuations, execution failures or misoperations, especially involving critical settings like payments, impose extremely high costs on user tolerance.

This concerns not only user data security but also product stability and the integrity of risk control systems. For WeChat, every message sent and every operation flow incorporates sophisticated anti-cheating, anti-abuse, and data monitoring logic. Circumvention by external tools could disrupt system design, increasing abnormal incidents and liability risks.

Notably, in April of this year, WeChat issued an announcement reminding users not to use third-party tools to manage WeChat chat records. Subsequently, voice assistants like Siri, Xiao Ai, and Huawei Xiaoyi could no longer directly invoke WeChat functions via voice.

Figure/Official Weibo account of WeChat Security Center

This explains why WeChat acted so swiftly this time. Once this window opens, it implies potential for abuse. For core apps like WeChat, they must reassess the security and trustworthiness of every external invocation, which is why they chose direct blocking.

This concern is not unique to WeChat. For the entire app ecosystem, when external assistants gain system-level operational capabilities, it means any app's control could be diverted or replaced.

This is the industry-level contradiction faced by the Doubao Mobile Assistant: the battle for entry points inevitably leads to disputes over operational boundaries.

What users perceive as "convenience" may represent potential security vulnerabilities or operational risks for app developers. Whether social, payment, or e-commerce apps, every operation carries multiple considerations, including account security, transaction integrity, and user experience.

From a market perspective, these restrictions also reveal where decision-making authority lies—regardless of how high system permissions are, the ultimate ability to execute tasks depends on whether apps permit it.

Even if a system-level AI assistant possesses extensive permissions, its ability to truly execute tasks still hinges on whether individual apps provide "automatable" space in their business workflows, interfaces, and risk control systems. Without app cooperation, the AI assistant's "global execution" faces a ceiling.

More broadly, this conflict reflects industry-wide competition over system-level AI entry points. Major players aim to retain control over user operational pathways, as these form the foundation of traffic, data, risk control, and user experience.

A simple example: if a system-level agent like Doubao can "act on behalf of users" across multiple apps, users no longer need to enter each platform to complete actions. The traditional path of opening Platform ABC → searching keywords → browsing and comparing products → completing checkout transforms into simply saying, "Help me buy a pack of cat food under 50 yuan, with high sales, deliverable tomorrow," and the AI assistant automatically compares prices, selects products, and completes the purchase across e-commerce platforms.

If users no longer enter platform homepages, platforms lose search entry traffic; if AI extracts content, platforms' "recommendation feeds" cease to be user entry points, disrupting the content value chain; with reduced user browsing time, ad exposure decreases; and platforms' "shelf logic" and recommendation-based traffic distribution become ineffective.

When AI assistants attempt to cross app boundaries, they challenge the existing power distribution within the app ecosystem. While Doubao's attempt demonstrates technical feasibility, its long-term viability depends on finding a commercially operational balance between system permissions and app ecosystems.

True Entry Points Lie in Software-Hardware Integration

After nearly three years of AI development, spanning from computational power to multimodal capabilities and AI hardware implementation, the industry landscape is gradually returning to considerations of sustainable business models.

Just as Doubao knowingly advances despite the multiple risks of privacy compliance and platform relationships associated with cross-app automatic execution, the reason is straightforward: after staking claims, the battle for entry points has entered a new phase. AI vendors and phone manufacturers now recognize that the future commercial value will not stem from isolated AI services but from lucrative entry points truly tied to user scenarios.

In the early stages of AI Agent development, products like Doubao, Yuanbao, Kimi, and Tongyi Qianwen attempted to occupy user scenarios by embedding functionalities within apps.

Doubao initially integrated into apps like Douyin and Toutiao, offering intelligent recommendations and content generation; earlier this year, Tencent even placed ads for Yuanbao almost everywhere ads were allowed in WeChat...

The last deep collaboration between AI search software and phone manufacturers occurred after Deepseek gained popularity, with major domestic phone manufacturers like Huawei, Honor, and Xiaomi announcing the integration of DeepSeek into their existing voice assistants.

In fact, system-level AI is not a "first" for Doubao.

As early as June this year, during Huawei's Developer Conference, the AI smart experience demonstrated by HarmonyOS 6 already showcased a prototype of system-level AI. However, Huawei then emphasized coordinating multiple intelligent agents to complete tasks.

Apple is upgrading Siri toward system-level execution capabilities, with future Apple Intelligence also targeting cross-app collaboration and operational path reconstruction. Manufacturers like Xiaomi, Huawei, and OPPO are continuously improving their voice assistants' global operational capabilities, gradually moving toward centralization and integration.

On the hardware front, preparations are also being made to tap into future entry points. Consider the early AI - wearable device AI Pin from the well - known internet celebrity - backed startup Humane, as well as Alibaba's newly launched Kuake AI Glasses in November. Both of these products are viewed as potential alternative entry points. Nevertheless, at least for the time being, they have failed to draw significant user attention. Smartphones, on the other hand, continue to be the most reliable and frequently utilized terminals. As a result, the majority of vendors are still placing their bets on smartphones as the core entry point.

This explains why major industry players are making strides in both software and hardware development within their AI ecosystem strategies. Software development is crucial as it secures system - level operational capabilities, which can be likened to laying a solid foundation for a building. Meanwhile, hardware innovation is about exploring new forms of interaction in the future, akin to opening up new doors for users. Only by integrating software and hardware can these companies maintain long - term influence over user behavior patterns.

As the wave of AI technology continues to surge ahead and becomes more and more widespread, enterprises are faced with a simple yet critical choice: either adapt to the changing landscape or risk being left in the dust.

Header image/Generated by Doubao AI

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links