12/03 2025
377

On December 1st, ByteDance's Doubao Mobile Assistant test version made a sudden debut, instantly capturing the spotlight. This isn't just an upgraded iteration of Doubao AI; it's a system-level AI endeavor aiming to redefine smartphone interaction. It empowers models to engage with smartphones in a human-like manner, managing user intentions at a more sophisticated level.

Doubao Mobile Assistant Demo Video
The technological direction is unmistakably clear. As mobile devices transition into the AI-native era, system-level interactions are evolving from traditional point-and-click interfaces to natural language comprehension and highly automated execution. However, when this seemingly futuristic approach is put into practice, a structural obstacle surfaces—gaining deep permissions within China's domestic mobile systems and software ecosystems is far more arduous than anticipated. Doubao's vision heavily hinges on such software and hardware permissions. From this vantage point, Doubao might ultimately be propelled by China's intricate ecosystem structure towards developing its own hardware, rather than remaining confined to a mere assistant application.

Tired of Fiddling with Your Phone? Let Doubao Take Over
The capabilities showcased by the Doubao Mobile Assistant in its test version are notably more ambitious than those of most AI conversation assistants currently on the market. It essentially represents a system-level collaboration solution, attempting to execute cross-application actions through underlying capabilities. By deciphering user intentions and automatically breaking them down into a series of operational instructions, the system-level AI directly executes them. From a broader technological standpoint, what Doubao aims to achieve is the 'Agent' model, which has already been extensively validated on desktop platforms—such as comparing e-commerce prices, placing food delivery orders, and engaging in WeChat chats.

Doubao Mobile Assistant Demo Video
Of course, in a PC environment, such agents can leverage relatively open systems, window structures, and permission systems to seamlessly complete automated tasks. From the early success of Manus to the subsequent agents of ChatGPT, major companies have now provided stable solutions. However, the scenario is entirely different for mobile devices, which boast a much more closed ecosystem. Whether it's Apple or Android, mobile systems impose stricter restrictions, clearer boundaries between applications, and fewer system capabilities available for invocation. Operations that desktop agents can effortlessly accomplish become a series of actions requiring deep system permissions when transplanted to mobile devices. Therefore, Doubao's attempt to replicate a 'global agent' on mobile devices is significantly more daunting than its PC counterparts. It must not only comprehend user intentions but also breach the natural barriers imposed by mobile systems on cross-application operations. For instance, functions like reading WeChat chat interfaces, if perceived by Tencent as posing privacy risks, may not be available in the official version.

Doubao Mobile Assistant Demo Video
After all, based on the current test results, Doubao is striving to play the role of not just a 'voice assistant' but a cross-system AI 'operating system.' This indeed represents the future trajectory of mobile interaction. However, this very reliance on underlying permissions has caused Doubao to collide with significant ecosystem barriers. Currently, almost all leading domestic mobile phone manufacturers are, to some extent, developing their own 'AI-native systems.' Xiaomi, Huawei, OPPO, Vivo, and even Apple are all fortifying their ecosystem closures through system-level integration. Under such circumstances, they are unlikely to permit an external third-party assistant to directly intervene at the underlying level or relinquish critical permissions. The collaboration with ZTE Nubia's engineering machine essentially underscores the same issue: manufacturers capable of deep cooperation with Doubao are often those with limited ecosystem scales. Truly influential leading manufacturers, due to inherent commercial opposition, are unlikely to grant such permissions.

Image Source: ZTE Mall
Therefore, Doubao now finds itself in a precarious position. On the one hand, its vision is reasonable and forward-looking; on the other hand, its ability to realize this vision hinges not on its own technology but on whether system permissions can be granted.

When Cornered, 'Doubao Mobile' Emerges as the Only Viable Option
When Doubao opted to develop such a highly system-permission-dependent assistant, it implicitly accepted a premise: for this capability to become a true 'standard experience,' it must penetrate the core of the software and hardware ecosystem. There are only two paths to achieve this: persuade a leading manufacturer to open up system permissions or develop its own fully controllable hardware. The first path is nearly unrealistic. All leading manufacturers are developing their so-called AI OS and would not allow an external assistant to become a system-level entry point.

AIOS Architecture
This implies that Doubao cannot rely on the underlying permissions provided by existing manufacturers to build its core capabilities. In fact, it is improbable that any AI company can fully realize its needs and ambitions by relying on third-party hardware. Meta relies on glasses, while Alibaba recently launched its Quark AI Glasses, indicating a consideration for seeking independent hardware entry points. This trend is occurring globally. Whether it's OpenAI's lightweight device strategy or Google's deep integration with the Pixel series, both essentially point in the same direction: the core of next-generation AI smart hardware is no longer the UI but AI logic.

Pixel 10's AI Personal Summary Feature
Under these circumstances, Doubao must either become an integral part of someone else's system or possess its own device, controlling the entire chain from hardware to OS. For Doubao, the question is probably not whether to develop hardware but that without hardware, its vision cannot be realized. In fact, rumors about ByteDance developing smart glasses have been circulating for a long time, and the acquired Smartisan hardware team is clearly not idle. Given the current industrial landscape, such 'operational layers' can only be fully realized when attached to hardware that Doubao can control. Although Doubao specifically emphasized 'not making phones' when launching its mobile assistant, considering technical direction, ecosystem conflicts, and permission structures, it is almost inevitable that it will contradict itself in the future.