06/22 2026
569
AI phones are evolving into Agent phones.
A recent social media clash in the tech world has shattered the polished images of major companies in the AI era.
A Xiaomi engineer openly criticized on Weibo: 'When it comes to large models now, some companies just compete on volume and sentiment, resorting to bundling.' Onlookers believe this jab is aimed at Huawei's Pangu large model, which was just announced with great fanfare at HDC 2026, where Yu Chengdong retook the helm and vowed to claim the top spot in the industry.

Image source (Lei Technology)
To outsiders, this may seem like just another round of corporate bickering, but Lei Technology (ID: leitech) sees these criticisms as highlighting a collective anxiety in the 2026 mobile AI landscape.
Everyone is realizing that simply stacking parameters in the cloud and chasing benchmark scores is no longer effective. For mobile AI to survive today, it must follow these two trends:
One is whether on-device AI can handle intense computational demands locally;
The other is whether system-level agents can truly assist users by breaking down the barriers between apps.
Following these two threads, Lei Technology (ID: leitech) has combined half a year of hands-on testing and industry observations to uncover the underlying strategies of major players and see what cards each holds in this new AI battle.
At HDC 2026, Huawei's approach demonstrated not just breakthroughs in individual technologies but a rare 'full-stack, all-scenarios' systematic strategy in the industry.
From foundational computing chips to top-level AI applications, Huawei is building an ecological barrier with a completely self-developed technology chain.
Leveraging the Ascend computing architecture and a comprehensive cloud computing infrastructure, Huawei provides continuous data throughput support for the ongoing evolution of large models.
On the device side, Huawei deeply integrates 'Kirin chip affinity' technology, adhering to the 'Tao Law' of energy efficiency and computing power synergy, successfully keeping native 30B on-device models (including 2B active parameters) resident in regular RAM. Through quantization pruning and expert prediction algorithms, the phone achieves low-latency responses for frequent, small tasks during local operation while avoiding overheating and excessive power consumption.
As the intellectual core of the entire system, the openPangu 2.0 large model once again showcased its technical depth this year. Not only does it support long contexts of up to 512K, but it also plans to gradually open-source seven core components, enabling on-device-cloud collaboration.
At the system level, Harmony OS stands as the only fully self-developed system in China. The newly released HarmonyOS 7 places 'Agent-affinitive system architecture' at its core, directly reconstructing the relationship between applications and the system. It can disassemble and reorganize traditional applications into readily callable Skills and agents, enabling one-step service access.
Xiao Yi, the system agent at the forefront, now sees 3 billion daily activations, boasting over 2,100 system-level capabilities and 500+ partner-selected Skills. With HarmonyOS 7, Xiao Yi transforms into a super scheduler with spatial awareness capable of executing complex cross-device tasks.
Huawei's greatest strength lies in its opportunity to integrate chips, systems, ecosystems, and AI into a closed loop, providing users with multiple devices a highly cohesive Agent transition experience while further solidifying the barriers of the Harmony ecosystem.
Apple Intelligence and the new Siri AI unveiled at WWDC 2026, while still a mirage for Chinese users, showcase a high level of system-layer integration. Apple's core large model, AFM, is essentially a privatized and modified version of Google's Gemini recipe.
Apple has deeply mined interaction and system permissions. Siri AI now features an independent interaction history app and has developed multimodal screen perception capabilities. It can calculate shared expenses directly from on-screen bills and cross-app search private emails to automatically plan a three-day trip.
At the application layer, Safari and shortcuts have introduced Vibe Coding. The Photos app can even use on-device models to re-render 2D photos into spatial compositions with Z-axis depth information for wallpapers.
With its stringent Private Cloud Compute (PCC) architecture, Apple maintains its privacy bottom line (bottom line). By leveraging Google's brain to boost its own intelligence while firmly grasping its role as the system scheduler, Apple remains as calculating as ever.
Combined with its influence in the software ecosystem, Siri AI may become the fastest-growing Agent.
Xiaomi represents the deep dive into on-device computing power in 2026. This year, they invested over 16 billion yuan in AI and just released the Xiaomi MiMo-V2.5-Pro flagship base, with on-device active parameters soaring to 42B.
To adapt such a massive model to more devices, Xiaomi specializes in FP4 (4-bit floating-point) quantization technology. While extremely compressing model size, it maximizes the retention of native inference precision. Its specially tuned version even achieves generation speeds of up to 1,000 tokens/s on general-purpose GPUs.
With computing power as a safety net, Xiaomi previously initiated a small-scale closed beta for Xiaomi miclaw, a native mobile agent. It delves deep into the system bottom layer (bottom layer), capable of invoking over 50 system-level tools. For example, upon receiving a ticket purchase SMS, it can automatically read the message, create a calendar event, set an alarm, check the weather, and even pre-open the boarding pass—seven steps in total—requiring only your final confirmation.
Even more formidable, it fully integrates with the Mi Home IoT ecosystem, capable of reading and scheduling over 1 billion smart devices.
Instead of competing purely on software applications, Xiaomi uses Agents to activate its vast Mi Home hardware ecosystem—this is Xiaomi's unique moat.
OPPO, which two years ago boldly announced its 'all-in AI phone' strategy, finally reconstructed its system with a cohesive AI approach at ODC25 and ColorOS 16. Instead of stacking parameters, they introduced three technological foundations:
On-Device Compute achieves peak theoretical performance of 300 tokens/s locally with 128K long contexts;
PersonaX memory symbiosis engine builds multimodal 'lifelong memories' for users;
Agent Matrix intelligent agent ecosystem framework empowers Xiao Bu with cross-device task execution capabilities.
At the functional level, activating 'One-Touch Flash Note' while watching Bilibili allows AI to generate outlines and mind maps nearly in real-time. Clicking on the outline timeline can instantly jump back to the corresponding video segment. One-touch bookkeeping and order code recording functions using image recognition are also practical, complete with dedicated dynamic icons.
For ordinary users, these 'small but certain happiness' features that save daily hassles and grow more attuned with use are far more perceptible than model computing power and technical details.
As one of the earliest domestic manufacturers to dive deep into self-developed large model matrices, vivo has been sprinting toward lightweight on-device AI since releasing the Blue Heart large model in 2023.
Vivo understands user pain points: if the internet goes down, does AI become useless? Through Xiao V Memory 2.0, vivo directly builds a completely offline knowledge graph on the phone. Without an internet connection and with absolute privacy protection, Blue Heart Xiao V can still accurately retrieve information from massive photos and complex files.
As mentioned earlier in Lei Technology's (ID: leitech) hands-on testing, while budget phones struggle with large models, the flagship vivo X300 Pro can handle complex image recognition in just 32 seconds. This profound skill in computing power scheduling makes us highly anticipate the upcoming on-device AI foldable, vivo Fold 6.
In 2026, Honor largely avoids discussing its large model parameters, instead taking a clever route by focusing on reconstructing underlying interactions and hardware form factors.
In terms of hardware, the Robot Phone showcased at MWC 2026 features a miniature three-axis mechanical stabilization gimbal on its back, allowing the lens to automatically track subjects like a neck and even move to the rhythm of music—offering a physical interaction approach for homogenized imaging flagships.
On the system side, the YOYO agent, based on the AHI (Personal + Global + Edge Collaboration) strategy, can automatically execute over 3,000 scenarios and was the first to integrate with WeChat's A2A protocol.
Honor's approach of avoiding direct competition and using system-level scheduling to connect third-party vertical large models has allowed them to progress smoothly in breaking down app silos.
As the 'parent' of the Android ecosystem, Google's ambitions in the AI era extend far beyond being just an app—it aims to completely dominate the system foundation.
In on-device deployment, Google introduced the Gemma 4 model, designed for fully offline operation, and is testing Mobile Actions functionality in the Google AI Edge Gallery, attempting to convert natural language instructions directly into system-level operations.
While Lei Technology's (ID: leitech) earlier hands-on testing showed poor performance on budget phones, this is actually Google 'setting the rules' for the entire industry. By imposing hard requirements for system-level software ecosystems, Google is pushing chip manufacturers like Qualcomm and MediaTek to accelerate the iteration and decentralization of mid-to-low-end NPU computing power.
Google's most formidable trump card lies in its ecological dominance. With the Google suite above and the Android ecosystem below, coupled with the strong capabilities of Gemini itself—when Apple needs deep integration with Gemini and when various Android flagships use it as the core brain for all-scenario Agents—Google has already won big.
Google is not just setting system-level scheduling standards for on-device AI; it's reissuing tickets for the mobile ecosystem of the next decade.
In the on-device large model and Agent race, Samsung takes a highly pragmatic approach. Starting too late, they might as well outsource.
In overseas markets, Samsung deeply binds with Google, using the Gemini large model as the foundation for the Galaxy S26 series. During MWC demonstrations, its Agent could directly scan family group chats in the background. Upon detecting discussions about ordering pizza, it would automatically open food delivery apps and add items to the cart, only stopping for user confirmation before checkout.
In the Chinese market, for compliance, Samsung flexibly integrates AI services from domestic giants like Baidu's Wenxin Yiyan and Meitu.
While this may seem like 'eating at a hundred tables,' I must admit Samsung's skill in refining the experience.
Whether it's circle-to-search, real-time call translation, or intelligent photo editing in the gallery, Samsung seamlessly stitches together these seemingly piecemeal capabilities. As long as they complement their top-tier hardware to keep consumers comfortable, the engine's origin doesn't matter.
Running large models directly on phones locally sounds great—no network dependence, zero latency, and absolute privacy protection.
But the reality is that flagship phones running local AI enjoy technological luxury, while mid-to-low-end phones suffer.
In April, Google introduced the mobile-focused Gemma 4 model designed for fully offline operation, with online tests on flagship phones receiving unanimous praise.
However, when Lei Technology (ID: leitech) tested it on the vivo Y500 Pro, equipped with a mid-range Dimensity 7400 chip and NPU 655, the results were eye-opening.
Recommendations became a disaster zone of ineffective information: asking for movie recommendations for a long high-speed rail (high-speed rail) ride generated 500 words locally in 2.8 minutes, ending with unnecessary reminders to bring headphones.
Logic problems stumped it: solving a seating arrangement logic puzzle took 3.3 minutes of on-screen calculation (without allowing background tasks), ultimately yielding an incorrect answer.
Image recognition crashed: showing a picture of a large mall, it failed to recognize the prominent Apple Store sign; presenting a green plant image left it loading in circles for 5 minutes before crashing.
In contrast, the same model on the flagship vivo X300 Pro solved the logic puzzle in 1.6 minutes and recognized images in just 32 seconds.
This is the harsh industry reality: without strong hardware computing power, on-device large models are purely marketing gimmicks that torture users.
To address the pain points of local RAM and bandwidth being overwhelmed, major manufacturers are starting to modify algorithms at the bottom layer (bottom layer).
For example, Xiaomi specializes in FP4 (4-bit floating-point) quantization technology, maximizing the retention of native inference precision while extremely compressing model size, achieving generation speeds of up to 1,000 tokens/s on general-purpose GPUs.
Transsion takes a practical approach, compressing offline models into phones to enable real-time offline translation of various complex dialects in network-poor regions like Africa and the Middle East, practically eliminating the digital divide with on-device AI.
AI phones in 2026 are essentially vying for operating system entry points.
Major manufacturers collectively suffer from entry point anxiety, cramming AI into power buttons, negative screens, and sidebars. Some even start testing physical AI buttons.
But the more entry points are stacked, the more confused users become. Truly practical Agents should enable phones to complete operations autonomously, reducing user steps.
The industry's biggest headache used to be app silos. Phone assistants trying to send a WeChat message had to rely on brute-force screen reading and simulated clicks, easily getting stuck by risk controls—like last year's Doubao phone.
Recently, WeChat finally cracked open the door slightly, launching the A2A (Agent-to-Agent) protocol with Huawei, Honor, Xiaomi, and other major players. Large models no longer pretend not to understand; instead, phone assistants directly send work orders to the WeChat Agent, which executes them and returns results.
Lei Technology's (ID: leitech) hands-on test with the Honor Magic8 RS showed that after activating YOYO, saying 'Send WeChat to Sandwich: Genshin Impact starts' allowed the system to bypass ecological barriers with a single phrase and execute directly.
Without A2A integration, phone assistants encountering such instructions could only get as far as opening WeChat before being blocked by system pop-ups.
WeChat's openness provides the industry with a blueprint for efficient collaboration between major manufacturers' agents without traffic poaching.
At MWC 2026, we also saw excellent examples. The Nubia M153 uses on-device Nebula-GUI to run virtual machines in the background, directly simulating human fingers to operate in apps without APIs, enabling one-sentence cross-platform price comparisons and bookings.
After examining the true strategies of major players in 2026, it's clear that despite executives' verbal sparring on stage, when it comes to implementation, everyone is moving in the same direction:
Models must compress toward on-device deployment; otherwise, low-end computing power merely tortures users;
AI must evolve into multimodal Agents. From AI phones to Agent phones, future smartphones must grow 'eyes and hands' to break down app barriers;
Privacy and security must remain an unshakable bottom line (bottom line).
The real battle in AI smartphones is not about whose model name is longer, but about how much time they can save for ordinary people every day. After all, smartphones are not thesis defense arenas. In the end, what determines the winner is not the parameter list, but those few minutes of usage every day.
Xiaomi Huawei AI smartphones Apple
Source: Leikeji
The images in this article are from 123RF royalty-free image library. Source: Leikeji