05/28 2026
535


Early yesterday morning, Xiaomi's MiMo large model announcement sent ripples through the developer community.
The announcement's core message was straightforward: a significant price reduction.

The MiMo-V2.5 series API has permanently lowered its prices, with the maximum reduction reaching an astonishing 99%. Additionally, it no longer differentiates pricing based on context window lengths. The previously introduced Credits point system in the billing structure remains intact, with package availability increased by approximately 5 to 8 times.
From a broader perspective, this move transcends a mere promotional event; it signifies the first swift response in the domestic large model market, just four days after DeepSeek V4 reset industry price benchmarks.
Behind this so-called inclusive strategy lies the harsh reality of survival for domestic models and a profound misconception regarding token value.
01 DeepSeek Raises the Query, Xiaomi Provides the Solution
Many AI industry observers were immediately impressed by Xiaomi's rapid response. Liang Wenfeng's DeepSeek had just 'cast a stone' to lower API prices, and Xiaomi swiftly matched the prices of the V4 Pro and V4 Flash models.
This sends a crystal-clear signal: the second price war in the domestic AI industry, which I had long anticipated, has already commenced and quietly entered the 'red ocean' fray.
Objectively speaking, it must be acknowledged that, at this stage, domestic models still significantly trail behind GPT-5.5 and Opus 4.7, with a generational gap that is challenging to bridge in the near term. Personally, I believe this gap will continue to widen.
In the realm of 'cutting-edge intelligence,' domestic models are still in full pursuit. However, in large-scale application scenarios for 'non-complex tasks,' the intelligence levels among domestic models exhibit minimal absolute differences.
When intelligence levels fail to create a generational divide, cost-effectiveness (ROI) becomes the sole differentiator.
DeepSeek, under Liang Wenfeng's leadership, has already demonstrated through aggressive pricing strategies and consecutive price reduction announcements that, while maintaining top-tier domestic performance, low prices are the most effective means to attract traffic and foster 'substitution dependency' among users.
Xiaomi's swift follow-up validates another logic: in this race, those who do not adapt will simply watch their users drift away. The fact that the two leading domestic model performers have implemented such drastic price reductions indicates that some vendors' subscription services and API prices were inflated. This is no longer a matter of 'wanting to reduce prices' but a survival issue of 'pricing out or being priced out.'
02 The Mathematical Game Behind 11 Billion Credits
While prices have undergone seismic shifts, Xiaomi's Credits billing unit, introduced with its subscription service, remains steadfast.
From a marketing standpoint, this is indeed a simple yet ingenious move: 99 yuan for 1.3 billion tokens sounds like a steal; 99 yuan for 11 billion Credits seems like a giveaway.
The impact of such large numbers can significantly alleviate user concerns about 'whether the price reduction means reduced quality.' Nevertheless, it is prudent to remain calm and analyze the commercial intricacies behind it.
Regarding API prices, Xiaomi's ability to announce a maximum reduction of 99% is largely attributable to its original pricing appearing overly traditional and conservative in the wake of DeepSeek's impact. To remain competitive and prevent users from being instantly poached, such a drastic price reduction was necessary for the API.

For subscription services, Xiaomi pioneered the Token Plan domestically, a billing method that offers greater transparency and explainability and is gradually becoming a global standard, with an official claimed capacity increase of approximately 5 to 8 times.
Taking the most subscribed, commonly used, and cheapest Lite tier as an example, token availability increased from 60 million to 500 million, resulting in an approximately 88% reduction in unit cost, slightly lower than the API price reduction. Higher-tier subscription services saw even smaller cost reductions.
This difference is understandable, as subscription services themselves represent a 'wholesale price,' already significantly more cost-effective than direct API usage. Both domestically and internationally, users tend to opt for subscriptions when available.
Thus, Xiaomi's series of measures has a clear objective: attract traffic by aligning API prices with DeepSeek, then retain high-frequency users through subscription services. Even if subscription discounts are not as dramatic as API reductions, since DeepSeek does not offer subscriptions, Xiaomi's Token Plan, as the second 'crab eater,' is currently the most cost-effective 'computing power package' on the market.
This differentiated design actually guides user behavior: it encourages high-frequency, repetitive intelligent agent calls, as these scenarios offer Xiaomi the lowest costs and users the cheapest perceived prices.
03 Same Price, Different Value
When prices are aligned to the same baseline, the sole determinant of success becomes the productive value of tokens.
According to evaluations by Artificial Analysis and real-world performance feedback, Xiaomi's MiMo V2.5 Pro and DeepSeek V4 Pro exhibit distinct value orientations.

DeepSeek resembles a specialized player, slightly ahead in programming and logical reasoning capabilities, and more successful in capturing user mindshare. It is currently the preferred choice for many individual developers or small development firms. However, DeepSeek's lack of multimodal capabilities severely limits its application scenario expansion, with its current expert mode image recognition being only marginally better than nothing.
Xiaomi, on the other hand, has crafted an all-rounder, explicitly labeled as 'full-modal' at model release. Under the same API pricing, Xiaomi's tokens can handle complex interaction forms like images, audio, and video, offering advantages over text-only DeepSeek in intelligent agent applications.
This reinforces a point I've repeatedly emphasized: multimodal capabilities cannot be overlooked in the intelligent agent era; instead, they deserve greater attention.
Given this, where does Xiaomi's confidence in reducing prices lie? The technical details mentioned in the announcement vaguely reveal how Xiaomi lowers the physical cost per token.
SGLang HiCache and SWA (Sliding Window Attention) are two terms worthy of close attention. Simply put, Xiaomi believes that during large model inference, the most costly aspect is the KV Cache in GPU memory.
SWA technology eliminates the need for models to consume massive memory to remember irrelevant information from tens of thousands of words ago, explaining why Xiaomi abolished tiered pricing based on context window length this time. Multi-level storage optimization reduces data movement between memory, GPU memory, and SSDs to one-seventh of the original.
Technological leadership ultimately translates into pricing flexibility.
When Xiaomi can reduce cache hit costs to one-tenth or even one-hundredth of previous-generation models, a 99% price reduction ceases to be charity or mere marketing; instead, it releases technological dividends while eliminating competitors with outdated technical architectures and uncontrollable costs.
04 Beware of Token Monetization; Intelligence is the True Value
Finally, whether it's DeepSeek or Xiaomi's price reductions, everyone in the AI industry should pay attention to a deeper industry issue.
In today's AI market, the term 'token' seems to have been transformed into a 'standard currency.' Over the past two months, some companies have started evaluating employees based on 'how many tokens they consume each month,' while developers have begun flaunting their token usage.
But this is fundamentally a misunderstanding: tokens are not currency, and the value of tokens varies greatly across models.
Top-tier models like GPT-5.5 and Opus 4.7 have high token value because they can accomplish complex tasks with few tokens, boasting extremely high productivity density.
In contrast, tokens from low-intelligence models, even supplied in the billions, hold near-zero productive value if they fail to solve problems.
Recently, numerous domestic and foreign vendors, capitalizing on the popularity of programming agent software, have raised subscription service and API prices, essentially exploiting the ambiguity of the token concept to mislead users unfamiliar with AI into believing all models' tokens are identical raw materials.
Now, DeepSeek has turned over the table, and Xiaomi has sealed the door. The essence of both companies' actions is to restore tokens to their true value: as a cheap 'cyber-industrial consumable,' they must be affordable to support large-scale AI applications.
The second price war among large models has quietly begun, and once prices are driven down this time, they won't rise as easily as after the last price war. For vendors still clinging to high prices without offering top-tier intelligence, winter may arrive sooner than expected.

Finally, Xiaomi's concluding remark in the announcement is worth sharing with everyone: the value of technology ultimately lies in its breadth of use.
When tokens are no longer expensive, domestic large models can truly transform from laboratory samples into water and electricity accessible to everyone on demand.
And this reshuffling of intelligent value has only just begun.
