03/08 2026
495

A set of data from OpenRouter, the world's largest AI model API aggregation platform, has caused a significant stir both domestically and internationally.
As of February 28, 2026, the total Token consumption of the top ten models on the platform has exceeded 28.7 trillion, with Chinese models contributing over 14.69 trillion. This marks the first time in history that Chinese models have accounted for more than half of the monthly Token calls, surpassing U.S. models.
From a weekly perspective, from February 16 to 22, Chinese models saw a weekly call volume of 5.16 trillion Tokens, while U.S. models fell to 2.7 trillion during the same period, giving Chinese models a 61% global share. Among the top five models in terms of call volume, China held four spots—MiniMax M2.5, Moonshot AI's Kimi K2.5, DeepSeek V3.2, and Zhipu AI's GLM-5.
Although Chinese model call volume declined the following week, with the lead being short-lived, the ability to compete head-to-head with U.S. models on the world's largest API aggregation platform and briefly take the lead is itself a testament to strength.
Additionally, among OpenRouter's user base, American developers account for 47.17%, while Chinese developers make up just 6.01%. Reports also indicate that 80% of U.S. AI startups use Chinese open-source models in their product development.
This means that the primary force driving Chinese models to the top consists of overseas developers from Silicon Valley and Europe, rather than domestic market hype.
This is because large models have transitioned from a single-dimensional competition of "who is smarter" to a multi-dimensional competition of "who is smart and cost-effective."
1. The Irresistible Appeal: American Developers Fall for Chinese Tokens
The ability of Chinese AI models to surpass the United States in global call volume is backed by a systemic advantage formed by multiple overlapping factors, with the most direct driver being affordability.
Let's look at some numbers.
Research from Changjiang Securities shows that in terms of input pricing, both MiniMax M2.5 and Zhipu AI's GLM-5 are priced at $0.3 per million Tokens, while Anthropic's Claude Opus 4.6 costs $5—16.7 times more than Chinese models.
The output side is even more striking. MiniMax-M2.5 is priced at $1.1 per million Tokens, and Zhipu AI's GLM-5 at $2.55 per million Tokens, while Claude Opus 4.6 costs $25 per million Tokens—approximately 22.7 times and 9.8 times more expensive, respectively. Alibaba's Qwen 3.5, released at the end of February, directly sets the price at 0.8 RMB per million Tokens, equivalent to one-eighteenth of Google's Gemini.
In many daily scenarios, especially with the advent of the Agent era, user demand for affordable, high-volume computing power outweighs the need for "top-tier intelligence."
In February of this year, the open-source framework OpenClaw gained popularity, transforming AI from a "chat tool" into a "digital employee" capable of independent work.
An Agent task can easily consume hundreds of thousands to millions of Tokens, making API costs based on usage a significant expense for developers. Moonshot AI Launch in line with the trend (seized the opportunity to launch) KimiClaw, supporting one-click deployment. As a result, Kimi K2.5's call volume within 20 days of release surpassed its total for the previous year, with cumulative revenue also exceeding the 2025 total.

This is also why Google and Anthropic have banned accounts making fully automated calls under subscription models—subscription fees are limited and far from covering the computing costs of fully automated calls.
When consumption grows exponentially, the unit price advantage of Tokens becomes a competitive lifeline.
This cost advantage doesn't come out of nowhere; it's underpinned by electricity and engineering.
At the end of the day, computing power relies on electricity.
China's industrial electricity prices are 30% to 40% lower than those in the U.S., with green electricity in central and western China being 50% to 70% cheaper. Additionally, China's large industrial electricity consumption allows for the full utilization of off-peak electricity for model training, forming a physical cost moat for Chinese AI companies.
On the other hand, there's the engineering capability honed by necessity. Since April 2024, Chinese AI companies have been operating under a ban on cutting-edge chips. Unable to access the best hardware, they've pushed existing chips to their limits.
Chinese models widely adopt Mixture of Experts (MoE) architectures, a technical approach that reconstructs the logic of computing power consumption. A model with hundreds of billions of parameters activates only a small subset of its "expert networks" when handling simple questions, adopting an "on-demand activation" model that saves electricity and computing power.
Finally, there's the positive cycle of the open-source ecosystem.
Over the past year, Chinese large models' share of global Token consumption has grown by 421%. A Stanford report states that from August 2024 to August 2025, Chinese developers contributed 17.1% of Hugging Face's total downloads, slightly higher than the U.S.'s 15.8%.
The open-source ecosystem lowers the barrier to entry for global developers and allows Chinese models to iterate rapidly through continuous technical feedback, expanding their combined advantages in capability and price. As Silicon Valley investor Aditya Agarwal put it, "Over 50% of large model calls are made through cheap open-source models. Chinese models are effectively supporting most AI applications, and U.S. counterparts can't even replace them."
The success of Chinese AI models going global is the result of technological architectural innovation, extreme cost control, open-source ecosystems, and scenario adaptation—a concentrated eruption of systemic advantages.
2. Overseas Expansion Models: From Applications to Computing Power and Ecosystems
If call volume data explains "how strong" Chinese AI is, the next question is: How do these Tokens flow globally?
In recent years, the mainstream approach for Chinese AI going global has been "application export"—packaging AI capabilities into apps and delivering them to overseas users. ByteDance's Gauthmath, Meitu's imaging products, and Kuaishou's KLING AI all follow this path.
To this day, this approach continues to contribute significant user bases and revenue.
Take Talkie, for example, an emotional companionship app covering over 200 countries globally, with increasing penetration among North American Gen Z. Every user conversation consumes Tokens. Such C-end revenue accounts for over 70% of Minimax's income and continues to grow rapidly: In February 2026, the average daily Token consumption exceeded December 2025's by more than sixfold.
ByteDance's Gauthmath captured a 47% share of the U.S. photo math search market, successfully replacing established player Mathway, following the same logic.
These models don't charge users directly based on Tokens but monetize through subscriptions, in-app purchases, and advertising. However, at their core, they still consume Chinese computing power, forming the "user base" for Chinese AI's global expansion.
If AI's global expansion is likened to a supply chain, applications are the downstream, while computing power is the upstream. Chinese companies first build products and traffic downstream, then move upstream to develop infrastructure.
On one hand, they directly export computing power through API pipelines, treating it like utilities.
Overseas developers call APIs of Chinese large models through aggregation platforms like OpenRouter, with inference completed in Chinese data centers and payment based on Tokens. Throughout this process, computing power and electricity remain domestic, with only value delivered cross-border (cross-border) through Tokens.
This is a classic "water and electricity" business. Developers don't need to deploy models or buy GPUs themselves; their applications can run on Chinese models.
According to reports, Moonshot AI's team responsible for API services has rapidly expanded, now reporting directly to President Zhang Yutong as an independent business unit. Organizational adjustments like this underscore the rapidly rising importance of the API business.
From a commercial perspective, this model's advantage lies in its scalability and impressive profit margins. Moreover, with the Agent era's arrival, the Token consumption per task is growing exponentially, amplifying the API business's potential.
On the other hand, by building open-source ecosystems, they pave the way for computing power exports.
Alibaba's Tongyi Qianwen and DeepSeek series have chosen a seemingly "free" path: fully open-sourcing model weights, toolchains, and engineering paradigms, allowing overseas developers to download and deploy them on local servers at no cost.
The goal of offering free access is to make Chinese models a default tool in global developers' kits, becoming part of their technical stacks. Once developers become familiar with open-source models, they'll naturally prioritize calling APIs from the same series when developing commercial applications.
The number of derivative models uploaded based on Alibaba and DeepSeek's open-source models has already surpassed those based on mainstream U.S. models. This indicates that global developers are building a vast technical ecosystem around Chinese open-source models. Once an ecosystem forms, migration costs become extremely high.
Today, Chinese AI's global expansion is no longer a single "application export" but a three-tiered structure: The bottom layer is the open-source ecosystem, winning developer mindshare through openness; the middle layer is API computing power export, directly selling Tokens to global developers as the core commercial engine; and the top layer is application export, reaching end-users through products, serving as both a traffic entry point and a key scenario for computing power consumption.
These three layers support each other, collectively demonstrating that Chinese computing power is becoming the underlying infrastructure for global AI.
3. The Second Half Challenge: Commercial Advantages Meet Regulatory Barriers
The numbers on OpenRouter are indeed impressive, but OpenRouter doesn't represent the full picture.
In the consumer market (developers, startups, Agent applications), decision-making chains are short, with cost-effectiveness and ease of use as core metrics. Developers often decide which models to use themselves. Under this logic, Chinese models' "affordability and abundance" are absolute advantages.
The enterprise market is different. Governments, finance, healthcare, and critical infrastructure involve long decision-making chains, covering compliance, security, auditing, supplier stability, and more.
The overseas enterprise market is even more complex.
Thus, one question is unavoidable: In international competition, pure commercial advantages like usability and low cost may not suffice.
For example, NVIDIA's H200 was previously banned from export. Although imports are now allowed, U.S. policies in the AI competition landscape could "reverse" again at any time, and current inference clusters still rely on NVIDIA's H100/H200.
Of course, blockades have a dual nature. On one hand, they increase training costs and slow model iteration; on the other hand, they force engineering optimizations to improve efficiency, driving progress in domestic chips.
But risks remain. Research from Galaxy Securities points out that the global model iteration cycle is shortening, with mainstream models now updating every few months instead of every six months. If core capability improvements slow down, cost advantages may quickly lose appeal in high-end markets.
Morgan Stanley's chief economist, Xing Ziqiang, believes that while there's certainly room for Token exports, China's open-source large models and Token exports leveraging electricity advantages shouldn't be overhyped while ignoring geopolitical and security considerations.
He cites China's 5G equipment sector, which also had cost and technological advantages but saw Chinese 5G base stations replaced in many European and U.S. telecom networks after 2018–2019.
In the enterprise market, price-sensitive SMEs may be penetrated by Chinese models' cost-effectiveness, but in sectors involving data sovereignty and critical infrastructure like government, finance, and healthcare, the entry logic shifts from "cost-effectiveness" to "compliance trust, brand recognition, and ecosystem lock-in."
The U.S. is systematically erecting entry barriers for the enterprise market through investment reviews, standard-setting, and data sovereignty rules.
This means the "geopolitical ceiling" is lowering.
In December 2025, the U.S. government proposed the so-called "Pax Silica" initiative, claiming to unite countries with leading global tech companies or other strategic resources to ensure "supply chain security" and more.
Experts argue that this is an attempt to reshape global technological division of labor and capital flows through rules, investments, and project lists—appearing as ecosystem reshaping but actually being exclusionary integration under packaging.
Who "they" refers to is It goes without saying (self-evident).
From chip blockades to "Pax Silica," from containment to rule exportation, the U.S. aims to reshape game rules and gain discourse power (discourse power) at the ecological level.
Thus, surpassing model call volume is a phased achievement, but it's only half of the story.
In the second half of AI's global expansion, while maintaining cost advantages, more complex challenges must be faced. Some can be addressed by improving model performance, system efficiency, and competitiveness, but others have no clear answers.