05/27 2026
387
After Token became a 'hard currency', four major forces are making rapid moves.
"The amount of Tokens you have determines how much you can sell; the entire market is facing a supply shortage," Xin Zhou, General Manager of Baidu Intelligent Cloud's AI and Large Model Platform, told Shuzhi Qianxian. Previously, Tokens were sold at a discount, but now they are unavailable even at a premium, with a strong demand for inference. Mao Yunhang, co-founder of AI Infra company Shishi Technology, observed that the entire market is shifting from a buyer's market to a seller's market. "APIs used to be discounted and unprofitable. Now, Tokens must guarantee a certain consumption volume to achieve favorable pricing and supply," he said. Liu Weiguang, Senior Vice President of Alibaba Cloud, provided a figure: Token calls on Alibaba Cloud have increased 15-fold in the past five months.
After Tokens became scarce, four forces—cloud giants, model companies, operators, and AI Infra enterprises—have rapidly entered the Token service market.
01 Token: From Discounted Sales to Unavailable Even at a Premium
Although Token sales are tight, cloud vendors have not capitalized by selling them openly. Instead, they cautiously allocate their GPU resources to balance internal model training and external Token sales. "Everyone realizes that training better models can sell more Tokens," said a person from a major company. Internal departments are competing for resources, with final decisions made through cost-benefit analyses. Last year, Tokens were sold at a discount, and selling hardware directly was more profitable. Now, the situation has reversed, with companies scaling back pure hardware sales: "Selling hardware is less profitable than selling Tokens."
Why have Tokens become so sought after 'overnight'? The reason is the surge in real demand.
Liu Weiguang said that coding has become a major dividing line. It not only generates new applications but will also unlock a large number of legacy systems in the coming year—those "too old to be moved to the cloud"—giving them new life with AI coding. More critically, non-programmers are now "coding," enabling everyone to create their own reports, analyses, and project budgets, thereby unleashing productivity.
The proliferation of agents has technically amplified Token consumption. Mao Yunhang described it as, "Tokens are gone before anything is done." Once agents have "hands and feet," every step of completing a task consumes Tokens, leading to a sharp increase in consumption.
Over the past two years, every major company has had Token sales targets. ByteDance looks at total Token volume, while Alibaba, Baidu, and Tencent focus on model call counts. However, execution has been difficult. Xin Zhou explained, "The market didn't have that much real demand. Many uses were inappropriate or overkill, such as using large models for data cleaning or tasks that small models could handle. We call this low-quality usage." With advancements in agent technology, models, and coding capabilities, some truly valuable applications have emerged, and these applications are heavy Token consumers.
As a result, every major company has set ambitious Token targets this year, "based on judgments about real market demand."
Facing this surge in demand, Zheng Weimin, an academician at the Chinese Academy of Engineering, observed an industry shift: from MaaS (Model as a Service) to TaaS (Token as a Service). Although many companies do not explicitly distinguish between MaaS and TaaS, their focus has tightly centered on Tokens.
Tokens are the smallest unit of information processed by large models, with 1,000 Tokens roughly corresponding to 700-800 Chinese characters. Zheng Weimin explained that Tokens now serve as three types of metrics: they are the basic unit of information processing for large models, a mapping of computational consumption in AI operations, and the standard unit for industry pricing and billing.
Previously, MaaS addressed "model usability" with relatively crude billing methods, such as per-call settlement. TaaS, however, encapsulates AI computing power into standardized services like water, electricity, and data, refining billing granularity to the smallest unit: Tokens.
Zheng Weimin explained the deeper contradiction behind this evolution: Current AI infrastructure is primarily designed for large model training, leading to an industry dilemma of "expensive computing infrastructure, weak inference engineering, and low Token output." His judgment is that the competition in AI infrastructure has shifted from scaling computing clusters to competing on Token production efficiency per watt.
The race for the Token market has also intensified. Liu Weiguang of Alibaba estimated that for AI-native startups, Token expenditures account for nearly 100% of costs; for domestic internet companies, it's 15-20%; and for traditional enterprises, it's still below 5%. Alibaba Cloud requires that customers' Token expenditures account for at least 20% of their total spending this year. The company has also created sales positions focused solely on MaaS, primarily targeting AI-native startups and OPCs (One-Person Companies) with MaaS sales. Liu Weiguang revealed three strategies: first, mobilizing thousands of salespeople nationwide to ensure coverage and encourage customers to start using Tokens, even for basic coding modifications; second, adopting an open model strategy, treating all models deployed on Alibaba Cloud as first-party models; and third, restructuring performance metrics to focus on three key areas: daily growth in the number of paying Token customers, the quantity and efficiency of Token integration into customers' core systems, and the efficiency of agents completing closed-loop tasks within enterprises.
Xin Zhou cited state-owned enterprises as an example, estimating that their Token expenditures account for about 1% of total IT spending, with significant room for future growth. Baidu requires that the effectiveness of agents be prioritized this year. Once customers see value, penetration and cost reduction can follow.
Against the backdrop of tight computing power, domestic AI infrastructure is seeing opportunities. Mao Yunhang observed that domestic chips are beginning to emerge, with some capable of supporting large-scale clusters. Shishi Technology's domestic adaptation work has also upgraded from small-scale or even "passion projects" to genuine production-level demand. "If a certain domestic chip can be adapted to deploy new models and meet production-level requirements, it essentially revitalizes the entire inventory of that chip," he said.
Liu Weiguang made a broader prediction: when Tokens cover "everything," the entire market's IT spending structure will fundamentally change, with software outsourcing and traditional IT procurement facing industrial reshaping. Tokens are becoming the new water and electricity.
02 Moves by the Four Major Players
After Token became a "hard currency," four major forces have made rapid moves: cloud giants, model companies, operators, and AI Infra enterprises.
Cloud giants were the first to propose Token services. Their core advantage lies in their full-stack capabilities, including models, computing infrastructure, and nearly all having their own chips. At Baidu's developer conference this year, the company proposed "chips, cloud, models, and agents," while Alibaba Cloud proposed "chips-cloud-models-inference" at its annual summit. Liu Weiguang of Alibaba Cloud told Shuzhi Qianxian last year that the "decisive factor" for cloud giants is cost-effectiveness, with full-stack technology being the core path to achieving ultimate cost-effectiveness. This year, he emphasized the deep integration of chips and models: "Every model training is backed by powerful computing power, with the two meshing and spiraling upward. Therefore, we must follow our own path, emphasizing the integration of cloud, chips, and models even more."
On the product side, cloud vendors are moving from cloud-native and AI-native to "agent-native." Their entire cloud technology stack and service systems are being rebuilt for agent applications. Currently, companies are systematically transforming their cloud product lines—Skill-based, MCP-based, and CLI-based. At the same time, cloud vendors are not only promoting Token sales but also placing great importance on packaging Tokens into agent applications, such as coding, various agents, and tools, completing a closed loop from Token production to application for both consumer and enterprise markets.
The second force is model companies, including Zhipu, Minimax, Kimi, and others. However, they are betting more on the models themselves, which is starkly different from cloud giants.
They provide API and Token services and also entrust other industrial chain (industry chain) parties to sell model call services. Although some model companies have gone public in Hong Kong with market capitalizations in the hundreds of billions, according to observations from multiple parties in the industry chain, such as data center builders, these companies' actual revenue and cash flow scales are still relatively small. Therefore, they generally choose to maintain a light operational model and currently hold limited self-owned computing infrastructure. The focus of these companies is on the models themselves, with "selling the developed models" as their core goal, and Tokenization being just a means. For example, in Tianyi Cloud's recent Token packages for developers and small and medium-sized enterprises, models such as Zhipu's GLM5 were integrated.
The third force is operators. In May, the three major operators collectively launched Token package services, with China Telecom acting the fastest. In fact, as early as April's Digital China Summit, Liu Guiqing, General Manager of China Telecom, conveyed that "the traditional industrial division of labor and value distribution model is being reshaped by a new business model centered on Tokens," disclosing strategic plans related to Tokens. Tianyi Cloud also began building a full-stack Token service system from IaaS to SaaS. Subsequently, China Telecom launched a trial commercial Token package in May.
Operators' core advantages lie in their possession of large data centers, computing power, and network resources, as well as their last-mile customer reach platforms and nationwide local service capabilities. After AI is Tokenized, it is logically similar to phone bills and data traffic, allowing for billing and operation like water and electricity. Operators jointly develop AI applications with ecosystem partners and use Tokenized services to drive AI adoption.
More notably, operators are among the first in China to procure domestic chips on a large scale and have a strong incentive to drive domestic chip ecosystem adaptation. Currently, the industry faces challenges such as low computing utilization, fragmented heterogeneous computing power, difficult domestic adaptation, and rapid model iteration. Industry insiders note that adapting domestic chips to new models and meeting production-level requirements may take several months. During this period, model companies continuously release new models, making the overall adaptation speed insufficient. Therefore, operators are also leveraging their ecosystem integration capabilities to mobilize all parties for multi-chip adaptation and multi-model fusion, becoming key drivers of the domestic ecosystem.
The fourth force is AI Infra enterprises, which are currently the most active in financing. The explosion of agent applications has driven up Token consumption and is reshaping the business logic of these companies. Previously, they struggled with "low margins and unprofitable operations," but now the industry's shift from a buyer's market to a seller's market has made the commercialization path of this sector increasingly clear.
Among these companies, Shishi Technology positions itself as China's equivalent to the U.S.'s Corewave, aiming to build a robust independent third-party domestic GPU cloud ecosystem. It focuses on large-scale cluster operations and domestic chip adaptation, achieving profitability three years ago and evolving into a capital-intensive independent third-party cloud platform. Silicon Flow entered the industry spotlight last year by partnering with Huawei Cloud to deploy the DeepSeek model the fastest, primarily focusing on the MaaS layer and closer to the user side. Infinigence was the first in the industry to propose the "MxN" concept, positioning itself as an intermediary product between M models and N chips.
Industry observers note that Corewave, a U.S. AI Infra company, faces limited profit margins due to pressure from both leading model companies and NVIDIA. However, Mao Yunhang told Shuzhi Qianxian that domestic AI Infra companies face a significant opportunity in domestic adaptation. The domestic market has an urgent need for domestic chip adaptation, with each chip having a different architecture and varying adaptation difficulties. Relying solely on hardware vendors is insufficient in terms of time and resources, requiring chip manufacturers, AI Infra companies, and application developers to jointly complete the entire chain. "Domestic adaptation and optimization are the opportunities we've found in this wave of development," Mao Yunhang said.
03 Coding and Agents: The Most Certain 'Money Printers'
Among the various directions of Token services, large language models (LLMs) for coding and agents yield the highest returns. Industry insiders told Shuzhi Qianxian that the Coding Plans (coding subscription packages) launched by major companies may seem low-priced but are actually profitable. Under a monthly subscription model, most users' actual consumption is far below the limit, making "Coding Plans more profitable on average than simply selling Tokens."
A senior industry figure further explained to Shuzhi Qianxian that the commercial value of video generation is now far lower than that of LLMs. Xin Zhou's judgment is even more direct: once LLMs truly enter production environments, they can generate huge returns with "no upper limit on revenue."
Liu Weiguang further analyzed this point. He believes that while advertising, media, film, television, and short videos have significant market potential, they are not in the same league as LLMs for coding and agents. His reasoning is that coding not only involves programming but also gives rise to agents, which can independently complete tasks and enhance human productivity, all deeply tied to LLMs. "Our greatest focus now is on LLMs for coding and agents. The market for these models will be vastly larger than that for other models," he said.
Liu Weiguang observed that since the emergence of coding tools, application development has accelerated significantly. He predicts that once "everyone can code" becomes a reality, the number of applications or agents generated annually will multiply compared to the past. This represents not just a leap in productivity but also a structural reshaping of the entire software industry.
AI Infra companies are also paying close attention to this trend. Mao Yunhang of Shishi Technology said that hardly any programmers now work without AI, with major companies worldwide using models for coding, quietly transforming the industry. The rise of agents has further amplified this effect. "How to ensure stable code output, maximize cache utilization, turn code into complete projects, and enable agents to produce efficiently within controllable ranges—these are the most discussed engineering directions in the industry today," he said.
Opinions vary in the industry regarding the next growth trend for Tokens. Most believe that industry computing power supply will be very tight in 2026 and will become even more so in the following years. Others argue that the current Token shortage is related to domestic and foreign chip supply but that longer-term trends remain to be seen.
However, there is consensus that under the constraint of limited computing resources, maximizing unit Token production efficiency has become a core proposition for unleashing AI productivity. "One observation I have is that language models are one-dimensional, while driving is a two-dimensional plane. When we move to low-altitude, embodied, and world models, it becomes three-dimensional. From the initial training demand to full-scenario inference usage, it's another order of magnitude increase. Therefore, we can see that a lot of time and effort will still be spent on computing," Mao Yunhang said.