Computing Power Crisis: Four Major Forces Enter Token Service Market!

05/29 2026 414

After Token became a 'hard currency', four major forces are rapidly positioning themselves in the market.

'The market is undersupplied—you can sell as many Tokens as you have,' Xin Zhou, General Manager of AI and Large Model Platform at Baidu Intelligent Cloud, told Shuzhi Qianxian. Previously, Tokens were sold at discounts; now, they're unavailable even at premium prices due to soaring inference demand. Mao Yunhang, co-founder of AI Infra company Shishi Technology, observed the market shifting from buyer-driven to seller-driven: 'APIs used to be discounted and unprofitable. Now, Tokens require guaranteed consumption volumes to secure favorable pricing and supply.' Alibaba Cloud Senior Vice President Liu Weiguang revealed a 15x surge in Token usage over the past five months.

With Tokens in high demand, four forces—cloud giants, model companies, operators, and AI Infra firms—have swiftly entered the Token service market.

01 Tokens: From Discounted Sales to Premium Unavailability

Despite Token scarcity, cloud vendors cautiously allocate GPU resources, balancing internal model training and external Token sales. 'Teams realize better models drive higher Token sales,' a cloud insider said, noting internal competition for GPU access. Last year, discounted Tokens underperformed hardware sales; now, vendors prioritize Token sales over pure hardware. 'Selling hardware is less profitable than selling Tokens,' the insider added.

Why the sudden Token frenzy? A surge in real demand.

Liu Weiguang highlighted Coding as a game-changer: it not only creates new applications but also revitalizes legacy systems deemed 'too old for the cloud' within a year. Critically, non-programmers now 'code' for reporting, analysis, and budgeting, unlocking productivity. The rise of AI Agents further amplifies Token consumption. Mao Yunhang quipped, 'Tokens vanish before any work begins.' With Agents acting as 'hands and feet,' each task step consumes Tokens, drastically increasing usage.

Over the past two years, cloud giants set Token sales targets—ByteDance tracked total Tokens, while Alibaba, Baidu, and Tencent focused on model API calls. However, execution faltered due to inflated, low-quality calls, such as using large models for data cleaning—tasks small models could handle. 'Low-quality calls' were rampant,' Xin Zhou explained. With Agent tech and model/Coding advancements, high-value applications emerged as heavy Token consumers. This year, cloud giants set ambitious Token targets based on genuine market demand.

Chinese Academy of Engineering member Zheng Weimin observed an industry pivot from MaaS (Model as a Service) to TaaS (Token as a Service). While firms rarely distinguish MaaS/TaaS, their focus has shifted to Tokens. Tokens, the minimal unit for large model processing (1,000 Tokens ≈ 700–800 Chinese characters), now serve three roles: basic information units, metrics for AI computational consumption, and industry pricing benchmarks. Previously, MaaS emphasized model availability with coarse billing (e.g., per API call). TaaS standardizes AI computing like utilities, refining billing to the Token level.

Zheng attributed this shift to AI infrastructure's current design for model training, creating a 'costly infrastructure, weak inference, low Token output' dilemma. He argued the AI infrastructure race now hinges on Token efficiency per watt, not just computing cluster size.

The Token market competition intensifies. Liu Weiguang estimated AI-native startups allocate nearly 100% of IT budgets to Tokens; domestic internet firms, 15–20%; traditional enterprises, under 5%. Alibaba Cloud mandates clients spend ≥20% of IT budgets on Tokens, creating MaaS-focused sales roles for AI-native startups and OPCs. Liu outlined three strategies: (1) deploy thousands of salespeople nationwide to drive adoption, starting with basic Coding upgrades; (2) open model strategies, treating all models deployed on Alibaba Cloud as first-party; (3) restructure KPIs to track daily growth in paying Token clients, core system Token integration, and Agent autonomy efficiency.

Xin Zhou cited state-owned enterprises, estimating Tokens at ~1% of IT budgets, with significant growth potential. Baidu prioritizes Agent efficacy this year, aiming to penetration (penetrate) and reduce costs after clients recognize value.

Amid computing shortages, domestic AI infrastructure gains traction. Mao Yunhang noted rising adoption of domestic chips capable of supporting large-scale clusters. Shishi Technology's domestic adaptation efforts evolved from niche projects to production-grade demands. 'Adapting a model to a domestic chip for production-level use can revive its entire inventory,' Mao said.

Liu Weiguang predicted a broader IT spending shift as Tokens encompass 'everything,' reshaping software outsourcing and traditional IT procurement. Tokens are becoming the new 'water and electricity.'

02 Moves by the Four Major Forces

After Token became a 'hard currency', four forces rapidly positioned themselves: cloud giants, model companies, operators, and AI Infra firms.

Cloud giants led Token services, leveraging full-stack capabilities—models, computing infrastructure, and proprietary chips. Baidu's 2024 Developer Conference unveiled 'Chip-Cloud-Model-Agent,' while Alibaba Cloud's summit proposed 'Chip-Cloud-Model-Inference.' Liu Weiguang emphasized cost efficiency as the key differentiator, driven by full-stack technology. This year, he stressed deep chip-model integration: 'Each model trains on robust computing power, forming a symbiotic, escalating relationship. We must pursue cloud-chip-model unity.'

Product-wise, cloud vendors transition from cloud-native and AI-native to 'Agent-native,' overhauling cloud stacks and services for Agent applications. Firms systematically upgrade cloud product lines—Skill, MCP, and CLI—while promoting Token sales and developing in-house Agent applications (e.g., Coding, tools) for C-end and B-end closed loop (closed loops).

Model companies form the second force, including Zhipu, Minimax, and Kimi. Unlike cloud giants, they prioritize model development.

They offer API/Token services and outsource model API sales. Despite multi-billion-dollar market caps, their revenue and cash flow remain modest, leading to lean operations with minimal self-owned computing infrastructure. Their goal: 'Sell developed models,' using Tokenization as a means. For example, Tianyi Cloud's recent Token packages for developers and SMEs incorporated models like Zhipu GLM5.

Operators, the third force, launched Token packages in May, with China Telecom leading. At April's Digital China Summit, China Telecom President Liu Guiqing announced a Token-centric business model reshaping traditional industry roles, unveiling Tianyi Cloud's full-stack Token service from IaaS to SaaS. China Telecom debuted commercial Token packages in May.

Operators' strengths lie in vast data centers, computing/network resources, last-mile customer access, and nationwide local services. Tokenization aligns AI with telecom services like voice/data plans, enabling utility-style billing. By co-developing AI apps with ecosystem partners, operators drive AI adoption via Token services.

Critically, operators are primary domestic buyers of domestic chips, driving ecosystem adaptation. The industry faces low computing utilization, fragmented heterogeneous computing, challenging domestic adaptations, and rapid model iterations. Adapting new models to domestic chips for production can take months, during which model firms release newer versions, slowing overall adaptation. Operators leverage ecosystem integration to accelerate multi-chip adaptations and multi-model fusions, propelling the domestic ecosystem.

AI Infra firms, the hottest financing sector, see Agent-driven Token demand reshaping their business logic. Previously struggling with profitability, the market shift from buyer-driven to seller-driven clarifies their commercial path.

Shishi Technology, modeled after U.S. Corewave, aims to build an independent domestic GPU cloud ecosystem in China, focusing on large-scale cluster operations and domestic chip adaptations. Profitable for three years, it evolves into a capital-intensive third-party cloud platform. Guiji Flow, known for rapid DeepSeek model deployment with Huawei Cloud, focuses on MaaS near users. Wuwenxinqiong pioneered the 'MxN' concept, positioning as middleware for M models and N chips.

U.S. AI Infra firm Corewave faces pressure from top model firms and NVIDIA, limiting profits. However, Mao Yunhang told Shuzhi Qianxian that domestic AI Infra firms capitalize on domestic adaptation opportunities. The urgent need to adapt diverse domestic chips—each with unique architectures—far exceeds hardware vendors' capacity, requiring collaboration among chipmakers, AI Infra firms, and users. 'Domestic adaptation and optimization are our growth opportunities,' Mao said.

03 Coding and Agents: The Most Lucrative 'Cash Cows'

Among Token service directions, Coding and Agent-oriented large models yield the highest returns. Industry insiders told Shuzhi Qianxian that cloud giants' Coding Plans, though seem (seemingly) low-priced, are profitable. Under subscription models, most users consume far below limits, making Coding Plans more profitable than standalone Token sales.

A veteran added that video generation's commercial value pales compared to large models. Xin Zhou was blunter: large models in production environments generate immense, 'unlimited' revenue.

Liu Weiguang elaborated. While advertising, media, film, and short videos have vast markets, they pale next to Coding/Agent large models. Coding spawns Agents, which autonomously complete tasks, boosting productivity—all tied to large models. 'Our top priority is Coding/Agent large models, whose market dwarfs others,' he said.

Since Coding tools emerged, app development has accelerated. Liu predicted 'universal Coding' would multiply annual apps/Agents, reshaping software structurally.

AI Infra firms also eye this sector. Mao Yunhang said nearly all programmers now use AI, with global tech firms adopting model-driven Coding, quietly transforming the industry. Agent proliferation amplifies this. 'Stabilizing code output, optimizing caches, building complete projects, and efficient Agent production within constraints are top engineering priorities,' he said.

Opinions vary on Token growth. Most expect severe computing shortages through 2026 and beyond; others cite chip supply factors, urging longer-term observation.

All agree that maximizing Token efficiency per unit of computing power is key to unlocking AI productivity under resource constraints. 'Language models are 1D; driving is 2D; low-altitude, embodied, and world models are 3D. Training demands and full-scene inference scale exponentially, requiring prolonged computational efforts,' Mao said.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.