Models as the Gateway, Cloud as the Hinterland: Tan Dai’s Determination Is Often Underestimated

05/22 2026 510

In 2026, China’s AI cloud market saw the rise of two "champions," yet these leaders are tackling entirely different challenges.

Omdia’s May 19 report provided the first clue: By 2025, China’s AI cloud market will reach RMB 56.7 billion, with Alibaba Cloud claiming the top spot at 38.1% revenue share—surpassing the combined market share of the second- to fourth-ranked players—and leading both AI IaaS and MaaS segments.

However, a week before Omdia’s data release, IDC presented a different perspective. IDC’s figures revealed a 16-fold year-over-year surge in large model API calls on China’s public clouds in 2025, totaling 1,944 trillion Tokens. In this ranking, Volcano Engine claimed the crown with a 49.5% share in call volume and over 40% in revenue.

Omdia measures "who sells the most," while IDC tracks "who is used the most." These dual metrics collectively underscore the prosperity of China’s 2026 cloud market and highlight the distinct strategic priorities of the two companies. Alibaba Cloud maintains its revenue dominance, while Volcano Engine builds its position through usage scale, attempting to convert volume into irreplaceable platform-layer value.

On May 11, Volcano Engine unveiled China’s first Agent Plan. The offering includes GLM-5.1, Kimi-K2.6, and Volcano Engine’s proprietary models, all billed under a unified AFP (Agent Flow Pricing) model starting at RMB 40 per month. A company that accounts for nearly half of China’s public cloud MaaS API calls voluntarily includes competitors’ products on its platform.

Today, the performance gap between mainstream models has narrowed to the point where most enterprise users struggle to perceive differences in real-world applications. With Token prices continuously declining, the cost for users to switch providers approaches zero. Under these conditions, the narrative value of "the strongest model" is rapidly diminishing, while the strategic importance of "the most comprehensive platform" rises.

The Agent Plan operationalizes this logic rather than introduces it. IDC’s data confirms this judgment from a scale perspective, showing a 16-fold year-over-year increase in China’s public cloud MaaS API calls in 2025. In a market expanding so dramatically, success may never be determined by absolute model capability alone.

Low-Priced Tokens: Scaling First, Profiting Later

Volcano Engine’s near-50% MaaS market share stems not only from model capability but also from its early scale accumulation and transformation of that scale into sustainable engineering advantages. This strategy begins with a clear assessment: The traditional public cloud market has matured, leaving MaaS as the remaining avenue for differentiation.

Volcano Engine officially launched in 2020. When Tan Dai took the helm, the traditional IaaS sector had devolved into a war of attrition centered on customer retention and operational efficiency, leaving little room for latecomers to reverse the situation. MaaS thus emerged as Volcano Engine’s most promising breakthrough path—establishing an entry point through model services to drive synergistic growth across IaaS and PaaS layers.

This logic finds parallels overseas. Azure’s sale of OpenAI model APIs represents just the first link in the chain. Once enterprise clients adopt large models, they often proceed to purchase supporting cloud services like retrieval and databases, driving up overall spending.

By late 2020, Tan joined Volcano Engine through ByteDance’s acquisition of "1024," initially leading its technical architecture after extensive experience in search engines. Compared to "providing the smartest answers," search competition prioritizes "enabling users to find results at the lowest cost and highest efficiency." This DNA directly shaped his approach to MaaS—viewing Tokens as production materials that must reach users with maximum efficiency.

According to LatePost, Volcano Engine twice revised its MaaS revenue targets upward in 2025, with additional increases following the releases of Seed 2.0 and Seedance 2.0, refreshing its 2026 target to over RMB 10 billion. Resource allocation and adjustment rhythms consistently prioritize the same objective.

Tan has explicitly stated that the primary drivers of rapid Token usage growth are the explosion of AI video creation and accelerated adoption of AI agents, rather than overall improvements in general-purpose language model capabilities. This judgment gains additional explanatory power given the discrepancy between Doubao’s market performance and its benchmark testing results.

In video generation—the highest Token-consumption scenario—ByteDance currently leads the market. According to AI Press, Seedance commands over 80% market share by daily computing power consumption, with Kling following at ~14% and Wanxiang at ~4%. This means that for every 10 AI video generation requests, over 8 flow to Seedance.

AI agent scenarios similarly amplify Token consumption, as a single Agent task typically involves multiple rounds of inference, tool calls, and task execution, consuming far more Tokens than ordinary conversations. This scenario structure forms the first critical premise for understanding Volcano Engine’s market share—its call volume leadership largely rests on demand density in specific scenarios.

Price mechanisms serve as Volcano Engine’s leverage for scale accumulation. In May of the previous year, Volcano Engine slashed prices for its Doubao large model into the "fen era" (sub-cent pricing), with Doubao 1.6 pioneering tiered pricing based on input length, reducing costs by 63% compared to similar models. Tan’s post-hoc explanation was succinct: "When technology can reduce costs, we decide to cut them thoroughly in one go."

The technical foundation supporting this price cut lies in two key engineering optimizations Volcano Engine implemented early at scale: PD separation and KV Cache. A more intuitive analogy helps explain their roles: PD separation resembles assigning "reading questions" and "answering questions" to separate workstations, matching each task with optimal computing resources; KV Cache functions like providing a "scratchpad" for inference processes, caching previously computed states to avoid recalculating from scratch for each new content generation. Both technologies aim to reduce memory consumption and computational costs per inference.

The benefits of these technologies depend heavily on scale. At small volumes, maintaining complex caching and scheduling systems incurs costs that may offset saved computing power; only at larger scales do benefits become significant as cache hit rates improve. Tan once illustrated this amplification effect: optimizing utilization by 1% across 10,000 servers yields 100 times less benefit than the same optimization across 1 million servers.

As PD separation, KV Cache, and similar technologies diffuse across the industry and Token prices converge, the true barrier emerges. Scale-deficient followers attempting to match low prices face greater cost pressures, while larger platforms enjoy more cost flexibility and sustainability in price competition.

The second half of 2025 saw the year’s fiercest competition. Despite intensive market entries by competitors, Volcano Engine’s call volume share rose from 49.2% in the first half to 49.5% for the full year—not declining but slightly increasing. This figure partially validates the defensive value of scale advantages at this stage.

Platform Pricing Emerges After Model Commoditization

The launch of the Agent Plan signals a shift. It marks Volcano Engine’s strategic pivot at the product level from model distributor to infrastructure provider.

Before 2026, MaaS followed a single fundamental business model: selling Token APIs. Enterprises paid based on usage volume, with models as the core purchase object and platforms serving as mere conduits. The Agent Plan disrupts this structure’s underlying logic by packaging proprietary Seed-series models with third-party models like GLM-5.1 and Kimi-K2.6, along with Harness tools like web search, under a unified AFP billing system. The billing unit shifts from "how many Tokens consumed" to "how many tasks completed."

Harness represents the overlooked keyword in this announcement. While MaaS provides stable model capabilities, Harness transforms inference into constrained, trackable, and sustainable workflows. Though their roles differ, both share the same goal: making Agents truly usable in production environments. When enterprises run Agent tasks through the AFP unified billing platform, with workflow logs, usage reports, and audit trails generated within the same system, migration costs become a critical consideration.

According to LatePost, Volcano Engine’s product evolution over the past few years has not only strengthened MaaS competitiveness but also gradually expanded large model services into infrastructure covering Agent development and operations. Tan’s earlier description offers a reference: "Previously, coding essentially meant defining workflows with if-else statements; now, developing Agents based on models increasingly delegates flow planning, task decomposition, and sub-Agent creation to the models themselves."

The Agent Plan’s inclusion of competitor models suggests Volcano Engine judges its infrastructure value now exceeds that of individual model products, enabling channel revenue from third-party model distribution—similar to how AWS Marketplace allows third-party SaaS listings, with the platform’s core assets being the workflow data and billing integration depth users accumulate.

Two opposing forces operate simultaneously here: including competitor models reduces friction for users to switch models within the platform, representing openness; the AFP unified billing system aims to do the opposite—raising overall costs for users to leave the platform. Openness attracts users in, while billing integration keeps them there. Which objective prevails depends on enterprise clients’ platform dependency depth, measurable only after real Agent workflows deploy into production.

Currently, the key variable testing this judgment is whether third-party models’ share of total Agent Plan call volume declines over time. If users eventually migrate toward Seed-series models, the platformization narrative holds; if the proportion stabilizes or rises, it suggests a more pragmatic capability supplementation. Time will tell.

Supporting this platform transition is a concurrent convergence in organizational structure. In 2025, ByteDance’s AI R&D team underwent three integrations: AI Lab fully merged into the Seed team, while visual generation teams and Doubao technical departments came under unified Seed management, shifting from dispersed R&D to unified driving. This represents more than R&D efficiency integration—only a unified R&D system can provide MaaS platforms with stable, predictable model iteration rhythms.

Volcano Engine has answered MaaS’s first-phase core question: victory doesn’t require the strongest model but the lowest usage barriers, the most aggressive pricing strategy, and earlier scale accumulation than competitors. However, scale advantages must translate into platform-layer binding depth to sustain competitiveness in the next phase. Platform binding presupposes enterprise clients truly running Agent workflows here, demanding toolchain completeness and model reliability in critical scenarios.

Epilogue

The evolution from Token platforms to Agent infrastructure follows a discernible overseas trajectory. Anthropic’s partnerships with multiple cloud providers and OpenAI’s collaboration with AWS to encapsulate models into cloud-native Agent environments both aim to enable enterprises to develop and operate production-grade Agents entirely within cloud platforms. IDC reports that MaaS’s commercial boundaries are expanding from "usage-based inference services" to "operational foundations for enterprise AI workflows." Increasingly, major clients’ collaborations with platforms extend deeper into business processes rather than remaining at the billing level.

However, IDC’s forecast offers a judgment: China’s 2026 MaaS market will see Token consumption reach 40,000 trillion, corresponding to approximately RMB 18.6 billion in revenue. Consumption will expand about 21-fold in one year, with revenue growth lagging far behind volume growth—implying further compression of average Token prices.

Simultaneous volume growth and price declines reflect the industry’s collective choice at this stage: prioritize scale expansion first. However, low-price strategies have a financial floor—the decline rate of computing costs must outpace that of Token prices. The answer hinges on NVIDIA’s supply rhythms and the maturation progress of domestic chip alternatives—both currently difficult to predict precisely.

Tan once said MaaS remains in its infancy: "We’ve only run 500 meters in a marathon; don’t get complacent over minor achievements." This served as internal motivation in 2025, but reading it in May 2026 adds another layer of meaning. The decision to bundle competitor models into its offerings represents proactive positioning by a company already dominant in scale competition for potential rule changes in the second half. Whether this judgment proves correct will only be verified after enterprise clients deploy production-grade Agents.

"Good enough" won the first phase. Whether the same logic can prevail in Agent-as-a-Service competition depends on where enterprise-grade Agent scenarios’ actual tolerance thresholds for model capabilities lie—that’s the true question for the second half.

*The featured image and illustrations in the text are sourced from the internet.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.