AI Computing Power Can't Always Operate Out of Love

04/16 2026 501

Computing power can pursue inclusivity and cost-effectiveness. But no provider can keep operating out of love indefinitely.

/

By Luo Xiaomei | Edited by Yang Xiaoruo, Zhang Hongyi

Produced by Business Show

“Over 150,000 monthly API calls on average.” Li Ran frowned as he stared at the API call volume and billing data for his team's AI customer service SaaS tool over the past three months.

On April 13, an announcement on Alibaba Cloud's official website further weighed on Li Ran's mind. The notice revealed adjustments to the free API (‌Application Programming Interface) quotas for DataWorks Standard and Professional Edition users, with support for pay-as-you-go pricing. For DataWorks Standard Edition, the free API call quota was reduced to 100,000 times per month, with excess (excess) calls billed via OpenAPI on a pay-as-you-go basis.

This means that starting from the policy's effective date on April 14, Li Ran, a DataWorks Standard Edition user, will face over 8,000 yuan in additional operational costs per month due to at least 50,000 excess calls, while the net profit of his AI customer service SaaS tool barely exceeded 10,000 yuan last month.

“After crunching the numbers, this really isn't about AI reducing costs and improving efficiency anymore. Look at us this time last year—we were even worried about not using up our free call quotas!” Li Ran joked to us.

In the same period in 2025, Li Ran and his startup project incurred just 500 yuan in costs for calling 10 million Tokens. Today, with the same usage volume, superposition ( superposition ) price hikes from Tencent Cloud's Hunyuan model and Baidu's Wenxin, costs have surged to nearly 10,000 yuan.

He said his company's cash flow could sustain operations for another three months, but the cost increases were still putting pressure on the business.

Since the AI boom this year, particularly the OpenClaw (Longxia) craze since the 2026 Spring Festival, Token consumption logic has fundamentally changed, leaving small and medium-sized developers like Li Ran desperately in need of Token computing power.

According to JPMorgan Chase's forecast, China's AI inference Token consumption is expected to soar from approximately 10 quadrillion in 2025 to around 390 quadrillion by 2030, marking a 370-fold increase over five years.

While the global AI industry awaits technological breakthroughs, a cost challenge triggered by computing power price adjustments is also emerging. Recently, price adjustment actions by domestic and foreign AI and cloud service providers have become increasingly frequent. According to public reports, Alibaba Cloud has confirmed that AI computing power, storage, and other products will see across-the-board price hikes starting April 18, with maximum increases of 34%.

From Alibaba and Tencent to AWS and OpenAI, no provider is absent, with overseas vendors even implementing steeper adjustments than their domestic counterparts. This means the API free call services previously enjoyed by all consumers have been significantly reduced, with excess usage now requiring real monetary payment. For high-frequency users like Li Ran, this also increases their API call costs.

This is forcing countless small and medium-sized developers to reevaluate the cost-optimization race brought about by AI.

01 A Global Computing Power Price Adjustment

This adjustment is, in fact, a global revaluation of computing power value.

Let's first examine the adjustment paths of domestic providers. Baidu Smart Cloud was the first to act, announcing on March 18 that AI computing power-related product prices would increase by 5%-30% starting April 18, with API unit prices for its Wenxin Yiyan series rising by 12%-25%. The “permanent free unlimited” policy for low-tier models was canceled, replaced by QPS throttling and excess billing.

This is seen by the industry as the end of the computing power subsidy era, as small and medium-sized developers, previously attracted by free quotas, must now pay based on actual usage.

Tencent Cloud followed suit, adjusting Hunyuan model API prices in March and formally announcing a price adjustment on April 9, stating that AI computing power, Container Service TKE-Native Nodes, and Elastic MapReduce (EMR)-related product list prices would uniformly increase by 5% starting May 9.

ByteDance's Volcano Engine implemented more subtle adjustments, modifying Q1 Doubao LLM Token unit prices and raising video generation API prices compared to the beta phase, with a single 15-second video now costing approximately 15 yuan. Meanwhile, unlimited free calls were canceled, retaining only a short-term quota of 5 million Tokens/30 days for new users.

Zhipu AI made the most frequent adjustments. Nearly every model release by Zhipu was accompanied by price increases. On April 8, Zhipu launched its flagship open-source model GLM-5.1 while raising GLM series API prices by another 10%, bringing them close to Anthropic's levels. On the 12th of this month, Zhipu Coding Plan (overseas version) saw price hikes, with monthly payments nearly doubling—the third price increase this year for Zhipu.

During the Q1 2026 earnings call on March 31, Zhipu CEO Zhang Peng stated that API call pricing for Zhipu had increased by 83% in the first quarter of 2026, yet the market remained undersupplied, with call volumes surging by 400%.

While computing power is expensive, it also confirms a fact: AI has transformed from an optional tool into an essential production asset for enterprises, with users prioritizing model capabilities over price sensitivity.

Overseas providers are also making aggressive adjustments. On January 22, Amazon AWS broke its 20-year tradition of “price declines” by raising EC2 machine learning capacity block prices by 15%. On February 15, Microsoft Azure adjusted GPT-4o and GPT-4 Turbo API prices, canceling free quotas for GPT-4o. On March 10, Google Cloud announced AI computing instance price adjustments starting May 1, discontinuing Gemini's low-cost subscription plans. OpenAI adjusted GPT-4o/4 Turbo API prices, raising ChatGPT Plus from $20/month to $30/month with a daily message limit of 30.

From domestic to overseas providers, from computing instances to API calls, this global collective price adjustment is pulling the AI industry back from the subsidy-driven expansion phase to a value-based pricing rationale. Free quotas are becoming a thing of the past, with pay-as-you-go becoming the norm. Developers must now recalculate and reassess their cost structures.

02 The Logic Behind the Adjustments

The collective price adjustments by global providers are driven by profit-seeking on the surface but essentially reflect the AI industry's transition from expansion to profitability validation. Business Show identifies three underlying logics behind this global adjustment.

First, the core underlying logic is the revaluation of computing power value.

As the supply of AI's key fuels (GPUs, HBM) tightens and costs rise, all downstream providers are forced to adjust prices. The starting point of this chain reaction may trace back to NVIDIA.

Currently, NVIDIA holds an 85% global market share in AI chips, with a net profit margin of 56%. To a large extent, its pricing directly determines the industry's cost baseline.

In 2026, NVIDIA's Blackwell series GPU delivery lead times extended to 2027, with single-card procurement costs rising over 30% year-on-year. Meanwhile, HBM3E high-bandwidth memory spot prices surged over 20% from the end of 2025, with a global capacity gap of 50%-60% and supply shortages emerging.

More critically, NVIDIA's closed-loop ecosystem of hardware + software further drives up industry costs. With 90% of global AI training code written in CUDA and 5 million developers reliant on this ecosystem, each H20 chip requires a $12,000 CUDA licensing fee, with recessive (implicit) costs exceeding 30% of the total.

This dual impact on performance and costs leaves providers like Alibaba, Tencent, Microsoft, and Google with no choice but to pass on cost increases to downstream users.

While rising computing power costs provide a passive rationale for adjustments, the exponential growth in Token demand gives providers active leverage for price hikes.

In 2026, AI applications evolved from single-round dialogues to the Agent era, triggering exponential growth in Token consumption. Take OpenClaw and other Agents as examples—their single-task multi-round recursion, tool calls, and reflective validation result in Token consumption 50 to 100 times higher than traditional dialogues, with a single active Agent consuming Tokens at over a thousand times the rate of average users per month.

Data shows that domestic daily average Token call volumes exceeded 140 trillion in Q1 2026, up over 1,400-fold from 100 billion in early 2024. ByteDance's Doubao consumed over 120 trillion Tokens daily, with multimodal (e.g., video/image) Tokens accounting for over 40% of the total, costing 10 times more than text-only Tokens. Baidu Qianfan platform's enterprise user Token consumption surged 280% QoQ in Q1.

The current state of computing power consumption can be summarized as follows: strong demand for low-tier free models and undersupply of high-tier paid models. When demand grows and supply tightens, prices naturally follow supply-demand dynamics, explaining why Zhipu's price adjustments led to a 400% increase in call volumes.

Thus, high-quality Tokens have become a scarce resource.

The most fundamental change, however, lies in the shifting commercial logic of the AI industry, which has transitioned from burning cash for scale and acquiring users at a loss to prioritizing profitability and refined operations, with pricing power shifting from users back to providers.

Over the past two years, the AI industry operated in a frenzy of expansion, with heavy capital investments. Providers relied on free APIs and low-cost computing power to attract users and capture market share. Sustaining AI business losses was deemed acceptable, as profits from other segments and capital injections ensured continued investment.

However, the winds shifted in 2026. Capital investments became more rational, and providers faced mounting pressure to deliver profits, as top executives demanded: “Our AI business must turn a profit.”

This explains why Alibaba Cloud adjusted free quotas and introduced pay-as-you-go pricing, while Tencent Cloud and Baidu Smart Cloud overhauled their pricing across the board. ByteDance's Volcano Engine leveraged internal scale effects to reduce costs while adjusting external prices to achieve AI business profitability. Overseas, OpenAI and Anthropic also validated model capabilities through price adjustments.

Referencing Amazon AWS's 14-year journey to break even and Alibaba Cloud's path to profitability in 2022, the price wars among domestic cloud providers began as early as 2014 and have persisted for over a decade. Alibaba Cloud frequently initiated large-scale price cuts, with single reductions exceeding 50%, while Tencent Cloud swiftly followed with even lower quotes, engaging in cutthroat competition.

According to public reports, Tencent Cloud was long viewed as a cost center within the group. To rapidly seize market share amid fierce competition from Alibaba Cloud and Huawei Cloud, Tencent Cloud adopted an aggressive low-price strategy, securing key account (major client) orders through quotes far below cost and long-term no-price-hike commitments.

While this strategy rapidly scaled Tencent Cloud's revenue, briefly securing its position as China's second-largest provider, it also trapped the business in a cycle of diseconomies of scale—the larger the scale, the heavier the potential losses. Not until 2025 did Tencent Cloud achieve full-year scaled profitability.

Undoubtedly, as AI computing power demand rises, the market size expands. Yet, the vast majority of cloud providers remain chronically unprofitable. Among them, only Zhipu, with a market capitalization exceeding HK$400 billion, has the capital resilience to continue raising prices and experimenting. The rest are barely surviving.

Under such circumstances, the survival of small and medium-sized enterprises (SMEs) becomes even more precarious.

03 Rising Costs and Billing Reflections

“As a small team without self-developed models or computing power reserves, we rely solely on public cloud APIs.” Li Ran's voice carried a tinge of resignation. “After costs rise, we must either adjust prices or compress profits.”

More realistically, providers will prioritize allocating computing resources to high-paying, high-volume, and high-margin clients such as finance, government, and top-tier internet enterprises. SMEs not only face increased costs but may also struggle with resource allocation, finding it harder to secure stable computing power.

The hardest hit are shell applications—enterprises and platforms that lack technical barriers and simply repackage APIs for secondary development. “Once costs rise, their cost advantages diminish, forcing them to reevaluate their business models,” the aforementioned investor told Business Show.

For individual developers, free quota adjustments also carry consequences, as the window for zero-cost trial and error closes. Baidu reduced free quotas for low-tier models, while ByteDance adjusted Doubao's free quotas, retaining only short-term quotas for new users (Baidu: 1 million/90 days; ByteDance: 5 million/30 days).

It's time to replan cost investments. This billing reflection is also pushing developers to shift from mindless API calls to meticulous calculations, exploring model compression, quantization, context window optimization, RAG retrieval augmentation, and even hybrid calls to different model versions—all to reduce Token consumption.

However, this requires time and technical accumulation. For many small and medium-sized teams, the immediate priority is to replan their company's development path. Li Ran decided to study each provider's pricing plans, noting that “combining and stacking them could be more cost-effective.”

Evidently, this price adjustment is accelerating AI industry differentiation. Leading enterprises, leveraging full-stack capabilities and scale effects, can maintain profit margins post-adjustment and even consolidate market share through resource optimization. Meanwhile, small and medium-sized providers, especially those without self-developed models or computing power reserves, face cost increases that cannot be passed on, eroding profits and forcing them to reexplore development paths.

However, exceptions exist. SMEs deeply rooted in vertical scenarios with core technologies (e.g., model optimization, cost control) may emerge stronger from this adjustment. Without relying on high-end APIs, they can find their niche by achieving cost reductions and efficiency gains in vertical scenarios.

While many observe this price adjustment with concern, arguing that provider profit-seeking pressures SMEs and developers, Business Show contends that this adjustment signals the AI industry's maturation. After all, the free AI subsidy model of the past two years led many to assume AI was free, spawning countless valueless applications that wasted computing resources. The 2026 collective adjustment essentially optimizes and eliminates such applications, forcing technological iterations. Only then can truly valuable AI applications earn reasonable commercial returns.

Computing power can pursue inclusivity and cost-effectiveness. But no provider can keep operating out of love indefinitely.

For providers, this adjustment represents a return to commercial logic, enabling sustainable AI business profitability through cost + reasonable profit pricing. For SMEs and developers, beyond cost control, the focus must shift to technological optimization and vertical scenario deep cultivation (deep cultivation).

AI has never been a free lunch. In the future, the AI industry will enter an era of paying for value. Only enterprises and developers that can truly create value and manage costs effectively will avoid being left behind by the times and manage to survive and thrive. "The End."

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.