AI Computing Power Can't Always Operate Out of Love

04/16 2026 496

Computing power can pursue inclusivity and cost-effectiveness, but no provider can sustain operations purely out of love.

/

Writer: Luo Xiaomei Editor: Yang Xiaoruo, Zhang Hongyi

Produced by: Business Show

"With an average of over 150,000 API calls per month," Li Ran frowned as he examined the API call volume and billing data for his team's AI customer service SaaS tool over the past three months. On April 13th, an announcement on Alibaba Cloud's official website further weighed on Li Ran's mind. The notice revealed adjustments to the free API (Application Programming Interface) quotas for DataWorks standard and professional version users, with support for pay-as-you-go pricing. For DataWorks standard version, the free API call quota was reduced to 100,000 times per month, with any excess charged via OpenAPI on a pay-as-you-go basis. This meant that starting from the policy's effective date on April 14th, Li Ran, a DataWorks standard version user, would see his operational costs increase by over RMB 8,000 per month due to at least 50,000 excess calls, while the net profit of his AI customer service SaaS tool had just exceeded RMB 10,000 last month. "After crunching the numbers, it's clear that relying on AI for cost reduction and efficiency gains is no longer sufficient. Look at us this time last year—we were even worried about not using up our free call quotas!" Li Ran joked to us.

In the same period of 2025, Li Ran and his startup incurred costs of just RMB 500 for calling 10 million Tokens. Today, with the same usage and considering the price hikes of Tencent Cloud's Hunyuan model and Baidu's Wenxin, costs have surged to nearly RMB 10,000.

He said his company's cash flow could sustain operations for another three months, but the cost increase was still a source of pressure.

Since the AI boom this year, especially the OpenClaw (Longxia) craze since the 2026 Spring Festival, which has completely transformed Token consumption logic, small and medium-sized developers like Li Ran urgently need Token computing power.

According to JPMorgan Chase's forecast, China's AI inference Token consumption is expected to soar from approximately 10 quadrillion in 2025 to about 390 quadrillion by 2030, marking a nearly 370-fold increase over five years.

While the global AI industry awaits technological breakthroughs, a cost challenge triggered by computing power price adjustments is also emerging. Recently, price adjustment actions by domestic and foreign AI and cloud service providers have become increasingly frequent. According to public reports, Alibaba Cloud has confirmed that starting from April 18th, AI computing power, storage, and other products will see across-the-board price hikes, with a maximum increase of 34%.

From Alibaba and Tencent to AWS and OpenAI, no provider is absent, with overseas providers even adjusting prices more aggressively than their domestic counterparts. This means that the previously enjoyed free API call services for all consumers have been significantly reduced, with any excess requiring real monetary payment. Especially for high-frequency users like Li Ran, their API call costs will also increase.

This has prompted countless small and medium-sized developers to reevaluate the cost-optimization competition brought about by AI.

01 A Global Computing Power Price Adjustment

This adjustment is, in fact, a global revaluation of computing power value.

Let's first examine the adjustment paths of domestic providers. Baidu Intelligent Cloud was the first to adjust, announcing on March 18th that starting from April 18th, prices for AI computing power-related products would increase by 5%-30%, with API unit prices for the Wenxin Yiyan series rising by 12%-25%. The "permanent free unlimited" offer for low-tier models was canceled, replaced by QPS throttling and excess billing.

This is seen by the industry as the end of the computing power subsidy era, as small and medium-sized developers, previously attracted by free quotas, must now pay based on actual usage.

Tencent Cloud followed suit, adjusting Hunyuan model API prices in March. On April 9th, Tencent Cloud officially announced a price adjustment, stating that starting from May 9th, list prices for AI computing power, container service TKE-native nodes, and elastic MapReduce (EMR)-related products would uniformly increase by 5%.

ByteDance's Volcano Engine adjusted more covertly, changing the unit price of Doubao LLM Tokens in Q1 and raising video generation API prices from the beta phase. The cost of generating a single 15-second video is now approximately RMB 15, while unlimited free calls have been canceled, retaining only a short-term quota of 5 million Tokens/30 days for new users.

Zhipu AI has made the most frequent adjustments. Almost every model release by Zhipu has been accompanied by price increases. On April 8th, Zhipu launched its flagship open-source model, GLM-5.1, and simultaneously raised API prices for the Zhipu GLM series by another 10%, nearing Anthropic's levels. On the 12th of this month, Zhipu's Coding Plan (overseas version) saw a price hike, with monthly payments nearly doubling—Zhipu's third price increase this year.

During the Q1 2026 earnings call on March 31st, Zhipu CEO Zhang Peng stated that API call pricing for Zhipu had increased by 83% in Q1 2026. Despite this, the market remained undersupplied, with call volumes surging by 400%.

Although computing power is expensive, it also confirms a fact: AI has transformed from an optional tool into an essential production resource for enterprises, with users being less price-sensitive and more focused on model capabilities.

Overseas providers are also adjusting aggressively. On January 22nd, Amazon AWS broke its 20-year tradition of "price declines" by raising EC2 machine learning capacity block prices by 15%. On February 15th, Microsoft Azure adjusted GPT-4o and GPT-4 Turbo API prices, canceling the free quota for GPT-4o. On March 10th, Google Cloud announced AI computing instance price adjustments starting May 1st, discontinuing Gemini's low-priced subscription plans. OpenAI adjusted GPT-4o/4 Turbo API prices, with ChatGPT Plus rising from $20/month to $30/month and a daily message limit of 30.

From domestic to overseas, from computing instances to API calls, this global collective price adjustment has pulled the AI industry back from the subsidy-driven expansion phase to a rational track (translated as "path" or "course") of value-based pricing. Free quotas are a thing of the past, with pay-as-you-go becoming the norm. Developers must now recalculate and reassess their cost structures.

02 The Logic Behind the Adjustments

The collective price adjustments by global providers are driven by profit-seeking on the surface but essentially reflect the AI industry's transition from an expansion phase to a profitability verification phase. "Business Show" believes three underlying logics lie behind this global adjustment.

First and foremost is the revaluation of computing power value.

As the supply of AI's core fuels (GPUs, HBM) tightens and costs rise, all downstream providers are forced to adjust prices. The starting point for all this may trace back to NVIDIA.

Currently, NVIDIA holds an 85% global market share in AI chips, with a net profit margin as high as 56%. To a large extent, its pricing directly determines the industry's cost baseline.

In 2026, NVIDIA's Blackwell series GPU delivery lead times extend to 2027, with single-card procurement costs rising over 30% year-on-year. Meanwhile, HBM3E high-bandwidth memory spot prices have surged over 20% since the end of 2025, with a global capacity gap of 50%-60% and supply tightening.

More importantly, NVIDIA's closed-loop ecosystem of hardware and software further drives up industry costs. With 90% of global AI training code written in CUDA and 5 million developers reliant on this ecosystem, each H20 chip requires a $12,000 CUDA license fee, with implicit costs exceeding 30%.

This dual impact on performance and costs leaves providers like Alibaba, Tencent, Microsoft, and Google with no choice but to pass on cost increases to downstream users.

While rising computing power costs provide a passive reason for adjustments, the exponential growth in Token demand gives providers the confidence to adjust prices proactively.

In 2026, AI applications have evolved from single-round dialogues to the agent era, leading to exponential growth in Token consumption. Take OpenClaw and other agents as examples—their single-task multi-round recursion, tool calls, and reflective verification result in Token consumption 50 to 100 times that of traditional dialogues. A single active agent can consume Tokens at a rate a thousand times higher than ordinary users per month.

Data shows that domestic daily average Token call volumes exceeded 140 trillion in Q1 2026, a more than 1,400-fold increase from 100 billion in early 2024. ByteDance's Doubao consumes over 120 trillion Tokens daily, with multimodal (e.g., video/image) Tokens accounting for over 40% and costing over 10 times more than text-only Tokens. Baidu Qianfan platform's enterprise user Token consumption surged 280% QoQ in Q1.

The current state of computing power consumption can be described as strong demand for low-tier free models and undersupply of high-tier paid models. When demand grows and supply tightens, prices naturally follow supply and demand dynamics, explaining why Zhipu's price adjustments led to a 400% increase in call volumes.

Thus, high-quality Tokens have become a scarce resource.

Of course, the most fundamental change lies in the shifting business logic of the AI industry, which has moved from burning money for scale and accepting losses for users to prioritizing profitability and refined operations, with pricing power shifting back to providers.

Over the past two years, the AI industry has been in a frenzy of expansion, with significant capital investments. Providers attracted users and captured market share through free APIs and low-cost computing power, even if their AI businesses sustained losses, supported by profits from other segments and capital injections.

However, the winds shifted in 2026. Capital investments became more rational, and providers began feeling profitability pressures, with executives demanding: "Our AI business must turn a profit."

This explains why Alibaba Cloud adjusted free quotas and introduced pay-as-you-go pricing, while Tencent Cloud and Baidu Intelligent Cloud overhauled their pricing across the board. ByteDance's Volcano Engine leveraged internal scale effects to reduce costs and achieve AI business profitability through external price adjustments. Overseas providers like OpenAI and Anthropic also realized model value through price adjustments.

Referencing Amazon AWS's 14-year journey to break even and Alibaba Cloud's path to profitability in 2022, the price war among domestic cloud providers began as early as 2014 and has continued unabated since. Alibaba Cloud has consistently initiated large-scale price cuts, with single reductions exceeding 50%, while Tencent Cloud swiftly followed with even lower quotes, engaging in fierce competition.

According to public reports, Tencent Cloud was long viewed as a cost center within the group. To rapidly capture market share amid intense competition from Alibaba Cloud and Huawei Cloud, Tencent Cloud adopted an aggressive low-price strategy, securing large client orders through quotes far below cost and promises of long-term price stability.

While this strategy rapidly scaled Tencent Cloud's revenue, making it a domestic leader, it also trapped the business in a cycle of diseconomies of scale—the larger the scale, the heavier the losses. Not until 2025 did Tencent Cloud achieve full-year Large scale profitability (translated as "scaled profitability").

Undoubtedly, as AI computing power demand rises, the market size expands. Yet, the vast majority of cloud providers remain chronically unprofitable. Among them, only Zhipu, with a market value exceeding HK$400 billion, has the capital confidence to continue raising prices and experimenting. The rest are barely surviving.

In this environment, the survival of small and medium-sized enterprises (SMEs) is even more precarious.

03 Cost Increases and Billing Reflections

"Our small team lacks self-developed models and computing power reserves, relying solely on public cloud APIs," Li Ran's voice carried a tinge of helplessness. "After costs rise, we must either adjust prices or compress profits."

More realistically, providers will prioritize allocating computing power resources to high-paying, high-volume, and high-margin clients, such as finance, government, and top internet firms. SMEs will not only face higher costs but also resource allocation challenges, making it harder to secure stable computing power.

The most affected are "shell applications"—enterprises and platforms that lack technological barriers and simply repackage APIs for secondary development. "Once costs rise, their cost advantages diminish, forcing them to reevaluate their business models," the aforementioned investor told "Business Show."

For individual developers, free quota adjustments also have an impact, as the window for zero-cost trial and error closes. Baidu's adjustment of free quotas for low-tier models and ByteDance's change to Doubao's free quotas, retaining only short-term quotas for new users (Baidu: 1 million/90 days; ByteDance: 5 million/30 days), exemplify this.

It's time to replan cost investments. This billing reflection is also forcing developers to shift from mindless API calls to meticulous calculations, exploring model compression, quantization, context window optimization, RAG retrieval augmentation, and even hybrid calls to different model versions—all to reduce Token consumption.

However, this requires time and technical accumulation. For many small and medium-sized teams, the immediate priority is to replan their company's development path. Li Ran decided to explore each provider's pricing plans, noting that "combining and stacking them will be more cost-effective."

This price adjustment is accelerating industry differentiation. Leading enterprises, with their full-stack capabilities and scale effects, can maintain profit margins after adjustments and even consolidate market share through resource optimization. Meanwhile, small and medium-sized providers, especially those without self-developed models or computing power reserves, face cost increases that cannot be passed on, squeezing profits and forcing them to seek new paths.

Of course, exceptions exist. SMEs deeply rooted in vertical scenarios with core technologies (e.g., model optimization, cost control) may thrive amid these adjustments. Without relying on high-end APIs, they can find their footing by achieving cost reductions and efficiency gains in vertical scenarios.

Many are watching this price adjustment, with some arguing that providers' profit-seeking motives burden SMEs and developers with cost pressures. However, "Business Show" believes this adjustment signals the AI industry's maturation. After all, the free AI subsidy model of the past two years led many to assume AI was free, spawning countless valueless applications that wasted computing resources. The 2026 collective adjustment is essentially the market optimizing and eliminating such applications, forcing technological iteration. Only then can truly valuable AI applications receive reasonable commercial returns.

Computing power can pursue inclusivity and cost-effectiveness, but no provider can sustain operations purely out of love.

This adjustment also represents a return to business logic for providers, who can now achieve sustainable AI business profitability through cost-plus-reasonable-profit pricing. For SMEs and developers, controlling costs is just the first step; they must also focus on technological optimization and vertical scenario deep cultivation (translated as "cultivation" or "development").

AI has never been a free lunch. As the industry enters a value-based payment era, only enterprises and developers capable of creating value and managing costs will avoid being left behind and survive. "The End."

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.