03/16 2026
507

AI Freedom: Now Out of Reach for Ordinary People
The illusion of cloud computing as a sector bound only by falling prices was shattered in 2026.
Recently, Tencent Cloud's Intelligent Agent Development Platform announced an optimization of billing strategies for certain models.
According to the announcement, the adjustments involve two main changes: First, public beta models will no longer be free. Models like GLM 5, MiniMax 2.5, and Kimi 2.5 will end their free public beta on March 13 and transition to formal commercial services. Second, prices for the Hunyuan series models, Tencent HY2.0 Instruct and Tencent HY2.0 Think, will increase, with some models seeing hikes exceeding 400%.

Tencent Cloud, however, is not the first cloud platform to raise prices.
On February 11, UCloud announced price increases for its product services. UCloud stated that due to escalating global supply chain disruptions, infrastructure costs, particularly for core hardware procurement, have risen significantly and structurally. After careful evaluation, the company decided to adjust prices upward for all products and services for both new and renewing customers starting March 1.
This wave of price hikes is not limited to Chinese cloud providers. Looking across the Pacific, the first domino to fall was quietly pushed by Amazon.
On January 4, without any public announcement or press conference, AWS raised EC2 prices by approximately 15%, with flagship instance costs increasing from $34.61 to $39.80 per hour.
Google Cloud soon followed, announcing that starting May 1 of this year, it would raise global data transfer service prices, doubling the per-GB rate in North America from $0.04 to $0.08. The decision was made quietly, with just a single announcement.
AWS, Google, Tencent, and UCloud—cloud providers spanning the Pacific—have all independently decided to raise prices within the same timeframe, shattering a nearly two-decade-old industry belief: cloud services only get cheaper, never more expensive.
So, is this price hike the end of the two-decade myth of falling cloud service prices, or just a cyclical correction? Will token prices continue to rise indefinitely? Will your future AI bills become cheaper or more expensive?
01
How Cloud Providers Drove Prices Down for the First Two Decades
The shock from this price hike stems from users' long-standing expectation of two decades of continuous cloud computing price reductions.
Cloud computing is fundamentally a high-stakes gamble on scale.
When Amazon launched AWS in 2006, the logic was straightforward: its data centers had vast amounts of idle servers. Instead of letting them gather dust, why not rent them out by the hour? This business model, akin to renting out empty warehouse shelves, has, over the past two decades, upended the global IT industry's infrastructure.
According to incomplete statistics from Super Focus, AWS has proactively cut prices more than a hundred times over the past two decades. In China, Alibaba Cloud and Tencent Cloud have treated price cuts as a recurring festival. This near-suicidal pricing strategy, sustained for years, has drastically reduced global IT infrastructure costs.
Such a scenario is unheard of in any other industry. No landlord voluntarily lowers rent annually, and no supermarket reduces price tags every quarter. Yet, the cloud computing industry has achieved this—for two decades.
Why has only cloud computing succeeded in this?
The answer is simple: physical world dynamics have supported these price cuts.
Traditional cloud computing, or IaaS (Infrastructure as a Service), essentially sells "steel and electricity." Whether it's compute instances, storage, or network bandwidth, these are highly standardized digital utilities. The underlying cost of these resources is governed by a ruthless physical law: Moore's Law.
Chip transistor density doubles every 18 to 24 months, meaning the physical cost per unit of computing power has been in freefall. Cloud providers pay less for newer servers, reaping huge benefits from this technological cycle.
Yet, these savings have never fully reached cloud providers' bottom lines, as traditional cloud services are highly commoditized.
Your servers can run code; so can your competitors'. If you don't pass hardware cost savings to customers, rivals will undercut you by 30%, stealing your client base overnight.
Thus, price cuts weren't a choice but a survival instinct. If you didn't lower prices, others would.
Beyond technological drivers, economies of scale are another hallmark of cloud services.
Building a hyperscale data center—buying land, laying cables, constructing server rooms, and purchasing tens of thousands of servers—requires tens of billions in upfront investment.
Once operational, adding one more startup's website or processing one million more data requests incurs nearly zero marginal cost, aside from a few cents in electricity.
This is cloud computing's most alluring commercial leverage. At scale, serving more customers dilutes fixed costs per user.
Under this logic, industry giants understood: market share matters more than current pricing. By offering rock-bottom prices to dominate the market, they could spread costs, then undercut rivals further. It was a flawless virtuous cycle.
Giants calculated that with sufficient capital, they could outlast weaker competitors. Once they achieved monopoly status, pricing power would be firmly in their hands.
But reality proved harsher. The players at this table are all heavily armed "great powers."
Across the Pacific stand Amazon, Microsoft, and Google—the "three mountains." In China, Alibaba, Tencent, and Huawei dominate with seemingly bottomless cash reserves. All have strategic imperatives to endure losses indefinitely. Using low prices to outlast rivals? Everyone at this table is a trillion-dollar giant; none will back down easily.
Thus, cloud computing's price wars dragged on for two decades, a grueling war of attrition.
Objectively, this ruthless competition drove unprecedented technological advancement in cloud computing, transforming once-exorbitant enterprise IT infrastructure into today's "utilities." Without this two-decade price war, today's thriving internet ecosystem would not exist.
Yet, the price-cutting vortex never stopped—until 2026, when explosive AI demand suddenly jammed the decades-old price-cutting flywheel.
02
Short-Term Price Hikes Are a Bluff; Long-Term Bills Are the Real Threat
Cloud providers' sudden collective price hikes stem from a simple reality: their data center hardware is struggling under the 2026 AI demand surge.
When large models first gained traction, giants didn't immediately raise prices but offered free public betas. Like supermarkets offering free samples of new drinks, usage was light—users mostly asked frivolous questions, and cloud providers' computing reserves sufficed.
By 2026, the situation changed. Enterprises found AI genuinely useful, integrating it into customer service, internal analytics, and even core operations. Individual users discovered AI agents like Openclaw could handle simple tasks.
Token consumption skyrocketed, no longer a trickle but a tsunami.
As UCloud admitted, "infrastructure costs, particularly for core hardware, have risen significantly and structurally." Translation: users are consuming resources too aggressively, and purchasing top-tier GPUs and paying electricity bills is draining our finances.
This is the true logic behind the current price hikes—not cloud providers finally seizing pricing power, but an awkward "supply-demand mismatch."
Many data centers still rely on general-purpose GPUs designed for training large models, not the inference tasks now dominating usage. Using these expensive, power-hungry chips for routine token generation is financially unsustainable.
With enterprise usage surging and legacy infrastructure inefficient, cloud providers, facing cash flow pressure, turned to price hikes to stay afloat.
However, seeing Tencent Cloud end free tiers and AWS raise rates has spooked many. The assumption: since AI is becoming indispensable, cloud providers will gouge customers, making AI bills a bottomless pit.
This fear is misplaced. Current high computing costs stem from a transitional hardware shortage, but chip giants aren't idle.
Once large models' applications mature, the market will shift from training chips ("brain-building") to inference chips ("labor"). Soon, data centers will fill with new-generation inference hardware optimized for token generation.
At the upcoming GTC conference, NVIDIA is expected to unveil new inference chips integrating LPU technology. Domestic players like Cambricon are also focusing on inference chips.
These new chips eliminate unnecessary computing units, focusing solely on data throughput. At equal power consumption, they will exponentially increase token generation efficiency, driving down per-token costs.
Critically, software engineers are also optimizing costs relentlessly.
Early models were inefficient: asking about the weather activated billions of parameters, wasting electricity. Now, techniques like mixture-of-experts models ensure only relevant neural pathways activate, keeping most of the "brain" dormant.
This software-level efficiency, combined with new hardware, will continuously reduce the true cost of generating tokens in cloud data centers.
Thus, this price hike is likely a temporary rebound as legacy hardware struggles with new demand. Per-token costs will still plummet toward zero.
03
Tokens Get Cheaper, but "Intelligence" Gets Pricier
If token prices are destined to crash, will "AI freedom" soon follow? Not quite. The assumption—that completing tasks requires a fixed number of tokens—breaks down as AI evolves from "Q&A tools" to "AI agents."
In 2023, using AI meant exchanging text: a query consumed 1-2,000 tokens, costing pennies.
By 2026, AI usage fundamentally changed. When an agent tackles a real business task—analyzing a competitor report, reviewing a contract, or processing customer emails—its backend operations are far more complex.
It logically deduces steps, repeatedly queries search engines and databases, and even learns new skills from internal libraries to complete tasks.
Each backend hesitation or tool invocation consumes tokens. Compared to 2023, token usage isn't just a few times higher—it's orders of magnitude greater.
Economics offers a parallel: the steam engine. When James Watt improved steam engines 150 years ago, coal efficiency soared. Logically, coal consumption should have dropped. Instead, factories adopted steam engines en masse, and Britain's coal usage exploded.
Today, large models' computing consumption follows the same script: higher efficiency and lower unit costs drive greater total consumption.
Some ask: will algorithmic optimizations eventually offset rising token usage?
Unfortunately, no. AI computations occur in the physical world. Each silicon transistor flip and coolant cycle consumes real electricity. With billions of agents operating 24/7, handling exponential task volumes, this will translate to relentless data center noise and soaring electricity meters.
The physical world's energy limits ensure computing power cannot grow infinitely. This answers our core question: will your future AI bills get cheaper or more expensive?
The answer is stark: absolute costs will rise, likely sharply.
Extending these commercial and physical dynamics to their logical conclusion leads to a deeply uncomfortable conclusion:
For three decades, the classical internet era spun a comforting narrative: technology as a "great equalizer." Search engines democratized information, social media amplified grassroots voices, and smartphones bridged urban-rural divides. With near-zero software distribution costs, technological benefits transcended class barriers, reaching the masses.
However, in the AI era, this utopian logic is brutally fracturing.
When large models truly evolve from 'encyclopedias in dialogue boxes' into 'super Agents that think and make decisions on behalf of humans,' they inherently become insatiable Token-consuming beasts. Truly powerful AI capabilities will never approach being free, as web browsing once did. Their costs will escalate infinitely in proportion to the exponential rise in task complexity.
It is foreseeable that as the two-decade-long era of 'price-reduction accessibility' in cloud computing reaches its end, the future of intelligence will inevitably exhibit an extremely rigid 'stratification.'
Individuals or enterprises at the pinnacle of the financial food chain, who can afford high-quality AI computing power, will see their productivity exponentially amplified by superior Agents. Their business acumen will become sharper, decision-making processes shorter, and execution efficiency far surpass that of ordinary people. Moreover, this advantage will further compound through the data flywheel generated by high-frequency usage, creating a dimensionality reduction strike against those below.
Meanwhile, ordinary individuals and small-to-medium enterprises unable to afford exorbitant bills will have to rely on simplified, diluted 'low-tier intelligence' disguised as free offerings. Such versions may help draft perfunctory weekly reports or create a couple of illustrations, but when faced with truly complex commercial gamesmanship capable of transcending social strata, top-tier medical diagnoses, or hardcore legal analyses, they can only offer vague nonsense.
This is not a dystopian fantasy from science fiction; it is the coldest, hardest reality and the future unfolding around us.
Throughout history, what has always been cheap is mere 'computation' and 'information.' Truly top-tier 'cognition,' however, has always been expensive—a privilege reserved for the few. Large models have not shattered this barrier; instead, they have used skyrocketing electricity meters and costly Token bills to erect this cognitive wall higher than ever before, making it even more imperceptible.
This is the chilling truth of our era lurking behind the current wave of cloud service price hikes.
- END -