In-Depth | Burning $100 Million a Day: The First 'Token Retreat' Has Arrived

06/05 2026 461

Recently, a story went viral in Silicon Valley: A company spent $500 million a month on Claude—equivalent to over RMB 3.3 billion, or more than $100 million burned daily.

Now, the entire AI community is trying to 'find' this company.

At the same time, overseas giants like Uber and Microsoft began slashing AI token budgets.

One reason for runaway bills: Claude's token-based pricing with unlimited access rights.

For the past two years, the AI industry operated under one consensus: AI must be cheaper than humans.

Companies worldwide went all-in on burning tokens. Programmers used AI for coding, operations for content, customer service for replies, designers for visuals—while laying off human staff.

But now, for the first time, an 'AI retreat wave' has emerged, slapping the 'All in AI' narrative hard. Is the AI bubble about to burst?

Let's examine some real money-burning figures:

1. Burning millions overnight.

A team conducting Multi-Agent experiments blew through $1 million in token fees in a single night.

2. Exhausting weekly budgets in 30 minutes.

One company redesigning a simple webpage saw AI continuously called for reasoning, burning through a week's budget in half an hour.

3. Monthly bills exceeding $1 million.

An overseas developer shared a bill: At OpenAI's official rates, monthly AI costs easily surpassed $1 million.

4. 50 million tokens daily with no idea how to spend them.

Many corporate executives receive over 50 million token budgets daily but 'don't know how to spend them' effectively.

These numbers are staggering, but more critical than the money burned is asking: Where is it going? Genuine needs or pure waste? Technological inevitability or management failure?

Li Cong, founder of Onion Group, dropped a bombshell: 'Many employees use company tokens to slack off or take private jobs. They work company hours while handling external development, design, or operations orders.' Developer productivity platform Entelligence.AI analyzed 2,444 companies and found only 18 cents of every $1 spent on AI tokens delivers actual user value, while 44 cents fixes AI-generated bugs, 27 cents goes to rework, and 11 cents to review friction. Data source: Entelligence.AI

Hu Yanping, Distinguished Professor at Shanghai University of Finance and Economics and founder of DCCI Internet Data Center, identifies three causes for token consumption: Inadequate optimization (simple tasks consume tokens), rising computational costs, and insufficient rigidity (rigid) demand in many scenarios.

Next, Qianbidao will analyze this emerging 'AI retreat wave' through four dimensions.

1. Who's stealing your tokens? Internal 'ghostwriters' and budget black holes in corporate AI bills.

2. The KPI curse: Why does tying AI usage to performance metrics increase losses?

3. Cost-saving strategies: Which companies treat tokens like 'foreign currency'?

4. Lucrative opportunities: Companies already profiting from token management.

- 01 -

Who's 'stealing' your tokens? Slacking, side jobs, and management failures

Li Cong, Founder of Onion Group

NASDAQ's first 'cross-border brand e-commerce' stock, specializing in cross-border retail, B-end supply chain, and overseas brand incubation

Uber's budget cuts don't surprise me. The issue isn't AI's value—it's massive token waste. Many employees use company tokens for personal gain.Amazon's internal KiroRank leaderboard created 'Tokenmaxxing'—employees ran pointless AI tasks to boost rankings, increasing computational waste before Amazon shut it down. Image source: Business Insider

Imagine working company hours while using allocated model quotas for external development, design, or operations orders—even running cross-border e-commerce or content matrices. This explains many Xianyu platform listings.

The root problem isn't AI itself but unclear corporate priorities. Many bosses haven't decided where AI should be applied.

Overseas giants now make AI usage a KPI, requiring daily token consumption or including usage rates in performance reviews. Employees burn tokens without direction, pursuing tasks unrelated to company benefits.

A friend at a major firm received a $50 million daily token budget but had no idea how to spend it productively. With KPIs looming, he had to burn it anyway.

Three scenarios dominate corporate token consumption:

1. Research (content research (research), data collection, information analysis)

2. Coding (the biggest consumer, especially in development departments)

3. Automated content production (mass-generating videos, graphics, live-streaming materials, and e-commerce operations)

The core issue isn't AI adoption but flawed evaluation methods. AI shouldn't be measured by KPIs but ROI (return on investment).

We only care if tokens convert to profits (or efficiency).

Allocate $100 million in tokens? Continue only if ROI-positive. Or if it significantly boosts efficiency—like increasing output from 3 to 30 articles weekly, or completing work for three people alone.

Tokens function like marketing budgets: Invest only when ROI-positive.After GitHub Copilot switched to usage-based pricing, some users saw bills jump from $50/month to $3,000/month.

Many companies make a critical mistake: Assigning premium models to high-salary employees. This rarely pays off.

A senior engineer earning $80,000 monthly paired with Codex sees limited efficiency gains due to ceiling effects and inherent resistance to AI.

But giving the most expensive model to a junior programmer with 2-3 years' experience yields dramatic results. AI instantly bridges skill gaps, compensating for all weaknesses.

The correct approach: Pair inexpensive talent with premium models, not expensive talent with even pricier models.

E-commerce companies (especially second-tier and private-domain social commerce) and branding firms most effectively implement this logic, achieving clear AI ROI.

We're a prime example. Our Amazon distribution team once had 50-60 designers and creators, plus video staff. Now only four remain.

Top designers previously created 20-30 high-quality images daily. Now graduates produce hundreds daily after 3-5 days' training, maintaining quality.

The apparel industry saw similar transformations. Photography, design, operations, and content production—once labor-intensive—now require just 10% of previous staff. Profits soar as fixed costs vanish.OpenClaw founder Peter Steinberger shared an OpenAI bill: A 3-person team managing 100 Codex agents burned 603 billion tokens with 7.6 million requests in 30 days, costing $1.3 million (Fast Mode). Image source: Peter Steinberger

This shift isn't just about efficiency—it restructures organizations. Operations and design roles merge into 'multi-role' positions managed by individuals plus AI agents.

How to achieve this? Two steps:

1. Build an enterprise AI infrastructure.

Create a unified internal AI operating system that distills employee skills into reusable 'skill packages.' Standardized processes like product listings, content generation, and logistics become AI-automated. Experience-based tasks like product selection use advanced reasoning models with corporate knowledge bases for AI-assisted decisions, minimizing human intervention.

2. Distill employee expertise into AI.

Transform human capabilities, experience, and methods into AI-callable 'skill packages.' When employees leave, their abilities remain. Overseas giants like Meta follow similar approaches in what we call AI EVO AI—continuous AI self-review and iteration.

This isn't speculation—it's happening. With economic downturns, human resources remain companies' largest cost. Many firms now minimize hiring and optimize existing staff, especially SMEs under RMB 1 billion annual revenue.

AI first reduces costs rather than boosting revenue. A realistic example: An employee's RMB 10,000 salary costs the company RMB 13,000-14,000. Generating RMB 14,000 in net profit is hard, but replacing 10 workers with AI might be easy—it's just a matter of timeline and substitution rate.

Tasks for 10 people now require 1 person plus 9 AI agents. Companies focus on efficiency profits (per-capita output), which will significantly rise in AI-adopting firms for the foreseeable future.

New opportunities emerge in AI infrastructure R&D/services, enterprise AI transformation, and even AI intermediary services and computational power export businesses.

Since OpenAI, Claude, and other top models face payment and account barriers in China, intermediaries offer overseas account recharges, token sales, and relay services.

Despite regulatory concerns, this represents not just short-term arbitrage but a long-term track. It resembles early trading firms—importing overseas goods then, now importing overseas tokens.

This sector grows rapidly as Chinese demand for top models rises while overseas models struggle to fully enter China.

- 02 -

Burning millions overnight: AI's budget black hole arrives

Cui Wei, CEO of Shenxing Intelligence, AI consultant with algorithm/system research experience, PhD in Electrical Engineering from Tsinghua University, and master's supervisor at Zhejiang University's School of Economics

When Uber, Microsoft, and others cut AI token budgets, many wonder: Is AI failing? Quite the opposite.

The real issue isn't AI's value but companies' overly optimistic token projections last year. Many now realize: AI isn't always cheaper than humans.

Over the past year, corporate attitudes toward AI have evolved through three phases:

1. Skepticism ('AI is unreliable')

2. Enthusiasm ('All in AI')

3. Accountability ('Show me the numbers')

Now, companies face runaway AI bills.Foreign media reports token prices have risen ~65% since late February 2026. Goldman Sachs' One-Delta department notes AI trading's core variable shifted from 'technical feasibility' to 'cost affordability.'

A few days ago, a team conducted a Multi-Agent (multi-AI collaboration) experiment and burned through millions of yuan overnight. Some overseas developers have also shared their bills publicly: Calculated at OpenAI's official prices, AI costs can reach millions of dollars per month.

When many bosses see this figure, their first reaction is: 'Why is AI more expensive than employees?' But this is the reality, especially in R&D departments.

Because the biggest Token burner isn't writing copy—it's programming. Code generation essentially expands exponentially. You might write an article that's a few thousand words long, but programs are different. They continuously generate context, debug, reason, call Agents, and rerun repeatedly.

Often, a task will be running smoothly when suddenly, the Tokens explode. The worst part is: You can't predict it in advance. The real issue for many companies now isn't that they 'can't afford it'—it's that they don't know how much they'll end up spending. You might just be modifying a webpage and burn through a week's budget in half an hour.

This situation is strikingly similar to what happened in the cloud computing industry.

Back then, everyone said: 'Cloud computing is cheaper—no need to buy your own servers.' But many companies later discovered: Long-term cloud usage was more expensive than building their own private clouds.

So many companies started moving back to private deployments. Now, a similar trend is emerging in the AI industry. More and more enterprises are realizing: AI isn't as simple as 'just connecting an API.' It has evolved into a new financial system.

I've even been researching a potential new role lately: FinAI. Think of it like FinOps (cloud cost management) in the cloud computing era.

What does that mean? Simply put: One of the most critical future capabilities for enterprises won't just be 'whether they can use AI'—but whether they can control AI costs. For example: Which departments should use top-tier models? Which tasks only need 70%-good models? Which jobs justify burning Claude? Where is it completely unnecessary?

Because right now, many companies' biggest problem is: 'Using expensive models everywhere.'

For instance, many employees who only need ordinary models jump straight to Claude, Codex, or GPT's premium packages. Bosses initially think: 'Fine, it improves efficiency.'

But a few months later, when finance sees the bill, they're stunned. Many companies currently lack any AI usage guidelines. Who can use which models? In what scenarios can premium Tokens be burned? Which tasks must have restricted calls? Most companies have no rules at all.

As a result, AI usage gradually spirals out of control. My personal judgment: Right now, 70% of AI budget overruns are management issues, and only 30% are technical problems.

Management issues include: Poor KPI setting, employee abuse, model misselection, lack of ROI assessment, and no cost constraints. This is especially evident in R&D departments. Programmers are heavy AI users, and programming is one of the most Token-intensive tasks.

Over the past few years, many companies have been laying off programmers to embrace AI. But now some are realizing: If everyone operates at Claude Code levels, costs could be even higher than before. Because AI isn't a one-time fee—it's an 'all-you-can-drink' model that gets more expensive the more you consume.

There's an even trickier problem now: AI costs are highly uncertain. Today's large models can't yet 'complete a task within a set budget.'

They can only tell you when Tokens are depleted, not precisely how much a task should cost. You rarely know in advance. Sometimes a rogue Agent can burn through your entire budget.

That's why many overseas companies are now controlling Token usage. It's not that they're abandoning AI—they're realizing AI needs financial discipline.

Of course, in the long run, I believe Token prices will decline because computing power is ultimately infrastructure. Right now, it's expensive mainly due to GPU shortages, rising storage costs, and insufficient computing supply. But this phase won't last forever.

Especially with Chinese companies aggressively expanding production these past two years. Many industry forecasts suggest that by around 2028, computing and storage capacity will significantly increase. But here's an interesting phenomenon: Even if Token prices drop, total enterprise spending might not decrease—because usage will skyrocket.

This is the classic 'Jevons Paradox.' Lower costs lead to higher consumption. Previously, companies might have called AI a few times a day. In the future, every system, employee, and process might have an Agent attached.

Total Token consumption could explode even further. So what really matters isn't whether Tokens get cheaper—but whether companies can build their own AI financial models. For example, only burning premium models for the most critical problems.

Let me mention one more trend that's been extremely hot lately: Token resale platforms. Many domestic companies desperately need access to overseas top-tier models like Claude and GPT. But account setup, payment, and real-name verification are all barriers. So a flood of 'resale services' has emerged. Essentially, they profit from the 'usage threshold ' (usage barriers).

A $100 overseas API quota might sell for double domestically—and people still buy it because they can't access it themselves. This market is profitable now but extremely risky. These resellers face: Overseas platform account bans, data risks, compliance issues, and domestic regulatory problems. Many resale platforms might disappear overnight.

That's why I believe the truly valuable long-term plays aren't simple 'account reselling' but platforms like OpenRouter. They're not just reselling—they're doing model scheduling, optimization, and acceleration. Think of them as the 'cloud operating system' for the AI era.

The entire AI industry is now entering a critical inflection point. Over the past year, the competition was about 'who burns Tokens faster.' Next, it will be about 'who manages Tokens better.' And that might mark the true start of commercialization.

-  03  -

Rising Token Costs Actually Saved AI

Li Di, Founder of Tomorrow's Journey

Tomorrow's Journey focuses on multi-agent systems and has secured two rounds of financing

Insufficient Tokens are an inevitable transitional challenge.

Whether for coding or today's popular FDE (Frontline Developer Engineers), AI in enterprises primarily handles two things: First, organizational collaboration; Second, point productivity. Many current issues revolve around these two areas.

Let's start with point productivity. Take programming—AI is already highly effective at the repository level. If you strictly limit tasks to this scope, ROI often makes sense and can even be extremely cost-effective.

You'll find AI already proves more efficient than traditional methods for simple and medium-complexity tasks. But as complexity increases, Token consumption grows exponentially.

The second, bigger issue is collaboration.

Many people approach AI like clients: 'I want A, B, C, and preferably D too.' But they haven't truly clarified their needs. Since AI's collaborative capabilities are still immature, such vague requests lead to massive rework and wasted efforts—all of which burn Tokens.

The third problem lies in the models' technical approaches.

Take ultra-long contexts. Many models now support million-word contexts. Technologically, this is progress—but from a cost perspective, it's enormous waste. The entire model layer, including Harness layers, desperately needs optimization.

Another reason is that many people misjudged how quickly inference costs would decline.

In 2023, many believed model inference costs would plummet rapidly. But reality proved otherwise. As technology advances, new models continuously fill the cost savings from older ones.

I believe AI will eventually reach the point where 'inputs are less than outputs,' but we're still in transition. This direction is already very close to true commercialization.

Today's AI-using enterprises fall into two categories.

The first category is traditional enterprises. Their AI adoption follows a 'cost-saving and efficiency-improving' logic. They already have mature business processes and organizational structures—now they're inserting AI to optimize existing workflows.

For example, a consulting firm might have needed many data analysts to organize client data. Now AI can do it faster and cheaper—that's cost-saving.

Another example: Previously, a consulting team could serve at most 50 clients. If clients doubled, the team had to double too. But with AI, client expansion doesn't necessarily require proportional team growth—that's efficiency improvement.

This is how most companies use AI today.

But the second category is what I call AI Native super-organizations. From day one, they don't embed AI into old processes—they redesign entire organizational structures and business logic according to AI's characteristics.

Take traditional media as an example. The old logic was: Continuously collect information, analyze trends, combine with the outlet's aesthetic and viewpoint, then produce content.

AI Native organizations might operate differently. Instead of 'periodic data collection,' they use Agents for continuous updates—perhaps scanning industries every 3-5 minutes. Insights shift from 'intermittent' to 'continuous.'

This change is massive. It spawns new product forms, user experiences, even business models.

Now consider organizational structure.

Traditional enterprises follow classic HR/finance/operations structures with centralized departments. AI Native organizations might abandon this entirely.

For example, HR functions could be decentralized to business teams. Hiring teams generate their own JDs, screen candidates, conduct initial interviews—with AI handling intermediate processes. You'll find this is no longer the old centralized structure.

Thus, AI Native companies aren't about cost-saving and efficiency—they're creating new species.

Many companies are desperately transforming into AI Native organizations, but employees find model products still have massive issues in practice, leading to intentional or unintentional waste. While I lack precise statistics, I believe this waste accounts for at least 30%+ of usage.

When we build multi-agent systems, we prefer serving AI Native organizations.

We don't want to force new technologies into bloated, effective but extremely outdated organizational structures. We'd rather collaborate with companies willing to Refactoring (reconstruct) business logic.

Take a content platform that initially rejected AI-generated content. Today, it faces a dilemma: Embrace AI and let AI creators flood in, disrupting the ecosystem; or reject AI and risk obsolescence.

I often say this resembles the difference between 'suicide' and 'being killed by others.' Not embracing AI might be suicide; embracing it risks being 'killed' by AI creators.

But overall, 'being killed by others' might be better. At least you're actively changing and have a chance to survive this tech cycle.

Will companies abandon AI because Tokens are too expensive?

My answer: Yes.

Because Token costs appear directly on monthly financial statements. If there's no effect this month, they'll slam the brakes next month.

This is actually a positive signal. Such pressure will force the industry into true refinement (refined) management.

For example: Using smaller-parameter models to achieve near-large-model results; shifting more inference to edge devices instead of clouds; not calling top models for every task; implementing more complex, granular task allocation at the Harness layer.

All these will be driven by today's cost pressures.

In 2023, many investors told me: 'All businesses should use the best models.'

I immediately said that wasn't scientific.

For model companies, I've always believed the future lies not in infinitely pursuing larger models, but in miniaturization—whether small models can also achieve emergent capabilities.

Enterprises, meanwhile, are becoming increasingly pragmatic. Many are only now feeling Token costs not because waste just started, but because model companies had been subsidizing these costs before.

Today, companies shouldn't ask 'Should I chase AI hype now?' but rather: 'How should I prepare for future AI Native organizations?'

As for opportunities, I see two critical directions.

The first is swarm intelligence.

Its greatest appeal is that it doesn't rely on how smart a single agent is—but on the emergent wisdom from multiple agents collaborating.

It naturally offers two advantages: Greater robustness and higher ROI.

Human society itself is a swarm intelligence structure.

Now even OpenAI is investing in this direction. Domestic players like Kimi and MiniMax are already developing related products. When we discussed swarm intelligence last year, many found it abstract—but I believe industry consensus is now forming.

The second direction is edge computing.

In the internet and mobile internet eras, cloud services were optimal due to declining marginal costs.

But AI is different—inference costs can't be ignored. Thus, a 'cloud + edge' rebalancing will occur, shifting increasingly toward edge devices.

This article does not constitute any investment advice.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.