04/30 2026
518

GPT-5.5 Competes with V4 on the Same Day: Coding Becomes AI's Sole Ballast
Today, OpenAI releases GPT-5.5, while DeepSeekV4 preview debuts as open-source.
OpenAI announces GPT-5.5, starting with the official statement: 'Our most intelligent model yet.' Among all capabilities, OpenAI highlights Agentic Coding: achieving 82.7% on Terminal-Bench 2.0, which tests complex command-line workflows, and 58.6% on SWE-Bench Pro, which evaluates real-world GitHub issue-solving abilities.
On the same day, DeepSeek's first highlighted capability is also Agent and Coding. According to public information, DeepSeek-V4-Pro has become the Agentic Coding model used internally by company employees, with special adaptations and optimizations for mainstream Agent products like Claude Code and OpenClaw, showing improved performance in code tasks and document generation.
Behind this coincidence, the AI industry has, in one year, narrowed its path from 'doing everything' to 'focusing on Coding.' Most analyses of this competition stop at superficial explanations like 'the programming market is large' or 'developers have strong willingness to pay,' but this fails to explain why OpenAI and DeepSeek chose Agentic Coding to define their flagship products on the same day.
Technical advantages established in the Coding track automatically translate into advantages across the entire Agent ecosystem. This technical premise also explains why contradictions at the business model level have erupted at this juncture. The Coding Plan was initially designed for the usage intensity of the Chatbot era, but the invocation patterns of the Agent era have fundamentally invalidated this pricing logic.
On March 23, MiniMax led the way by announcing the upgrade of its Coding Plan to a Token Plan. Subsequently, Alibaba Cloud removed the Coding Plan entrance from its BaiLian platform; Zhipu's unlimited weekly old packages ceased renewals; GitHub suspended new user registrations for its Copilot Pro series and removed Claude Opus from Pro.
The synchronized nature of these moves stems from the same structural contradiction triggered by Agent usage patterns: fixed monthly fees encountering unlimited computational consumption.
The shift from Coding Plan to Token Plan, superficially a transition from subsidies for user acquisition to usage-based pricing, in essence (here use "in essence" for better flow) marks the industry's shift from 'burning money for market entry' to 'establishing sustainable business models.' For cloud providers, this is a return to their most familiar business; for the industry as a whole, this round of AI Coding competition has completed its first shakeout at the business model level.
Why Coding Became the Commanding Height of the Agent Era
Video generation was once seen as AI's most imaginative application direction, but computational power ultimately failed to pay for imagination. In March, OpenAI announced the shutdown of Sora, terminating its $1 billion strategic partnership with Disney. During the same period, Google co-founder Sergey Brin urgently assembled an internal strike force specifically targeting AI Coding, demanding the team 'pivot decisively.'
According to The Information, this team's roster even included DeepMind's CTO, with a single goal: to reclaim the high ground of AI Coding. On April 21, Musk's SpaceX announced the acquisition of Cursor for $60 billion.
The logic of value creation in the internet era revolved around traffic, conversion rates, and ARPU, culminating in advertising fees or subscription revenues, with a ceiling set by total user time and ad budgets. The Agent era follows a starkly different logic: task value, completion rates, and take rates, culminating in substituted labor costs, with a ceiling set by the global total of white-collar wages. The gap between these two logics directly drives resources toward Coding.
Coding is one of the few application scenarios that simultaneously satisfy 'high frequency' and 'high complexity.' The reality for most AI products is that users may find them novel once but won't use them daily—scene frequency determines the upper limit of stickiness. Programming is different. Professional coders work eight hours daily in IDEs, with debugging, refactoring, documentation, and code reviews all representing potential AI intervention points, resulting in inherently high invocation frequencies.
Moreover, the value of code can be precisely measured. Whether a code segment runs successfully or not, whether a function is implemented or not—these are binary outcomes with no 'almost' ambiguity. This means developers are willing to pay far higher prices for AI programming tools than for other AI products because they replace quantifiable labor costs, with ROI calculations being direct and transparent for enterprises.
As Zhu Guangxiang, general manager of Baidu's Miaoda product, noted last year, the value of Chatbots lies in answering and communication, while Coding directly generates final applications and solutions, relating to all aspects of research, production, supply, sales, and service. 'It's about productivity, creating new demand value and space, so the potential is greater.'
However, in
Anthropic, the instigator of this 'Coding craze,' transformed this judgment from theory into a capital-priced reality. Claude Code officially launched in May last year and reached $2.5 billion in ARR by February 2026, growing faster than Salesforce and Slack in their early stages and surpassing Cursor's revenue scale achieved over two years in less than one year.
SemiAnalysis estimates that approximately 4% of public code commits on GitHub are now completed by Claude Code; at this trend, Claude Code's share of GitHub's daily public commits could exceed 20% by the end of 2026.
More persuasive is the company-level comparison: by the end of 2025, Anthropic's annual revenue was $9 billion, while OpenAI's was $21.4 billion—more than double. However, just four months later, Anthropic's ARR surged to $30 billion, surpassing OpenAI's $25 billion ARR disclosed in February.
Domestically, this recognition spread with a noticeable time lag. A group of large model startups pivoted earlier and more nimbly than big tech firms. Two months after Claude Code's birth, Kimi K2 launched as open-source, positioning Coding plus Agent as the model's main axis, with Zhipu following suit.
By early 2026, the first-mover advantages of these early adopters began to manifest. Zhipu has raised prices three consecutive times since releasing GLM-5, yet demand still outstrips supply. CEO Zhang Peng stated during earnings calls that invocation volumes have grown 400%. Less than a month after the release of Moonshot AI's K2.5 large model, cumulative revenues exceeded the total for all of 2025.
Big tech's pivot came later but with greater magnitude.
Guo Daya is a top talent in code intelligence and large model reasoning. During his time at DeepSeek, he deeply participated in research for models like V3, R1, Coder, and Math, proposing the GRPO algorithm with his team in DeepSeek-Math, later applied to DeepSeek R1's training. One reason for his departure from DeepSeek was the relatively low priority of Agents internally at the time, whereas he strongly favored this direction.
With Guo's joining, Seed is initiating organizational integration around Agents and Coding. This move signifies not just a public strategic commitment to the Coding plus Agent direction but also ByteDance's judgment on the next competition dimension through its talent structure.
The Inevitability of Coding Plan's Collapse
The collapse of the Coding Plan represents a business model whose inherent contradictions, embedded from its inception, were forced to surface after Agents altered consumption structures.
The subscription model's foundational assumption is that a platform's true costs are far lower than its listed prices, as most users pay but don't fully utilize services—light users' subscription revenues cover heavy users' service costs, maintaining controllable gross margins overall. This logic held during the SaaS era because software's marginal delivery costs approached zero; adding users didn't significantly increase costs.
The Coding Plan Continued it (here use "continued" for clarity) SaaS pricing logic but applied it to a scenario with fundamentally different underlying economics. When usage patterns remained at the 'code completion' stage, this contradiction could be ignored. Traditional code completion involves single requests: users input a few characters, and the model returns a completion segment, with controllable Token consumption.
The Agent model is entirely different. A complex task involves planning, decomposition, multi-step execution, parallel subtasks, result verification, and error retries— Connect them together (here use " Connect them together " literally translated as "strung together" or more naturally "when combined") Token consumption is dozens or even hundreds of times higher than traditional completion. GitHub put it bluntly in its official blog: Long-running, parallelized Agent sessions far exceed the resource limits supported by the original plan architecture.
Additionally, the Coding Plan had an underestimated cost issue: the integration of Agent frameworks like OpenClaw systematically disrupted cloud-side cache hits. In normal programming scenarios, context is highly coherent, with cache hit rates typically reaching 85-90% or higher; many Claude Code users even maintained stable 90%+ hit rates. Cache hits usually cost one-tenth of normal input prices, making actual computational costs far lower than estimates based on full-price inputs.
OpenClaw-like frameworks operate differently. Their request prefixes vary continuously due to version numbers, build times, and A/B testing variables, making them highly unstable and drastically reducing cache hit rates. Consequently, while all users pay the same fixed monthly fee, the Coding Plan's actual costs vary wildly depending on the type of integrated framework.
Zhipu's response trajectory clearly illustrates how this contradiction escalated from controllable to uncontrollable. The GLM Coding Plan's unlimited weekly old packages ceased automatic renewals on April 30, with platform announcements admitting: 'With sustained usage growth, the original supply model for old packages can no longer support long-term stable services.' Affected early subscribers received two months of new package benefits as compensation.
This represented a cost-pressure-triggered passive exit rather than active product iteration. Meanwhile, Zhipu restricted Coding Plan usage scenarios to AI coding and IDE tools, explicitly excluding general Agent scenarios like OpenClaw—a restriction that itself highlights the problem's core.
The speed of this collective tightening exceeded most predictions. An industry insider described it as: 'Within a quarter, from subsidized user acquisition to collective tightening—much faster than I expected.'" ,"OpenAI adopted a different strategy in this competition. Sam Altman announced in early April that Codex reached 3 million weekly active users, then reset all package usage limits, promising to reset again for every additional 1 million users. Community users reported experiencing four quota resets within 10 days. Plus users enjoyed 10x usage during promotional periods, Pro users 2x, but promotions ended May 31, with future strategy adjustments still unknown.
Codex lead Tibo stated on X that OpenAI has sufficient computational power and advanced models to support Codex operations. This aligns with
ByteDance's Volcano Engine Coding Plan has maintained relatively stable operations among similar products. However, this exception has context: ByteDance's self-owned computational infrastructure operates on a different scale than startups like Zhipu, and according to
For Volcano Engine, the Coding Plan simultaneously serves strategic functions of locking in developer ecosystems and acquiring training data, with short-term cost pressures offset by the longer-term value of data assets. However, this represents a special circumstance for large-scale computational infrastructure holders, not a broadly replicable industry path.
The endgame is paying for results
Replacing the Coding Plan with the Token Plan is merely the halftime whistle in this competition.
The biggest contradiction of the Coding Plan is its fixed revenue versus fluctuating costs. Once model capabilities iterate or user habits change, costs can soar while revenue remains unchanged. The Token Plan is the best way to eliminate this contradiction, where the platform's gross margin is determined by the difference between the Token unit price and inference costs, both of which can be precisely controlled and predicted.
From a commercialization perspective, the Coding Plan is essentially a subsidy strategy, offering prices below cost to foster user habits and accumulate training data. Switching to the Token Plan at this juncture signifies that the industry believes the phase of subsidizing for market share has ended, and there is sufficient room between users' willingness to pay and actual usage value to support a sustainable business model.
Luo Fuli played the role of an industry pricing 'whistleblower' in this process, advocating against blindly competing on price by selling Tokens at extremely low prices while opening doors to third parties before figuring out how to price Coding solutions without causing capital losses. This may seem attractive to users, 'but it's a trap—the very one Anthropic just escaped.'
According to Tencent Technology, Xiaomi's MiMo large model's Token Plan exemplifies this trend. During a two-week free promotion, MiMo-V2-Pro on OpenRouter saw weekly Token consumption exceed 4 trillion, with programming market share once surpassing 30%. However, after the free period ended, weekly usage declined from its peak, confirming that conversion rates from free to paid remain a common challenge for all large model companies.
On the launch day of MiMo-V2-Pro, Lei Jun personally announced subscription plans priced at 659 RMB/month for the Max tier and 100 USD/month internationally, directly targeting Anthropic's Claude Max 5x package. Luo Fuli later explained the rationale in public statements: the Token Plan supports third-party framework integration but bills based on Token quotas, ensuring users pay only for what they use, avoiding the 'wool-pulling' cost reversals seen in subscription models.
For cloud providers led by Alibaba Cloud, this shift holds another layer of significance: returning to their most familiar business. Maintaining a subscription service with sustained economic pressure requires constant operational efforts to offset structural losses—a non-core area for cloud vendors. In contrast, Tencent Cloud and Alibaba Cloud have sold computing, storage, and CDN traffic packages for over a decade, boasting complete infrastructure for metering, billing, prepaid/postpaid settlement, and usage quota controls. Simply replacing 'CPU core-hours' or 'GPU hours' with 'Tokens' allows seamless integration of the entire system.
The Token Plan also aligns incentives for innovation more rationally. Under the Coding Plan, launching stronger models increases inference costs without boosting subscription revenue, effectively penalizing technological progress through pricing mechanisms. With the Token Plan, stronger models encourage higher Token consumption, generating more revenue and creating a virtuous cycle: better models drive higher usage, which produces more revenue, which funds further R&D investment. This resolves a fundamental incentive mismatch left unresolved by the Coding Plan.
Current public discourse around the Token Plan reflects confusion about the transition itself, but this is inherently a temporal issue rather than a directional one. Cursor, an early player in Coding Agents, shifted from per-use to usage-based billing about a year earlier than most Chinese vendors. Last year, Cursor transitioned from per-session to volume-based pricing and this year introduced an Ultra tier (200 USD/month). This validates that as Agent usage intensity rises, pricing model evolution becomes inevitable.
For the Chinese market, OpenClaw's local explosion compressed this timeline dramatically. What might have been a two-year industry transition now occurs within a few quarters. The cost of this compression is that many vendors lack time to design transition plans, forcing passive responses that disrupt legacy user experiences—as seen in Alibaba's and Zhipu's package migrations, which included user compensation schemes.
However, from a longer-term perspective, The New Stance argues that the Token Plan represents only an intermediate stage in AI Coding competition, not the final state. The future ideal model is paying for results, just as ride-hailing eliminates concern for liters of gasoline consumed—using AI to solve problems shouldn't require tracking Token usage.
The current essence of Token-based billing is pricing 'compute usage rights,' purchasing opportunities for models to 'think' once on behalf of users, without guarantees about depth, quality, or problem resolution. As Chapter 1 stated, in AI Coding scenarios, 'results' can be precisely defined: did the code run? Were bugs fixed? Were features implemented? Once these outcomes can be reliably measured, result-based pricing becomes technically feasible.
At that point, 'Token efficiency' will become a formal evaluation metric for model capabilities, as fewer Tokens consumed for equivalent results imply higher gross margins under fixed result pricing. GPT-5.5's release data provides a forward-looking reference here. OpenAI emphasized that GPT-5.5 uses fewer Tokens for equivalent Codex tasks, listing this as a core capability alongside 'higher accuracy.'
DeepSeek V4's announcement similarly noted that its new attention mechanism 'significantly reduces computational and memory requirements compared to traditional methods.' Both of the day's strongest models promoted computational efficiency as flagship capabilities, redefining standards for 'better models.'
Future Coding Agent competition will evaluate efficiency and capability as two metrics on the same scorecard. *The featured image and in-text illustrations are sourced from the internet.