After Launching the First Trillion-Parameter Model, China’s Computing Power Surmounts Initial Challenges

04/30 2026 391

Individuals Born in the 1980s and 1990s Share Unique Memories of 'Gaming Cartridges'.

Three decades ago, gamers enjoyed classics like Contra and Super Mario via Nintendo-produced cartridges. Authentic cartridges, costing several hundred yuan each, were considered luxury items during that era. Over time, an unyielding principle emerged in the gaming industry: whoever controlled the hardware determined where the games could be played.

The landscape of the industry was later revolutionized by the rise of the PC gaming market—an ecosystem that bypassed gaming cartridges entirely and operated under a set of rules distinct from Nintendo’s. Sometimes, a new path emerges to replace an old one.

This history led those born in the 1980s and 1990s to realize that the gaming world is fundamentally digital, and no single entity could indefinitely monopolize an industry’s future through the scarcity of physical media.

Three decades later, the same generation—now in their 30s—is witnessing a familiar scenario. This time, however, the bottleneck isn’t gaming cartridges but computing power. The choice of chips and frameworks for running large models is similarly dictated by those who control computing resources.

First-tier tech giants like OpenAI, Google, and Anthropic wield top-tier computing cards, having developed GPT-4 and Claude. As export restrictions on high-end chips to China tighten, a trend of "GPU fetishism" has emerged as a notable industry phenomenon—where a company’s AI capabilities are often gauged by "how many cards it has."

On April 24, DeepSeek V4 was released, while Meituan opened beta testing for LongCat-2.0-Preview, adding two Chinese contenders to the "trillion-parameter club." Both models boast over one trillion parameters and support context windows of 1M tokens.

Notably, DeepSeek V4 successfully transitioned from NVIDIA’s CUDA ecosystem to domestic platforms like Huawei’s Ascend. Meituan’s LongCat-2.0-Preview was trained entirely on domestic computing clusters, utilizing 50,000 to 60,000 computing cards—making it the only trillion-parameter model trained on domestic hardware to date.

The simultaneous launch of these two domestic trillion-parameter models signals a shared goal: establishing a self-sustaining ecosystem independent of external computing resources.

When the "Breaking Point" of Rising Computing Costs Arrives

Products from companies like OpenAI, Google, and Anthropic have long operated at trillion-parameter scales, but entry into this "trillion-parameter club" implicitly requires alignment with NVIDIA’s ecosystem and GPUs.

These overseas giants sustain massive annual investments in computing power, guided by mature commercial logic. Maintaining computing power is treated as a fixed cost within depreciation cycles, with larger computing capacities yielding better models. Costs are passed on to users through token-based pricing models. When computing prices rise, domestic small-to-medium developers face even higher barriers.

In early April, this "breaking point" was finally reached.

Anthropic quietly revised Claude Enterprise’s pricing model. Previously, enterprise clients paid up to $200 per user per month for unlimited token usage—akin to an all-you-can-eat buffet. Now, a $20 monthly base fee applies, with additional charges based on actual computing consumption.

A co-founder of software licensing negotiation firm Redress Compliance stated bluntly that for heavy users, costs could double or even triple under the new model. A user previously covered by a $200 flat fee now faces higher expenses.

A global surge in computing costs is underway.

Public company financials reveal shrinking flexibility amid rising computing costs and usage demands. According to Zhipu’s earnings report, API call prices surged 83% in Q1 2026, while usage volume grew five times faster than price hikes.

Price and demand pressures are stifling domestic AI development. Relying on external computing supply chains means domestic AI firms must chase technical benchmarks while passively absorbing upstream price adjustments. In the AI Agent era, global computing growth lags behind soaring demand, necessitating a return to model innovation itself.

As rising computing costs render token price wars unsustainable, a new consensus emerges: industry competition shifts from computing price to computing-generated value.

In late April, domestic trillion-parameter models DeepSeek V4 and Meituan LongCat-2.0-Preview debuted. Meituan offered 10 million free daily tokens during LongCat’s beta, while DeepSeek slashed large model usage costs to unprecedented lows.

Take DeepSeek V4’s two variants: V4 Flash input caching costs as little as ¥0.02 per million tokens, while the full-performance V4 Pro costs just ¥0.025 per million tokens.

High-efficiency frameworks and model solutions have generated positive feedback post-launch.

One developer calculated over 70% cost savings using real-world data. Another user noted on social media that while DeepSeek previously generated four to five chapters of novel content, post-update output expanded to nearly twenty chapters, with improved accuracy in understanding user habits and prompts, yielding far superior quality.

Bidirectional Synergy Between Capital and Data

Architectural innovation marks the first step for domestic trillion-parameter models pursuing efficiency.

If model knowledge is likened to a library, traditional inference required searching all volumes at once. Under Meituan LongCat-2.0-Preview’s MoE (Mixture of Experts) architecture, models precisely locate relevant "bookshelves" and mobilize only the most pertinent "experts." This expands model capacity without linearly increasing computational costs per task, unlocking efficiency gains.

However, architectural optimization alone solves only the "cost-saving" problem. The difficulty of acquiring high-end chips remains the most tangible external constraint for domestic large models. DeepSeek-V4 Pro’s release schedule, for instance, was once delayed by high-end computing supply shortages.

For years, training top-tier large models defaulted to NVIDIA GPUs and CUDA ecosystems. Domestic AI firms faced limited choices: accept queuing and external control over supply, pricing, and schedules while exploring mid-scale models and Agent engineering; or build alternative infrastructure using domestic chips for end-to-end training and inference.

These challenges boil down to two factors: speed and infrastructure.

"Speed" refers to the direct constraints of memory capacity and bandwidth. Compared to mid-scale models, trillion-parameter models require exponentially greater parallel computing and memory usage during training.

"Infrastructure" denotes the software ecosystem for long-term training for clusters with tens of thousands of cards. Familiar frameworks like PyTorch, along with core operators and parallel tools, are built on CUDA-dominated ecosystems. Training on domestic chip clusters requires engineers to rewrite and optimize core operators for chip-specific traits while managing additional engineering complexities.

Take DeepSeek V4: it completed migration from NVIDIA’s CUDA ecosystem to domestic platforms like Huawei’s Ascend. Challenges included not just switching from CUDA’s instruction set to CANN but also precision alignment, communication mechanism reconstruction, and parallel strategy rewrites. In essence, DeepSeek created an industrial closed loop for domestic computing power at the codebase and training framework levels.

Meituan LongCat-2.0-Preview went further by training entirely on domestic computing power.

We learned that Meituan’s insistence on using domestic ten-thousand-card clusters represents the largest-scale training task completed with domestic computing resources to date. Training massive models on imperfect infrastructure initially posed significant engineering challenges.

Meanwhile, the team tailored training frameworks and model architectures to domestic hardware characteristics. Like optimizing fuel and gear ratios for a new engine, Meituan enhanced computational performance on domestic computing power.

This trial-and-error engineering capability not only produced LongCat-2.0-Preview but also, through Meituan’s open-source approach, provided real-world large-scale training samples and engineering feedback to mature the domestic chip ecosystem.

DeepSeek V4 demonstrated that domestic platforms can support migration of cutting-edge models, while Meituan proved domestic computing power can handle end-to-end training and inference for trillion-parameter models.

Notably, Meituan has invested in at least 14 domestic semiconductor firms, including Moore Threads and Maxel Technology, covering multiple niche market leaders.

Computing Power Should Not Be a "Barrier"

Recently, innovation across multiple paths continues unabated. Beyond the trillion-parameter race, some pursue breakthroughs in multimodal fusion, others refine lightweight edge deployments, and some focus on world models built from real-world datasets.

With domestic computing power increasingly validated, China’s AI industry no longer obsesses over replicating overseas paths. Players now believe that when computing autonomy, architectural innovation, and application scenarios form a closed loop, domestic AI can chart its own course.

Meituan, which launched LongCat-2.0-Preview, stands out due to its unique business foundations.

Wang Xing articulated a clear vision: building an AI foundation for the physical world. Local services like food delivery and group buying provide real-world data feedback, enhancing consumer experiences while improving merchant efficiency.

These scenarios offer long-term, continuous, and real-world testing grounds for domestic models and computing power. For example, chips must perform reinforcement learning and real-time inference under space constraints and low-power requirements in special scenarios.

More fundamentally, chips, frameworks, and even training/inference processes represent China’s unique AI building blocks. What transforms these components into a thriving ecosystem is cross-industry AI adoption.

Every line of code DeepSeek rewrote for the CANN framework, every "map" LongCat open-sourced, every domestic GPU tape-out by Moore Threads or Maxel, and every defect detected by Huawei’s Ascend chips on factory lines—all contribute to this ecosystem. Some rewrite framework operators, others optimize hardware tape-outs, and still others debug data on the front lines.

Without centralized coordination, their collective efforts form a complete picture. Though this puzzle may take years to finish, its completion will free China’s AI industry from the "computing barrier."

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.