05/15 2026
487
The ultimate test of AI capabilities lies in their application in the real world.
Authored by | BlueHole Business Yu Weilin
Early this year, the global tech community has been closely monitoring developments in China’s computing power landscape.
In January, Elon Musk remarked on a podcast that China’s AI computing capabilities “will far outstrip the rest of the world.” In February, OpenAI CEO Sam Altman commented that China’s technological progress in artificial intelligence is “astonishingly rapid.”
The year 2025 marks a period of consolidation on the supply side. Domestic GPU firms such as Moore Threads and MetaX have successively entered the capital markets, further solidifying the industrial foundation for domestic large-scale models. By 2026, these advancements will begin to permeate throughout the lower levels of the industrial chain. In late April, several domestic large-scale models unveiled new versions.
On April 20, Moonshot AI introduced the Kimi K2.6 model, which excels in generating long-form code. On April 24, DeepSeek V4 was released, followed by the open beta of Meituan’s LongCat-2.0-Preview. Both models boast total parameter counts exceeding one trillion and support 1M ultra-long contexts.
Notably, DeepSeek V4 completed its migration and adaptation from the NVIDIA ecosystem to Huawei’s Ascend platform. Meanwhile, Meituan’s LongCat 2.0 is a trillion-parameter large-scale model trained and inferred entirely on domestic computing power, utilizing 50,000 to 60,000 domestic computing chips.
For an extended period, Chinese AI practitioners have predominantly relied on existing, mature solutions. Now, domestic AI companies are beginning to forge their own paths.
Forging Paths in Uncharted Territory
How does one tackle a formidable challenge?
Science fiction author Arthur C. Clarke responded: “The only way to discover the limits of the possible is to venture beyond them into the impossible.”
The release timeline for DeepSeek V4 underwent multiple adjustments from its initial plan. External speculation suggests that one contributing factor was the need to migrate core code from NVIDIA’s CUDA ecosystem.
After more than a decade of refinement, the CUDA ecosystem has evolved into a robust, fully-equipped development platform. Migrating the code necessitated extensive reconstruction of the underlying framework by the development team.
Ultimately, DeepSeek succeeded. Two days after the release of V4, JPMorgan Chase noted in a report that V4’s successful adaptation to Huawei’s Ascend chips confirmed the viability of domestic computing power for cutting-edge AI inference. Furthermore, DeepSeek significantly reduced inference costs through underlying technological innovations, such as hybrid attention architectures.
DeepSeek achieved cost reduction and efficiency gains through a tech-savvy approach, completing a challenging migration by rewriting half of the large-scale model. Concurrently, Meituan’s LongCat-2.0-Preview, which opened for beta testing on the same day, ran directly on domestic computing power.
What are the engineering hurdles for domestic computing power? Let’s examine LongCat-2.0-Preview as a case study.
The first challenge is hardware-related. The memory capacity and bandwidth of domestic hardware foundations differ from those of NVIDIA chips. When training and deploying trillion-parameter models, the Meituan team faced significant engineering challenges, requiring substantial effort to debug parallel strategies and optimize memory usage.
The second challenge is the maturity of the software ecosystem. To ensure precise reproducibility throughout the training process, tailored to the characteristics of domestic chips, the team had to rewrite and optimize core operators and develop self-designed, fully deterministic operators.
The third challenge is the stability of the 10,000-card cluster. On an ultra-large-scale cluster utilizing 50,000 to 60,000 domestic computing cards, hardware failures are inevitable. To mitigate this, the team constructed a comprehensive fault-tolerance and automatic recovery system.
Finally, the team conducted targeted affinity designs in the training framework and model structure based on the characteristics of domestic hardware, breaking the adaptation limitations of generic frameworks and enhancing computational performance.
DeepSeek’s algorithmic optimizations lowered the barrier to computing power and reduced model costs, while Meituan’s engineering practices demonstrated the feasibility of domestic chips. These explorations have also amassed engineering capabilities and experience for the domestic chip ecosystem.
Liang Wenfeng once remarked, “We didn’t intentionally become a catfish; we just accidentally became one.” Now, the “catfish effect” is evident, and DeepSeek is not alone in this endeavor.
From Isolated Points to Integrated Systems
Tang Daosheng of Tencent Cloud once employed this metaphor: “Large-scale models are engines, and users are drivers.” Users readily notice the engine’s performance, but adept drivers recognize that fuel and chassis are equally crucial.
The development of China’s computing power hinges on the collaborative progress of the entire industrial chain. Core companies in each segment continue to strive for advancement.
On the manufacturing front, public data indicates that China’s chip output is steadily rising. Companies like SMIC and Hua Hong Semiconductor are advancing process technologies, such as multiple patterning, to find a balance at the physical limits.
On the computing power front, products like Huawei’s Ascend 910C demonstrate that massive model training can be achieved through ultimate cluster linear acceleration ratios.
“He who masters the ecosystem wins the world.” One key reason for NVIDIA CUDA’s formidable moat is its establishment of universal software and hardware compatibility standards.
Industry practitioners are cognizant of this. For instance, Cambricon launched a basic software platform compatible with mainstream frameworks to lower the migration barrier for developers. The open-source system led by the Beijing Academy of Artificial Intelligence (BAAI) has built a unified underlying interface, enabling upper-layer models to run on various domestic chips.
Domestic internet giants are also taking action. Baidu’s dual-track strategy and ByteDance’s multi-billion-dollar investments are seeking better solutions for the computing power foundation.
According to public data, in the past few years, Meituan has invested in at least 21 companies spanning semiconductors/intelligent hardware and general large-scale model fields. These include chip computing power layer companies like Moore Threads and MetaX, visual chip companies like Aixin Yuanzhi, and companies in niche segments like Guangzhou Zhongshan in new materials.
While maintaining long-term technological follow-up, industrial capital is also acting as an investor and co-builder of computing power, gradually forming a positive cycle.
From the Digital Realm to Real-World Applications
“Artificial intelligence is at a pivotal inflection point in its third wave. Large-scale models are propelling it from narrow AI toward artificial general intelligence (AGI). More significantly, they are driving robots from the era of 1.0 specialized robots to the 2.0 era of general-purpose embodied AI,” stated Wang Zhongyuan, Dean of the Beijing Academy of Artificial Intelligence, emphasizing that the key application of AI capabilities lies in the real world.
On one hand, numerous domestic companies are striving to enable large-scale models to “read ten thousand books” in the cloud, enhancing model intelligence and logical reasoning rigor. On the other hand, they are also making large-scale models “travel ten thousand miles.” For example, Baidu’s Wenxin large-scale model has been integrated into the decision-making system for autonomous driving, while Tencent’s Hunyuan large-scale model’s industrial quality inspection solutions have been deployed in multiple assembly line scenarios.
Meituan’s food delivery, in-store, and travel businesses constitute the most intricate task execution network in daily life, encompassing a vast array of real-world scenarios: from the speed of meal preparation in merchant kitchens to delivery routes for riders in heavy rain, and even a user’s late-night craving for “hot pot.”
Wang Xing has explicitly stated the goal of upgrading the Meituan app into an “AI-powered app” first. This means that LongCat’s training objectives extend beyond answering questions like “which restaurant serves the best stir-fried pork” to “finding the restaurant, selecting the best group-buying coupon, and reserving two seats at 7 PM on Friday evening.”
This underscores the importance of task delivery effectiveness and explains why Meituan emphasizes building an AI foundation for the real world.
From parameter enhancements to computing power optimization, domestic large-scale models are progressing from “usable” to “user-friendly.”
There are no shortcuts on this journey. In the future, as algorithms, computing power, capital, and scenarios continue to interact synergistically, the narrative of China’s AI will transition from “individual breakthroughs” to “systemic evolution.”