When DeepSeekV4 and Meituan LongCat Simultaneously ‘Surpass One Trillion Parameters,’ What Signals Are Being Sent?

05/01 2026 478

Chinese AI enterprises are beginning to forge their own paths.

Written by | BlueHole Business Yu Weilin

At the start of this year, the overseas tech community has been abuzz with discussions about China's computing power.

In January, Elon Musk stated in a podcast that China 'will far outpace the rest of the world' in AI computing power. In February, OpenAI CEO Sam Altman remarked that China's technological advancements in artificial intelligence are 'astonishingly rapid.' Nvidia CEO Jensen Huang has also publicly stated on multiple occasions that 'restricting China's AI technology will only hasten its independent research and development.'

The year 2025 can be seen as a pivotal year for mobilization on the supply side. Domestic GPU companies such as Moore Threads and Muxi Integrated Circuit (Muxi Corporation) have successively entered the capital market, further solidifying the industrial foundation for domestic large models. In 2026, changes are rippling downstream in the industrial chain. In late April, several domestic large models unveiled new versions.

On April 20, Moonshot AI introduced the Kimi K2.6 model, which excels in long-form code generation. On April 24, DeepSeek V4 was launched. Shortly after, Meituan's LongCat-2.0-Preview opened for testing. Both models boast total parameter scales exceeding one trillion and support 1M ultra-long contexts.

It's noteworthy that DeepSeek V4 has successfully migrated and adapted from the Nvidia ecosystem to Huawei's Ascend platform. Meanwhile, Meituan's LongCat 2.0 is a trillion-parameter large model entirely trained and inferred on domestic computing power, utilizing 50,000 to 60,000 domestic computing chips.

For a considerable period, the prevailing approach among Chinese AI practitioners has been to leverage existing, mature solutions. Now, however, domestic AI enterprises are beginning to chart their own courses.

Building Roads in the Wilderness

How does one accomplish a formidable task?

Science fiction writer Arthur C. Clarke's response is: 'The only way is to make the impossible itself the starting point for progress.'

The release timeline for DeepSeek V4 underwent multiple adjustments from its initial plan. External speculation suggests that one reason was the necessity to migrate core code away from Nvidia's CUDA.

The CUDA ecosystem, honed over more than a decade, is a robust and well-equipped development platform. In contrast, the domestic computing power ecosystem is still in its nascent stages of development. Migrating the code meant that the development team had to undertake extensive reconstruction of the underlying framework.

Ultimately, DeepSeek succeeded. Two days after the release of V4, JPMorgan Chase highlighted in a report that V4's successful adaptation to Huawei's Ascend chips validated the feasibility of domestic computing power for cutting-edge AI inference. Furthermore, DeepSeek significantly reduced inference costs through underlying technological innovations, such as hybrid attention architectures.

DeepSeek achieved cost reduction and efficiency enhancement through a tech-savvy approach, completing a challenging migration by rewriting half of the large model's workload. Meanwhile, Meituan's LongCat-2.0-Preview, which opened for testing on the same day, operates directly on domestic computing power.

What are the engineering challenges associated with domestic computing power? Let's take LongCat-2.0-Preview as an example.

The first challenge lies at the physical level. The memory capacity and bandwidth of domestic hardware foundations differ from those of Nvidia chips. When training and deploying trillion-parameter models, the Meituan team encountered significant engineering hurdles, requiring substantial effort to debug parallel strategies and optimize memory usage.

The second challenge pertains to the maturity of the software ecosystem. To ensure precise reproducibility throughout the training process, tailored to the characteristics of domestic chips, the team had to rewrite and optimize core operators, as well as develop proprietary fully deterministic operators.

The third challenge is the stability of the 50,000-60,000 card cluster. Hardware failures are inevitable in such a massive cluster of domestic computing cards. To mitigate this, the team constructed a comprehensive fault-tolerance and automatic recovery system.

Finally, the team conducted affinity designs tailored to the characteristics of domestic hardware in the training framework and model structure, breaking through the adaptation limitations of generic frameworks and enhancing computational performance.

DeepSeek's algorithmic optimizations lowered the barrier to computing power and reduced model prices. Meituan's engineering practices demonstrated the feasibility of domestic chips. These explorations have also accumulated engineering capabilities and experience for the domestic chip ecosystem.

Liang Wenfeng once remarked, 'We didn't intend to become a catfish; we just accidentally became one.' Today, the 'catfish effect' is already apparent, and DeepSeek is not alone in this endeavor.

From Individual Points to Systems

Tencent Cloud's Tang Daosheng once employed this metaphor: 'Large models are engines, and users are drivers.' Users readily notice the engine's performance, but adept drivers recognize that fuel and chassis are equally crucial.

The development of China's computing power hinges on the collaborative progress of the entire industrial chain. Core enterprises in each segment are continuously addressing shortcomings.

On the manufacturing front, public data indicates that China's chip output is steadily increasing, but it follows a 'dumbbell' structure, with mature processes above 28nm dominating and advanced processes at 14nm and below still scarce.

Faced with the absence of EUV lithography machines, companies like SMIC and Hua Hong Semiconductor are advancing process innovations, such as multiple patterning, to find a balance at the physical limits. Multiple reports suggest that SMIC's N+2 process (equivalent to 7nm) has achieved a yield rate exceeding 80%, indicating it has crossed the threshold for commercial mass production.

On the computing power front, domestic chips still lag behind Nvidia in single-card performance. However, products like Huawei's Ascend 910C demonstrate that massive model training can be achieved through ultimate (extreme) cluster linear acceleration ratios.

'He who masters the ecosystem wins the world.' One key reason for Nvidia CUDA's strong moat is its establishment of universal software and hardware compatibility standards.

Industry practitioners are cognizant of this. For instance, Cambrian launched a basic software platform compatible with mainstream frameworks to lower the migration barrier for developers. The open-source system led by the Beijing Academy of Artificial Intelligence (BAAI) has built a unified underlying interface, enabling upper-layer models to run on various domestic chips.

Domestic internet giants are also taking action. Baidu's dual-track strategy and ByteDance's tens of billions in investment are seeking better solutions for the computing power foundation.

According to public data, Meituan has invested in at least 21 companies covering semiconductors/smart hardware and general-purpose large models. These include chip computing power players like Moore Threads and Muxi Integrated Circuit (Muxi Corporation), as well as visual chip companies like Axera Intelligence. They also encompass niche sectors such as new materials, including Guangzhou Zhongshan and Oriental Computing Core.

While maintaining long-term technological follow-up, industrial capital is also acting as an investor and co-builder of computing power, gradually forming a positive cycle.

From the Digital World to Real-World Tasks

'Artificial intelligence is at a critical inflection point in its third wave. Large models are propelling it from weak AI toward artificial general intelligence (AGI). More crucially, they are driving robots from the era of 1.0 specialized robots into the 2.0 era of general-purpose embodied AI,' said Wang Zhongyuan, dean of the Beijing Academy of Artificial Intelligence, highlighting the significant landing point of AI capabilities in the physical world.

On the one hand, numerous domestic vendors are striving to enable large models to 'read ten thousand books' in the cloud, enhancing their intelligence and logical reasoning rigor. On the other hand, they must also allow large models to 'travel ten thousand miles.' For example, Baidu's ERNIE Large Model has been integrated into the decision-making system for autonomous driving. Hunyuan Large Model's industrial quality inspection solutions have been deployed in multiple assembly line scenarios.

Meituan's food delivery, in-store, hotel, and travel businesses constitute the most complex task execution network in daily life, encompassing a vast array of real-world scenarios: from the speed of meal preparation in merchant kitchens to delivery routes for riders in heavy rain, and even a user's late-night craving for 'hot pot.'

Wang Xing has explicitly proposed upgrading the Meituan App into an 'AI-powered App' first. This means that LongCat's training goal is not just to answer questions like 'which restaurant has the best stir-fried pork' but also to 'find the restaurant, select the best group-buying coupon, and then reserve two seats at 7 PM on Friday evening.'

This underscores the particular importance of effective task delivery, explaining why Meituan emphasizes building an AI foundation for the physical world.

From parameter increases to computing power breakthroughs, domestic large models are advancing from 'usable' to 'user-friendly.'

There are no shortcuts on this path. In the future, as algorithms, computing power, capital, and scenarios continue to interact chemically, the story of Chinese AI will transition from 'individual breakthroughs' to 'systemic evolution.'

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.