Meituan Triumphs with Domestic Computing Power: Running a Trillion-Parameter Model, Exploration Initiated in 2023

07/01 2026 464

The LongCat team embarked on its journey to harness domestic computing power in 2023, swiftly becoming one of Ascend's premier internet clients.

A buzzworthy piece of news swept across major platforms last night: Meituan has unveiled and open-sourced the LongCat-2.0 large model, boasting an impressive 1.6 trillion parameters. Notably, its entire training and inference processes were seamlessly executed on a domestic computing power cluster. Large models typically undergo stages such as pre-training, post-training, and inference, with pre-training posing the greatest challenge. Traditionally, domestic chips were primarily utilized for large model inference. However, this time, a significant breakthrough has been achieved in pre-training.

This milestone positions LongCat-2.0 as the inaugural trillion-parameter model to run successfully on a domestic computing power cluster. Previously, the earliest 100-billion-parameter model to achieve this feat was iFLYTEK's Spark V3.5.

Meituan has chosen not to disclose the precise model number. According to Digitizing Frontier, LongCat-2.0 completed its entire training and inference journey on an Ascend cluster comprising 50,000 cards. On June 5, Meituan previewed LongCat-2.0 at Huawei Cloud's inaugural INSPIRE Innovators Conference, spotlighting its Coding and Agent functionalities.

01 Why Meituan, Rather Than DeepSeek, Achieved This Milestone?

As per Digitizing Frontier, the LongCat team's foray into domestic computing power began in 2023, quickly establishing itself as one of Ascend's largest internet clients.

An intriguing question arises: Why did Meituan, rather than DeepSeek, secure this achievement first? The answer lies in a confluence of strategic decisions by top leadership, corporate positioning, resource allocation, and business considerations.

Media reports indicate that DeepSeek's V4 model, launched this year, still relies on NVIDIA and CUDA ecosystems for training, while adapting to domestic computing platforms like Huawei Ascend for inference. This approach may reflect DeepSeek's balancing act—maintaining a cutting-edge position while considering existing computing power and software stack accumulation, prioritizing model capabilities themselves, with training efficiency serving as a competitive edge.

On the other hand, Meituan has maintained a longstanding collaboration with Huawei and was an early advocate of HarmonyOS. Wang Xing, Meituan's founder, has publicly expressed support for Huawei's self-developed chip roadmap. As a scenario-driven internet company, Meituan strategically opted to utilize fully domestic computing power for trillion-parameter model training, driven by supply chain security concerns.

According to Digitizing Frontier, since 2023, the Meituan team has progressively overcome fundamental challenges such as operator adaptation, communication optimization, and distributed stability. The crux of the issue was constructing a technology stack from scratch on a domestic chip platform characterized by a less mature algorithm ecosystem, smaller memory capacity, and narrower memory bandwidth—a systemic engineering endeavor.

Take operators as an illustration: For training purposes, Meituan developed its own deterministic operators encompassing Embedding, FA, LSA, MoE, and more, rewriting a series of foundational operators for numerical reliability to enhance precision. For long-context scenarios, it also crafted deterministic attention operators and KL loss operators. On the inference front, Super Kernel and Weight Prefetch represent operator-level adaptations. These capabilities are typically readily accessible within the mature NVIDIA ecosystem.

In Meituan's introduction of LongCat-2.0, the statement "Pre-training was accomplished in just over a month on more than 50,000 domestic computing chips, consuming over 35 trillion tokens, with no rollbacks or irrecoverable loss spikes throughout the process" is prominently featured at the outset, underscoring its engineering breakthrough.

It's worth noting that some of these technologies were optimized and evolved from existing industry practices. For instance, LongCat's Sparse Attention (LSA) is explicitly stated in the report to have "evolved from DeepSeek's Sparse Attention (DSA)."

02 Domestic Computing Power Chips Progress Toward Training

Previously, domestically produced chips were predominantly employed for model inference. However, breakthroughs are now being realized in the more demanding model training phase.

In 2023, iFLYTEK and Huawei embarked on a full-stack localization journey for large models, establishing a dedicated computing power task force. Huawei deployed a team of several hundred personnel, with over a thousand engineers working on-site at iFLYTEK's headquarters in Hefei during peak periods. iFLYTEK constructed the inaugural 10,000-card domestic computing power platform, "Feixing No.1," based on Huawei Ascend, and all iterations of its Spark large models were trained utilizing Huawei Ascend computing power.

In October 2024, iFLYTEK announced the launch of "Feixing No.2," a 30,000-card Ascend cluster. By June of this year, the first three phases of "Feixing No.2" were fully operational for commercial use, focusing on technologies such as MoE sparse trillion-parameter foundations, ultra-long contexts, and agent reinforcement learning.

Meituan's exploration of domestic computing power also commenced in 2023. In June 2026, it officially unveiled the LongCat-2.0 trillion-parameter model, powered by domestic computing resources.

On Alibaba's front, T-Head initiated the Zhenwu series PPU project in 2020, targeting NVIDIA's GPGPU. In September 2025, CCTV News showcased the Sanjiangyuan Intelligent Computing Center, revealing Alibaba's PPU 10,000-card cluster. In May 2026, T-Head released the next-generation training-inference integrated PPU chip, Zhenwu M890.

For Baidu, in April 2025, its Kunlunxin 34,000-card cluster was activated, and significant versions such as ERNIE 5.1 were trained based on Kunlunxin. Currently, Baidu Kunlunxin's IPO is progressing toward a dual listing in "A+H" markets, entering a pivotal stage.

Regarding Cambricon, Digitizing Frontier confirmed with a major internet company that acquired Cambricon chips that they are not involved in large model training but are utilized for model inference. According to reports, some industry-specific models have been trained on Cambricon chips.

Since embodied intelligence-related models feature smaller parameters and significantly smaller datasets compared to large language models, they have emerged as a new proving ground for domestic chip model training. In January 2026, Moore Threads, in collaboration with the Beijing Academy of Artificial Intelligence, completed full-process training of the embodied brain model RoboBrain 2.5 using its MTT S5000 GPU-based 1,000-card cluster, boasting 8 billion parameters.

Automakers such as Li Auto are also developing their own chips for training embodied models that support their embodied intelligence applications for VLA (Vision-Language-Action).

From inference to training, spanning from trillion-parameter foundations to compact embodied models, domestic computing power is achieving breakthroughs across multiple fronts. However, an insider from an intelligent computing center informed Digitizing Frontier that while there is optimism for more domestic chips to run the full spectrum from model training to inference, thereby maximizing chip capability utilization and application, this will indeed necessitate collaborative efforts from capable chip companies and application providers, marking a gradual process of breakthroughs.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.