Looking Ahead to 2026: Will DeepSeek Liang Wenfeng's mHC Architecture Reshape Chip Design?

01/04 2026 510

Editor's Note:

The groundbreaking mHC (Manifold-Constrained Hyper-Connections) architecture, unveiled by DeepSeek, not only revolutionizes chip design but also heralds a paradigm shift in AI hardware design. It transitions from a 'general-purpose computing adaptation' approach to 'deep optimization for specialized, efficient architectures'.

In layman's terms, mHC represents a novel methodology that enables more stable and efficient training of AI models as parameter sizes escalate. Its primary impact lies in significantly reducing the reliance on brute-force computational power and memory through algorithmic innovation. This shift compels chip design to focus on achieving higher 'effective computational efficiency' rather than merely pursuing peak performance.

On New Year's Day 2026, the AI community was jolted out of its holiday lull by a paper from DeepSeek. Titled 'mHC: Manifold-Constrained Hyper-Connections,' this research delves into the manifold-constrained hyper-connection architecture (mHC), directly addressing the core challenges of large model training and chip design. The inclusion of DeepSeek's founder and CEO, Liang Wenfeng, among the paper's authors, underscores that this technology is not just an academic endeavor but carries clear industrial implementation goals.

Over the past few years, the AI industry's competition has centered around 'larger parameters, more computational power,' driving model iterations from billions to trillions of parameters. This trend has compelled AI chips, like GPUs, to continuously augment their computational units. However, beneath this surface prosperity, a critical contradiction has emerged: the growth rate of chip computational power far outpaces that of memory bandwidth, leading to significant inefficiencies in data movement and computational resource wastage. Known as the 'memory wall,' this dilemma acts as the Achilles' heel of AI chips, becoming the primary bottleneck restricting AI computational performance. Data from Micron reveals that GPU computational power has surged 37.5-fold in the past five years, while PCIe bandwidth has only increased eightfold. This severe imbalance often results in even the most advanced AI chips operating at less than 30% utilization.

The mHC architecture, proposed by Liang Wenfeng's team, transcends mere algorithmic optimization by reconstructing the connection logic of neural networks through manifold constraints. This fundamentally reduces the demand for memory bandwidth. This innovative approach, bridging algorithms and hardware, prompts the industry to reconsider: when software architectures can proactively adapt to hardware bottlenecks, will it disrupt the current 'hardware-first, software-adapted' chip design paradigm? This technological breakthrough in 2026 may mark a new starting point for the co-evolution of AI software and hardware.

From Uncontrollable to Controllable: The Core Breakthrough Logic of the mHC Architecture

To grasp why the mHC architecture strikes a chord in chip design, we must first revisit the core issue it addresses—the 'uncontrollable dilemma' of hyper-connection (HC) architectures. During the evolution of Transformer models, residual connections have been the bedrock supporting stable training of deep networks. Their 'x + F(x)' identity mapping structure ensures that signals do not systematically amplify or attenuate during propagation. However, as model scales expand, the expressive capacity of a single residual stream becomes inadequate, giving rise to hyper-connection architectures. By widening residual stream channels and constructing multi-path connections, they significantly enhance model expressiveness but also introduce stability risks.

The fatal flaw of traditional hyper-connection architectures lies in their unconstrained connection matrices, which disrupt the identity mapping property of residual connections. During large-scale model training, this unconstrained design easily leads to signal explosions or gradient anomalies. Data from research papers indicates that in certain scenarios, traditional hyper-connections can amplify signals up to 3,000-fold, directly causing training collapse. More critically, multi-path connections not only introduce stability issues but also drastically increase memory overhead—more residual streams mean more intermediate activations need to be stored and moved, further exacerbating the 'memory wall' problem and straining already limited memory bandwidth. Liang Wenfeng once noted in internal technical sharing that this 'performance vs. stability' dilemma in hyper-connections is a major reason for the high training costs of current large models.

The core innovation of the mHC architecture lies in introducing 'geometric constraint reins' to hyper-connections. Its central idea is to project the hyper-connection's connection matrix onto a manifold (Birkhoff polytope) composed of doubly stochastic matrices. This ensures, through mathematical constraints, that the sum of elements in each row and column equals 1 and is non-negative. This seemingly simple constraint fundamentally resolves the signal uncontrollability issue: the maximum eigenvalue of a doubly stochastic matrix is 1, meaning it can only redistribute weights among different residual streams without systematically amplifying signal norms. Experimental data shows that mHC strictly controls signal amplification within 1.6-fold, eliminating the stability issues of traditional hyper-connections.

At the implementation level, mHC employs the mathematically mature Sinkhorn-Knopp algorithm for manifold projection, ensuring constraint effectiveness while controlling overhead. During training, the model first learns ordinary real-valued matrices, then projects them into approximate doubly stochastic matrices through a finite number of Sinkhorn normalization steps. This differentiable projection method ensures training continuity. More critically, the DeepSeek team did not stop at algorithmic innovation but minimized memory overhead through three engineering optimizations: kernel fusion packages multiple operators like RMSNorm and matrix multiplication for execution, reducing intermediate data read/write times; selective recomputation discards non-critical intermediate activations and recomputes them during backward propagation, reducing memory usage by over 70%; DualPipe overlaps gradient communication with model computation, eliminating idle waiting time for computational units.

Experiments validate the effectiveness of this approach. In model training with 3 billion, 9 billion, and even 27 billion parameters, mHC not only completely avoids the non-convergence issues of traditional hyper-connections but also outperforms baseline models across eight downstream tasks, including BBH and DROP, with a 2.1% performance improvement in BBH and 2.3% in DROP. More notably, at an expansion rate of 4, mHC introduces only a 6.7% additional training time overhead, achieving a 'low-cost, high-performance' balance that makes it viable for large-scale industrial applications. The Liang Wenfeng team emphasizes in their paper that the value of mHC lies not in replacing Transformers but in providing a 'controllable and trainable' theoretical and engineering framework for exploring complex residual topologies. This framework's universality lays the groundwork for its adaptation to various chip architectures.

The Software-Hardware Synergy Revolution: mHC's Potential Reshaping of Chip Design

For a considerable period, AI chip design has been ensnared in a 'computational power race' path dependency. From NVIDIA's H100 to Blackwell architectures and various domestic AI chips, core innovations have consistently focused on augmenting computational unit density and expanding memory capacity. However, the emergence of the mHC architecture has prompted the industry to reconsider: when software can proactively reduce demand for memory bandwidth, does chip design need to break free from the inertial thinking of 'stacking hardware'? Behind this reconsideration lies a fundamental shift in the software-hardware synergy logic brought by the mHC architecture.

Firstly, mHC has the potential to break the 'compute-bandwidth' mismatch dilemma, driving chip design from 'compute-first' to 'efficiency-first.' The current core contradiction of AI chips is excess computational power paired with insufficient bandwidth, leading to significant wastage of clock cycles on data movement. Through optimizations like kernel fusion and selective recomputation, mHC integrates multiple scattered memory accesses into single accesses, drastically reducing bandwidth demands. This software-level 'bandwidth conservation' allows chip design to forgo the relentless pursuit of high-bandwidth HBM memory. For instance, mid-range AI chips previously unable to support large-scale model training due to bandwidth limitations could, with mHC adaptation, achieve feasibility through optimized memory access efficiency. This suggests a differentiated future for chip design: high-end chips may continue pursuing ultimate matching of compute and bandwidth, while mid- and low-end chips could leverage efficient architectures like mHC to achieve comparable training results at lower hardware costs.

Secondly, mHC's manifold constraint logic may drive innovation in chip-specific computational units. Current AI chip computational units are primarily optimized for general-purpose operators like matrix multiplication, but the Sinkhorn-Knopp projection operator in mHC has unique computational characteristics. While DeepSeek currently fuses it with existing operators through software optimization, widespread adoption of mHC could lead to the inclusion of dedicated projection operator acceleration units in chip design. The emergence of such specialized units would break the current monopoly of 'general-purpose computational units' in AI chips, driving evolution toward 'general-purpose + specialized' heterogeneous architectures. More importantly, mHC's constraint logic could deeply synergize with chip memory hierarchy design—for instance, chips could dynamically adjust caching strategies based on mHC's activation recomputation policies, prioritizing caching for critical layer inputs and freeing cache space for other computational tasks, further enhancing memory utilization.

Thirdly, the mHC architecture may lower hardware barriers for large model training, reshaping the chip market's competitive landscape. Currently, large model training is monopolized by a few tech giants with access to massive GPU clusters, primarily because smaller players cannot afford high-end AI chips. By ensuring training stability while drastically reducing memory usage and bandwidth demands, mHC enables smaller firms to conduct large-scale model training with fewer mid-range chips. This lowered barrier will drive demand growth in the mid-range AI chip market, compelling chip manufacturers to invest more innovation resources in this segment. For example, mid-range chips optimized for mHC may prioritize enhancing cache efficiency and operator fusion capabilities over blindly stacking computational units. This shift in market demand will guide chip design resources from 'high-end overcrowding' to 'mid-range inclusivity,' fostering diversified development in the AI chip market.

However, for mHC to truly reshape chip design directions, it must overcome several challenges. On one hand, ecosystem development for architectural adaptation takes time. Current mainstream AI chip software stacks are optimized for traditional Transformer architectures, and compelling chip manufacturers to proactively adapt to mHC requires forming sufficient industry consensus. DeepSeek's open-source strategy may accelerate this process—its previously open-sourced DeepSeek-V3 model has already amassed a large developer base, and sustained open-sourcing of mHC could attract more chip manufacturers to participate in adaptation. On the other hand, mHC's optimization effects still require validation in larger-scale models. While it performs exceptionally in 27 billion parameter models, its memory bandwidth savings in models with hundreds of billions or trillions of parameters remain to be proven with more experimental data. Liang Wenfeng stated in media interviews that the team is advancing training for larger mHC models, with relevant data to be gradually released in 2026—this data will directly influence chip manufacturers' adaptation confidence.

Notably, the software-hardware synergy approach advocated by mHC has already begun resonating in the industry. Storage manufacturers like Micron have mentioned in recent technical sharing that future storage product designs need to more closely integrate with AI architectures' memory access characteristics, and mHC provides an excellent example of such synergy. NVIDIA technical leaders have also stated they are monitoring the impact of efficient architectures like mHC on chip design and may incorporate targeted optimizations in future architectures. These signals indicate that mHC is driving the AI industry from a passive 'software-adapts-to-hardware' mode toward an active 'software-hardware co-design' mode.

Conclusion

The release of the mHC architecture by Liang Wenfeng's team in early 2026 represents not just an algorithmic breakthrough but a clarion call disrupting the AI industry's inertia toward 'computational power races.' At a time when the 'memory wall' has become the core bottleneck restricting AI development, mHC offers a novel approach by combining manifold constraints with engineering optimizations to address the mismatch between computational power and bandwidth. Its advocacy for 'software proactively adapting to hardware bottlenecks' is challenging traditional chip design paradigms and driving the industry toward 'efficiency-first' software-hardware synergy.

Objectively speaking, for the mHC architecture to completely revolutionize chip design paradigms, it must surmount challenges such as ecosystem development and large-scale validation. Consequently, it is improbable that it will overturn established paradigms in the near future. Nevertheless, it undeniably introduces a new dimension to chip design: the essence of chip value does not lie in merely stacking computational power but in efficiently leveraging every unit of compute. This paradigm shift in thinking could well become the central theme of AI chip innovation in the years ahead.

For the industry, the emergence of the mHC architecture marks a pivotal turning point. It serves as a reminder to practitioners that AI development should not solely focus on "scale" expansion but also place a premium on "efficiency" enhancements. As more teams delve into the profound synergy between algorithms and hardware, existing technological bottlenecks may be shattered, propelling AI into a phase of more sustainable growth. Regardless of whether the technological exploration in 2026 completely reshapes chip design directions, it has undoubtedly infused new vigor into AI innovation—and this, perhaps, represents the deeper significance of Liang Wenfeng's team unveiling the mHC architecture.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.