No AI without Light: Harnessing Optical Interconnects for AI Computing Power

08/05 2025 351

At the 2025 World Artificial Intelligence Conference (WAIC), the AI Smart Computing Center Super Node emerged as the central focus. Various companies unveiled their super node solutions, with Huawei's Ascend 384 Super Node stealing the show as a must-visit destination for media outlets and international visitors.

In the era of large models, AI computing power must handle massive parallel computations. AI computing is transitioning from single-card reliance to clustered computing, evolving from 10,000-card clusters to 100,000-card clusters. These large-scale AI clusters necessitate optical interconnect solutions. As clusters grow in size and performance, heat dissipation and power supply challenges intensify. Optical interconnects, leveraging fibers, minimize cable usage and reduce demands on heat dissipation and power supply, making them indispensable for large-scale AI cluster networking.

Ascend 384 Super Node embraced optical interconnects to manage its colossal AI demands. The super node employs 384 Ascend NPUs and 3,168 optical fibers spanning 316 kilometers, achieving full MESH interconnection of NPUs through 6,912 Nebula Optical Modules. In this vast optical interconnect network, dirt on optical link end faces poses a significant challenge, impacting long-term and stable training.

(Data Source: 2025 Core Light Forum, Huawei Cloud)

Dirt on end faces can cause frequent network flickering and high failure rates in computing clusters. Huawei Cloud's 2023 network analysis revealed an initial system flickering rate of 37.27%, primarily due to dirt on optical link end faces. This issue can lead to substantial losses; iFLYTEK's analysis of a 10,000-card cluster showed losses of 15.4861 million yuan due to 7 days of idle cluster equipment.

Traditional on-site operation and maintenance methods for detecting dirt on optical link end faces are labor-intensive and costly, with a detection rate of only 48.3%. Faced with over 110,000 optical links in a super node, these traditional methods are impractical.

To break through computing power limitations, Ascend 384 Super Node overcame challenges in optical link operation, maintenance, and detection. Equipped with 6,912 Nebula Optical Modules, the super node achieved a significant leap by leveraging intelligent operation, maintenance, and detection to surpass AI computing power limits.

The digital and intelligent system encompasses storage, computing, and networking, which complement and support each other. In large-scale AI computing power clustering, network connectivity is crucial, potentially unlocking new possibilities. Huawei's Ascend 384 Super Node showcased its prowess in optical communications, leveraging strengths to overcome weaknesses. The key to this achievement is Huawei's Nebula Optical Module.

Ascend 384 Super Node's success hinges on ultra-large-scale optical link networking. Each Ascend 384 Pod is equipped with 6,912 Nebula 400G Optical Modules, 5,376 for scale-up and 1,536 for scale-out networking. These modules address bottlenecks and challenges in smart computing center networks, achieving a breakthrough in ultra-large-scale AI computing clusters.

To address dirt on end faces, the Nebula Optical Module introduced the "Nebula Smart Inspection StarSensor." The Nebula 400G Optical Module features optical and electrical port health diagnosis and automatic dirt detection. Enhanced stress testing at the optical module level enables easier identification of burst errors caused by dirt, reducing flickering risks. With a dirt detection algorithm accuracy rate exceeding 90%, enabling minute-level detection, the Nebula 400G Optical Module resolves flickering issues in existing networks.

Highly accurate and intelligent dirt detection improves optical link operation and maintenance efficiency, enabling ultra-large-scale AI computing power to take root, enhancing system availability, and providing 360° seamless protection for AI computing clusters.

After enabling the Nebula Smart Inspection function, Huawei Cloud's existing network data showed a 13.9-fold reduction in link failure error rates.

The "light" of the Nebula Optical Module has illuminated China's AI computing power path. It supported the super node in building the world's largest and most advanced AI computing cluster, surpassing NVIDIA's NVL72. More importantly, it validated the feasibility of a system-centric approach to breaking AI computing power limits.

At the foundation of AI computing power, Huawei's Ascend 384 has demonstrated China's strength in this field, alongside the United States, indicating a balanced development trend. Leveraging Huawei's optical communications expertise, the Nebula Optical Interconnect solution for smart computing centers has become a strategic asset, underpinning the super node's AI computing power foundation for the future.

China's optical communications industry is undergoing rapid upgrades, with optical interconnects driving AI computing power development. The Nebula Optical Module not only empowers the Ascend 384 Super Node but also propels China's smart computing industry towards overall breakthroughs and upgrades.

Future AI competitions will be structural and systematic. Technological assets like the Nebula Optical Interconnect will increasingly drive development, becoming standard in smart computing center construction and AI computing cluster networking.

Following the path of "light" is a viable strategy for China's AI to transcend limitations.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.