12/25 2024
532
General-purpose computing power is evolving towards high-density, liquid-cooled rack-mounted form factors.
Early in 2023, a major internet company approached Inspur with a novel challenge: their clients have diverse application scenarios, and each demands a different optimal processor platform. For instance, lightweight container scenarios require moderate performance but high efficiency and density, whereas high-performance computing scenarios favor platforms with strong parallel processing capabilities and more high-frequency cores. The client posed a question: How can we swiftly deploy servers with varying processors across diverse businesses?
Traditionally, general-purpose server systems were custom-developed around a specific processor platform. Now, faced with demands for multiple processor platforms, how can servers adapt quickly? This poses a challenge to the almost unchanged general-purpose server architecture that has persisted for decades.
Almost simultaneously, another challenge emerged: While AI servers handle most large model training and inference, artificial intelligence also introduces new requirements for general-purpose servers, such as large-scale data storage for model training. General-purpose servers now possess intelligent acceleration capabilities and can run large model inference services. In the long run, the rapid evolution of intelligent computing clusters with tens of thousands or even millions of cards is disrupting and reconstructing data centers, guiding general-purpose servers, like AI servers, towards high-density deployment.
These two new market variables have positioned general-purpose servers, already a mature industry, at the cusp of a new transformation.
The projected shipment growth of general-purpose servers is expected to remain at 5% to 6%.
01
Changes and Competition in New Standards
Faced with the diverse computing power demands of this major internet company, Inspur and the client collaborated in a "brainstorming" session, leading to the decoupling approach. Previously, AI servers also faced competition among multiple acceleration chips, and the OAM standard, which Inspur helped promote, adopted a decoupled and standardized module approach, enabling chips from different vendors to be quickly applied and scaled up.
"The OAM approach inspired us," said Zhao Shuai, General Manager of Inspur's Server Product Line. If general-purpose servers can break the market convention and no longer design their system architecture around a specific processor, but instead split into standardized modules like processors, hard drives, IO, and power supplies, clients can assemble different modules like Legos to meet their diverse needs.
More than a year after this idea was proposed, through the efforts of multiple industry stakeholders, the decoupling approach was implemented. The Open Compute Technology Committee (OCTC) initiated the Open Computing Module (OCM) specification, establishing a standardized computing module and realizing "one machine with multiple chips." According to the specification's definition, in the future, multiple CPU platforms from Intel, AMD, ARM, and more can be freely switched in a single server or even supported simultaneously. This is also the first domestic server computing module design standard specification.
Initiation of the Open Computing Module (OCM) specification
Inspur has also completed the design of the first product compliant with the OCM specification. According to Digital Frontier News, the first YuanNao server NF3290G8 based on the OCM specification has entered the testing phase and is expected to be deployed in bulk in Q1 2025.
The reason this standard could break the design convention of general-purpose servers from the past few decades is also linked to the breakthroughs sought by various industry stakeholders:
The dominant processor chip segment has "loosened up." In the past two years, diverse computing power has gained momentum, with not only the x86 architecture but also RISC-V and ARM architectures actively deployed in the computing power market. Chip competition has intensified - whoever reaches users first and achieves rapid business deployment will capture the market. Even powerful chip vendors can no longer adhere rigidly to outdated rules, leaving room for negotiation.
End-user enterprises have also expressed urgent needs. While major internet companies require flexible and versatile computing power units, telecommunications companies face pressure to quickly deploy and scale diverse computing power.
Server companies, facing numerous chip platforms, experience multiplied development workloads and high costs. They also urgently need to enhance the efficiency of diverse computing power server research and development.
For national standard-setting organizations, the industry standard for computing power modules has been a blank spot, and they are willing to establish relevant standards to promote the domestic server industry to benchmark international standards.
These driving forces have united all industry stakeholders. Therefore, when OCTC initiated the Open Computing Module specification in 2024, the first batch of members included representatives from the China Electronics Standardization Institute, Baidu, Xiaohongshu, Inspur, Intel, AMD, Lenovo, and Ultra Fusion, among others.
However, the process of standard development was not smooth, with each party having their own demands, leading to some conflicts.
For example, major internet companies and chip vendors have differing priorities. Internet companies are more concerned with leading chip platforms being implemented in the standard, while some domestic and foreign chip vendors are more focused on platform compatibility and showcasing their respective advantages. Ultimately, the standards group included these computing platforms in the standard, conducting standardized evaluations and compatibility assessments.
Different server vendors also had their own demands, each hoping the standard would favor them. Ultimately, the standards group resolved this contradiction by adopting a motherboard standard + tray method to quickly couple different chassis or technology architecture platforms.
Recalling the initiation and development of this standard, Luo Jian, Product Planning Manager of Inspur's Server Product Line, said that all parties' ability to come together was largely due to the premise of benefiting the entire industry's healthy development. Under this premise, OCM provides a relatively fair platform. Through this platform, everyone can jointly promote the high-quality development of the computing power industry.
02
Three Major Productization Trends
After the introduction of the OCM standard, the industry began productization efforts.
Inspur promptly launched the first general-purpose server based on the OCM architecture - the YuanNao NF3290G8. The first-generation server supports two new CPU models: the Intel® Xeon® 6 processor and the fifth-generation AMD EPYC™ 9005 series processor. The former demonstrates significant performance improvements in scenarios such as AI inference and computing, generative AI, and scientific research, while the latter excels in scenarios such as all-flash storage, high network bandwidth, high-frequency financial trading, and big data analysis.
During this productization process of the OCM standard by system vendors, three major trends also warrant industry attention: decoupling, the introduction of large model technology in intelligent product management, and the trend of open hardware and open-source software.
Regarding the first major trend, the decoupling trend adopted by OCM represents the future evolution direction of server system architecture. "From a system efficiency perspective, when the system is divided into standard modules such as general-purpose computing power, memory, and heterogeneous computing power, and consistent power supply, cooling, and regulation are provided, corresponding power supply and cooling optimizations can be performed for different hardware resources to achieve the ultimate energy efficiency ratio," said Luo Jian. The YuanNao NF3290G8, which adopts the OCM standard, has already demonstrated this prototype.
To realize decoupling and modular design, engineers focused on solving normalization issues such as power supply, management, and external high-speed interconnection of computing modules. In terms of management, since each processor chip has different management interfaces and protocols, the management system BMC is required to master each processor's "cipher codebook," translating different information into "plaintext" for unified management. Previously, this technology was held by independent BMC firmware providers (IBVs). However, in 2023, Inspur acquired firmware development capabilities through the open-source route OpenBMC, laying the foundation for this time's processor management normalization.
Regarding the second major trend of intelligent product management, for high-failure components in general-purpose servers, such as memory and hard drives, the new-generation server platform leverages the advantage of large models' ability to learn and train on massive amounts of data. Based on the large model "Yuan" launched by Inspur, targeted training is conducted on previous server fault log data to form a fault warning model, integrated into the BMC management engine. Currently, the system achieves a seven-day advance fault warning, reducing unplanned downtime for clients and minimizing business losses.
Regarding the third major trend of openness and sharing, the product design of hardware, especially related to OCM productization, is contributed in the OCTC open community, allowing clients to access relevant materials. In terms of open-source software, open-source technology from the OpenBMC community helped Inspur solve key decoupling issues and was contributed back to the open-source community. Openness and sharing is a process of continuously accumulating and pooling technical strength, ultimately providing strong support and momentum for oneself and the industry chain's development.
In addition to these three major trends, the heat dissipation issue arising from general-purpose servers' increasing power consumption is also of great concern to the industry. According to introductions, heat dissipation was the biggest challenge encountered during this productization process.
We can observe that future processor platforms on general-purpose servers consume approximately 500 to 600 watts. Simultaneously, there are four 350-watt GPUs in the server. Smart network interface cards have become a standard configuration for cloud services, and their power consumption cannot be overlooked as bandwidth increases. The combined power consumption of these components approaches 3000 watts. How to address such high power consumption dissipation? Luo Jian revealed that one method adopted by engineers is to separate the cooling air channels, with separate cooling channels for CPUs, GPUs, and smart network interface cards. This improves heat dissipation efficiency by more than 5%, which is crucial for data centers' PUE.
Moving forward, as general-purpose servers' power consumption further increases, air cooling may reach its limits, and the OCM standard may evolve towards liquid cooling.
Adopting the OCM standard significantly reduces server research and development costs. Due to decoupling, many repetitive development tasks are reduced, accelerating the speed from chip research and development, testing, and verification to deployment. Inspur's product development cycle has been compressed from the original 18 months to 6 to 8 months. Additionally, during the decoupling and modularization process, the server's reliability has not been compromised due to reliability standards, including improved requirements for signals, power supply, structure, and system stability, as well as architectural changes.
Illustration: The OCM computing module supports multiple processor platforms
03
General-Purpose Servers at the Starting Point of Transformation
OCM is an important milestone that changes the design convention of general-purpose servers using a decoupling approach. However, in the long run, intelligent computing will have an even more profound impact on general-purpose servers in the future.
Currently, intelligent computing is leading the industry's evolution. The demand for computing power from large models is driving rapid growth in intelligent computing power. According to market research firm IDC's analysis and forecasts, the AI server market doubled consecutively in 2023 and 2024. For instance, in the Chinese market, the AI server market doubled to $10 billion in 2023 and is expected to double again to nearly $20 billion in 2024. AI servers are about to occupy half of the overall server market. Consequently, there's a saying in the server market: the market's performance is determined by AI servers.
In AI servers, flagship GPU chips have adopted the Chiplet approach, where multiple chip dies are interconnected and packaged together to provide extreme computing power, but this also rapidly increases chip power consumption to 1200 watts or even 1600 watts, further driving up the entire computing power infrastructure's power supply demand.
Over the past decade, the infrastructure of data centers has remained largely unchanged. Presently, most data centers possess a power supply capacity ranging from 10 to 12 kilowatts. However, with the advancement of intelligent computing, the overall power supply capacity of data centers is anticipated to reach 100 kilowatts, or even 200 kilowatts in the future. Currently, certain AI rack servers may already exceed 400 kilowatts.
"Given this scenario, we foresee that general-purpose computing power may also undergo significant changes in the future," noted Luo Jian. The current deployment method for general-purpose servers is comparatively less efficient and advantageous compared to data centers equipped with higher power supply capacities. "We predict that general-purpose computing power will ultimately evolve towards a high-density, liquid-cooled rack-mounted form factor," he added.
Should general-purpose servers adopt a high-density rack-mounted deployment model, the internal nodes will be designed based on the layered decoupling concept. Similarly, the decoupling concept of OCM (Optical Computing Module) transforms computing units into smaller modules. Consequently, OCM could serve as the catalyst for realizing high-density server deployment in data centers. In the future, liquid cooling may be employed to further enhance deployment density.
Luo Jian analyzed that during the transition towards high-density and liquid cooling, product design will undergo substantial transformations. For instance, memory may be laid flat on the motherboard, affixed to both sides of the motherboard, or configured in a manner that facilitates liquid cooling deployment.
To facilitate such a transition, the existing industry chain will expand, with companies specializing in liquid cooling, memory, power supply, and other segments joining forces. "OCM will serve as an excellent starting point," said Luo Jian. "It will propel the computing power industry towards evolution and upgradation, catering to future demands."