How many AI companies are currently stuck in intelligent computing centers?

11/11 2024 412

This may be the most costly detour for some domestic technology companies.

Author|Li Xiaodong

Until now, computing power is still considered the main factor restricting the development of AI in China.

To address this issue, especially after the emergence of ChatGPT, intelligent computing centers have been viewed as a new type of infrastructure in the AI era, similar to water and electricity, and have sprung up across the country. As of the first half of this year, there were over 250 intelligent computing centers built or under construction nationwide. Media reports indicate that in the first seven months of this year alone, 140 new winning projects emerged.

The companies and institutions involved in the construction of these intelligent computing centers include urban investment companies, operators, financial enterprises, some central and state-owned enterprises, large internet cloud computing companies such as Huawei Cloud, Alibaba Cloud, and Tencent Cloud, and even some companies that have crossed over from industries such as real estate.

However, another set of data is equally noteworthy: IDC points out that the utilization rate of general-purpose computing centers, with enterprises as the main users, is currently only 10%-15%. Previously, an academician of the Chinese Academy of Sciences publicly stated that many heavyweight intelligent computing centers are now mostly idle.

From insufficient computing power to so-called 'excess' computing power, the core issues have not been completely resolved. This means that large projects with investments ranging from billions to over ten billion yuan not only fail to deliver their intended value but also become the costliest new challenge in this AI boom.

01

The surge in computing power supply

Many people may still be unfamiliar with the term 'intelligent computing center.' However, it is not a new term and can be understood as an advanced version of a 'computing center.'

At the dawn of computer science, due to the high cost and scarcity of computing devices, many institutions and organizations established specialized computing centers to provide centralized high-performance computing resources and services. As computer technology continues to advance and become more prevalent, from the early mainframe era to today's cloud computing and edge computing, the form and function of computing centers have evolved.

Compared to traditional computing centers, intelligent computing centers use GPUs instead of CPUs at the hardware level. GPUs are more efficient in parallel processing for large-scale data set computations. On the software side, intelligent computing centers deploy AI frameworks that can allocate computing tasks to different computing platforms, thereby achieving maximum efficiency.

From an application perspective, traditional computing centers primarily focus on data storage and internet service provision, whereas intelligent computing centers specifically provide computing power and data storage for AI applications.

Three years ago, against the backdrop of promoting digital transformation in industries, China's first intelligent computing center was established in Wuhan with a total investment of approximately 460 million yuan. Due to the prefabricated modular machine room and cabinet delivery methods typically adopted during the construction of such projects, the actual construction period is very short, taking only six months to complete.

The initial phase had a construction scale of 100P FLOPS of AI computing power, composed of thousands of Ascend AI processors, with a peak computing performance equivalent to 50,000 high-performance PCs. What does this mean? Taking astronomical exploration as an example, while ordinary computing power requires 169 days to find a specific star, an intelligent computing center can do it in less than 100 seconds.

According to NewMo's statistics on intelligent computing centers built since 2021, from locally-led projects in Wuhan, Hefei, Nanjing, Beijing-Tianjin-Hebei, and other regions, to later investments and operations led by major companies such as SenseTime, Alibaba Cloud, Baidu Cloud, etc., the investment funds and computing power scale have shown a clear upward trend.

For example, the 'SenseTime AI Computing Center' that went into operation in 2022 invested 5.6 billion yuan in its first phase, with 5,000 cabinets and a peak training computing power of 3740P FLOPS. Later that same year, Alibaba Cloud's Zhangbei Intelligent Computing Center went online with a computing power scale of 12,000P FLOPS and a total investment of 18 billion yuan.

In the past year, intelligent computing centers have stood out in quantity.

Specifically, intelligent computing centers have sprung up in some fourth- and fifth-tier cities such as Qingyang, Gansu, Suzhou, Anhui, and Zaozhuang, Shandong. Compared to large cities, these cities have abundant and inexpensive land resources, are eager for economic transformation, and aim to drive the development of surrounding industries. Therefore, they provide support such as tax incentives and financial subsidies for intelligent computing centers, and have simplified approval processes and faster construction progress.

On the other hand, intelligent computing centers need to standardize and service computing power to achieve on-demand access, similar to traditional public cloud services. This means that intelligent computing centers are not just providing hardware resources but transitioning to providing computing power services. Therefore, many companies, even traditional ones, have entered the cross-border market of computing power leasing.

According to iFinD data, there are currently as many as 108 stocks related to the concept of computing power leasing. For example, Hongbo Stock, which primarily engages in lottery printing, is the first A-share listed company to announce a cross-border venture into computing power; Lianhua Health, whose main business is producing monosodium glutamate, has also procured a large number of NVIDIA GPUs to carry out computing power leasing.

02

Computing power leasing, a new trend

In simple terms, computing power leasing is when you rent rather than purchase expensive computing equipment when you need powerful computing power to complete a project. The service provider sets up the required computing environment or system based on your needs and then rents this computing capacity to you through a contract.

Customers pay rent, and after use, the ownership of the equipment still belongs to the service provider, and there is no need for customers to maintain or purchase the equipment themselves. Typically, there are four pricing models for computing power leasing: hourly rates, based on the scale of computing power, based on usage, and packaged pricing.

Specifically, small technology innovation companies often conduct scientific research projects or short-term data processing tasks with uncertain computing power usage times, making hourly pricing a flexible way to control costs. For example, SFCompute provides hourly pricing services, allowing users to rent H100 GPUs based on their needs at a low price.

Pricing based on the scale of computing power usually involves assessments of server performance, the number of GPUs, etc. The better the performance and efficiency, the higher the service rental fee. There are also pricing models based on data processing volume and network traffic usage. For some large enterprises or customers with special needs, providers can customize personalized packages based on specific customer requirements.

Why rent?

As we all know, the demand for large model training, fine-tuning, and inference has surged in recent years. However, the key is that AI applications have varying computing power requirements at different development stages. During the development cycle, substantial training computing power is required; once the model is developed and launched, training computing power is no longer needed, but inference computing power is.

The urgent demand under current computing power leasing is mostly driven by model training. As for how much is needed, reference can be made to the number of A100 cards reaching tens of thousands during the training process of GPT-4. Considering server procurement and rental costs, as well as the elastic demand for computing power, debugging and maintenance costs, and engineers' development time costs, computing power leasing has become the best choice for most large model vendors.

However, in the current computing power leasing market, the competitive landscape is characterized by many participants but a decentralized structure. Traditional cloud service providers have relatively abundant high-end computing power resources but lease out a smaller proportion of them. Enterprises with IDC construction and operation capabilities, such as Inspur Information and Sugon, as well as companies crossing over from other industries to engage in computing power, are also increasing.

Another issue is that China's computing power investment in recent years has continuously narrowed the gap with foreign countries. From a scale perspective alone, it has reached the world's top level, especially in the reserve of intelligent computing centers, which is far from being undersupplied.

So, why are there still so many companies competing to do leasing business in this situation?

In fact, regarding supply and demand, although China's computing power scale has increased, there has always been a gap in computing power for AI and high-performance computing. According to CCID Consulting data, in 2023, China's demand for intelligent computing power reached 123.6 EFLOPS, but the supply was only 57.9 EFLOPS, less than half.

On the other hand, the demand for computing power has not been fully met by large cloud computing companies. According to industry insiders, the main reason is that large model training requires high-performance GPU clusters, while large companies mostly provide single-node services that lack ultra-high-bandwidth interconnection infrastructure between nodes, unable to meet the demand.

Furthermore, the cost of building new clusters is high, existing infrastructure cannot be reused, and large companies are more focused on single-node availability and reliability, lacking advantages in high-performance cluster business. Otherwise, there would not be opportunities for the development of other computing power leasing companies.

According to a research report by Dongwu Securities, the gross profit margin of computing power leasing companies is approximately 40%, and the net profit margin is approximately 20%. The biggest threshold is capital—for hardware procurement, venue leasing, and operation and maintenance team formation. However, the technical threshold is not high, and enterprises can quickly build and operate large-scale computing power centers by cooperating with technology suppliers and introducing professional talents.

For enterprises under pressure from their main business and eager to find new profit growth points, this new market segment is highly attractive.

03

The logic behind intelligent computing centers

In 2023, Lianhua Holdings established Lianhua Zixing to carry out computing power leasing business. The announcement indicated that the total estimated investment in the computing power project is approximately 290 million yuan. In the first eight months of this year, Lianhua Zixing's revenue exceeded 35.15 million yuan, but its net profit was -3.9813 million yuan. Overall, it is still incurring losses, mainly due to significant impacts from equipment depreciation and interest expenses, with relatively high personnel cost expenditures and yet to achieve economies of scale.

Not making money in a short period does not mean it is not a good business. By combing through the financial data of A-share companies engaged in computing power leasing, it is found that most of them are indeed in a state of increasing revenue without increasing profits, with only a handful achieving profitability. However, these enterprises related to the concept of 'computing power' may have the most direct benefit of seeing their stock prices rise significantly in the short term.

If asked about the uncertainties in computing power leasing, market demand, policy changes, technological level, as well as delivery and supply chains, and domestic substitutions, are all important influencing factors. However, for enterprises currently involved in this market segment, the more critical issue is truly understanding how to do it.

Intelligent computing centers do not simply mean buying a bunch of GPUs and then earning money through rental and sales. Just in terms of hardware deployment, issues that need to be addressed include high-performance AI chips, heterogeneous architecture design, high-speed, low-latency networks, storage systems, security configurations, monitoring and management, liquid cooling devices, and other complex steps.

In addition, the main leasing targets for training computing power are large model companies. So, how do leasing companies, especially those crossing over from other industries, acquire these customer resources and obtain stable customer support? These issues are more difficult to resolve in practical operations. So far, several listed companies have abandoned their computing power leasing business.

Returning to the topic at the beginning, what is the reason for the low ROI of intelligent computing centers?

Besides the mismatch between supply and demand, regional differences in economic development levels and industrial structures, such as strong demand but relatively insufficient supply of computing power in eastern regions, the earliest bubble burst was caused by companies that blindly invested in the construction of intelligent computing centers without adequate market research.

Secondly, considering the current predicament of the computing power leasing market, the efficient operation of intelligent computing centers not only relies on high-performance hardware but also requires software-level optimization and coordination. Intelligent computing is not just about cards but a system that coordinates hardware and software. When software capabilities are insufficient, the performance of the cards themselves will also be limited, resulting in inefficient card usage.

The sluggish performance of intelligent computing centers has directly affected the downturn in the computing power leasing market. Conversely, when computing power resources cannot be effectively utilized, some enterprises idle their computing power resources due to the lack of application scenarios, leading to waste, and so on in a cycle.

It is worth noting that due to the technological gap between China and foreign countries and the chip shortage, domestic substitutions are increasingly being mentioned repeatedly. However, the most difficult issue to resolve in this process is the application ecosystem.

For example, if domestic chip manufacturers adopt a closed model in terms of technology and maximize commercial benefits through high-priced equipment sales and auxiliary operating services, although they can concentrate efforts to achieve end-to-end control.

However, closedness also leads to very few available open-source and commercial software, and the migration and adaptation costs for users' own software are extremely high. Some users' software cannot be adapted, and the intelligent computing centers established on this basis can only remain idle.

This article is originally created by NewMo

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.