01/23 2025
558
To grow wealthy, build roads first.
For AI big models to continuously iterate and upgrade, the establishment of underlying computing power infrastructure is indispensable. Since the meteoric rise of ChatGPT in 2022, the computing power market has also seen explosive growth.
On one hand, China's tech giants, aiming to secure tickets for the future AGI era, are engaged in a "computing power arms race," frantically stockpiling graphics card resources while also constructing computing power clusters ranging from thousands to hundreds of thousands of cards.
According to an Omdia report, ByteDance ordered around 230,000 NVIDIA chips in 2024, becoming NVIDIA's second-largest customer.
Reports indicate that ByteDance's capital expenditure in 2025 will reach 160 billion yuan, with 90 billion yuan allocated for purchasing AI computing power. Tech giants of similar scale, including Alibaba, Baidu, and China Telecom, are also advancing the construction of computing power clusters at the level of hundreds of thousands of cards.
The frenetic computing power infrastructure behavior of tech giants is undoubtedly pushing China's AI computing power market to new heights.
However, on the flip side of the crazy expansion of computing power by giants, a substantial amount of computing power resources in China's computing power market remain idle, and there are even voices suggesting that "China's overall computing power resources exceed demand."
"The computing power market was very hot in 2023, and those using relatively low-performance A100s made money, but the market cooled down significantly in 2024, with many cards remaining unopened. However, due to various factors, the 4090, aimed at the gaming and consumer markets, is still in higher demand," Wang Wei, CTO of ZStack at Cloudaxis Technology, told Guangzhui Intelligence.
In the past two years, the computing power business has been the first track to strike gold in the wave of big models. Besides NVIDIA, countless cloud vendors, PaaS-layer computing power optimization service providers, and even chip brokers have stepped up their efforts. This surge in demand for computing power is primarily driven by the rapid development of AI big models.
The demand for AI is akin to a water pump, activating the previously stable computing power market and reigniting turbulent waves.
But now, this source of motivation has shifted. The development of AI big models is gradually transitioning from pre-training to inference applications, and more and more players are beginning to choose to abandon the pre-training of super large models. For example, recently, Kai-Fu Lee, founder and CEO of Innovation Works, publicly stated that Innovation Works will not stop pre-training but will no longer pursue super large models.
In Kai-Fu Lee's view, if one pursues AGI by continuously training super large models, it also means needing to invest more GPUs and resources. "It's still my previous judgment – when pre-training results are already inferior to open-source models, no company should be obsessed with pre-training."
Because of this, as one of the six tigers of Chinese big model startups, Innovation Works began to shift its strategy, focusing on the AI big model inference application market in the future.
At this stage, where both demand and supply are rapidly evolving, the balance of the market is constantly shifting.
In 2024, a structural imbalance in supply and demand emerged in the computing power market. Whether future computing power infrastructure should continue, where computing power resources should be sold, and how new players should compete with giants have become key propositions.
A hidden world surrounding the smart computing power market is gradually unfolding.
Mismatch between Supply and Demand: Low-Quality Expansion Meets High-Quality Demand
In 1997, the still-young Liu Miao joined IBM, which was then in the ascendancy, and this brought him into the computing industry.
In the mid-20th century, IBM's large mainframes, known as the "Blue Giant," nearly monopolized the global enterprise computing market.
"At that time, just a few large mainframes from IBM could support the operation of a bank's core business systems nationwide, which made me see the value of computing in accelerating business systems," Liu Miao told Guangzhui Intelligence.
It was his experience at IBM that foreshadowed Liu Miao's subsequent dedication to a new generation of smart computing.
After experiencing the mainframe era represented by CPUs and the cloud computing era, current computing power has entered the era of smart computing dominated by GPUs, and the entire computing paradigm has undergone a fundamental transformation. After all, if the old architecture solution is used, a large amount of data needs to be routed through CPUs to reach GPUs, which results in the waste of the large computing power and bandwidth of GPUs. Moreover, GPU training and inference scenarios also place higher demands on high-speed interconnect, online storage, and privacy security.
This has spurred the development of China's smart computing power industry chain, especially the infrastructure construction centered on smart computing centers.
At the end of 2022, the release of ChatGPT officially ushered in the era of AI big models, and China also entered the stage of the "Hundred Models War."
At that time, everyone hoped to provide computing power for the pre-training of big models, but there was also uncertainty within the industry about where the final demand for computing power would come from and who would use it. "At this stage, everyone would prioritize buying cards and hoarding resources," said Hong Rui, co-founder and dean of the research institute at Turing New Intelligence, which is the era of smart computing 1.0.
As the training parameters of big models continue to increase, it was eventually discovered that the true consumers of computing power resources are concentrated on players doing pre-training.
"In the early stages of this round of AI industry explosion, the hope was to continuously expand computing power consumption in basic model pre-training and explore the path to AGI (Artificial General Intelligence)," said Hong Rui.
Public data shows that ChatGPT's training parameters have reached 175 billion, with 45TB of training data, generating 4.5 billion words of content daily. Supporting its computing power requires at least tens of thousands of NVIDIA GPU A100s, with a single model training cost exceeding $12 million.
In addition, multimodal big models in 2024 are like a battle among gods, with the training of video, image, voice, and other data placing higher demands on computing power.
Public data shows that the computing power demand for training and inference of OpenAI's Sora video generation big model reached 4.5 times and nearly 400 times that of GPT-4, respectively. A report from China Galaxy Securities Research Institute also shows that Sora's demand for computing power is growing exponentially.
Therefore, starting in 2023, in addition to various forces hoarding graphics card resources, to meet more computing power demand, China's computing power market has witnessed explosive growth, especially in smart computing centers.
Bai Runxuan, a senior analyst at CCID Consulting's Artificial Intelligence and Big Data Research Center, previously stated: "Starting from 2023, local governments have increased their investment in smart computing centers, promoting the development of infrastructure."
Under the dual influence of the market and policy, China's smart computing centers have sprung up rapidly in just one or two years.
These include both government-led construction projects and smart computing centers invested and constructed by enterprises such as Alibaba Cloud, Baidu Intelligent Cloud, and SenseTime. There are also some cross-border enterprises that have seen opportunities and entered this track.
At the same time, startups like Turing New Intelligence, Quanjing Technology, and Silicon-based Flow have entered the computing power industry.
Relevant data shows that as of the first half of 2024, there are more than 250 smart computing centers under construction or already constructed in China, with 791 bidding-related events for smart computing centers in the first half of 2024, a year-on-year increase of 407.1%.
However, the construction of smart computing centers is not as simple as building bridges and roads. First, it requires a high level of technology and professionalism. Second, there is often a mismatch between construction and demand. Third, there is insufficient continuous planning.
In Liu Miao's view, smart computing centers are actually a unique product of China, which to some extent undertakes part of the social mission of supporting local industrial development. However, the problem with not being purely market-driven is that after a construction cycle of 12-24 months, "they become idle because they can no longer meet the industry's demand for computing power in two years."
Currently, China's computing power market resources are indeed idle in some regions. "The root cause of the current problems in China's computing power market lies in its overly extensive nature," said Liu Miao.
However, the market cannot simply be described as an excess or shortage of supply and demand. In fact, it is a mismatch between the supply and demand of computing power. That is, high-quality demand for computing power is far from sufficient, but low-quality supply of computing power cannot find much market demand. After all, players in big model pre-training often require computing power resource pools of over ten thousand cards.
However, the scale of some smart computing centers in China's computing power market in the early stages "may only be a few dozen to a couple of hundred, which is far from enough for the pre-training of current basic models, but the equipment selection matches the pre-training demand," said Hong Rui. From the perspective of pre-training, computing power is indeed scarce, but due to the insufficient scale, the unusable computing power becomes idle.
Differentiation in the Big Model Track, and a Subtle Shift in Computing Power Demand
The development and changes in the big model market are too fast.
Originally, in the pre-training stage of big models, players in the industry hoped to improve the effectiveness of big models through continuous training. If this generation did not work, they would spend more computing power and funds to train the next generation of big models.
"The development logic of the big model track was like this before, but around June 2024, the industry could clearly perceive that the pre-training of big models had reached a critical point of input and output, and investing huge resources in pre-training may not achieve the expected returns," said Hong Rui.
A very important reason behind this is the "evolution of OpenAI technology. The capability of GPT-3.5 was very impressive, and GPT-4's capability has improved, but from mid-2023 to 2024, the overall upgrade of the base model's capabilities did not match the effect of 2023, with most improvements seen in CoT and Agent sides," said Wang Wei.
While the upgrade of base model capabilities slows down, the cost of pre-training is also very high.
As previously stated by Kai-Fu Lee, founder and CEO of Innovation Works, the cost of one pre-training session is about 3-4 million US dollars. For most small and medium-sized enterprises, this is undoubtedly a high-cost investment. "The survival strategy for startups is to consider how to make good use of every dollar, rather than burning more GPUs," said Kai-Fu Lee.
Therefore, as the parameters of big models continue to increase, more and more enterprises cannot afford the cost of big model training and can only apply or fine-tune based on already trained models. "It can even be said that when the parameters of big models reach a certain level, most enterprises do not even have the ability to fine-tune," said Hong Rui.
According to relevant data statistics, in the second half of 2024, nearly 50% of the registered big models shifted to AI applications.
The shift of big models from pre-training to inference applications undoubtedly also brings about a differentiation in the demand for the computing power market. Hong Rui believes that: "The computing centers and computing power demand for big model pre-training, as well as the computing power demand for inference applications, are actually two different tracks."
From the perspective of big model pre-training, the required computing power is directly proportional to the number of model parameters and the amount of training data. The overall requirement for the scale of computing power clusters is: 100 million parameters require 100 cards, 10 billion parameters require 1,000 cards, and 100 billion parameters require 10,000 cards.
In addition, an important feature of big model pre-training is that it cannot be interrupted. Once interrupted, all training needs to start from the CheckPoint.
"From last year to this year, China has introduced a large number of smart computing devices, but the average failure rate is around 10%-20%. Such a high failure rate results in big model training being interrupted every three hours," said Liu Miao. "For a 1,000-card cluster, it basically gets interrupted once every 20 days."
At the same time, to support artificial intelligence moving towards the Agent era and even future general artificial intelligence, it is necessary to continuously expand computing power clusters, from thousands of cards to tens of thousands or even hundreds of thousands of cards. "Musk is a great person. He planned a 100,000-card cluster in Memphis, with the first 19,000 cards being installed and lit up in just 19 days. Its complexity is far higher than existing projects," said Liu Miao.
Currently, in order to meet the training needs of big models with higher parameters, China is actively investing in the construction of 10,000-card computing power pools. However, "everyone will find that the customers of computing power suppliers are actually concentrated in a few leading enterprises, and these enterprises will be required to sign long-term computing power leasing agreements, regardless of whether they really need this computing power," said Liu Jingqian, chief expert of China Telecom's big model and head of the big model team.
However, Hong Rui believes that: "In the future, no more than 50 players globally will possess the capability to engage in pre-training. And when the scale of smart computing clusters reaches tens of thousands or hundreds of thousands of cards, fewer and fewer players will be able to handle cluster operation, maintenance troubleshooting, and performance tuning."
Currently, a significant number of small and medium-sized enterprises have shifted their focus from pre-training large models to AI inference applications, and Liu Jingqian notes that "a large number of AI inference applications are often short-lived and cyclical." However, when deployed in actual terminal scenarios, these applications require a substantial number of servers for parallel network computing, causing a sudden surge in inference costs.
Ai Zhiyuan, CEO of Quanjing Technology, explains to Guangzhui Intelligence, "The reason for this is the relatively high latency. For a large model to answer a question, it must undergo deep reasoning, during which the model is continuously computing. This means the computing resources of the machine are exclusively occupied for tens of seconds. If expanded to hundreds of servers, the inference cost becomes difficult to justify."
Therefore, compared to AI (large model) training scenarios that necessitate extensive computing power, AI inference has less stringent requirements on computing power performance, primarily focusing on low power consumption and real-time processing. Yue Kun, vice president of Huawei and president of the ISP and Internet System Department, states, "Training is concentrated in power hubs, while inference needs to be close to users." The latency of inference computing power should be within the range of 5-10 milliseconds, requiring a high redundancy design to achieve the construction of "two locations and three centers."
Taking China Telecom as an example, it has established 10,000-card resource pools in Beijing, Shanghai, Guangzhou, Ningxia, and other locations. To support the development of industry models, it has also established 1,000-card resource pools in seven places such as Zhejiang and Jiangsu. Simultaneously, to ensure that the low latency of AI inference applications remains within the 10-millisecond range, China Telecom is building edge inference computing power in multiple regions, gradually forming a national "2+3+7" computing power layout.
2024 is hailed as the first year of AI application landing, but in reality, the AI inference application market has not experienced the expected explosion. Hong Rui attributes this to "the current lack of an application in the industry that can be rolled out on a large scale in enterprises. After all, large models still have technical deficiencies, with insufficiently strong base models and issues such as hallucinations and randomness."
Due to the general absence of AI application explosions, the growth of inference computing power has also stagnated. However, many practitioners remain optimistic, believing that smart computing power will continue to be in "long-term shortage," and as AI applications gradually penetrate, the demand for inference computing power will definitely increase.
A representative from a chip enterprise told Guangzhui Intelligence that AI inference is constantly striving to achieve the optimal solution. Agents consume more tokens than ordinary LLMs (Large Language Models) because they are continuously observing, planning, and executing. "O1 is an attempt within the model, while Agent is an attempt outside the model."
Therefore, Liu Jingqian predicts, "There will likely be a significant surge in demand for AI inference computing power next year." He adds, "We have also established a plethora of lightweight smart computing cluster solutions and comprehensive edge inference solutions."
Wang Wei also comments, "If the card volume in the computing power pool is not substantial, it is challenging to rent out cluster computing power for pre-training. The inference market does not require a large volume of training cards, and the entire market is still growing steadily, with demand from small and medium-sized internet enterprises continuously increasing."
However, at this stage, training computing power still dominates. According to the "2023-2024 China AI Computing Power Development Assessment Report" jointly released by IDC and Inspur Information, in 2023, the ratio of training to inference in domestic AI server workloads was approximately 6:4.
During NVIDIA's Q2 2024 earnings call in August 2024, management stated that inference computing power accounted for approximately 40% of NVIDIA's data center revenue over the past four quarters. In the future, revenue from inference computing power is expected to continue growing. On December 25, NVIDIA announced the launch of two GPUs, GB300 and B300, designed to meet the performance requirements of large inference models.
Undoubtedly, the transition of large models from pre-training to inference applications has driven a differentiation in the demand for computing power in the market. From the perspective of the entire computing power market, current intelligent computing centers are still in their nascent stages of development, with infrastructure construction still incomplete. Therefore, large pre-training players or enterprises tend to hoard graphics cards themselves. For the AI inference application track, when intelligent computing centers provide equipment leasing, most small and medium-sized customers prefer zero-rent leasing and prioritize cost-effectiveness.
In the future, as the penetration rate of AI applications continues to rise, the consumption of inference computing power will also increase. According to IDC's forecast, the proportion of inference computing power in the overall intelligent computing power market is projected to exceed 70% by 2027.
Improving computational efficiency to reduce the cost of inference deployment has become crucial for the development of the AI inference application computing power market.
How can we enhance computing power utilization without blindly promoting graphics cards?
Overall, since the official launch of the "East Data, West Compute" initiative in 2021, the Chinese market does not lack underlying computing power resources. Even with the advancement of large model technology and the growth in computing power demand, the boom in purchasing infrastructure in the computing power market is expected to continue for another one or two years.
However, these underlying computing power resources share a common characteristic: they are dispersed and small-scale. Liu Jingqian notes, "Each location may only have around 100 or 200 computing units, which is far from meeting the computing power requirements of large models."
Moreover, what is more critical is that the current computational efficiency of these resources is not high.
News indicates that even for OpenAI, during the training of GPT-4, the utilization rate of computing power was only 32%-36%, and the effective utilization rate for large model training was less than 50%. Wu Hequan, academician of the Chinese Academy of Engineering, admits, "The utilization rate of computing power in China is only 30%."
The reason for this is that during the training cycle of large models, GPU cards cannot achieve high resource utilization at all times. During stages with relatively minor training tasks, resources may remain idle. In the model deployment stage, due to business fluctuations and inaccurate demand forecasts, many servers often remain in standby or low-load states.
Hong Rui explains, "The overall development of CPU servers in the cloud computing era is already very mature, and the availability requirement for general-purpose cloud services is 99.5% to 99.9%, but it is very difficult to achieve this for large-scale GPU clusters."
Behind this lies the inadequate development of GPU hardware and the entire software ecosystem. Software-defined hardware is gradually becoming the key to the development of the intelligent computing power era.
Therefore, in the realm of intelligent computing power, various players have entered the market with their core advantages, racing to expand their presence by focusing on the construction of intelligent computing power infrastructure, integrating idle social computing power resources, and improving computing efficiency through software algorithms.
These players can be broadly categorized into three groups:
The first category includes large state-owned enterprises and central enterprises, such as China Telecom, which can better fulfill the computing power needs of state-owned and central enterprises due to their status.
On one hand, China Telecom has established computing power resource pools with thousands, tens of thousands, and hundreds of thousands of cards. On the other hand, through the XiRang Intelligent Computing Integration Platform, China Telecom is actively integrating idle social computing power resources, enabling unified management and scheduling across service providers, regions, and architectures, thus improving the overall utilization rate of computing power resources.
Liu Jingqian explains, "Our initial step was to build an intelligent computing scheduling platform for state-owned and central enterprises. By integrating over 400 different idle computing power resources from society into the same platform and then connecting the computing power needs of state-owned and central enterprises, we can address the imbalance between supply and demand of computing power."
The second category comprises cloud vendors dominated by internet companies, including Alibaba Cloud, Baidu Intelligent Cloud, Volcano Engine, etc. These cloud vendors are actively transforming their underlying infrastructure architecture from CPU clouds to GPU clouds, forming a full-stack technical capability centered on GPU clouds.
Tan Dai, president of Volcano Engine, previously stated, "In the next decade, the computing paradigm will shift from cloud-native to a new era of AI cloud-native." AI cloud-native will reoptimize computing, storage, and network architectures with GPUs as the core, allowing GPUs to directly access storage and databases, significantly reducing IO latency.
From the perspective of underlying infrastructure, the construction of intelligent computing centers often does not rely solely on GPU graphics cards of a single brand. Instead, it may involve a combination of NVIDIA and domestic GPU graphics cards or even heterogeneous computing power scenarios where multiple different types of computing units such as CPUs, GPUs, FPGAs (field-programmable gate arrays), and ASICs (application-specific integrated circuits) collaborate to meet computing needs in diverse scenarios and maximize computing efficiency.
Therefore, cloud vendors have also emphasized upgrading their capabilities for "multi-chip mixed training." For instance, in September this year, Baidu Intelligent Cloud upgraded its Baige AI Heterogeneous Computing Platform to version 4.0, achieving 95% multi-chip mixed training efficiency on clusters with tens of thousands of cards.
Above the underlying infrastructure, in addition to the performance of GPU graphics cards, the deployment of large model training and inference applications is closely tied to software toolchains such as networks, storage products, and databases. The improvement of processing speed often necessitates the joint acceleration of multiple products.
Besides large cloud vendors, there is also a group of small and medium-sized cloud vendors that have entered the computing power industry from their differentiated perspectives, such as CloudAxis Technology, which focuses on the scheduling and management of computing power resources based on platform capabilities.
Wang Wei admits, "Initially, GPUs were merely accessories in the business system architecture but gradually became a separate category."
In August this year, CloudAxis Technology released the new-generation AI Infra infrastructure ZStack AIOS platform, ZStack AI Tower. This platform is tailored for AI enterprise-level applications, assisting enterprise customers in deploying new applications of large models from three perspectives: "computing power scheduling, AI large model training and inference, and AI application service development."
Wang Wei explains, "We will use the platform to calculate the specific usage of computing power, perform operation and maintenance, and also segment computing power for customers to enhance computing power utilization in scenarios with limited GPU graphics cards."
Furthermore, in operator scenarios, where numerous computing power resource pools exist, Wang Wei states, "We will also collaborate with customers to assist them with the operation, computing, and unified operation management of these resource pools."
The third category includes startups that focus on improving computing power efficiency through algorithms, such as Turing Intelligence, Qujing Technology, and Silicon-based Flow. These new players have weaker overall strength compared to large cloud vendors but have gradually gained a foothold in the industry by making breakthroughs with single-point technology.
Liu Miao states, "Initially, we were an intelligent computing cluster manufacturing and service provider. In the connection stage, we transformed into a computing power operation service provider. In the future, we aspire to become an intelligent data and application service provider. These three roles are constantly evolving. Therefore, our positioning is as a new-generation computing power operation service provider."
Turing Intelligence hopes to build an independent platform that integrates idle computing power resources in the future, enabling computing power scheduling, leasing, and services. Liu Miao explains, "We are constructing a resource platform to connect idle computing power to the platform, similar to the early Taobao platform." The idle computing power primarily connects to intelligent computing centers in various regions.
In contrast, companies like Qujing Technology and Silicon-based Flow are more focused on the AI inference application market, paying closer attention to improving computing power efficiency and reducing the cost of large model inference applications through algorithmic capabilities, although their respective solution entry points differ.
To tackle the seemingly insurmountable triangle of effectiveness, efficiency, and cost in large models, Qujing Technology has introduced two groundbreaking technical strategies: full-system heterogeneous collaborative inference and RAG (retrieval-augmented generation) scenarios tailored for AI inference applications. These strategies leverage the concept of "trading storage for computing," thereby freeing up storage capacity to augment computing power. This innovative approach has resulted in a tenfold reduction in inference costs and a twentyfold decrease in response latency.
Looking ahead, in addition to continuously refining the middle AI infrastructure layer that bridges underlying computing resources with upper-layer applications, Ai Zhiyuan, founder and CEO of Qujing Technology, expressed, "Our aspiration is to build a framework where everyone can develop applications, thereby leveraging our framework to further minimize costs."
It is evident that Qujing Technology aims to be more than just an algorithm optimization solution provider; it aspires to become a leading AI large model application service provider.
Furthermore, current industry solutions for optimizing large model computing power often prioritize boosting GPU utilization. However, Ai Zhiyuan noted that GPU utilization rates have already surpassed 50%, making further significant improvements exceedingly challenging.
"While there is still room for improvement in GPU utilization, it is indeed a daunting task. It encompasses technologies such as chip design, video memory, inter-card connectivity, multi-machine communication, and software scheduling. This is a complex issue that necessitates the collaborative efforts of the entire industry chain, rather than being solvable by a single company or technology," Hong Rui also emphasized to Guangzhui Intelligence.
Hong Rui believes that the industry currently lacks the expertise to effectively operate and maintain ultra-large-scale intelligent computing clusters. Simultaneously, the software layer remains immature. "Even with ample computing power, inadequate software optimization, reasoning engines, or load balancing can significantly impair performance."
Observing these major players, whether they be operators like China Telecom, cloud vendors, or newcomers, their strategies for entering the computing power market vary, yet they all share the common goal of securing a portion of the global computing power market.
In fact, at this stage, compared to large model services, computing power leasing presents a more tangible business opportunity.
The leasing of computing power has become increasingly homogeneous, with a premium placed on refined and professional operational services.
Despite the rapid development of AI large models over the past two years, throughout the entire industry chain, only computing power service providers led by NVIDIA have truly reaped financial rewards, achieving both revenue growth and stock market success.
In 2024, the benefits of computing power are gradually extending beyond NVIDIA to the broader computing power sector. Server vendors, cloud providers, and even players involved in the resale and leasing of various cards have also seen some profit returns, albeit significantly less than NVIDIA.
"In 2024, we didn't incur losses overall, but we also didn't make substantial profits," Wang Wei admitted. "AI applications haven't taken off yet, and the largest volume related to AI is still at the computing power layer, where revenue from computing power applications is relatively strong."
Regarding expectations for 2025, Wang Wei also acknowledged that a complete forecast has not been made. "It's really hard to predict for next year, but in the long run, there will be significant incremental progress in AI applications over the next three years."
However, based on the development of intelligent computing centers in various regions, few have achieved profitability, with the primary goal being to cover operating costs.
According to Yue Yuanhang, CEO of Zhibole Technology, calculations reveal that even if the equipment rental rate of an intelligent computing center rises to 60%, it will still take at least seven more years to recoup the initial investment.
Currently, intelligent computing centers primarily rely on computing power leasing as their main revenue source, but "equipment leasing is highly homogeneous, and what is truly lacking is an end-to-end service capability," Hong Rui told Guangzhui Intelligence.
End-to-end service capability implies that, in addition to hardware, intelligent computing centers must support enterprises throughout the entire process, from the development and iterative upgrading of large model applications to their subsequent deployment, providing comprehensive full-stack services. Currently, there are relatively few vendors that can truly offer such end-to-end services.
Nevertheless, overall data indicates that the prospects for China's intelligent computing service market are increasingly optimistic. According to the latest IDC report titled "China Intelligent Computing Service Market (First Half of 2024) Tracker," the overall market for intelligent computing services in China during the first half of 2024 grew by 79.6% year-on-year, reaching a market size of 14.61 billion yuan. "The intelligent computing service market is growing at a pace far exceeding expectations. Based on the growth trend of intelligent computing services, the market will continue to expand at a high speed over the next five years," said Yang Yang, research manager of IDC's China Enterprise Research Department.
Hong Rui also stated that after experiencing the era of intelligent computing 1.0, characterized by frantic hoarding of card resources, and the era of intelligent computing 2.0, marked by extensive expansion of intelligent computing centers and supply-demand imbalances, the ultimate outcome of the era of intelligent computing 3.0 will undoubtedly be specialized and refined computing power services.
Ultimately, as pre-training and inference become distinct tracks, the AI inference application market will gradually evolve, the technology stack will mature, service capabilities will improve, and the market will further consolidate scattered and idle computing power resources to maximize utilization.
However, the current Chinese computing power market still faces significant challenges. Amid a shortage of high-end GPU chips, "the domestic GPU market is currently too fragmented, with each GPU having its independent ecosystem, leading to an overall fragmented ecosystem," Wang Wei noted. This results in exceedingly high adaptation costs for the entire domestic GPU ecosystem.
But as Liu Miao pointed out, the 20-year journey of intelligent computing has only just begun, and we may still be in the first year. On the path to achieving AGI, numerous uncertainties remain, undoubtedly presenting both opportunities and challenges for numerous players.