01/03 2025 348
The AI revolution has propelled OpenAI from obscurity to a globally renowned company with a valuation exceeding $150 billion. However, when discussing the primary beneficiary of the AI era, NVIDIA stands out as the clear winner.
Financial reports reveal that in the third fiscal quarter of 2024, NVIDIA's revenue reached $35.082 billion, a year-on-year increase of 93.61%, with a net profit of $19.309 billion, up 108.90% year-on-year. Notably, data center business revenue amounted to $30.8 billion, marking a staggering 112% increase year-on-year. NVIDIA's market value even surpassed that of Microsoft, ranking second globally.
AI enterprises and NVIDIA mutually support each other's growth. The former relies on the latter to provide GPUs or specialized AI computing cards for the training and inference of large models, propelling NVIDIA's revenue and profit to new heights. Recognizing this symbiotic relationship, NVIDIA understands the importance of investing in supporting AI companies and nurturing a robust customer base.
According to the UK's Financial Times, in 2024 alone, NVIDIA participated in over 50 funding rounds and multiple deals, investing approximately $1 billion in AI startups, marking a roughly 15% increase from the $872 million invested in 2023. Based on this trend, it is highly likely that NVIDIA will invest in even more AI startups in 2025.
Besides NVIDIA, other domestic and international internet companies are also increasing their investments in the AI sector, both by building their AI teams and investing in other AI companies. For small AI companies, the opportunity to receive investment from large enterprises is undoubtedly good news. Consequently, countless enterprises flocked into the AI industry in 2024.
Data from Tianyancha shows that in 2024, more than 500,000 new AI-related enterprises were established in China. With AI technology becoming increasingly mature today, the question arises: Is there still an opportunity to enter the AI industry?
"Money" remains paramount, making AI entrepreneurship challenging to break into.
The black-box nature of large AI models makes them less interpretable and debuggable. Technical challenges for developers encompass semantic understanding, mathematical logic, reasoning abilities, and potential AI "hallucinations." Training a mature and usable large model necessitates recruiting a substantial number of technical talents to address various issues. However, industry competition prompts companies to poach talent from one another, driving up the cost of attracting AI professionals.
For instance, Xiaomi CEO Lei Jun personally recruited Luo Fuli, who participated in the development of the DeepSeek-V2 large model, to Xiaomi. Online rumors even suggest that Xiaomi offered Luo Fuli an annual salary in the tens of millions of dollars. While these rumors might be exaggerated, Luo Fuli's annual salary is undoubtedly a seven-figure sum.
Beyond talent, GPU computing power poses a significant burden on AI companies. Using Xiaomi as an example, a recent report by Jiemian News revealed that Xiaomi is building a 10,000-GPU computing cluster. Shen Dou, Executive Vice President of Baidu Group and President of Baidu Intelligent Cloud Business Group, once stated that the procurement cost for a cluster of 16,000 GPUs is as high as several billion yuan, not to mention the costs associated with setup, operation, and maintenance.
(Image source: AI-generated)
A 10,000-GPU cluster is merely the beginning. To create top-tier AI, more GPUs or specialized AI computing cards are required. Lv Wei, the Chief Analyst of Computer Research at Minsheng Securities, stated that based on ChatGPT's monthly active user data of 600 million, the training task necessitates approximately 120,000 A100 GPUs, while the inference task requires around 350,000 A100 GPUs.
To meet the future training and inference demands of large AI models, Sam Altman, CEO of OpenAI, even proposed a $7 trillion plan to reshape the global semiconductor industry. The investment in computing clusters seems endless, and no AI company can currently provide an exact figure on the ultimate amount required. It's worth noting that in the 1984 movie "Terminator," the self-aware Skynet system had a computing power of only 60 trillion floating-point operations per second, inferior to a single RTX 4090.
While the costs of recruiting talent and building computing clusters may be high, they are manageable. A more pressing issue is the insufficient data for training large models. GPT-5, originally scheduled for release in mid-to-late 2024, has yet to complete its training, partly due to a lack of data. To address this data shortage, OpenAI was compelled to hire engineers, mathematicians, and physicists to compile data for training large models.
(Image source: AI-generated)
The more parameters a large AI model has, the higher its performance ceiling, but these parameters require extensive data for training. To train GPT-4 and GPT-5, OpenAI nearly exhausted publicly available research papers, news articles, and social media posts, even facing lawsuits from media outlets like The Canadian Press, CBC, Torstar, and The Globe and Mail.
It's not that the data is completely exhausted; rather, the remaining data is mostly unpublished and held within major companies, making it difficult for AI companies to access. Sam Altman bluntly stated that the cost of training a large model in the future could exceed $1 billion.
Faced with the three major costs of talent, computing power, and data, even internet giants feel the pressure, let alone startups flooding into the AI industry.
However, there are methods for training large models at a lower cost. Xiaomi's offer of a sky-high annual salary to recruit Luo Fuli likely stems from Luo's involvement in the development of DeepSeek-V2.
DeepSeek's success demonstrates that cost reduction and efficiency enhancement are possible.
Recently, the most talked-about news in the AI industry has undoubtedly been the arrival of the DeepSeek-V3 model. With a training cost as low as $5.576 million, roughly one-twentieth of GPT-4's cost and less than one-two-hundredth of GPT-5's estimated cost, our practical tests show that DeepSeek-V3 performs on par with renowned domestic AI applications like Doubao, ERNIE Bot, and Kimi in areas like text generation and mathematical reasoning.
DeepSeek's successful large model conveys a message to other AI companies – the cost of training large models doesn't have to be prohibitive. However, there's a trade-off: large models trained at a lower cost come with intractable issues.
DeepSeek's secret to training a large model at an extremely low cost, with performance comparable to Doubao and Kimi in certain scenarios, lies in three key factors. First, the MLA and MoE architectures. The former utilizes a multi-head latent attention mechanism to extract and compress features from adjacent layers, reducing computational load and lowering training costs. The latter, an expert mixture-of-experts architecture, has 671 billion parameters but only activates 37 billion parameters at a time, minimizing the demand for computing resources.
(Image source: AI-generated)
Second is the FP8 mixed-precision training framework, which significantly saves video memory and computational resources by sacrificing precision and compressing keys and values, enhancing training speed and reducing costs without compromising model performance.
Third is model distillation. While regular large models require extensive data for training, distillation technology uses a pre-trained model as a "teacher" to sift through effective data to train a "student" model, thereby reducing costs. Distillation technology is undoubtedly a crucial factor in DeepSeek's high reputation in the open-source model field and its success in creating the current most powerful open-source large model, DeepSeek-V3.
While these technologies reduce costs, they come at the expense of precision, especially distillation technology, which has a severe flaw: the "student model" cannot surpass the capabilities of the "teacher" model. AI enterprises pursuing excellence do not overly rely on distillation technology for AI training. Wang Hanqing, a computer researcher at Nanyang Technological University, stated that the top researchers he knows barely engage in model distillation anymore.
In fact, almost all AI enterprises use data produced by other AIs to train large models. However, over-reliance on data distillation can lead to issues like biased generated data, loss of diversity, and even meaningless responses. Following the launch of DeepSeek-V3, there were instances where it identified itself as ChatGPT while responding to user queries.
In response, Sam Altman personally stated that replicating pioneers' work is relatively easy, while doing something new, risky, and challenging is difficult. Researchers who dare to take on difficult challenges deserve more credit; it's the coolest thing in the world. While not stating it explicitly, Sam Altman was essentially criticizing DeepSeek-V3 for potential plagiarism, hinting at the infringement issues facing distillation technology in the future.
Regardless, DeepSeek-V3's success paves a new path for other AI companies and enterprises considering entry into the field. By reducing precision, compressing content, and distilling models, outstanding AI large models can be trained without the exorbitant cost of hundreds of millions of dollars.
The AI industry remains a playground for "ambitious individuals".
Over the past decade, the two most successful emerging industries globally have been new energy vehicles and large AI models, with humanoid robots potentially joining this ranks in the future. The development trajectory of the new energy vehicle industry will be replicated in the AI industry. While many enterprises initially participate, those lacking technical prowess and managerial capabilities will gradually be weeded out through competition.
The AI industry is currently in a mid-stage of transition from rapid development to maturity, presenting opportunities for any enterprise to grow into a giant. However, whether these opportunities are seized depends on the AI company's technical proficiency, management capabilities, and strategic vision.
(Image source: AI-generated)
The AI industry is still open for entry but suits only two types of enterprises. The first are financially robust enterprises aiming for industry leadership, bearing the responsibility of driving industry progress and pushing the capabilities of AI to new heights.
Apple, which invested approximately $10 billion but failed to produce new energy vehicles, and Evergrande Health, which lost $110 billion, demonstrate that significant risks exist in any emerging industry, and having money does not guarantee success. The same applies to the AI industry, where input and output are not necessarily proportional. However, risks and opportunities coexist, and only enterprises that excel will reap significant profits in the future.
The second type of enterprises aims for "adequacy" rather than pursuing ultimate functionality. By reducing precision and employing distillation technology, they can create large models with decent performance at lower costs. With a training cost of only $5.576 million, DeepSeek-V3 is affordable for many startups.
As for ambitious enterprises or entrepreneurs lacking sufficient capabilities, the current AI industry is no longer suitable. Despite investments from giants like NVIDIA and Microsoft in startups, these funds are minuscule compared to the development costs of top-tier large AI models. The AI industry remains a playground for ambitious individuals, but the entry threshold is now higher, necessitating greater caution.
Source: Lei Technology