02/07 2025
576
Source | BohuFN
For years, the consensus held that the gap between China and the U.S. in large AI models was about 1-2 years. However, this perception was shattered with the release of DeepSeek-R1 by Chinese tech company DeepSeek.
DeepSeek-R1 achieved capabilities comparable to OpenAI's top-tier GPT-4 model, but at a fraction of the cost—less than one-twentieth. The ripple effects of this breakthrough quickly spread across the globe.
On January 27, U.S. tech stocks experienced a significant downturn, with the Philadelphia Semiconductor Index (SOX) plummeting 9.2%, marking the largest single-day drop since March 2020. NVIDIA's share price tumbled nearly 17%, erasing nearly $600 billion in market value overnight—the largest single-day market value decline in U.S. stock market history. Other tech giants such as Broadcom, TSMC, ASML, Google, and Microsoft also saw their share prices fall by 17.4%, 13%, 7%, 4%, and 2.14%, respectively.
European tech stock markets were similarly affected, with various tech stocks being sold off.
The protagonist behind this "global capital market earthquake" is DeepSeek, a previously obscure tech startup founded by Liang Wenfeng, the creator of QuantFront, just over a year ago.
In August last year, DeepSeek made headlines by announcing a significant reduction in its API prices, with input fees adjusted to 0.1 yuan per million tokens and output fees to 2 yuan per million tokens. This move ignited a price war in the large model market.
Liang Wenfeng, the founder of DeepSeek, has been an early adopter of AI technology. In 2023, when computing power was scarce, only five companies in China possessed tens of thousands of graphics cards. QuantFront was one of them, alongside Alibaba, Tencent, Baidu, and ByteDance.
DeepSeek has been called the "mysterious force from the East" in Silicon Valley. On one hand, they follow suit by launching similar inference models, and on the other, they face demands for a ban or even a block on their computing power.
On January 28, DeepSeek issued two consecutive announcements stating that its online services had been subjected to large-scale malicious attacks.
Objectively, in terms of actual product performance, DeepSeek is currently among the top tier in the industry but has not yet achieved a comprehensive surpass of existing market products nor paradigm innovation at the technical level.
However, DeepSeek has indeed paved a new path for the long-term development of the global AI industry, bringing fresh perspectives to a field long dominated by Silicon Valley. As AI luminary Andrew Ng noted, "DeepSeek's innovation shows that the gap between China and the U.S. in generative AI is rapidly narrowing, and China is already showing signs of leadership in certain areas."
01 DeepSeek: Breaking the Barriers
In an interview with 36Kr's AnYong account last December, DeepSeek discussed the price war it initiated in August last year. Unlike many large companies that burn money to subsidize, DeepSeek is profitable.
As early as May last year, DeepSeek's DeepSeekV2 model demonstrated remarkable efficiency: the inference cost was reduced to only 1 yuan per million tokens, approximately one-seventh of Llama370B and one-seventieth of GPT-4 Turbo.
Compared to OpenAI, a large company with thousands of employees from top universities worldwide, DeepSeek has only a few hundred employees and does not boast a top 50 talent density. Instead, it gathers elite talents with doctorates and master's degrees in related disciplines from domestic universities.
Impressively, the GPU used by DeepSeek-V3 during training is NVIDIA's H800, a specially supplied AI chip with reduced performance. In contrast, GPT-4 uses tens of thousands of NVIDIA H100 chips (which perform better than H800).
This differs significantly from our past perceptions.
The sharp drop in NVIDIA and U.S. stock prices points directly to DeepSeek. The company's success has disrupted the conventional logic of "competition through investment" in the field of large AI models. The industry has long been rooted in the belief that only by spending vast amounts of money and computing power can large AI models be created.
In 2023, OpenAI's CEO Sam Altman visited India, where he expressed doubts about whether the Indian team could build a substantial AI model with a budget of only $10 million.
In his view, without hundreds of millions of dollars in training costs, a good large model cannot be refined. As a leading enterprise in the AI industry, OpenAI has yet to achieve profitability, largely due to the astonishing cost of training cutting-edge AI models and high operating costs. It is estimated that maintaining the operation of ChatGPT alone costs $700,000 per day. Altman stated that the cost of future AI models is expected to exceed $1 billion.
This high cost has prompted other players to increase their investments, taking OpenAI as a benchmark. Musk's xAI, with its supercomputing data center equipped with 100,000 NVIDIA H100 GPU chips, has become one of the most powerful AI training clusters in the world. After Trump took office, he announced an investment of $500 billion to launch the "Star Gate" project, aiming to consolidate U.S. hegemony in the AI field with huge funds and powerful computing power.
Other tech giants are also actively deploying. In the past year, Microsoft and Google's capital expenditures have exceeded $50 billion, mostly used for AI-related infrastructure construction, and they plan to increase this investment to $70-80 billion by fiscal year 2025. Domestically, according to Zhejiang Merchants Securities' analysis, ByteDance's capital expenditure in 2024 is about 80 billion yuan, expected to reach 160 billion yuan in 2025, with about 90 billion yuan for AI computing power procurement and 70 billion yuan for IDC infrastructure and network equipment.
DeepSeek has not relied on any cost reduction magic but has explored a different path. DeepSeek's researchers have proposed a new MLA (Multi-head Latent Attention) architecture, combined with DeepSeek MoESparse (Mixture of Experts Structure). This architecture's advantage is that it occupies only 5%-13% of the memory used by the commonly used MHA architecture.
Unlike the industry's usual practice of training models with trillions of tokens (text units), DeepSeek chooses to reduce the degree of data computation through "data distillation," thereby achieving cost reduction.
Because of this, DeepSeek has been dubbed the "Pinduoduo of AI." While this description may not be entirely accurate, it essentially captures the impact DeepSeek has had on the current mainstream AI landscape. Through this low-cost model, DeepSeek can launch new products and services more quickly, significantly lowering market entry barriers and attracting more enterprises and institutions to participate in AI research and development.
02 The Power of Open Source
DeepSeek's impact on AI extends beyond cost reduction.
As a Chinese company, DeepSeek has demonstrated unprecedented confidence by implementing an open-source strategy for its products. This means publicly releasing the code and architecture of the model, allowing the public to view, use, and modify it. This approach enables many small and medium-sized enterprises to directly use its model, significantly reducing R&D costs for many enterprises.
In contrast, AI products under OpenAI and Google are closed-source. DeepSeek's open-source and low-cost strategy will also have a substantial impact on international mainstream AI tools that rely on high fees.
DeepSeek is not alone in this endeavor. In an era where the open-source wave is sweeping the globe, Meta's LLaMA and Alibaba's Tongyi Qianwen are both attempting to prove that "openness wins the future." Even Musk supports open-source technology, having previously criticized OpenAI for moving towards closed-source, calling it "CloseAI," and accusing it of violating its original open-source intent, while Altman continues to move forward with closed-source. This controversy has extended beyond verbal sparring and has reached the courts.
Despite this, many insiders still view open-source with skepticism.
Last year, an industry insider stated: Closed-source large models are the optimal solution for AI commercialization.
The reason is that open-source models still have critical flaws: "The so-called open-source model often only provides a large number of parameters of the model. But to effectively apply these models, a lot of follow-up work is still needed." Even with the parameters published, developers still struggle to gain insight into the generation process and data sources of the parameters, which are the core "recipes." This semi-transparent state makes secondary development akin to feeling an elephant in the dark.
"Due to the lack of understanding of the generation process and data sources of these parameters, it is difficult to directly achieve the synergistic effect of 'many hands make light work.' Even if the model source code is obtained, one may not be clear about the specific quantities and proportions used to train these parameters. Therefore, obtaining these open-source materials is not enough to allow one to stand on the shoulders of giants and easily iterate and develop."
This is also one of the reasons why OpenAI has been able to maintain its technological leadership and uniqueness for a certain period through closed-source, building its own business ecosystem.
In addition, in highly sensitive fields such as healthcare and finance, the closed-source advantage also plays a unique role in protecting technological and commercial interests, ensuring intellectual property security, and preventing technology abuse.
However, it is undeniable that DeepSeek has indeed ushered in an excellent opportunity to fill ecological niches.
In an interview with AnYong, Liang Wenfeng said, "In the long run, we hope to form an ecosystem where the industry directly uses our technology and output, and we only focus on basic models and cutting-edge innovations, while other companies build toB and toC businesses based on DeepSeek."
Users can enjoy powerful AI inference capabilities without paying, promoting the popularization of AI technology and allowing ordinary users to experience the convenience of cutting-edge technology in their daily work and life. In the open-source ecosystem, DeepSeek has attracted a large number of developers and formed a thriving community. As more developers and enterprises recognize the value of the open-source model, DeepSeek is expected to further expand its influence and reshape the AI industry landscape.
03 The DeepSeek Storm: Shifting the Focus of China-U.S. AI Narratives
With the global spotlight on DeepSeek's breakthrough, the company has disrupted people's inherent perceptions of AI development, setting off an unprecedented wave of technological change. This, to some extent, indicates the end of the arms race in computing power.
For a long time, the AI industry has relied on large-scale computing power and huge capital investments. Taking OpenAI as an example, they invest hundreds of millions of dollars in model training, procure NVIDIA's top GPU chips on a large scale, and are committed to building massive data centers. Relying on its early business foundation of selling graphics cards, NVIDIA has successfully embarked on the development wave of AI computing power, creating a business myth in the field of computing power chips. For a time, technology giants such as Google, OpenAI, and Apple queued up to send funds to NVIDIA, enabling it to dominate the AI computing power market.
However, DeepSeek has disrupted this model with disruptive innovation. It achieves performance comparable to industry giants using only 2,000 chips and an investment of less than $6 million. This achievement has triggered deep reflection in the industry: "If DeepSeek's innovation is true and effective, do AI companies really need such a large number of graphics cards?" While NVIDIA boasts of its 200TB per second memory bandwidth, DeepSeek has powerfully proven with a line of open-source code that true artificial intelligence should not be constrained by computing power.
According to incomplete statistics from Global Semiconductor Observation, a total of 20 companies, including foreign giants such as NVIDIA, AMD, Microsoft, Amazon Web Services, Intel, domestic GPU enterprises like MoXi, Tianshi, Moore Threads, Haiguang Information, cloud computing giants such as Huawei Cloud, Tencent Cloud, Tianyi Cloud, Alibaba Cloud, Baidu Intelligent Cloud, and Volcano Engine, as well as WWXQ, Biren Technology, Silicon Fluidity, PPIO, and CloudAxis Technology, have announced their adaptation and launch of DeepSeek model services. With multiple well-known domestic and foreign cloud platforms and technology enterprises successively launching the DeepSeek large model, the AI market has ushered in a new wave of change.
On the other hand, the sharp drop in NVIDIA's share price precisely reflects the gradual termination of its AI chip hegemony under the impact of the DeepSeek storm. As Nassim Taleb, advisor to the current hedge fund Universa Investments, warned, when NVIDIA bases everything on the hope that people will use its chips, assuming demand will continue to increase while neglecting the possibility of revolutionary improvements in software or other innovative methods, these assumptions are now being challenged. Future corrections could be several times the current decline.
It is noteworthy that the technological paradigm shift instigated by DeepSeek constitutes not merely a potent countermeasure against U.S. technological dominance but also prompts global developers to reevaluate the immense potential of AI in China. Amidst the backdrop of persistent U.S. efforts to hinder China's AI and chip advancements through various restrictions, this "efficiency revolution" spearheaded by a Chinese team could replicate the thrilling trajectory of electric vehicles usurping fuel vehicles—achieving lower costs and fostering a more open ecosystem, thereby transforming AI from an "exclusive preserve of U.S. giants" into a "pragmatic tool for universal benefit".
As stated on the cover page of the DeepSeek technical white paper: "We are not pursuing GPT, but rather demonstrating that the route to AGI is not monolithic and confined to Silicon Valley." From an industrial development standpoint, as DeepSeek continues to expand and mature, it will progressively incorporate more domestic chips, thereby effectively mitigating risks within the industrial chain. Under the leadership of DeepSeek, domestic chips are anticipated to gradually ascend from low-end to high-end applications, ultimately severing dependency on U.S. chips and securing a more prominent position within the global AI industry.
The cover image and illustrations featured in this article are the sole property of the copyright holder. Should the copyright holder deem their work inappropriate for public viewing or believe it should not be used without charge, kindly contact us promptly, and we will promptly take corrective measures.