02/07 2025
422
By Wang Huiying
Edited by Ziye
This Spring Festival, a Chinese large model took the global AI community by storm.
The AI discussion that ignited this global wave centered on DeepSeek R1, a model that directly rivals OpenAI's o1 series, released last September.
Data underscores DeepSeek's popularity. On January 27th, it simultaneously topped the free charts on the Apple App Store in both China and the United States, with weekly downloads nearing 2.4 million.
Until DeepSeek R1's release, OpenAI's o1 had left Chinese large model manufacturers trailing. DeepSeek R1's emergence prompted OpenAI CEO Sam Altman to exclaim, "This is an impressive model," while NVIDIA hailed it as the most advanced large language model.
Crucially, unlike OpenAI's closed-source models and paid usage restrictions on the o1 model, DeepSeek R1 is not only open-source but also freely available for unlimited global invocation.
This AI storm unleashed by DeepSeek has sparked worldwide interest among AI practitioners, triggering panic and attacks.
"The king of large model open source," Meta's internal employees revealed, "Meta's generative AI department is in panic." OpenAI's pressure was also evident, releasing three large models within two weeks: two AI agents, Operator and Deep Research, and the reasoning model o3-mini.
On January 28th, DeepSeek's official website announced that its online services had suffered large-scale malicious attacks.
DeepSeek stands at the epicenter of this storm. Notably, it has shattered the industry consensus that computational power is paramount. DeepSeek created DeepSeek R1, which performs on par with OpenAI's o1, using less than one-tenth of OpenAI's resources.
This begs the question: If large models do not require extensive computational power for training and reasoning, will the upstream and downstream AI industries remain as attractive?
The capital market has answered. By January 27th's U.S. stock market close, tech stocks plummeted, with the Philadelphia Semiconductor Index down 9.2% and NVIDIA's share price dropping nearly 17%.
Overnight, subtle changes rippled through all AI-related aspects. The butterfly effect persists, and the AI industry's direction remains unpredictable. DeepSeek appears to be redefining the game's rules.
1. Attacked, Supported, DeepSeek is "Surrounded"
During the 2023 Spring Festival, ChatGPT ignited the AI industry's flame. Since then, a popular large model has emerged each Spring Festival: in 2024, it was Sora; in 2025, it was DeepSeek.
The difference is that the previous two years' spotlight was on OpenAI, a U.S. company. This year, it's China's DeepSeek.
For several days, DeepSeek topped global download charts on the Apple App Store, surpassing 20 million daily active users within 20 days of launch. As an AI startup, this "mysterious force" from the East has shaken the AI industry.
This force swiftly spread across the ocean to Silicon Valley. Since ChatGPT's emergence, followed by Sora and then the deep reasoning model o1, OpenAI has been the industry's paradigm, with other large model enterprises typically playing catch-up.
Take the o1 large model. Since its launch last September, domestic large model enterprises have yet to introduce a competitor. This time, DeepSeek introduced DeepSeek R1 with less computational power and lower costs, undoubtedly capturing the market's attention.
Unlike OpenAI and its Chinese counterparts spending hundreds of millions on large model training, DeepSeek's approach has always been "doing big things with little money."
In late December 2023, DeepSeek released the V3 model, comparable to GPT-4o, using only 2048 NVIDIA H800 chips at a cost of approximately $5.6 million. In contrast, GPT-4o uses tens of thousands of NVIDIA H100 chips (superior to H800) with a training cost of approximately $100 million.
In May 2023, DeepSeek released DeepSeek-V2 at a price nearly one percent of GPT-4 Turbo's. Since then, major models like ByteDance, Alibaba, and Baidu successively announced price reductions, with DeepSeek reducing its prices three times in a year, each time by over 85%.
Whether in price or training cost, DeepSeek pursues segmentation and innovation rather than a large and comprehensive route.
For example, DeepSeek proposed a new MLA (Multi-head Latent Attention) architecture, which, combined with DeepSeek MoESparse (Mixture of Experts Structure), reduces memory usage to 5%-13% of the MHA (Multi-head Attention) architecture most commonly used in other large models.
Furthermore, DeepSeek trains its models using "data distillation" technology, where a high-precision general large model serves as a teacher, achieving equivalent results with only one-fifth of the data volume, thereby reducing costs.
Competing with OpenAI is superficial; essentially, DeepSeek is shaking up the entire large model industry and facing unprecedented pressure.
Twenty-four hours after topping the China and U.S. Apple App Stores, DeepSeek found itself at the storm's center again: OpenAI accused DeepSeek in the media of "distilling" its proprietary technology without permission.
Several U.S. officials supported this accusation, including David Sacks, Trump's AI advisor, and Howard Lutnick, Trump's nominee for U.S. Secretary of Commerce.
"Distillation" refers to allowing a smaller model to achieve similar results at a lower cost on specific tasks by learning from a larger, more powerful model.
This controversy stems from Microsoft, OpenAI's largest investor. On January 29th, according to foreign media reports, Microsoft's security researchers discovered that individuals associated with DeepSeek may have used OpenAI's API to steal a large amount of data without authorization.
Additionally, DeepSeek's online services have faced varying degrees of attacks, initially SSDP and NTP reflection amplification attacks, followed by a surge of HTTP proxy attacks on January 28th.
Despite being questioned and attacked, many enterprises are quickly adapting to and embracing the changes brought by DeepSeek.
In the cloud sector, platforms like Huawei Cloud, Tencent Cloud, Alibaba Cloud, Baidu Intelligent Cloud, Volcano Engine, JD Cloud, and 360 Digital Security have all announced support for the DeepSeek large model. Previously, overseas cloud giants like Amazon Web Services and Microsoft Azure have also officially announced support.
In the chip sector, following NVIDIA, AMD, and Intel's support, domestic chip companies Tianshi Core and Moore Threads have successively announced support for the DeepSeek model.
Whether attacked or supported, DeepSeek is expected to be surrounded, which is also the inevitable path for industry star projects. Only by withstanding questioning and comparison can one gain a firm footing in the "hundred models war".
2. Relying on Technological Innovation, DeepSeek Stirs Up an AI Storm
Since 2023, the large model industry has never lacked news, but truly explosive projects have been rare. ChatGPT and Kimi were notable, and now DeepSeek joins their ranks.
Unlike many large models following OpenAI, DeepSeek has become the initiator of a new AI storm.
Currently, DeepSeek R1 is widely recognized as one of the most advanced large language models, providing high-quality language processing capabilities. Its performance on tasks like mathematics, coding, and natural language reasoning is comparable to OpenAI's o1 model.
In the AIME 2024 math benchmark test, DeepSeek R1 scored 79.8%, while OpenAI's o1 scored 79.2%; in the MATH-500 benchmark test, DeepSeek R1 scored 97.3%, and OpenAI's o1 scored 96.4%.
DeepSeek R1's powerful reasoning ability is inseparable from DeepSeek's technological innovation, and its innovative training method provides new industry insights: DeepSeek abandons traditional supervised fine-tuning (SFT) and instead optimizes the reasoning path through reinforcement learning (RL).
The prevailing view is that large model training must first undergo SFT with labeled data to enable basic capabilities, followed by RL for ability enhancement. OpenAI's previous data training relied heavily on human intervention, consuming significant human and financial resources.
However, DeepSeek's research found that large models can acquire powerful reasoning abilities solely through reinforcement learning, without supervised fine-tuning.
This training method first appeared on the experimental R1-Zer version and was subsequently applied to the DeepSeek-V3-base model, completely abandoning the traditional supervised fine-tuning stage.
The results show that, without manually labeled data, the DeepSeek series of models demonstrated continuous self-evolution capabilities through trial and feedback.
In terms of technological innovation, DeepSeek adopts innovative designs, such as the MoE-2048 architecture, where each token can activate 8 expert modules, significantly increasing model parameter utilization to 72% and improving training efficiency by three times compared to the traditional Transformer architecture.
The capabilities of the DeepSeek large model are evident, and more importantly, DeepSeek is disrupting the market landscape with its innovative low-cost approach.
Taking DeepSeek R1 as an example, its performance is comparable to GPT-4o, yet its pre-training cost is only $5.576 million, just one-tenth of GPT-4o's. Meanwhile, DeepSeek's API service pricing is significantly lower than OpenAI's, with an output price of 16 yuan (approximately $2.2) per million tokens compared to GPT-4o's $60 per million tokens.
From an industry perspective, while providing cheaper and more user-friendly deep reasoning models, DeepSeek also practices open-source concepts, making deep reasoning models transparent and bringing new inspiration to the industry.
In the past two years, the debate between open-source and closed-source routes has been the focus in the large model industry, with Meta representing the former and OpenAI the latter.
Previously, OpenAI's ChatGPT and Anthropic's Claude adopted closed-source approaches, leveraging their influence to push the closed-source route. Now, DeepSeek's success undoubtedly gives confidence to open-source supporters.
After DeepSeek's popularity surge, Yann LeCun, Meta's chief AI scientist and an open-source supporter, stated that DeepSeek's success represents a victory for open-source AI models. "Open-source models are surpassing proprietary models," he wrote on LinkedIn.
In fact, the far-reaching significance of open-source initiatives like DeepSeek lies in their transparency, publicly disclosing model building processes through papers, thereby driving progress across the large model industry.
With DeepSeek R1 being both free and open-source, and performing beyond expectations in the first tier, it has directly questioned the large model industry: Do products developed by tech giants with more computational power and funding, similar to DeepSeek, truly deserve high valuations?
DeepSeek has not only shattered the "computational power competition" logic in the AI large model field but also shaken investors' confidence in high-tech chips: The AI industry may not need so many chips to train high-performing large models.
These voices directly impacted U.S. tech stocks. On January 27th, U.S. tech stocks plummeted, with NVIDIA's share price falling nearly 17% and its market value evaporating by nearly $600 billion, the largest scale in U.S. stock market history.
The storm unleashed by DeepSeek continues. From a market competition perspective, DeepSeek's rise has disrupted the balance, challenging traditional AI giants and prompting the entire industry to re-examine its technological roadmap and market strategy.
3. Panic, Follow-Up, Price Reductions: The Butterfly Effect of DeepSeek Has Arrived
Before this Spring Festival, mentioning DeepSeek was unfamiliar to most, with the industry's focus on OpenAI and some technology giants.
Unexpectedly, DeepSeek's emergence caused ripples in the turbulent waters, triggering a series of butterfly effects.
As Jim Fan, a senior research scientist at NVIDIA, commented, "We are living in a special era: A non-American company is truly fulfilling OpenAI's original mission - conducting truly open-source cutting-edge research to empower everyone."
Some are amazed, while others are panicked.
None are more restless than OpenAI. Since ChatGPT's launch two years ago, OpenAI has been the industry bellwether, with many of its ideas recognized.
Such as the open-source route. On February 1st, Altman participated in the Reddit "Ask Me Anything" Q&A session. During the event, Altman admitted for the first time that OpenAI's closed-source strategy "stands on the wrong side of history." As Altman said, "We need to find a different open-source strategy," and "the current OpenAI is in a 'complex and delicate' situation and faces numerous challenges."
For instance, in terms of training methodologies. Initially, OpenAI established a four-stage process for large model training: pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. However, this paradigm has been disrupted as DeepSeek has demonstrated that certain stages can be bypassed and streamlined, thereby enhancing model training efficiency and performance.
Another area impacted is the "scaling law." Over the past two years, OpenAI CEO Altman has emphasized three crucial factors driving the company's business: chips, data, and funding. Similar to adding coal to a steam train's furnace, the more these elements are invested, the more potent the AI produced.
Yet again, DeepSeek has challenged this theory. If lower-cost or even free models can achieve results comparable to OpenAI, then OpenAI's business model will come under scrutiny, and its market share may diminish.
Amidst such a significant impact, OpenAI has little time to panic but must swiftly adapt. On February 1, OpenAI unveiled o3-mini, its first reasoning model accessible to free users. On February 5, it announced that the o3-mini large language model was officially available for ChatGPT users and developers. The following day, OpenAI made the ChatGPT search function accessible to all users without registration.
These significant actions within a span of days are a direct response to the pressure exerted by DeepSeek.
According to Wired, OpenAI released o3-mini early to counter DeepSeek's R1, a large language model focused on reasoning that was unveiled last Monday. The release of R1 triggered a sharp decline in AI stock prices and raised questions about the cost-effectiveness of OpenAI's models.
OpenAI claims that o3-mini is its most cost-effective reasoning model, boasting strong capabilities in fields such as science, mathematics, and programming. It also combines the low cost and low latency features of o1-mini. While o3-mini can be used in conjunction with internet search functions, it currently does not support visual capabilities.
Simultaneously, OpenAI is offering increasingly lower API call prices. Since the launch of GPT-4, the pricing per token has dropped by 95%. The pricing for o3-mini input and output per million tokens is $0.55 and $4.40, respectively, still higher than DeepSeek R1's pricing.
OpenAI's urgency and subsequent adjustments are but one facet of this butterfly effect. DeepSeek, the butterfly, is flapping its wings with considerable force.
On January 30, Dario Amodei, CEO of Anthropic, published a lengthy article advocating for a "lockdown on chip exports" to ensure that AGI only emerges in the United States. On the same day, foreign media reported that the United States is considering imposing additional restrictions on chip sales to China, including the H20 chip that Dario recommended restricting.
However, it is widely acknowledged that technological blockades are not a sustainable strategy for maintaining an advantage. Openness and cooperation are more conducive to the AI industry's future development. Technological blockades may merely mark the beginning of this AI battle, and Chinese AI enterprises, represented by DeepSeek, still face numerous challenges.
Currently, the AI landscape is undergoing transformations. The changes introduced by DeepSeek are profoundly altering the entire AI industry chain. A low-cost development model could lead to a series of low-threshold industry innovation rules and methods, attracting more entrepreneurial players to enter the market.
This new "ChatGPT" moment sparked by DeepSeek continues to unfold, revealing new narratives.