02/07 2025
454
Written by: Yideng
During the Chinese New Year period, the hottest topics undoubtedly revolved around "Nezha 2" and DeepSeek.
One stems from ancient Chinese mythology and legends, while the other is an emerging star in the AI field. Despite their disparate origins, they unexpectedly "complemented each other" during this Spring Festival.
Image source: DeepSeek official website
Many have been following the progress of DeepSeek, including its notable 83-hour defense battle. As they sit in the cinema, watching the controversy between the Twelve Golden Immortals and the Dragon Clan, the "Monster Hunting Team" capturing innocent monsters to refine elixirs, and the desperate counterattack of the Dragon Clan, they may feel a profound emotional connection: Indeed, art mirrors life, but life can be even more cruel and irrational.
Despite the numerous media reports on DeepSeek, "Jiedian Caijing" still aims to discuss our unique insights into the company and matters beyond its model.
There have been numerous introductions to DeepSeek and its AI large model recently, so we will not reiterate its achievements here. Instead, let's briefly explore some of its implications for the industry.
First, it can "bypass" computational power and excel through algorithmic innovation.
Historically, computational power was widely considered the core of AI development, necessitating the continuous stacking of computational power and GPUs. When OpenAI emerged, NVIDIA benefited, and the US sought to curb the development of Chinese AI by banning the sale of NVIDIA GPUs.
While others were burning money to stack computational power, DeepSeek chose to focus on enhancing algorithms.
MLA (Multi-head Latent Attention) technology significantly reduces the cost of long-text reasoning, MoE (Mixture of Experts) innovation solves the routing collapse problem, and Multi-Token Prediction (MPT) markedly improves reasoning speed. These innovations target different bottlenecks in the Transformer architecture, enabling DeepSeek to achieve great results with minimal resources.
Overview of DeepSeek v3 architecture, image source: CSDN
Here's a simple analogy: traditional large models are like a restaurant with many waiters and chefs. Each waiter independently handles orders, food delivery, checkout, and cleanup for their customers from start to finish. When complex dishes appear, all chefs gather to discuss who can make them and how.
This may lead to issues such as multiple waiters repeatedly recording the same order, getting stuck at the kitchen door when delivering food, and wasted chef resources due to inefficiency.
In DeepSeek's model design, MLA technology allows all waiters to share an intelligent tablet, which synchronizes orders, table numbers, and dish statuses in real-time (eliminating duplicate recording). When serving food, only the responsible waiter works, with others intervening as needed (division of labor on demand). This not only speeds up tasks but also ensures quality.
Simultaneously, Multi-Token Prediction allows waiters to suggest desserts and drinks immediately after customers order main courses, preparing services in advance rather than waiting for customers to order one by one, making the service smoother and enhancing the customer experience.
The MoE model clearly knows each chef's culinary expertise. When facing complex dishes, the model intelligently assigns them to the most suitable chef based on the dish's characteristics, thereby improving processing efficiency and reducing unnecessary resource waste.
The application of these innovative technologies and architectures allows the pre-training of DeepSeek-R1 to be completed on a cluster of 2048 NVIDIA H800 GPUs (performance-limited version) at a cost of only $5.576 million. In contrast, companies like OpenAI require thousands or even tens of thousands of top-tier graphics cards such as Nvidia A100 and H100 for model training, with training costs often amounting to hundreds of millions of dollars.
It is evident that while the AI industry is generally obsessed with the "computational power arms race," DeepSeek's "breakthrough" proves that rather than frantically stacking servers, optimizing the algorithm structure and implementing "targeted therapy" for technical bottlenecks can enable large models to shed the label of "power-hungry monsters" and usher in a new era of low cost and high performance.
Second, it can "bypass" generality and enter from vertical scenarios.
According to DeepSeek's benchmark data, DeepSeek-R1 extensively uses reinforcement learning technology in the post-training phase. In tasks such as mathematics, code, and natural language reasoning, its performance is comparable to that of OpenAI's GPT-4 official version but at only 3% of the price.
Image source: DeepSeek
However, this does not imply that DeepSeek-R1 surpasses GPT-4. GPT-4 prioritizes "general intelligence," investing heavily in human resources to achieve the versatility of a generalist. Most domestic enterprises developing AI large models also follow this approach, hoping their models will have no obvious shortcomings in ability and can quickly reach commercial viability.
DeepSeek, on the other hand, chooses to enter from vertical scenarios, first pursuing better performance in certain fields (such as mathematics and code) and then gradually improving capabilities in other fields in phases. This is a development strategy that enables rapid growth and establishes differentiated advantages.
It is worth mentioning that ERNIE Bot, as a large language model rooted in the Chinese market, according to Baidu's official introduction, has surpassed the currently strongest GPT-4 model in multiple Chinese evaluations. This means that in terms of understanding and generating Chinese content, ERNIE Bot has also become one of the world's top AI models.
Therefore, "Jiedian Caijing" believes that Chinese AI enterprises, especially startups, do not need to focus solely on "versatile large models." They can choose vertical scenarios for targeted breakthroughs: This not only avoids the computational power struggle with general models but also establishes data moats, thereby carving out a niche in specific fields.
Finally, it can "bypass" commerce and adhere to the pursuit of technology.
The reason DeepSeek has caused such a sensation this time is not only due to the model's excellent performance and significantly reduced development and training costs but also because DeepSeek advocates for free and open source.
It should be noted that other well-known large models, whether it's Baidu's ERNIE Bot, Huawei's Pangu large model in China, or overseas products like OpenAI and Llama, are either closed-source from the beginning or gradually become closed-source due to commercialization and competitive considerations. Although some claim to be open source, they impose many restrictions and have not achieved true open source.
In contrast, DeepSeek not only fully opens its code but also releases detailed technical reports; it not only open sources its largest 671B R1 model but also helps distill and quantize models of multiple sizes from 1.5B to 70B; it not only provides all training data, training scripts, papers, etc., but also chooses the most permissive MIT License agreement, allowing anyone to use, modify, and distribute it freely, including for commercial purposes.
Liang Wenfeng, the founder of DeepSeek, previously talked about his vision for open source, stating that in the future, DeepSeek can be responsible for basic models and cutting-edge innovations, while other companies build To B and To C businesses based on DeepSeek. "In this wave, our starting point is not to make a quick profit but to move to the forefront of technology and promote the development of the entire ecosystem."
Image source: "Zhanjiang Fabu" WeChat official account
In the view of "Jiedian Caijing," perhaps because it is backed by a multi-billion-yuan quantitative fund, or perhaps it is purely idealistic, at least for now, the DeepSeek team values technological breakthroughs over commercial monetization, prioritizing industry prosperity over monopoly advantages.
As Jim Fan, a senior research scientist at NVIDIA, commented: "We live in an era where a non-American company is continuing the original intention of OpenAI, namely, to conduct truly open-source cutting-edge research that empowers everyone."
On January 28, several US officials pointed out that DeepSeek is "theft" and are conducting a national security investigation into its impact. Subsequently, some countries and organizations also began to "pay special attention" to DeepSeek:
According to Bloomberg, OpenAI and Microsoft recently launched a joint investigation, reviewing the accounts that DeepSeek used to access the OpenAI API last year and canceling their access due to suspected violations of service terms related to model distillation.
In the domestic public opinion field, some so-called "geeks" have also begun to attack the technical details of DeepSeek, claiming that DeepSeek is suspected of "plagiarism" or "lack of transparency in technology" and attempting to prove this through papers and data.
Of course, the Western countries led by the US are concerned about more than just DeepSeek.
The Wall Street Journal recently published a report titled "It's Not Just DeepSeek. A Guide to the Chinese AI Companies You Need to Know," reminding Americans to pay attention to which Chinese large model companies and highlighting that Baidu was the first to launch the public-facing generative AI ERNIE Bot in China, which now has 430 million users.
Image source: The Wall Street Journal
If the truth of these overt accusations remains to be verified and cannot be considered as deliberate smearing, suppression, or cognitive warfare by Western countries, between January 25 and 29, the DeepSeek server cluster inexplicably received more than 230 million DDoS malicious requests per second, with a total attack volume equivalent to the combined network traffic of the entire Europe for three days.
On January 28, the DeepSeek official website showed that its online services were under large-scale malicious attacks. Image source: DeepSeek official website
It is understood that to protect DeepSeek, the 360 Security Response Center immediately sounded the alarm and locked in the attack signature; Huawei Cloud activated its traffic cleaning system to build a protective shield for the server; and the China Red Army Alliance identified the source of the attack as entirely from the US within 12 hours and retaliated.
Simultaneously, Netease Thunderfire's game server array was urgently converted into a traffic buffer pool; Dahua Technology used AI to identify 0.00017% of real users; Cainiao Networks contributed logistics algorithms to optimize bandwidth; DingTalk opened emergency communication to ensure smooth command... Alibaba Cloud, Hikvision, Taishan Cloud, Inspur, and other enterprises also joined the DeepSeek defense battle and contributed their own efforts.
At 8 pm on January 29, after 83 hours of fierce battles, Chinese internet enterprises successfully suppressed 97.2% of the attack traffic, defending the dignity of DeepSeek and China's AI industry.
However, this cybersecurity defense battle under the AI rivalry between China and the US is only the beginning. According to monitoring by the QiAnXin XLab laboratory, the intensity of attacks against DeepSeek's online services suddenly escalated in the early hours of January 30, with attack instructions increasing by more than 100 times compared to January 28.
Moreover, at least two Mirai variant botnets, HailBot and RapperBot, were involved in the attack. This attack involved 118 C2 ports on 16 C2 servers, divided into two waves at 1 am and 2 am.
Details of some attack instructions. Image source: QiAnXin
The promise of fair competition and winning through innovation turns out to be open and hidden threats, making defense impossible.
To be honest, although DeepSeek has indeed achieved results in the model itself and its innovative path, it is far from surpassing OpenAI or "deifying" algorithms. After all, computational power is a necessary condition for the sustainable development of large models and also our shortcoming. Although DeepSeek has found some methods to optimize the use of computational power, this does not mean that the demand for computational power becomes dispensable.
Therefore, in the view of "Jiedian Caijing," the emergence of DeepSeek cannot yet be considered a revolutionary breakthrough in technology. It is more about making everyone start to rethink the current basic research perspectives and existing business models in the AI field. However, at present, DeepSeek has gained global "heat" and is being besieged by all means, no less than what Huawei faced in the past.
In this atmosphere, who is feeling insecure? Who is setting the pace? And who wants to perpetuate hegemony? It is actually self-evident.
Whether it's a genuine coincidence or overthinking, watching "Nezha 2" always makes me feel like the "Battle of the Gods" mirrors the rivalry between China and the US; the immortal Wuliang Xianweng capturing monsters to refine elixirs and enhance his divine power represents the US harvesting global assets and suppressing dissidents; while the Dragon Clan assisting Nezha in counterattacking the Jade Virtuous Palace is like the recent DeepSeek defense battle.
I discussed the server attack incident and the content of the "Nezha 2" movie with DeepSeek and asked for a summary.
DeepSeek emerges as a modern-day Nezha, a fervent idealist striving to shatter barriers through technological advancements and redefine industry norms via an open-source ecosystem.
While the future remains uncertain about how far DeepSeek will advance and how long it will maintain its open-source status, the aspiration to transform the AI landscape is already thrilling.
After all, "we are all too young to comprehend the boundless expanse of possibilities the sky holds."
*The featured image has been generated by AI.