02/28 2025
549
Is the DeepSeek-R2 model coming soon, before the heat of DeepSeek-R1 has even dissipated?
A recent Reuters report stated that three insiders revealed that the DeepSeek-R2 model, originally scheduled for release in May this year, is being accelerated in development and may be released ahead of schedule. The new model is expected to generate better code and perform reasoning in languages other than English. In response, the parent company of DeepSeek, Huanfang Quant, said to refer to official news.
Considering that DeepSeek was established on July 17, 2023, and has been in operation for less than two years, it has already developed multiple models covering various scenarios such as programming, mathematical reasoning, large language models, multimodal, and dialogue. The launch of the R2 model before May this year falls within the normal update frequency for DeepSeek's large models. From the R1 model to the R2 model, DeepSeek is transforming the AI industry.
Giants in a panic, the R1 model creates a miracle
After OpenAI released GPT-4 in March 2023, instead of becoming more active due to the arrival of this large model, the entire AI industry seemed somewhat subdued. Even though OpenAI subsequently launched large models such as GPT-4o, o1, and Sora, and other enterprises also rolled out various large language models, video generation models, and multimodal models, they failed to replicate the breakthrough progress from GPT-3 to GPT-4.
It was not until the arrival of DeepSeek-R1 that the AI industry seemed to be injected with vitality, prompting various industries to actively embrace AI and the R1 model. After the capabilities of the R1 model were verified, a large number of domestic enterprises in the mobile phone, TV, PC, and other industries have integrated the R1 model into their products.
(Image source: DeepSeek)
Taking the mobile phone industry as an example, Huawei, Honor, OPPO, and vivo were the first to officially announce the integration of the R1 model into their smart assistants, and Xiaomi also chose to follow suit after some hesitation, integrating the R1 model into XiaoAi Companion. Recently, well-known digital blogger @i Ice Universe revealed that Samsung is also actively promoting the integration of the R1 model into its Chinese models.
Moreover, internet companies such as Tencent, Baidu, and 360, although they have already established AI teams and developed their own large models, still choose to integrate the R1 model into their AI tools. This shows that even other AI enterprises acknowledge that the R1 model leads them in certain areas.
More crucially, before the birth of the R1 model, although AI tools were mainly used for free, there were not a few platforms offering paid services, such as ChatGPT and ERNIE Bot. To achieve profitability, OpenAI even abandoned the open-source model, transforming from OpenAI to "CloseAI", while Baidu is a staunch advocate of closed-source solutions.
However, under the impact of the R1 model, these two companies have also changed their original path. Baidu announced that ERNIE Bot will provide free services to users from April 1 and will be officially open-sourced on June 30. OpenAI has promised that the free version of ChatGPT can use GPT-5 in standard mode without restrictions, and its CEO Sam Altman also stated on the X platform that they will create an open-source project and solicit opinions from netizens on whether to build the o3-mini model that still needs to run on GPUs or a powerful edge-side model.
(Image source: generated by Doubao AI)
DeepSeek also proved through the R1 model the possibility of significantly reducing the training costs of large AI models through techniques such as knowledge distillation, mixed parallelism strategies, dynamic sparse training, just-in-time compilation, and hierarchical sparse attention. In the past, AI enterprises needed to use large-scale computing power clusters and vast amounts of data to train large models. OpenAI even recruited personnel to write data for training large models. Musk's xAI, in order to train the Grok 3 model, even built the world's largest computing power cluster with 100,000 GPUs.
If this continues, AI may turn into a game of financial might among internet giants, eventually progressing slowly due to insufficient data volume. The reason why DeepSeek can bring a huge impact to the AI industry is that it is indispensable to have three factors: strong capabilities, low training costs, and open-source. Especially the feature of encapsulating instruction set frameworks such as PTX and underlying instructions like CUDA, ROCm, and OpenCL into a unified interface allows it to no longer rely on NVIDIA GPUs and can be more freely deployed on various devices.
After the R1 model rocked the foundation of the AI industry, DeepSeek did not stop there. Now, with the upcoming R2 model, it may take over from the R1 model to further transform the AI industry.
Continuing the tradition of low cost and high performance, the R2 model raises high expectations
The logical reasoning ability of the R1 model is on par with large models trained at high costs by enterprises such as ByteDance, Alibaba, and Moon Dark Side, and it benchmarks the o1 model developed by industry leader OpenAI. However, OpenAI has more than just the o1 model, and ChatGPT Professional members can already use the more powerful o1 Pro and o3 models. After the R1 model catches up with the o1 model, the R2 model will naturally challenge the o3 model.
With the support of technologies such as dynamic sparse architecture, quantized knowledge distillation, mixture of experts (MoE) architecture, and multi-head latent attention (MLA), the cost of training the R2 model by DeepSeek is expected to be further reduced. Developing a co-adaptive system for training data and model parameters can dynamically adjust the synergistic relationship between data and model parameters, improving the efficiency, generalization, and adaptability of the machine learning system.
In recent days, DeepSeek has been publicly releasing a batch of open-source code libraries every day. The announced DeepGEMM adopts FP8 general matrix multiplication, supports dense and MoE models, and can achieve kernel performance surpassing expert optimization with only 300 lines of code, which can reduce the inference cost of large AI models. This technology will naturally be used in the R2 model.
(Image source: generated by Doubao AI)
The paper on the R1 model mentions that the increase in reinforcement learning (RL) data can not only improve the reasoning ability of large AI models when facing complex tasks but also spontaneously exhibit some complex behavioral abilities, such as reflection and exploring different methods. At this stage, the R1 model has less RL data, which will be significantly increased in future versions.
On the whole, similar to the R1 model, the R2 model is based on the V3 foundation and benchmarks the OpenAI o3 model, with regular upgrades as the main focus. With the support of more RL data, the R2 model is expected to improve reasoning ability and response speed, and rely on the ability of "reflection" to generate more accurate reasoning results.
The future V4 will benchmark GPT-4.5, which OpenAI plans to release in the middle of this year. The R3 model, developed based on the V4 foundation + RL, will compete with OpenAI's next-generation model GPT-5.
In addition to improvements in cost and capability, the R2 model will take the open-source concept to new heights. Starting with the o1 model, OpenAI has strengthened the closed-source concept. Not only are large models no longer open, but even the chain of thought has been eliminated. Users are even warned that using prompt words to induce the model to output a complete chain of thought may result in account restrictions. GPT-4.5 will be the last independently released foundation model by OpenAI. GPT-5 will enter the era of hybrid models, turning large AI models into a complete "black box" mode.
(Image source: generated by Doubao AI)
DeepSeek adheres to the open-source concept, allowing other enterprises or individuals to deploy, use, modify, and distribute the R2 model, continuously promoting the development of the AI industry. 360 CEO Zhou Hongyi once said that without open source, there would be no Linux, no internet, and even ourselves would not have developed to this point without leveraging open-source technology. Although closed source may offer opportunities for more revenue, open source can accelerate the industry's progress.
Since the release of GPT-4, although large AI models have continued to make progress, there have been no epoch-making changes. The R1 model, relying on its low-cost and high-performance characteristics, has changed the AI industry to a certain extent. The R2 model may not replicate the grand success of the R1 model, but it will significantly improve its reasoning ability, putting more pressure on other AI enterprises.
DeepSeek becomes the "king of competition", making competitors headache?
The time interval between the release of DeepSeek-V1 and the R1 model was only 13 months, and the interval between the R1 model and the R2 model may be only three to four months, making it the "ultimate king of competition". Companies like Baidu, Tencent, and 360 can once again integrate the R2 model, just like they did with the R1 model, but leading internet enterprises need to develop their own large AI models instead of relying on DeepSeek's open-source models to upgrade the capabilities of their AI tools.
Based on my experience, the vast majority of domestic large AI models perform inferior to the R1 model in terms of deep thinking, with only a few able to match the R1 model in certain scenarios. The upcoming R2 model puts more pressure on AI enterprises, who need to strengthen the capabilities of their large models before the R2 model goes online to avoid being left behind by DeepSeek.
Using DeepSeek's open-source models to improve the capabilities of AI tools is only a stopgap measure. Baidu, Tencent, and 360 have never given up on the development of large AI models. For example, Baidu's ERNIE 4.5 is already on the way.
(Image source: generated by Doubao AI)
As users, we naturally prefer AI tools that can integrate multiple models, making it convenient for us to choose the most useful one. Especially for leading AI enterprises, with their larger computing power, integrating the R1 model results in smoother responses when performing reasoning tasks, providing a better experience than the DeepSeek website or App.
DeepSeek has not only brought the outstanding R1 model but also pointed out a direction for other AI enterprises through various low-cost solutions it adopts. With techniques such as knowledge distillation and mixed precision, any AI company can achieve low-cost training of large models. As for the capabilities of these large models, it depends on the strength of the R&D personnel.
DeepSeek, with open source as its keynote, will play a catalyst role in the AI industry, urging every AI enterprise to accelerate the research and development of new models and continuously explore new directions.
Source: Leitech