02/11 2025
419
DeepSeek's emergence has shattered inherent limitations in computing power and model design, yet numerous challenges remain, such as targeted model distillation, data system construction, and ecosystem stakeholder alignment. This transcends mere technical discourse, positioning itself as an upward industrial proposition for the entire AI sector.
Undoubtedly, the tide of China's AI large model industry in 2025 will surge forward, unstoppable.
Author | Dou Dou
Editor | Pi Ye
Produced by | Industry Expert
DeepSeek's rise seems to gradually sketch out a deterministic blueprint for the practical application of AI.
Over the past few years, the entry barriers for AI large models have been clear: trillion-level parameter scales, robust computing power support, and extensive, high-quality data resources—all indicative of steep "entry prices".
During the 2025 Spring Festival, DeepSeek, akin to a dark horse, forcibly rewrote the rules of the Chinese and global AI large model arena.
This team, rooted in a quantitative institution, significantly reduced large model parameters to one-tenth of their original size. Leveraging reinforcement learning and model distillation, a small model outperformed GPT-4 in solving mathematical problems. Furthermore, DeepSeek open-sourced its code and APIs, showcasing capabilities on par with OpenAI at ultra-low prices, astonishing netizens worldwide with this "mysterious Oriental power".
While these accomplishments undoubtedly shook the AI industry, the more profound questions值得深思 revolve around the core propositions hovering over the AI industry in 2024: namely, how close are we to industrial large models? At the convergence of data, computing power, and models—areas where AI applications nearly reached consensus last year—what impact will DeepSeek's "phenomenon" bring?
In 2025, the curtain on industrial digital intelligence has quietly risen.
1. Technological Paradigm Shift
Models Enter the Era of "Low Cost, High Quality"
Traditional AI large models face numerous hurdles in widespread adoption. Chief among them is the "hopeless money burn".
Take GPT-4 as an example. Its training data volume reaches 13 trillion tokens, encompassing texts from all internet domains. Such extensive data annotation is not only costly but also time-consuming and labor-intensive. Additionally, its computing power demands are astronomical, relying on tens of thousands of A100 GPU clusters, with a single training cost exceeding $100 million. These high costs and resource requirements hinder technological implementation.
DeepSeek's popularity stems from its ability to achieve "self-evolution" through pure reinforcement learning (RL), granting it a significant edge in data preparation.
In other words, it eliminates the need for annotated data, drastically reducing data preparation costs and complexity, saving developers considerable time and effort, allowing them to focus more on model training and optimization.
Moreover, DeepSeek's reward design is minimalist, utilizing only "answer correctness" and "format specifications" as reward signals. This streamlined reward mechanism mitigates the risk of "cheating" posed by complex reward models, enhancing model training efficiency and stability.
This minimalist reward design also better guides the model's development in the right direction, improving the model's training effect and avoiding unexpected deviations.
Additionally, DeepSeek employs the GRPO algorithm, using group scoring to replace the traditional Critic model, reducing computing power consumption by over 30%, further decreasing hardware resource demands, i.e., the common reliance on "cards".
Notably, its model capabilities remain robust despite reduced computing power.
A paper published by DeepSeek presents data showing that DeepSeek-R1 achieved a Pass@1 score of 79.8% in the AIME 2024 test, slightly lower than OpenAI-o1-1217. On MATH-500, it scored 97.3%, performing comparably to OpenAI-o1-1217 and significantly outperforming other models.
With DeepSeek, it appears that the world is realizing that computing power and parameters are no longer AI's entry barriers. More accurately, DeepSeek showcases a more suitable, low-threshold, low-cost approach for AI implementation, favorable from a cost perspective.
From an industrial standpoint, medium and large enterprises stand to benefit the most from this change. Over the past two years, whether large state-owned enterprises, universities, or public service departments, they have all publicly bid for large model-based projects. A significant portion of these projects involve pre-training, often with unit prices exceeding tens or even hundreds of millions, representing targeted investments by enterprises.
However, post-DeepSeek, it's predictable that the targets of medium and large-scale large model projects will significantly shift this year. For medium and large enterprises, even state-owned ones, they can deploy large model projects at a lower cost or shift focus to data governance to further enhance the final model's effectiveness.
Small technology companies also benefit. Previously, they may have been deterred by funding and technical constraints from venturing into AI. But DeepSeek's emergence offers them possibilities. Enterprises can develop AI applications tailored to their business needs based on DeepSeek at relatively low costs, fostering business development and innovation.
Overall, with the reinforcement learning (RL) technological paradigm shift, not only will the thresholds and costs for AI large model implementation decrease, but more opportunities for enterprises and developers to participate in AI innovation will emerge. This not only propels AI technology development but also provides new impetus for various industries' digital intelligence transformation and upgrading.
2. Accelerated Open Source:
The Era of Vertical Small Models Has Arrived
In the paper published by DeepSeek, besides the RL technological paradigm shift, another highlight is the construction of a cross-dimensional knowledge distillation system.
Data shows that DeepSeek-R1-Distill-Qwen-7B surpassed the original QwQ-32B-Preview with a score of 55.5% in the AIME 2024 evaluation, achieving a 23% performance improvement with an 81% reduction in parameter scale. Its 32B version even reached a stunning accuracy rate of 94.3% in the MATH-500 test, nearly 40 percentage points higher than traditional training methods.
By deconstructing the 32B large model's reasoning logic into transferable cognitive patterns and then injecting them into the 7B small model through a dynamic weight allocation mechanism, it realizes the transfer of "thinking paradigms" rather than mere "knowledge memories".
Under this technical approach, the small model not only inherits the large model's problem-solving ability but also acquires meta-abilities such as problem decomposition and logical deduction. This means that the large model's reasoning mode can be distilled into the small model, outperforming direct reinforcement training on the model.
In the AI realm, the perception that "the larger the model, the stronger the performance" has long prevailed. The evolutionary trajectory from GPT-3 to GPT-4 seemingly confirms the rule that "parameter scale determines model capability".
With the advent of this "distillation + reinforcement learning" composite training method, it appears that the era of small models is finally upon us.
It's worth noting that for many enterprises, especially small and medium-sized enterprises and specialized vertical field enterprises, pursuing model performance is often constrained by the huge computational resource costs of large models.
After DeepSeek proves small models' efficacy, these enterprises can reduce expenditures on hardware equipment procurement and leasing (e.g., high-performance servers, GPUs) and lower energy consumption costs.
For instance, a small medical image analysis enterprise originally needed to build an expensive computing cluster to use large models for image data processing. Now, with the optimized model, it can complete tasks on ordinary computing devices, significantly cutting costs.
Moreover, with model effectiveness, enterprises with industry expertise typically have a deep understanding of their business processes and data characteristics. They can often integrate models into existing business systems more swiftly.
Since small models generally have simpler architectures and fewer parameters, developers can more easily customize them to meet specific industry needs. For example, a financial risk control enterprise, leveraging its expertise in financial risk assessment, can quickly embed the appropriate model into its risk control system, shortening the development cycle and faster realizing model deployment and business optimization.
In a highly competitive market, this advantage can precisely enable certain enterprises to achieve rapid AI field overtaking, becoming rule makers and leaders in vertical AI tracks.
3. Efficiency and Scenario Breakthroughs
The Era of Terminal-Side Application Explosion Has Arrived
As we all know, in practical applications, especially in edge computing and real-time decision-making scenarios, traditional AI models often face numerous limitations.
In edge computing scenarios, limited device resources, such as mobile phones and smart glasses, make it difficult to run large AI models, thereby restricting AI technology's application in these fields.
Additionally, in real-time decision-making scenarios, like financial transactions and industrial production, traditional AI models' reasoning speed and accuracy often fall short of demands.
DeepSeek provides a novel solution. Its breakthroughs in model compression, reasoning efficiency, and training cost optimization strongly support its implementation in multiple scenarios, ushering in significant efficiency and scenario breakthroughs.
Through model compression technology, DeepSeek enables its optimized models to better adapt to resource-limited devices, such as edge computing devices like smart glasses. This equips edge computing devices with enhanced AI capabilities, providing users with more convenient and intelligent experiences.
For example, in smart glasses, DeepSeek can achieve faster and more accurate image recognition and voice interaction functions. Users can more efficiently obtain information, navigate, and recognize objects through smart glasses, greatly enhancing their practicality and application scenarios.
In real-time decision-making scenarios, its efficient reasoning ability also plays a crucial role.
Taking financial transactions as an example, financial institutions need to analyze and process vast amounts of market data in a very short time to make accurate investment decisions. DeepSeek can swiftly analyze and predict data, providing real-time decision support for financial transactions and helping financial institutions improve trading efficiency and profitability.
In industrial production, real-time quality inspection and fault diagnosis are also paramount. DeepSeek can also rapidly analyze data during the production process, promptly detecting quality issues and equipment failures, thereby enhancing production efficiency and product quality and reducing production costs.
It can be said that in 2025, DeepSeek's emergence may spark a new wave of terminal application explosions, providing robust technical support for various industries' digital transformation and upgrading. DeepSeek's scenario application breakthroughs not only showcase its technological prowess but also offer new solutions for various industries' digital transformation and upgrading.
4. Ecological Transformation
Large Factories Refine Models, Small and Medium Factories Develop Applications
DeepSeek also catalyzes changes in the AI ecosystem, and these changes will present more possibilities for AI's industry implementation.
A fact is that the current AI industry exhibits a "pyramid structure," with giants like OpenAI and Google controlling basic models, middle-tier enterprises relying on API calls and falling into "data hollowing," and small and medium developers at the bottom lacking customization capabilities, becoming ecological appendages.
This structure's fatal flaw is stagnant innovation. To maintain their monopoly, giants inevitably restrict model openness.
DeepSeek open-sources its core models and API customization capabilities, disrupting the previous OpenAI-dominated "pyramid" ecosystem.
In the new ecological model, large factories can focus on refining models, leveraging their robust technical strength and resource advantages to continuously optimize and enhance model performance and capabilities.
For example, platforms like Alibaba Cloud and Tencent Cloud can become "model supermarkets," offering hundreds of small models in vertical fields to meet different industries' and users' needs. These large factories can continuously research and develop, introducing more advanced model architectures and algorithms to propel AI technology's development and progress.
Meanwhile, small and medium-sized factories can concentrate on developing applications, swiftly crafting dedicated AI tools leveraging open-source models, without relying on the "black box" capabilities of tech giants. This approach offers them broader development horizons and opportunities, enabling them to harness their flexibility and innovation to develop AI applications that are more aligned with user needs and industry-specific traits.
For instance, some small and medium-sized factories can focus on deterministic needs like industrial quality inspection and supply chain forecasting, fine-tuning models via APIs to create efficient and precise AI applications, thereby providing tailored solutions to users. This ecological shift also brings about multiple advantages, including technological democratization, fostering a positive ecological cycle, and enabling scenario customization.
Technological democratization enables non-tech enterprises, such as those in manufacturing and agriculture, to participate in the application and innovation of AI technology, accelerating the digital transformation and upgrading of various industries. A positive ecological cycle optimizes models through industry data contributed by developers and shares profits from these models, creating a collaborative "data - model - application" network that propels the sustainable growth of the AI industry.
DeepSeek's ecological transformation not only presents new growth avenues for the AI industry but also injects fresh momentum into the digital transformation and upgrading of various sectors. As DeepSeek technology continues to evolve, its potential for ecological transformation will be further unleashed, opening up even more possibilities for the development of the AI industry.
5. 2025: New Directions for AI
By 2025, the direction for implementing AI in industry is becoming increasingly clear.
In 2025, AI development will gradually shift from a past obsession with technology to a greater emphasis on practical, business-oriented implementation. This transformation is evident in various aspects, including technology research and development, commercialization pathways, and the construction of ecological alliances.
In technology R&D, enterprises are increasingly recognizing that blindly increasing model parameters is not a wise strategy. A model with hundreds of billions of parameters is not a universal solution, and the success of DeepSeek-R1 convincingly demonstrates that a model with tens of billions of parameters can rival larger models through algorithm optimization.
As a result, future R&D investments will focus more on reinforcement learning (RL) and model distillation technology.
Compared to simply expanding data volumes, RL's self-evolving capabilities and the ecological value of distillation technology offer greater potential in commercial applications. These technologies allow enterprises to enhance model performance while reducing costs and expanding application scenarios, thereby embarking on a cost-effective path of AI and business integration.
In selecting commercialization pathways, the B-end market has emerged as a priority for deployment.
By collaborating with leading enterprises across various industries, such as automakers, hospitals, and banks, to jointly develop industry-specific models and adopt a pay-per-performance model, enterprises can not only forge deep bonds with customers but also foster collaboration in value creation.
Simultaneously, enterprises should not overlook the potential market demands of small and medium-sized customer groups. By offering open-source models and low-code platforms, these customers can access convenient "AI capability containers," effectively reducing customization costs and meeting the diverse needs of the long tail market, thereby achieving comprehensive market coverage.
Building an ecological alliance is also crucial for enterprise development.
On one hand, open-sourcing core frameworks, like DeepSeek's open RL training toolchains, can attract developers to actively participate in ecological construction, pooling the wisdom and resources of all parties to create powerful technical synergies.
On the other hand, establishing cross-border alliances is essential. Collaborating with chip manufacturers (e.g., Huawei), cloud service providers (e.g., Alibaba Cloud), and specialized enterprises in vertical fields to form an iron triangle cooperation model of "computing power - model - scenario" can foster collaborative innovation throughout the industrial chain and create a win-win industrial ecological environment.
Judging from the current industry landscape, while it is temporarily challenging for Chinese AI large models to comprehensively surpass OpenAI in general capabilities, there are ample opportunities to achieve differentiated breakthroughs through deep cultivation in vertical scenarios and open cooperation within the ecosystem.
Looking ahead to 2025, the development goal for China's AI industry is to create a batch of "small but beautiful" industry models. These models will establish a local advantage over Western "large and comprehensive" models in specific fields, gradually penetrating and expanding into the realm of general intelligence through in-depth application and optimization in specific industries.
This development path not only fully leverages China's industrial strengths in specific areas but also provides an innovative model and solution with Chinese characteristics for the advancement of the global AI industry, fostering the diversified development and application of AI technology worldwide.
Final Thoughts:
DeepSeek's technological innovation and ecological openness have transformed AI from a "game of giants" to a "creation of all people." The mutual catalysis of digitization and AI has created a flywheel effect where "the more prevalent technology becomes, the richer the data, and the smarter the models."
However, the implementation of industrial AI should be approached with caution. While DeepSeek has broken some inherent constraints in computing power and models, many challenges remain, such as targeted model distillation, data system construction, and coordinating the interests of all parties within the ecosystem. This is no longer solely a technical issue but rather an industrial proposition for the upward development of the industry.
Nevertheless, it is undeniable that the wave of China's AI large model industry in 2025 will inevitably surge forward, unstoppable.