DeepSeek Breaks Through: How China's Homegrown Large Model Defies the Odds

12/03 2025 538

【From Attention to Agent: The Fundamental Solution for Capability Leaps】

By late 2025, Google had largely regained its position as the global leader in large model technology. Gemini 3 Pro burst onto the scene, surpassing all open-source models across multiple authoritative benchmarks and re-establishing the technical dominance of the closed-source camp (closed-source camp). Suddenly, industry doubts resurfaced—“Have open-source models reached their limit?” and “Has the Scaling Law truly hit a wall?”—as a sense of stagnation spread through the open-source community.

But at this critical moment, DeepSeek refused to stay silent. On December 1, it unveiled two groundbreaking models: DeepSeek-V3.2, which matches GPT-5 in reasoning performance, and the Speciale version, which excels in mathematics, logic, and multi-round tool invocation. This was not just a showcase of technical prowess but also a direct response to the new “closed-source ceiling,” achieved despite limited computational resources.

This was no simple model update. DeepSeek sought a new path in the post-Scaling era: How to bridge the pre-training gap through architectural innovation? How to achieve low-token, high-efficiency agent performance via “thinking chains during tool use”? Most crucially, why has Agent capability evolved from a secondary feature into the core engine of model advancement?

This article analyzes three key themes: How did DeepSeek break through technical bottlenecks? Why did it prioritize Agent capabilities within the open-source ecosystem? And does this mean open-source models still have a path to penetrate closed-source fortresses?

I. From Lagging to Leading: How DeepSeek Joined the Top Tier

In the arena of elite AI models, open-source contenders have long been seen as mere “catch-up” players, incapable of true competition. Yet DeepSeek-V3.2’s performance defies this narrative.

According to DeepSeek’s official data, V3.2 now rivals GPT-5 in public reasoning benchmarks, trailing only slightly behind Gemini 3 Pro. In critical evaluations, it consistently outperforms Kimi-K2-Thinking and sets new records for open-source reasoning capabilities in China. Across mathematics, logic, and complex Q&A tasks, V3.2 approaches the performance of closed-source leaders, securing its place among the global “second tier.”

The breakthrough lies not in simply “scaling up” but in reimagining foundational architecture, particularly through the introduction of the Dilated Sparse Attention (DSA) mechanism. Traditional Transformer architectures compute relationships between every token and all preceding tokens, leading to quadratic computational complexity—a major cost bottleneck in large model inference.

DSA’s “Lightning Indexer” acts as a “rapid predictor,” screening critical token pairs through minimal, low-precision index heads (operable in FP8) instead of full attention allocation. This reduces the core attention mechanism’s complexity from quadratic to near-linear, maintaining stable computational loads even with ultra-long 128K context inputs.

Notably, DeepSeek avoided radical replacement by adopting a “dense warmup—sparse transition” dual-phase training strategy. During early pre-training, the original attention structure was retained while training the indexer to mimic its distribution. Later, during post-training, the architecture gradually shifted to sparse mode without interruption. This “gradual architectural evolution” enabled V3.2 to achieve efficiency gains without sacrificing precision in long-context reasoning. Tests like Fiction.liveBench and AA-LCR showed marked improvements in information recall, contextual consistency, and compressed expressiveness.

But the most industry-defining breakthrough lies elsewhere. DeepSeek introduced the “Thinking in Tool-Use” paradigm in V3.2, transforming the model’s execution chain from a linear “think→invoke tool→end” process into an interleaved “think→invoke→continue thinking→re-invoke” logic. This aligns closely with the “Interleaved Thinking” direction in Agent research, enhancing logical continuity in tool invocation and enabling models to reuse intermediate reasoning states across tasks.

This capability is critical in real-world Agent scenarios, where tasks rarely unfold in a single step but demand multi-round information gathering, verification, and strategy refinement. Traditional models “forget” with each tool invocation, forcing them to restart reasoning from scratch. V3.2, however, retains “reasoning trajectories” as part of its context, extending the original thought path after new information arrives. This reduces redundant token generation and minimizes logical disruptions caused by state drift.

Ultimately, DeepSeek’s leap forward stems not from brute-force FLOPs but from “smarter computational allocation.” DSA optimizes compute distribution, while interleaved thinking stabilizes tool invocation. Together, they advance a singular goal: transforming models into “sustainable thinking agents” rather than mere language completers.

This signals a broader shift: as scaling dividends diminish, future model competition will pivot from “parameter count” to “thinking organization” and “energy efficiency.” V3.2 embodies this transition.

II. Betting on Agents: Not Trend-Following, But a Strategic Inflection Point

Beyond technical advancements, DeepSeek-V3.2’s strategic pivot lies in elevating “Agent capability” alongside “reasoning ability” as a core metric in its technical documentation—a direction rarely emphasized by domestic open-source models. In DeepSeek’s view, Agents are no longer secondary modules for tool invocation but bridges between model capabilities and industrial adoption, even serving as vanguards for future large model platforms.

This judgment is grounded in reality. Over the past year, the large model industry has shifted: enterprises now recognize diminishing marginal value from “smarter chatbots.” True commercial viability lies in Agents with “actionable capabilities,” from automated report writing and dashboard generation to batch ticket processing and code repair. Businesses pay for executable intelligence, not human-like dialogue.

This explains DeepSeek’s heavy investment in Agent training systems post-V3.2, including a self-built, large-scale task generation pipeline. The team synthesized over 1,800 Agent environments and designed ~85,000 high-complexity task prompts around Agent missions. These tasks were not manually annotated but automatically generated via environment builders and trajectory scoring mechanisms, with closed-loop training enabled by reinforcement learning (RL).

This approach diverges from traditional pre-training’s reliance on massive conversational datasets. Agent task trajectories offer stronger structural integrity, verifiability, and scarcity. Once constructed, their training efficacy far exceeds conventional “dialogue completion.” Critically, RL mechanisms allow continuous model optimization through feedback loops, breaking free from the unidirectional iteration of pre-training.

DeepSeek employed its self-developed GRPO (Group Relative Policy Optimization) strategy, deeply adapted for large-scale, multi-round task training. Here, models must optimize not just single-round outputs but balance reasoning consistency and linguistic stability across rounds. To mitigate “catastrophic forgetting” in traditional RL, DeepSeek integrated reasoning rewards, language consistency scores, and task completion grades into a multi-dimensional reward signal, preserving Agent execution chain integrity during training.

Supporting this complex mechanism requires upgraded “state awareness.” V3.2 introduced a full contextual management strategy: models reset thinking states only upon new user messages, retaining reasoning trajectories during continuous tool invocation. This enables “thought residue” accumulation, allowing models to resume reasoning after new information arrives instead of restarting logic. Dubbed the “state continuation mechanism,” it ensures multi-round behavioral continuity and task decomposition across complex, cross-stage missions.

Systematically, DeepSeek’s view of Agents has evolved from “task execution plugins” to components of a “model operating system.” Agents are not add-ons but integral to the model’s core architecture. This systemic shift implies future large model platforms will resemble scheduling OSes: the model serves as the kernel, Agents as user-state programs, and plugin tools as callable modules. Whoever standardizes the Agent layer will dominate platform discourse in the AI era.

This explains DeepSeek’s push to standardize “interleaved thinking + tool use” and propose “Thinking in Tool-Use” as foundational design language. This transcends technical details, reflecting a platform-centric mindset.

For the industry, DeepSeek’s pivot marks a new watershed: Agent capability is no longer an optional engineering feature but a core branch of model development. Platform-level Agent proficiency has become a key indicator of long-term model competitiveness.

III. The Limits of Open-Source Models? DeepSeek’s “Post-Training Tactics” Offer a Response

While V3.2 and Speciale reversed the “catch-up” narrative across multiple benchmarks, DeepSeek’s technical reports acknowledge persistent gaps between open-source and closed-source systems in knowledge breadth, ultra-complex task handling, and token generation efficiency. Open-source models remain constrained by resources, data, and budgets.

Rather than conceal these limits, DeepSeek responded with actionable strategies: if resources lag, deepen training methodologies.

At the core lies its unique “post-training trifecta”: expert distillation + multi-track reinforcement learning + tool-thinking integration.

First is Expert Distillation. While most models rely on mixed general-purpose training, DeepSeek crafted six expert models for V3.2, covering mathematics, programming, logical reasoning, general Agents, Agent programming, and Agent search. Each task domain has dedicated models trained on proprietary datasets and generated trajectories, honing singular skills. These experts do not deploy directly but generate high-quality training samples to refine the main model.

Subsequently, data from these “task-specialized models” trains a unified general model. Technically, this resembles feeding a well-rounded “all-rounder” with outputs from multiple hyper-specialized “prodigies,” avoiding skill dilution in multi-task training while preserving structural connections across tasks.

The second layer involves expanded reinforcement learning (RL). DeepSeek extended the GRPO (Group Relative Policy Optimization) strategy from V3.2-Exp, upgrading data and reward structures. Models must now optimize language quality, reasoning chain logic, and natural tool invocation while completing tasks. Over 10% of post-training computational budgets—rare in open-source ecosystems—were allocated here.

Critically, RL relies not on human scoring but on task environment feedback and automatic rubric-based scoring. This enables closed-loop learning through “structured tasks → auto-scoring → behavior optimization,” creating model capabilities rarer—and more reusable—than chat data.

The third layer fuses tool use with “thinking chains.” Early in training, models struggle to grasp “when to invoke tools versus continue thinking,” causing fractured reasoning. DeepSeek designed a cold-start system prompt for V3.2, embedding tool invocation examples within thought trajectories. This teaches models to “think with tools” across multi-round tasks rather than “invoke after thinking.”

Contextual states were also redesigned: tool invocation no longer interrupts thought, and only new user input triggers clearance. This drastically reduces token redundancy and eliminates per-task reasoning restarts.

These engineering-heavy designs address a fundamental question: How can open-source models enhance “intelligence density per token” under constrained parameters and training scales?

DeepSeek’s answer: concentrate resources on critical reasoning chain paths, maximizing information per round while minimizing repetition. This is not a victory of scale but of methodology.

Even so, DeepSeek has not fully bridged the knowledge gap with closed-source models. Its reports note that V3.2’s world knowledge breadth still lags behind leading closed-source models, and while Speciale excels in complex competitions, its token costs remain prohibitive for general-purpose use.

: ["But if Gemini 3 Pro represents the continued exploration of the 'bigger, faster, stronger' approach by the closed-source camp (camp/community, contextually translated), then what V3.2 and Speciale represent may be a new path of 'lighter, steadier, and smarter.' At a time when there is still debate in the industry over the prospects of the Scaling Law, DeepSeek is attempting to reconstruct the competitive order of open-source models with stronger reasoning and organizational capabilities, reduced resource consumption, and more efficient training paradigms.", "Written by 丨 Original content from Gaojian Guanchao. For reprints, please contact for authorization.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.