05/29 2026
363
By | Intelligent Relativity
Since last year, China's large models have caught up to the global top tier. MiniMax M2.5 and Kimi K2.5 consistently rank high in Token consumption on OpenRouter, while DeepSeek V4 is often benchmarked against GPT-5. However, many overlook that these models can 'run' because the existing computing power foundation is 'sufficient.'
So when will 'sufficient' no longer be enough? The answer lies in the Agent Era. Moreover, at this stage, 'being able to run' and 'running comfortably' are two different things.
At the Kunpeng & Ascend Developer Conference 2026 – Ascend AI Developer Summit 2026 this year, Ascend provided a more fundamental quantitative insight: Over the past year, model invocation frequency has surged 50 to 100 times, while sequence lengths have skyrocketed from 4K in the Chatbot era to nearly 1M—a 250x span. MoE inference latency requirements are also becoming increasingly stringent, approaching the 1-millisecond range from 10 milliseconds.

This is not a quantitative issue of 'models getting bigger' but a qualitative signal that the underlying logic of the entire computing power foundation needs to be rewritten. So, the real question is not whether 'existing computing power is enough' but how long the 'sufficiency' of current computing power architectures can hold when Agent appetite grows exponentially, consuming more while creating exploding demand.
From this perspective, we find that the three things Ascend discussed at the summit this year—hypernode architecture innovation, comprehensive software open-sourcing, and developer experience upgrades—are all essentially answering the same question: How can the computing power foundation evolve from 'able to run models' to 'naturally Agent-friendly'?
These are not three independent topics but a complete technical chain from hardware to software to developers, requiring a systemic reconstruction before the Agent Era arrives.
Hypernodes: Unleashing Greater Power Across Thousands of NPUs
In the Chatbot era, with sequence lengths around 4K, a single KVCache hit sufficed, and NPU communication pressure was manageable and controllable.
However, entering the Agent Era, inference chains extend to nearly 1M tokens, requiring KVCache to frequently jump across NPUs. At this point, the issue is no longer just 'bandwidth sufficiency' but an architectural-level challenge—how NPUs communicate directly determines whether the system can perform better.
Under traditional approaches, each NPU mounts independent memory, and cross-NPU access relies on message semantics (send/recv), with single communications in the microsecond range.
Ascend's hypernodes are naturally Agent-friendly. The core criterion for judging hypernodes is whether 'global memory unified addressing' is achieved. Here, the architectural innovation of the Ascend 950 chip integrates SIMT and SIMD dual programming models, accomplishing this in three ways simultaneously.
First is the revolution in memory semantics. Based on the Lingqu interconnect protocol and bus controller, AIC/AIV directly access remote memory via MTE instructions without copying. Compared to traditional Ethernet, under the Lingqu-based hypernode architecture, a single MTE instruction completes the operation. The difference is not quantitative optimization but qualitative reconstruction.
Second is global memory unified addressing. Under a global single virtual address space, NPUs and CPUs directly access any location using virtual addresses without code changes, routing, or copying. KV Cache achieves global sharing, and ultra-long contexts expand seamlessly.
Third is the efficiency leap from memory pooling. Through hierarchical pooling of on-chip memory and DDR, query-based computation replaces calculation, significantly improving KV Cache query hit rates. In typical scenarios like LLMs, recommendations, and Engrams, query latency decreases by 3-4x, while training and inference throughput improves by 3-4x compared to traditional clusters.
Layered together, communication RTT latency compresses to 3 microseconds, with bandwidth reaching terabyte levels. This is the true value of hypernodes—not 'stacking more NPUs' but making every Token more efficient and cost-effective.
Thus, the summit's core judgment is straightforward: Interconnect capability determines hypernode capability, and system overall performance depends on the product of hypernode scale and single-chip performance specifications. When interconnect bandwidth reaches terabyte levels, hypernodes should not 'stack more NPUs' but redefine how chips communicate.
But physical limits are merely entry tickets. What truly determines the fate of the computing power ecosystem is another core question after hardware is built: How high is the software barrier? Will developers truly come?
Ascend Forges Its Own Path
The key to attracting developers is open-sourcing, but the underlying logic of open-sourcing has changed.
In the past, many hardware 'open-source' efforts were like opening a window outside a wall—you could see the code but couldn't modify the core layers or participate in roadmap decisions.
This is an 'open posture,' not an 'open-source ecosystem.' Why does this distinction matter? Because open-source and 'looking open-source' unlock two entirely different ecological prospects.
True open-source means developers dare to invest continuously on your platform—they can modify code, participate in roadmaps, and trust that technical iterations won't suddenly stop. Fake open-source means they must always keep an exit strategy, with every optimization potentially wasted. In the Agent Era, when software demand explodes and new scenarios emerge weekly, developers' choice of ecosystem for infrastructure builds on trust—trust that the platform won't force them backward through closure.
Thus, an 'open posture' attracts onlookers, but an 'open-source ecosystem' retains people.
Therefore, Ascend is going further in open-sourcing this time, with its core difference being the construction of a complete, efficient, and open operator development system, allowing developers to find their path regardless of entry point.
Engineers pursuing ultimate performance can use Ascend C for fine-grained control over computation, memory access, and pipelining, with every step adjustable. Meanwhile, Ascend has introduced Tensor API, supports Host-device hybrid programming, and added CCU communication capabilities.
AI algorithm engineers focusing on rapid innovation can use TileLang or Triton—these two mainstream open-source ecosystem interfaces are now 100% compatible, achieving 0.6-0.9x performance of Ascend C with development cycles compressed to one week. Currently, over 600 Triton operators and 300 TileLang operators are supported.
Of course, developers seeking a sweet spot between performance and efficiency can also choose PyPTO.
Additionally, at the other end of this multi-path operator programming system, the CANNBot operator agent bridges the 'last mile.' It incorporates microarchitecture optimization experience into a skill library, generating single Vector operators in just 3 hours and completing the entire process from generation to deployment in 1 day—over 5x more efficient than traditional manual development. Meanwhile, paired with an evaluation set covering 22 typical operator categories and an automated verification system with over 4,000 evaluation points, developers no longer need to start from scratch.
Combined with the full open-sourcing of the AscendNPU IR compilation foundation and the joint construction of 15+ ecological operator libraries with over 30 enterprises and universities, Ascend is extending a crucial olive branch to outsiders and developers—here, writing an operator from scratch no longer requires 'expert status.'
From Intelligent Relativity's perspective, the true watershed of open-sourcing has never been how much code is released but whether developers can write an operator from scratch on Ascend. Now, the answer is becoming 'yes.'
Of course, opening the door is one thing; keeping people is another. That depends on the third dimension—developer experience.
The Flywheel of Developer Experience Is Fully Spinning
The traditional path from idea to deployment for a model requires researching ecological compatibility, manually adapting operators, building validation environments, manual quantization, deployment debugging, etc., often taking weeks and requiring all-around capabilities—and 'all-around' means high barriers.
However, in Ascend's actual deployment of DeepSeek-V4-Flash, we saw a different path—model status retrieval in 1 minute, adaptation within a day, and after hours of automated validation and quantization, deployment services and model documentation output in 30 minutes. Compared to traditional manual methods, efficiency improves 4x, and considering environmental differences, far more than 4x.

How is this achieved? Not through more tools, but by turning 'expert experience' into 'system capabilities.'
Specifically, we find two mechanisms driving this change.
The first is the Skills system. The experience, pitfalls, and best practices accumulated by over 4,000 Ascend engineers through years of tuning are structured into 200+ callable Skills modules. Compatible with mainstream Agent platforms like Claude Code, Codex, and OpenClaw, they can be invoked with two commands. Problems that previously required finding the right person are now directly available in Skills.
The second is the Agentic workflow. Today, developers only need to describe their intent, and seven foundational Agents automatically orchestrate and relay (hand over) the entire process of research, adaptation, optimization, and deployment. What was once 'people finding tools' is now 'tools finding people,' fundamentally rewriting developers' work paradigms.
Above, we see more of a technical lift in developer experience. However, for the developer experience flywheel to spin fully, technology alone may not suffice—Ascend further introduces two new experiences.
First is zero-cost trial space. One-click automatic deployment, averaging two minutes to run the first Demo, and tens of thousands of cards in computing power resources supporting the open-source community—this addresses not just computing power costs but the psychological barrier of 'first attempts.' Many developers aren't unwilling to use new platforms; they fear spending time without success. Here, Ascend drastically reduces the 'try it out' cost, essentially eliminating this psychological resistance.
Second is cash out (realizable) career returns. Partnering with leading internet enterprises to create a three-tier certification system with resume recommendations, internships at top firms, and other benefits, Ascend is making 'being good at Ascend' itself commercially viable for career mobility. It's easy to see that developers stay in an ecosystem not just because tools are good but because the skills they accumulate here can be exchanged for tangible external rewards. Acknowledging this need matters far more than imagined.
When these two elements combine, the message to developers is clear—come to Ascend; you don't need to start from scratch. For both developers and Ascend, the flywheel is fully spinning.
In Closing
During the Hypernode Summit Dialogue live stream, we saw an industry judgment: 'When software production becomes more efficient, human demand for software will explode. Previously, software was too expensive, suppressing many needs.'
Correspondingly, the Agent Era is not just consuming more Tokens; it's creating entirely new demands that didn't exist before—software is no longer about reusing standardized products but about instant customization for every person and scenario. Once released, this demand will drive computing power appetite not linearly but explosively. As Anthropic co-founder Jack Clark predicts, Agents may enter autonomous evolution by 2028, with Token consumption entering nonlinear growth.
Both lines point to the same conclusion: The Agent Era cannot wait. You can't Course on Supplementary Architecture (catch up on architectural lessons), open-source software, or lower developer barriers after Agents fully erupt—that's like starting road repairs during a traffic jam.
Thus, looking back at Ascend's three initiatives at the summit, they essentially triple-respond to this judgment.
First, hypernodes redefine NPU communication paradigms, preventing Agent-era computing power consumption from being locked by latency walls. This lays the hardware foundation for the Agent Era.
Second, CANN's multi-path operator programming system plus CANNBot enable any developer to write high-performance operators from scratch on Ascend. This solidifies software capabilities for the Agent Era.
Finally, the Skills system and Agentic workflow package the experience of 4,000 engineers into the starting point for every newcomer. This paves a 'user-friendly' path for developers.
Combined, these three capabilities explain what Ascend is doing today—not just responding to the present but benchmarking against an Agent Era that hasn't fully arrived yet but is already on the way.
*All images in this article are sourced from the internet.