02/28 2026
457
Preface: As the parameter scale of AI large models approaches the trillion level and context windows exceed millions of tokens, HBM (High-Bandwidth Memory), while fast, offers limited capacity and comes at a high cost; NAND flash, while large in capacity, suffers from insufficient bandwidth. Amid this gap between computing power and storage, a new storage technology called HBF (High-Bandwidth Flash) is moving from the lab to the forefront of industry.
The "Upgrade" Logic from HBM to HBF
HBF (High-Bandwidth Flash) is not a simple replacement for HBM but rather uses NAND stacking to achieve bandwidth close to that of HBM, aiming to play a strategic role as a "secondary cache" in the AI inference era. A technological competition centered around HBF has quietly begun among storage giants. Its core design philosophy is to deploy a "middle-layer" storage next to GPUs or AI accelerators, offering significantly greater capacity than HBM and much higher bandwidth than traditional SSDs.
From a technical standpoint, HBM stacks DRAM, while HBF stacks NAND flash. The first-generation HBF products are expected to stack 16 layers of 32GB NAND flash, reaching a total capacity of 512GB and a bandwidth exceeding 1638GB/s (equivalent to 50 times that of PCIe 6.0×4). In AI computing architectures, HBF serves as a "capacity expansion" for HBM.
The "Memory Wall" Dilemma in the AI Inference Era
The emergence of HBF directly addresses the most challenging contradiction in current AI computing systems—the inability of HBM capacity to keep pace with the exponential growth of model parameters. During the inference phase of AI large models, especially in scenarios involving long contexts, multi-round dialogues, and agent interactions, the system frequently reads and writes key-value caches (KV caches) to maintain contextual memory. When the context window reaches the million-token level, HBM is quickly overwhelmed by these cached data, impairing the efficiency of core computing tasks.
Traditional solutions involve recomputing vectors or accessing remote SSDs to handle overflow data, but both approaches introduce significant latency overhead. HBF intervenes to fill the substantial gap between HBM and SSDs. SK Hynix's proposed H3 (HBM+HBF) hybrid storage architecture positions HBF as a "secondary expansion" of HBM, specifically for storing read-only data or key-value caches. Simulations show that workloads previously requiring 32 GPUs can now be completed with just 2 GPUs with HBF assistance. This means HBF not only breaks through capacity bottlenecks but may also fundamentally alter the economic model of AI computing clusters.
Technical Characteristics: Rebalancing Capacity, Bandwidth, and Cost
The high hopes for HBF stem from its unique rebalancing of capacity, bandwidth, and cost. Capacity is its most prominent advantage, with HBF offering 8 to 16 times the storage capacity of HBM. This means model parameters and cached data that previously needed to be distributed across multiple GPUs can now be more centrally processed by a single GPU paired with HBF, significantly reducing system complexity and interconnection overhead.
In terms of bandwidth, HBF achieves about 80% to 90% of HBM's transmission speed. While there is still a gap, this bandwidth level is sufficient to support data supply needs in most inference scenarios, considering its capacity advantage. Regarding power consumption, HBF reduces power by approximately 40% compared to HBM, which is highly significant for AI clusters with increasingly uncontrollable power density.
Cost-wise, NAND-based HBF offers a much lower cost per unit capacity than DRAM-based HBM. GF Securities analysis suggests that HBF could expand GPU memory capacity to 4TB, making it the optimal solution for meeting the memory capacity requirements of AI large models.
Of course, HBF has inherent limitations: due to the physical characteristics of NAND flash, it suffers from poorer write endurance and higher access latency than DRAM. Therefore, the current mainstream design approach is to use HBF for read-only data or low-frequency written key-value caches while keeping frequently read and written dynamic data in HBM.
Global Giants Enter the Competition, Korean Duo Leads
Recognizing HBF's strategic value, global storage giants have entered the field, with Korean manufacturers taking the lead. SK Hynix is the most active in HBF R&D, planning to release HBF1 samples as early as 2026 and aiming for mass production by 2027. Strategically, SK Hynix adopts an HBM-centric approach, positioning HBF as a complement to HBM rather than a replacement, optimizing AI inference efficiency through their synergistic configuration.
Samsung Electronics displays even greater ambition. Leveraging its advantages in logic foundry, Samsung is exploring the production of HBF control logic using its in-house 4nm process and optimizing the energy efficiency and control performance of next-generation NAND solutions. Samsung aims to integrate HBF into a broader reconstruction of AI memory hierarchies, with its collaboration with SanDisk progressing toward applications in NVIDIA, AMD, and Google products by late 2027 to early 2028.
SanDisk is one of the earliest advocates of HBF technology, working closely with Professor Jung-Ho Kim's team at KAIST in South Korea to drive the standardization of HBF. SanDisk believes that HBF is the key solution to the GPU HBM memory wall problem, but its success requires the establishment of industry standards and adoption by mainstream customers like NVIDIA.
Notably, the three major manufacturers have begun collaborating on HBF standardization. Following SK Hynix, Samsung Electronics has also joined SanDisk's pioneering HBF technology camp, with the three parties working together to establish HBF as an industry-wide standard.
Future Outlook: From "HBM Dependency" to "AI Memory New Pillar"
Despite its promising prospects, HBF's industrialization still faces multiple challenges. Technically, HBF requires GPU manufacturers to redesign architectures, and developers need to modify software to optimize hardware-software collaboration, involving substantial semiconductor-level complexities.
Ecologically, NVIDIA's stance is crucial. Currently, NVIDIA has not publicly expressed interest in HBF, instead developing ICMSP technology as an alternative solution, using DPU-connected NVMe SSDs to handle overflow data. However, with continued promotion from core suppliers like SK Hynix and Samsung, as well as HBF's impressive energy efficiency improvements demonstrated in simulation tests, NVIDIA's attitude may shift.
Securities institutions predict that the HBF market will grow from $1 billion in 2027 to $12 billion by 2030. Professor Kim Jung-Ho even predicts that HBF demand will surpass that of HBM starting in 2038. In terms of commercialization timeline, SK Hynix is expected to showcase an early HBF test version later this month, while widespread HBF adoption is anticipated to wait until the HBM6 era—when single base dies will integrate multiple memory stacks.
Conclusion
In 2026, when inference becomes the new primary battleground for AI computing power, whoever first establishes an HBM+HBF hybrid storage system may gain a critical advantage in the ultimate competition for computing efficiency, ushering in the next golden decade of AI computing power.
Online References:
Sohu: "SK Hynix Explores H3 Storage: Leveraging HBM and HBF Advantages to Optimize Inference Efficiency"
IT Home: "Report: Micron Loses NVIDIA HBM4 Order, Korean Duo SK Hynix and Samsung Divide Market"