04/22 2026
408
Can you imagine?
An AI chip, nearly the size of an iPad.
This isn’t a conceptual design—it’s the solution from an AI chip company racing toward an IPO. Not long ago, Cerebras disclosed its prospectus, officially stepping into the spotlight.
By the numbers, its growth has been nearly “exponential.”
Revenue surged from $24.6 million in 2022 to $510 million in 2025, a more than 19-fold increase in three years.
More critically, profitability transformed in tandem. After a $482 million loss in 2024, it turned positive in 2025, achieving $238 million in net profit.
This year, Cerebras has secured two pivotal clients in rapid succession. On one side, OpenAI signed a computing power deal exceeding $10 billion; on the other, Amazon began integrating its chips for inference acceleration in the cloud.
This means it’s gradually entering the mainstream computing power ecosystem.
Capital market expectations are also rising. According to foreign media, Cerebras plans to raise over $3 billion, implying a valuation of at least $35 billion.
So the question arises: What problem is a company building “iPad-sized” chips truly solving?
Today, let’s talk about Cerebras.
/ 01 / GPUs Are Not the Answer for AI Computing
Let’s start with why Cerebras exists. Many may not realize that for the past 40 years, the computing industry has consistently proven a ironclad rule:
New computing demands inevitably spawn new computing architectures.
The PC era had x86, the mobile era had ARM, and the graphics era had GPUs. Whenever a new computing paradigm emerges, the industry initially tries to “make do” with old architectures. But as technology evolves, it becomes unavoidable—old architectures hit their limits, necessitating a redesign of the underlying system.
Now, this is happening all over again with AI.
AI’s computing methods are fundamentally different. Traditionally, most computations were “local and independent”—like graphics rendering, where each pixel could be calculated separately without affecting others.
But AI models are different. They’re essentially a highly coupled computational network where data, parameters, and computational processes require extremely frequent information exchange.
This leads to a radical shift: AI is fundamentally a “communication-intensive” computing problem, not just a pure “computing power” issue.
This is also why GPUs are hitting bottlenecks.
GPUs excel at parallel computing, but only when tasks are independent. In AI, every computational step depends on the previous one, spending vast amounts of time “waiting for data.”
This architectural mismatch becomes glaring during AI model training and inference.
During training, a single GPU isn’t powerful enough, forcing models to be split across thousands of GPUs. But once split, constant communication is required, drastically reducing efficiency while skyrocketing system complexity and costs.
In short: Single GPUs aren’t enough, but multiple GPUs are inefficient.
During inference, the problem becomes even more intuitive (intuitive).
When generating each token (word), the model must run the entire model once. But due to the model’s enormous size, it can’t fit entirely into the chip’s internal cache, forcing compute units to repeatedly fetch data from external memory.
The issue? This “data movement” process is far slower than “data computation.” Currently, high-end GPUs widely use HBM (High Bandwidth Memory) technology, but HBM’s strength lies in “large capacity at relatively slower speeds.”
Cerebras founder Andrew Feldman once provided stark data: For a relatively small 7 billion-parameter (7B) model, assuming each weight uses 16 bits, generating one token requires moving approximately 140GB of data from memory.
To generate the next token, the system must move that same 140GB again—over and over. This computing model’s demand for memory bandwidth is staggering.
The result? In some scenarios, GPU utilization can plummet below 5%. How can such a costly, high-latency architecture support future real-time AI applications demanding millisecond-level responses?
/ 02 / A Chip Nearly the Size of an iPad
Cerebras’ solution to these problems is both direct and extreme:
Make the chip large enough to fit “computing power, memory, and bandwidth” onto a single silicon wafer.
Thus, Cerebras created the world’s first and only commercial wafer-scale processor—the Wafer-Scale Engine (WSE)—and claims its third-gen AI chip, WSE-3, is the “largest and fastest AI chip ever mass-marketed.”
Compared to GPUs, WSE-3’s defining feature is its size.
WSE-3 spans 46,000 square millimeters, nearly matching an iPad’s screen; the H100 measures just 814 square millimeters—a 57-fold difference.
Using Feldman’s own metaphor:
“Imagine a glass is memory, the cola inside is data, and your mouth represents computing power. How fast you can drink the cola depends entirely on the straw’s thickness. NVIDIA GPUs’ fundamental problem is that the straw is too thin. Our breakthrough? We threw the straw away and poured the cola directly into our mouths.”
This insane size delivers three revolutionary outcomes:
First, computing is brutally “centralized.”
WSE-3 crams in 900,000 compute cores—52 times more than the H100! Even crazier, all 900,000 cores sit on a single silicon wafer, adjacent to each other, eliminating the need for inter-chip communication entirely.
Second, memory is brought “closer.”
Traditional GPUs rely on HBM (essentially DRAM)—large capacity but slow access. SRAM is lightning-fast but has tiny capacity.
Cerebras’ approach? Simply make the chip large enough to house massive amounts of SRAM—WSE-3 integrates 44GB of on-chip SRAM, compared to the H100’s ~0.05GB—an 880-fold difference.
This means large model parameters can sit “face-to-face” next to compute units, eliminating constant data shuffling.
Third, and most critically, the bandwidth problem is “eliminated.”
WSE-3’s on-chip memory bandwidth reaches 21 petabytes per second (PB/s), compared to the H100’s ~0.003 PB/s—a 7,000-fold difference. On-chip interconnect bandwidth exceeds H100’s by over 3,700 times.
In GPU systems, vast amounts of time are spent “moving data.” On WSE, data rarely needs to leave the chip.
In summary, Cerebras does one thing: Stop moving data. Make computation happen around the data.
/ 04 / Who’s Paying for This Crazy Story?
No matter how revolutionary the technology, it’s worthless if no one buys it. The existential question for Cerebras: Who’s footing the bill?
The answer: Middle Eastern tycoons.
From 2022 to 2025, revenue skyrocketed from $24.6 million to $510 million—a 20-fold+ increase in three years. Net profit also turned positive in 2025, hitting $238 million.
But nearly all this money came from Middle Eastern backers.
In 2024, Abu Dhabi’s G42 accounted for 85% of revenue; in 2025, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 together contributed 87%.
Nearly all of Cerebras’ revenue comes from the Middle East.
In other words, Cerebras’ lifeline rests entirely in the hands of two major Middle Eastern clients. This precarious revenue structure naturally became investors’ biggest red flag during the IPO.
To survive, Cerebras went on a client-acquisition spree. The real turning point came from OpenAI.
In January 2026, Cerebras secured an epic deal: From 2026 to 2028, it will provide OpenAI with 750 megawatts of computing power, totaling over $10 billion!
Not only that, but OpenAI also invested $1 billion (~RMB 6.8 billion) to help Cerebras build data centers.
The backstory runs deep. OpenAI’s leader, Sam Altman, was an early Cerebras investor, and OpenAI had been eyeing Cerebras’ technology since 2017.
With a giant like OpenAI onboard, the tide shifted.
In March 2026, AWS (Amazon Web Services) followed suit, becoming the first hyperscale cloud provider to adopt Cerebras.
AWS’ strategy was cunning: For inference tasks, it used its own Trainium chips for “input understanding,” then offloaded the dirty work of “output generation” to Cerebras’ CS-3 chips. The two interconnected at high speed, forming a lethal combo.
Rumor has it this setup is over 5x faster than existing solutions!
/ 05 / The Ultimate Showdown: What’s Cerebras’ Moat Against NVIDIA?
In this arena, you can’t avoid the ultimate question:
How do you dismantle NVIDIA’s famed moat?
The entire AI computing power industry believes one narrative: CUDA is NVIDIA’s impenetrable fortress.
At GTC 2026, Jensen Huang declared: “We’ve built CUDA for 20 years. It’s embedded in every cloud, every computer. Our moat is the entire software ecosystem!”
But to challenger Feldman, the CUDA myth is vastly overblown.
Why? Because while CUDA holds value for training, it offers nearly zero lock-in for inference.
Today’s AI development is dominated by PyTorch, decoupling applications from underlying hardware. With a robust compiler, porting models across hardware is trivial.
“Switching from NVIDIA to Cerebras to someone else isn’t hard for inference,” Feldman argues.
Compared to the ethereal CUDA ecosystem, Feldman sees clearly: NVIDIA’s real moat is its terrifying market share.
Beyond CUDA, Feldman believes NVIDIA’s true fortress is market dominance.
Market share itself is the most insurmountable moat.
He cites Intel: Despite repeated mistakes, it held 70%+ market share for years. AMD spent a decade to claim just 20%+.
This means once a company becomes the default option, even superior competitors face an uphill battle to displace it.
For today’s NVIDIA, this advantage is even starker: Everyone learns and builds AI within its ecosystem. NVIDIA is the “default starting point.”
But Feldman isn’t despairing. He predicts NVIDIA’s near-100% monopoly could shrink to 50–60% in five years.
The reason is simple: While NVIDIA dominates training, the truly massive inference market is exploding—and it craves new architectures.
Meanwhile, Feldman believes another controversial truth:
Even in the future, chip companies will utterly dwarf model companies in value!
His logic stems from a classic metaphor: Markets are a “voting machine” short-term but a “weighing machine” long-term.
Model companies thrive in short cycles—just months—with leadership constantly shifting, making long-term barriers elusive.
Chips are different. Their barriers lie in the physical layer: manufacturing, process, supply chain, engineering. Once established, these are nearly impossible to replicate quickly.
Over decades, the truly enduring giants have emerged from this layer.
