Trend丨Carving Large Models into AI Chips: The Gamble of Iteration Cycles Behind Extreme Concepts

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

03/09 2026 372

Foreword:

In February 2026, Taalas, a Toronto-based startup, announced the completion of a new funding round worth $169 million, bringing its total funding to approximately $219 million, and simultaneously unveiled its first chip, HC1.

Taalas's MSIC Route: "Blowing Up the Memory Wall"

The HC1 runs Llama 3.1 8B at an astonishing speed of 17,000 tokens/second, dozens of times faster than NVIDIA's B200, yet costs only 1/20th as much. This bold gamble of "model as chip" has torn open a rift in the AI computing power market.

Since the establishment of the von Neumann architecture in 1945, the design separating computing and storage has dominated the chip industry for eight decades—data must be constantly shuttled between memory and computing units, and this "memory wall" has become the core bottleneck of AI computing power. Taalas invented the "Mask ROM recall fabric + SRAM" architecture, solidifying billions of parameters of Llama 3.1 8B into the metal layers of the silicon wafer via masked ROM while retaining some SRAM regions for handling KV cache and LoRA fine-tuning.

Built on TSMC's 6nm process, the HC1 integrates 53 billion transistors within an 815mm² area, consuming only about 200W of power. A system composed of 10 cards requires just 2.5 kilowatts of air cooling. Led by Ljubisa Bajic, Taalas's CEO—a former AMD and NVIDIA architect and founder of the star chip company Tenstorrent—a team of 25 people developed this "counterintuitive" chip in just two and a half years with a research and development cost of only $30 million.

The Cost of Hardwired Hardware and the Risk of "Carving the Boat for the Sword"

In the late 1990s, 3dfx's Voodoo graphics card dominated the 3D graphics field by hardwiring the rasterization steps of 3D rendering into the circuit. However, by 1999, when developers began exploring programmable shaders for richer 3D effects, Voodoo, unable to support new features due to its hardwired hardware, was ultimately replaced by NVIDIA's GeForce, leading the company to bankruptcy.

During the AI chip boom of 2016-2018, a wave of startups designed specialized "convolutional acceleration engines" for CNNs (Convolutional Neural Networks), excelling in image tasks like facial recognition and autonomous driving. However, after the release of "Attention is All You Need" in 2017, the Transformer architecture fundamentally changed the underlying mathematical logic of AI, rendering those companies that had hardwired CNNs into their chips obsolete due to their lack of general matrix computing capabilities.

Comparing these two cases reveals that the degree of hardware solidification determines the risk level: Voodoo solidified rendering pipelines, which, while outdated, remained usable; CNN chips solidified algorithms, significantly narrowing their applicable scenarios but still retaining some value; whereas Taalas solidifies specific model versions—once the model updates, the chip may directly become "electronic waste." If the model changes, the chip is useless; this extreme binding makes Taalas bet on a premise: AI algorithms have entered a "plateau phase" where architectures no longer undergo drastic changes.

The Survival Logic of Specialized Chips in Vertical Scenarios

Despite the high risks, Taalas is not chasing a vain dream. In the real commercial world, not all scenarios require a "universal deity" that knows everything. Many vertical scenarios demand an extremely stable, cheap, and blazing-fast "electronic workhorse" that excels at a specific task.

Taalas's HC1 precisely targets three application scenarios sensitive to latency and requiring stable model versions:

Enterprise-Specific Models: Industries like finance, healthcare, and law have long used fixed-version privatized models, reducing inference costs to 1/266th of traditional solutions, making many previously unfeasible AI applications commercially viable.

Edge Inference: Devices such as humanoid robots, autonomous vehicles, and smartphones demand extremely high real-time performance and do not need to run multiple models. When encountering unexpected situations like temporary road closures, vehicles require "instinctive reasoning reflexes" within less than 1 millisecond; specialized chips with hardwired large models can achieve local ultra-fast responses.

Large-Scale Customer Service: Intelligent customer service for e-commerce and telecom operators year-round (cháng nián, meaning "year-round") runs standardized dialogue models. HC1's "instant responses" significantly enhance user experience while slashing operational costs by over 90%.

This "complementary rather than substitute (tì dài, meaning "replacing")" positioning has found survival space for Taalas under NVIDIA's shadow.

The Time Race Between Model Iteration and Hardware Delivery

However, a clear positioning does not eliminate the core uncertainty of the business model—the massive misalignment between model iteration cycles and chip development cycles.

Today, open-source large models evolve on a "monthly" or even "weekly" basis. Yet, an advanced-node chip typically takes 18 to 24 months from architectural design to tape-out and mass production. By the time the chip rolls off the production line, the model it "freezes" may have become "antiquated" in the rapidly evolving algorithmic world. Moreover, if the hardwired model has fatal flaws, the entire batch of chips can only be scrapped.

Taalas's defensive strategy is "rapid physical iteration." They collaborated with TSMC to develop a "two-layer metal" scheme—changing the model does not require redesigning the entire underlying silicon wafer; only the top two metal layers of the chip need to be modified, compressing the hardware implementation cycle of new models to about two months. Meanwhile, HC1 retains support for LoRA fine-tuning, allowing enterprises to hang small "knowledge patches" outside the physical large model to adjust specific task performances.

From "General-Purpose Dominance" to "Coexistence of General-Purpose and Specialized"

In 2026, as inference becomes the new main battlefield for AI computing power, the market is evolving from "general-purpose dominance" to a split pattern (gé jú, meaning "landscape") of "coexistence of general-purpose and specialized."

NVIDIA, through its $20 billion acquisition of Groq's inference technology, has signaled a "general-purpose giant's compromise to the specialized track." Meanwhile, Etched chooses to hardwire the Transformer architecture, Groq adopts a pure SRAM-based LPU route, Cerebras breaks through the memory wall with wafer-scale engines, and Tenstorrent embraces a RISC-V programmable architecture—diverse technological routes are vying for supremacy, collectively eroding the once-impregnable inference market.

The future AI computing power landscape may present a "three-way division": NVIDIA GPUs and general-purpose accelerators will dominate, exploring the intellectual boundaries of AGI and handling the most complex and volatile unknown tasks; "physically hardened" chips like Taalas's will permeate every streetlight, home appliance, and industrial robot; cloud vendor self-developed chips like Google TPU and Microsoft Azure Maia will deep cultivation (shēn gēng, meaning "deeply cultivate") cloud-internal deployments.

Conclusion:

Taalas's HC1 proves that when large models become as cheap and ubiquitous as resistors and capacitors, the true explosion of AI will finally begin.

Online Sources:

TMTPost: "$169 Million Funding Bet on Specialized Chips: Taalas Aims to Rewrite the AI Computing Power Landscape with 'De-GPUization'"

Vicor News: "Challenging NVIDIA's Computing Power Hegemony? A Toronto Startup 'Carves' Large Models into Chips"

Chipwise News: "17,000 Tokens/s! 48 Times Faster Than NVIDIA B200! Who Is This Company That 'Carves' Large Models into Chips?"

Sohu: "The Global Chip Circle Explodes: This 'Madman' Carves Models into Silicon Wafers, Discardin g an 80-Year-Old Architecture on a Whim"

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links