Behind the Release of Its First Self-Developed Chip, Is OpenAI's Full-Stack Ambition on Display?

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

07/02 2026 367

Author｜Qingping Chuiguo

Editor｜Wuxin Chaliu Liu Chengzhi

OpenAI has produced its first self-developed chip.

Not long ago, OpenAI posted a picture on social media showing Sam Altman receiving a wafer from Broadcom CEO Hock Tan.

This is not a PPT or a conceptual diagram, but an engineering prototype that has successfully run real models in the laboratory.

Named Jalapeño, it is OpenAI's first AI inference chip designed from scratch, specifically addressing the issues of AI inference being too expensive, too slow, and too power-hungry.

Nine months ago, it was just a blank slate. Nine months later, it can steadily run models like GPT-5.3-Codex-Spark in the lab, with both frequency and power consumption meeting targets.

What's even more outrageous is that the design was accelerated by OpenAI's own models.

Is OpenAI's AI building itself a new 'body'?

Moreover, this new 'body' will be used to run even smarter next-generation AI. The ChatGPT and Codex you use are now helping to build the next generation of hardware they will run on.

I have a hunch that once this flywheel starts spinning, many things will be rewritten.

Nine months shatters industry records

Those familiar with chip development know that a high-performance ASIC (Application-Specific Integrated Circuit), from architecture definition to tape-out, typically takes two to three years.

Google's first-generation TPU took about three years from architecture definition to tape-out, while traditional chip companies often take five to seven years for projects of similar scale.

OpenAI did it in just nine months.

This speed is comparable to a 'special forces extreme cross-country' in the chip industry, dubbed 'the fastest ASIC development cycle in the history of high-performance advanced semiconductors.' And it's not just hype.

How did they achieve this? Two reasons: one relies on 'brainpower,' the other on 'physical effort.'

First, AI helps design circuit diagrams, turning the verification deadlock into 'fast-forward mode.'

The most torturous part of chip design isn't architecture design but the endless cycle of verification, modification, and re-verification after design completion. This often consumes the majority of the project timeline.

AI excels at this kind of 'brute-force enumeration' grunt work.

While OpenAI hasn't disclosed specifics, they explicitly mentioned using their own models to accelerate parts of the design and optimization process.

This scenario is somewhat sci-fi: AI helps design its own circuit diagrams, which become chips that run even stronger AI, which then designs the next generation of chips.

This forms a closed loop where AI 'hands itself a wrench,' accelerating the process.

The engineering prototype is already running in the lab, powering cutting-edge models like GPT-5.3-Codex-Spark, with both frequency and power consumption meeting mass production targets.

Second, they recruited a professional team, having the most code-savvy experts teach chips how to run code efficiently.

Richard Ho, head of OpenAI's hardware team, spent nine years at Google working on TPU chip engineering, rising to Senior Engineering Director.

During his tenure, he helped invent machine learning-assisted chip design methods that enabled multiple TPU projects to succeed on their first tape-out. He later worked at photonic computing company Lightmatter, essentially treading every key node on the 'AI hardware' path.

He joined OpenAI in late 2023 and quickly assembled a elite team of about 40 people.

Having someone who understands LLM (Large Language Model) operations lead a team of experts in LLM operations to design hardware specifically for running LLMs is like treating the disease with the right medicine!

But one question remains: Does successful tape-out equal worry-free mass production?

Of course not.

Tape-out merely sends the design for manufacturing. Mass production still faces hurdles like yield rates, supply chains, and software ecosystems.

However, transforming a blank slate into a functional silicon chip in nine months is impressive enough on its own.

Why focus on 'inference' instead of 'training'?

Jalapeño has a clear positioning (positioning): inference only, no training.

Training and inference have entirely different chip requirements.

Training is teaching models to 'read and write.' You feed them all publicly available data, books, and code from the internet, letting them discern patterns from massive datasets.

This process demands extreme computational density, massive high-bandwidth memory for data feeding, and high general-purpose computing capabilities.

In this domain, NVIDIA's GPUs, with their CUDA ecosystem and hardware architecture, have built an almost insurmountable moat that no one can dislodge short-term.

Inference is different—it's when models 'answer questions.'

When you type a query into ChatGPT and click send, the server generates a response within seconds and sends it back. That's inference.

Inference tasks are characterized by: the model is already finalized, needing no further learning, only requiring fast, low-cost processing of massive user requests within established frameworks.

Such work is perfect for specialized chips to optimize precisely, squeezing maximum value from every watt and dollar.

OpenAI's decision to start chip development with inference is highly pragmatic.

Why? Because many don't realize that inference is quietly surpassing training as the top consumer of computational resources.

For mainstream industry models, inference-side computational consumption has gradually exceeded training-side.

Training a model is like building a skyscraper: once the blueprints are done and the structure topped out, the money is spent in one go, with no further costs. But inference? After the building is completed, every day it operates—air conditioning, elevators, lighting, maintenance—costs money every second. As long as users keep querying ChatGPT, the costs never stop.

In other words, training is the 'down payment,' while inference is the 'mortgage'—one you pay for life.

For products like ChatGPT with user counts in the hundreds of millions, if inference costs remain high, profitability will always be elusive.

The more users and queries, the larger the computational bill grows, eating into revenues before they can even warm up.

Jalapeño targets this exact pain point.

Broadcom CEO Hock Tan revealed that compared to mainstream AI graphics processing units on the market, Jalapeño can slash inference costs by nearly half.

If a request previously cost 20 cents, now it costs just over 10 cents. The savings are pure profit margin.

Not just costs—performance news is explosive too.

Early tests show Jalapeño's performance per watt 'significantly outperforms the current industry state-of-the-art.' In a Reuters interview, Tan even claimed this chip can compete with NVIDIA's Blackwell and Google's TPU.

However, these figures are currently official claims without third-party independent lab benchmarking.

When officials claim 'comparable to Blackwell,' details like which models were tested, under what concurrency conditions, and against which Blackwell configuration remain unclear until technical reports are officially released.

But that doesn't prevent us from getting excited.

OpenAI's 'full-stack' ambition goes beyond just building a chip

Jalapeño means 'jalapeño pepper' in Spanish, one of the mildest varieties.

The name hints that this is just OpenAI's first step—hotter things are coming.

Previously, OpenAI's approach was simple: focus solely on models, leaving computational power and data centers to Microsoft and NVIDIA to worry about.

This let them sprint ahead in the early model arms race but buried a structural risk—they didn't control pricing for computational costs.

With their lifeline in others' hands, they paid rent at the landlord's whim.

Today, OpenAI faces a vastly different competitive landscape than two or three years ago.

In the past, model capability was the biggest competitive variable—as long as models got smarter, high computational costs could be tolerated. But as we enter new phases like AI agents, AI programming, and AI video, model capabilities are converging. The real differentiator now becomes who can deliver intelligence most efficiently at lower costs and larger scales.

In other words, AI competition is shifting from 'who has the best model' to 'who delivers intelligence most efficiently.' And inference chips are the underlying infrastructure determining this efficiency.

After autumn 2025, OpenAI's actions noticeably accelerated.

First, they announced a collaboration with Broadcom to develop self-designed accelerators. Then they signed a $100 billion data center agreement with NVIDIA, locked in 6GW of chip supply with AMD, and in June 2026, secured a 750MW inference computational power order with Cerebras.

With Jalapeño's official debut, four parallel paths now exist:

self-developed chips, NVIDIA, AMD, and Cerebras.

Does OpenAI still seem like just a 'model company' now?

Obviously not.

Long-term, self-developed chips mean more than just cost savings.

The AI industry is entering a new competitive phase where models, chips, networks, data centers, and software stacks are increasingly tightly coupled. Future leading AI companies will compete not just on a single model but on whether their entire infrastructure can collaborative optimization (collaboratively optimize).

Just as Apple integrated hardware, operating systems, and app ecosystems through its self-developed M-series chips, OpenAI is attempting to co-design models, inference systems, and underlying hardware. This means optimizing not just individual models but the entire AI service chain.

Moreover, OpenAI's IPO preparations are underway. Launching self-developed chips at this juncture sends a clear signal to investors: this company has the means and plans to plug the biggest bleeding point—inference costs.

Financial and technological narratives converge here.

In terms of collaboration models, OpenAI chose a pragmatic route.

They lead on chip architecture, kernel optimization, and service systems; silicon implementation and network interconnection go to Broadcom (Google's TPUs are mass-produced through Broadcom, a proven path); board, rack, and system integration go to Canada's Celestica—also Google TPU's preferred manufacturing partner, by the way.

These three companies complete the puzzle.

Broadcom CEO Hock Tan called this collaboration 'a fundamental commitment'—not a one-time project but the starting point of a multi-generational roadmap.

Jalapeño is just the beginning. According to previous media reports, the second-generation chip is codenamed 'Serrano,' another type of pepper.

The deployment timeline is set: initial deployment begins in late 2026, expanding year by year (year by year) thereafter, with the ultimate goal of building gigawatt (GW)-scale data centers alongside partners like Microsoft.

What does gigawatt mean?

Roughly equivalent to the residential electricity consumption of entire Beijing municipality. Scheduling power at this scale is the ultimate stress test of infrastructure construction capabilities.

OpenAI is transforming from an AI lab into a full-stack AI company.

They train models in-house, design chips themselves, optimize inference autonomously, and control deployment independently.

Your ChatGPT bill is being rewritten.

After all that's been said, what does this have to do with us ordinary people?

It has a lot to do with us. The AI we use in the future will become increasingly 'high-quality and affordable.'

Greg Brockman said during the release: 'The world is shifting towards an economy driven by computing power.'

This may sound grand, but upon closer thought, it's not exaggerated.

Whether it's a student using AI to understand a paper, a small business using an API to build a customer service system, or an independent developer relying on a code assistant to complete a project, computing power is quietly at work behind the scenes.

The cost of computing power directly determines whether these tools are for the few or the daily routine of the many.

By positioning Jalapeño on the inference side, Open AI is essentially doing one thing: making intelligence cheaper, faster, and more stable.

Every second reduction in inference latency, every penny drop in the price per API call, translates to ordinary people being able to use AI more smoothly and affordably, even accessing advanced capabilities that today belong only to large enterprises and paying users.

For example, you used to spend money on a monthly subscription to get a 'high-end AI chat companion' that could chat with you. But by next year, for the same price, you might get an 'all-round intern' who can finish your entire PPT and clean up your Excel sheets while at it.

The price stays the same, but the 'worker's' capabilities have skyrocketed several levels—more value for the same price.

Of course, it's not just about users who spend money. When computing power becomes as cheap as utilities, what was once a 'superbrain' only affordable for big companies might become a standard tool accessible to everyone.

In Closing

Looking back at the whole thing, the most brilliant part is actually this cycle:

AI designs chips → chips run AI → stronger AI designs the next generation of chips.

Better infrastructure → higher computing efficiency → better training and services → stronger models → better products → more users and revenue → reinvestment in next-generation infrastructure.

Once this flywheel starts spinning, it accelerates on its own.

Google has TPU, Amazon has Trainium, Meta has MTIA, Microsoft has Maia. And now OpenAI has Jalapeño.

Once this flywheel starts spinning, it accelerates on its own.

Google has TPU, Amazon has Trainium, Meta has MTIA, Microsoft has Maia. And now OpenAI has Jalapeño.

For large model companies, building chips is shifting from a question of 'should we' to 'can we.'

Of course, from tape-out to mass production, there are still issues like yield rates, supply chains, and software ecosystems to resolve.

Keep in mind, NVIDIA's CUDA platform is like the 'universal language' or 'utility network' of the AI world. Everyone has been using it for over a decade, and it's become second nature. Now, you want everyone to switch to a new system? That's no easier than getting all drivers in the country to switch to driving on the left overnight.

Look at the predecessors: Google's TPU took a decade to reach where it is today. Amazon's Inferentia went through countless pitfalls and iterations before daring to roll out on a large scale.

Does Open AI expect to reach the top overnight with a single 'pepper'? There's still a long road ahead.

But Jalapeño's emergence at least proves one thing: AI can not only help people work but also help build its own 'body.'

Using the model that best understands LLMs to design the hardware that best runs LLMs—once this cycle starts, the future where 'computing power is cheaper and AI is more widespread' will arrive even faster.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links