Two Silicon Valley Geniuses Clash: Will AI Programming Lead to Real-World Disaster?

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

06/05 2026 541

Two AI geniuses at the heart of Silicon Valley, George Francis Hotz and Andrej Karpathy, have clashed over AI programming, reflecting a deeper rift within Silicon Valley and even the broader U.S. AI market.

By June 1, George Hotz's blog post had been circulating for over a week. On May 24, he asserted that introducing AI Agents into software development would be one of the most costly mistakes in the industry's history—initially sounding like the grumblings of a famous hacker, but subsequent days revealed that this was more than just a squabble between him and Karpathy.

Gary Marcus, known for his skepticism of large models, immediately elevated the discussion to a trillion-dollar market cap level—arguing that when even hardcore AI advocates like Hotz were saying 'code is garbage and will drag down big companies,' the entire generative AI movement became a massive lie—because programming is its cornerstone, and once people realize this, the bubble will burst.

As AI Agents begin writing code for humans, the core debate in the software industry is no longer about 'whether it works' but rather: Is it truly improving productivity, or is it creating a harder-to-detect engineering disaster with the most expensive Tokens?

And this is not just a technical squabble between two ordinary programmers.

—Lead

Behind Karpathy stand Geoffrey Hinton, Fei-Fei Li, OpenAI, Tesla, and Anthropic.

As one of OpenAI's original eleven co-founders, Tesla's former AI lead, a student of Fei-Fei Li, and a PhD graduate from Stanford's lab that revolutionized computer vision history, he invented the term 'vibe coding' and has been one of the best at explaining complex AI systems to the general public in the past decade. Now, he has stepped into the heart of Claude's R&D.

Behind Hotz stand iPhone hacks, PlayStation 3 reverse engineering, Sony lawsuits, Elon Musk, and the hacker culture of autonomous driving.

At 17, he became the first person globally to jailbreak the iPhone; later, he reverse-engineered the PlayStation 3, leading to a lawsuit from Sony; then, he founded comma.ai, aiming to challenge Tesla's Autopilot with aftermarket devices. He is neither an academic nor a corporate executive. His path to fame has always been simple: pick up a screwdriver, pry open the system, and find the screw that truly matters.

Both stand deep within the AI world, but on opposite sides. One is inside the power center, witnessing how model capabilities are rewriting software engineering; the other stands outside the system, convinced that Agent programming will push the software industry into an expensive self-deception.

And the train of AI replacing human programming has already left the station—Google says 75% of new code is now generated by AI and then approved by engineers; Zuckerberg predicts that AI will soon write and review most of the code for AI teams.

On the tool side, Claude Code, Codex, Cursor, and Devin have pushed 'AI writing code' from editor completions to a more radical position: letting machines read requirements, modify files, run tests, check documentation, submit patches, and even coordinate multiple sub-Agents simultaneously.

Hotz wants to stop this train, while Karpathy tries to stop Hotz.

To halt it, Hotz stepped forward, saying: This could be one of the most expensive mistakes in software development history. In his blog, he wrote: 'Agents can't program, and it's becoming increasingly difficult for us to realize that they can't.'

Five days before he said this, Karpathy had just joined Anthropic and publicly expressed a clear stance: AI Agents have already transformed software development.

These two statements tore open the industry's contradictions. The question is no longer whether AI is useful—that's too superficial. The real issue is: When low-skilled individuals can generate vast amounts of code using the most expensive Tokens, and many large companies (including Chinese tech giants) are treating 'AI adoption rates' as a metric of organizational progress, while more 'one-person companies' rely on AI programming to tell their stories... these 'vested interests' will experience immense anxiety and self-contradiction.

Because they too want to know: Is the software industry truly undergoing a productivity revolution, or must it first settle all the technical debt, bugs, maintenance costs, and talent gaps before that?

Hotz and Karpathy are the most dramatic faces of this debate.

One has spent his life dismantling machines until he pries open closed systems. The other has spent his life explaining machines until he makes complex systems understandable to all. Both are obsessed with understanding complex systems.

Why are these two so representative in their clash over AI programming? It goes back to their youth.

In 2007, in Glen Rock, New Jersey, a 17-year-old held one of the world's most closed—and tempting—electronic devices: the original iPhone. It only worked on AT&T's network, with hardware and software heavily locked down by Apple. Hotz wanted to set it free.

He didn't start by writing a sleek software tool but instead got to work directly. Using a small screwdriver for eyeglass repairs, he unscrewed the back, slid a guitar pick along the nearly invisible seam of the casing, and pried it open. Then came soldering, circuit modifications, and hardware-level challenges. A few days later, he became the first person in the world to publicly unlock the iPhone. Then he did something very Hotz-like: he went online to brag. Overnight, he became famous, earning the nickname geohot.

This scene was almost a preview of his lifelong methodology. Faced with closed systems, his first instinct was never to accept the interfaces but to dismantle them—to see for himself which chip defined the boundaries.

Around the same time, another teenager was grappling with complex systems, but instead of prying them open, he was explaining them.

Born in 1986 in Bratislava, then the capital of Czechoslovakia, Karpathy moved to Toronto with his family at 15. In 2006, he started a YouTube channel called badmephisto, posting Rubik's Cube speed-solving tutorials, breaking down a seemingly inscrutable system into step-by-step instructions anyone could follow. Those videos later amassed over 9 million views, even watched by world-class speed-cuber Feliks Zemdegs.

This was no mere side note. Much of what Karpathy did later carried the same spirit: an intense desire to share, almost a compulsion to explain. When he saw complex systems, he not only wanted to understand them but to explain them to others. This was true for the Rubik's Cube, neural networks, ImageNet, GPT, and vibe coding alike.

Even their academic lineages were starkly different. Hotz never completed any formal academic path, being quintessentially self-taught and unmanageable.

Karpathy followed a glittering orthodox trajectory: undergraduate studies at the University of Toronto under 'Godfather of Deep Learning' Geoffrey Hinton, followed by a PhD at Stanford under 'Mother of ImageNet' Fei-Fei Li.

One breaks down walls outside the system; the other ascends to the core of the establishment. Years later, these two paths brought them to the same question: When machines start writing code, how must humans still understand machines?

Hotz soon tasted the cost of fame. After the iPhone, he turned to Sony, gaining read-write access to the PlayStation 3 by late 2009 and publishing the key information to unlock the entire machine.

Sony's lawyers sued the young man in his early twenties. Hotz didn't hide but escalated the situation: he crowdfunded defense fees in two days, even releasing a rap video retort ing Sony, arguing one simple point—why should they dictate what I can do with something I paid for?

In 2011, they settled, with Hotz promising not to reverse-engineer any Sony products again. But the lawsuit angered the hacker community, leading to a series of attacks on Sony, including the PSN breach affecting ~77 million accounts and a 20+ day service outage. Hotz repeatedly denied involvement.

This incident revealed something: when technical conflicts entangle with closed ecosystems, legal machinery, and community emotions, they spill into larger systemic events—explaining his later aversion to all complex systems. What mattered to him was never surface functionality but whether it could be understood, controlled, and dismantled—a crucial factor in his lack of fundamental trust in AI programming.

On the other side of the continent, Karpathy was doing something seemingly mundane yet emblematic of his life: personally competing with machines to 'see.'

In 2014, as deep learning began showing promise in image recognition, the strongest convolutional networks on ImageNet had already achieved astonishingly low error rates. To gauge whether machines were approaching human levels, one first needed to know human performance.

So Karpathy sat down and did something no ordinary person would—he mimicked the machine, examining and categorizing images one by one, measuring true human performance: a top-5 error rate of 5.1%, while his less patient colleagues reached as high as 15%. He discovered that humans struggled most with fine-grained distinctions, like differentiating hundreds of nearly identical dog breeds. From then on, he jokingly called himself 'ImageNet's reference human.'

A man competing with machines to see the world with his own eyes, personally measuring the line between human and machine—this was the wildness beneath his orthodox resume.

Around the same time, in 2015, he wrote the later widely cited 'The Unreasonable Effectiveness of Recurrent Neural Networks.'

He trained a small character-level model to imitate Shakespeare, mathematical typesetting, and even Linux kernel source code. The results looked plausible at first glance but fell apart under scrutiny—it learned the 'appearance' of code but not its 'meaning.' Karpathy was both amazed and dismayed by this model that produced 'plausible yet broken' code.

In a sense, Karpathy's journey was even closer to—and more contradictory toward—AI programming than Hotz's. He had experimented with it, sighed over its unsatisfactory results, yet ultimately became its champion. How many intellectual twists did this journey entail?

What truly placed both men on the same scale was Elon Musk.

Around 2015, Musk tried to recruit Hotz to Tesla for the next-gen Autopilot, offering a reportedly multimillion-dollar bonus. Hotz refused, publicly accusing Musk of 'constantly changing conditions' and boasting that he could build the self-driving system alone. Then he actually went home and, in his garage, spent ~1 month converting a Honda Acura into a self-driving prototype, prompting a famous Bloomberg report.

He even challenged Musk to a duel: his Acura versus a Tesla with Autopilot on Los Angeles' 405 freeway, but Musk declined.

Hotz secured top-tier VC funding to found comma.ai, offering a $999 aftermarket self-driving kit, released in fall 2016. Then regulators stepped in—a California DMV ban, followed by a polite yet threatening letter from the U.S. National Highway Traffic Safety Administration (NHTSA) with fines up to ~$20,000 per day, containing a line that left him speechless: 'It is almost certain that drivers will use your product in ways beyond its intended purpose.'

Hotz immediately canceled the product, tweeted that he'd rather spend his life developing awesome tech than dealing with regulators and lawyers, then open-sourced the solution and walked away.

This theme repeated throughout his life: exiting iPhone jailbreaking, canceling comma one, and in 2022, volunteering to 'intern' for Musk at newly acquired Twitter for a month in exchange for San Francisco living costs.

Before leaving, he mimicked Musk's style, polling fans on whether to resign; when ~60% said stay, he left anyway, leaving with 'Time to go write code'—a man forever outside the system, forever believing that 'one person who truly understands the system can outmaneuver an entire army.'

Karpathy chose the opposite path. In 2017, he accepted Musk's invitation to join Tesla as Senior Director of AI, reporting directly to Musk and overseeing Autopilot's vision systems for five years. During this time, he proposed the influential 'Software 2.0' concept: future software would no longer be written line-by-line by programmers but trained from data.

After leaving Tesla, he briefly returned to OpenAI to work on GPT-4 and ChatGPT, then founded an AI education company, returning to what he did best since the Rubik's Cube era—explaining the most arcane topics to the broadest audiences.

It was during this teaching phase that he casually coined 'vibe coding' on X, which went viral globally. His initial description was lighthearted: 'Let go entirely to feeling, embrace exponential curves, forget code itself exists.' To this, he added his now-famous line: 'The hottest new programming language is English.'

One rejected Musk and vowed to go solo; the other accepted Musk and stepped into the core. Their opposite choices regarding Musk precisely corresponded to differing trusts in an older question: When facing massive systems, should you trust the individual who understands everything, or the machine that keeps getting stronger?

The true turning point of this story came in winter 2025. The fascination lies in how both men flipped positions, in opposite directions, at nearly the same time, using the same newly matured models.

First, Karpathy. Few knew that as late as the second half of 2025, the inventor of 'vibe coding' himself was still skeptical of AI Agents, publicly stating that these products were far from mature and that hype had outpaced reality. Even the creator of the term didn't fully believe in AI programming yet.

But what really flipped his perspective was a specific weekend: he wanted to create a small dashboard for his home security camera that could analyze video footage. A few months ago, this would have taken him an entire weekend. This time, he simply explained the task to the Agent in plain language and watched it work for about half an hour—debugging on its own, searching online for solutions, writing code, and configuring services—before delivering a functional product.

This experience changed his tune. He said, "Programming is becoming unrecognizable." He noted that the Agent was basically non-functional before December but has since become capable. He made a strong judgment—this is definitely not a "business as usual" moment for the software industry.

So, he took the playful term "vibe coding" a step further in a more serious direction, calling it "agentic engineering."

But Karpathy never became a mindless cheerleader for AI programming, a fact that makes him far more formidable than he appears.

Because even at his most excited, he remained measured. He emphasized that the Agent is essentially still an "intern"—you must oversee its aesthetics, judgment, taste, and supervision. It only truly excels at clearly defined, verifiable tasks. A person's deep technical expertise is not a smaller multiplier in this era but a larger one.

To some extent, he is right. What he approaches cautiously even aligns with what Hotz is concerned about. But coming from his own experience, he is someone whose mind was changed by evidence—and he changed honestly, admitting he was on the opposite side just a few weeks ago.

Hotz is moving in the opposite direction, and many overlook a key point: before this reversal, he was not an old-school programmer who never touched AI. He tried it—deeply—in the real projects he cared about most: using various Agents to write tinygrad and reverse-engineer hardware. For someone who has spent his life taking machines apart to their core, this was never just "playing around."

After six months, his conclusion is harsh: Agents pile all progress upfront and then hand you a slot machine lever, asking you to pull it repeatedly in hopes of finishing the polishing—but it always falls just short.

What he truly hates is this "just short" part. To most people, if it runs, it's good enough; if it can be demonstrated, it's fine; if it meets deadlines, it's acceptable. But for someone like Hotz, the most valuable part of software engineering often lies in that final bit: why this abstraction works, why the boundaries are drawn this way, why shortcuts can't be taken here, why this bug can't be fixed by simply regenerating... This is actually the first complex system in his life that he can only oppose but cannot pry open with a screwdriver.

He even suspects that the entire narrative of "you'll be left behind if you don't use AI" is itself a "psychological operation manufactured to sell Agents." He delivered the line that has since been repeatedly quoted: This will be a golden age where tons of garbage code floods in and a dark age for high-quality, refined work.

Finally, he made his stance clear: on the issue of LLMs, he sides with LeCun and Marcus. LeCun recently denied once again that LLMs possess intelligence, arguing that intelligence lies in finding solutions in unfamiliar situations, not in imitating existing things with varying precision.

In Hotz's view, what true programming Agents need is a world model, not the current system he harshly describes as "commenting out failed tests and then telling you all tests have passed."

Placing their reversals side by side: one, who once measured the line between "what humans can see and machines cannot," admits that line has been pushed to a new position; the other, who has spent his life physically taking machines apart, declares that bad code written by machines is becoming more like good code—harder to distinguish and thus more dangerous. One says the gap is narrowing; the other says understanding is diminishing.

This is not just a clash between two opinion leaders but software's sudden glimpse of its two possible futures in the same winter.

Is AI programming a false proposition?

No, that question is already outdated. From Cursor to Claude Code, from Devin to Codex, it is no longer confined to demo videos but has entered corporate workflows, management efficiency narratives, and investors' reimagining of software companies' cost structures. For some startups, it has even become part of their valuation logic: why can a small team do what used to take dozens? Because they have a team of Agents.

This is precisely what makes it dangerous. Once something is placed into valuation logic and organizational metrics, it is no longer just a technical issue. It becomes KPI-driven, demanded by management to "be used," and used by low-level practitioners as a fig leaf.

To judge whether Hotz is too pessimistic, we must call upon witnesses from the real world—and their testimonies stand at opposite ends of the argument.

The first witness comes from the lab. METR, a nonprofit dedicated to studying AI capabilities and risks, conducted a randomized controlled trial: experienced open-source developers were asked to complete tasks in familiar, mature projects with or without AI. The results were stark: with AI, they were 19% slower. Even more jarring was the second half: these developers predicted AI would make them 24% faster beforehand and still believed they were 20% faster afterward, even though they were slower.

Feeling faster but actually slower, with no self-awareness. This is almost the lab version of Hotz's claim that "badness is becoming more hidden."

Later, METR tried to rerun the experiment with newer tools but couldn't—because more and more developers refused to work under "no AI" conditions. Tools may be getting stronger, but organizations are finding it harder to know how much stronger they truly are—because adoption rates themselves are obscuring real output.

The second witness comes from the code itself. GitClear, a code analysis firm, analyzed over 200 million lines of code changes for two consecutive years. The data was blunt: in 2024, copied-and-pasted code blocks surged to several times their previous levels and, for the first time in history, exceeded refactored, reusable code. The proportion of "code movement" (a measure of refactoring) fell from about a quarter in 2021 to less than 10% in 2024. Meanwhile, the proportion of newly written code rewritten within two weeks nearly doubled over several years.

Translated into plain language: people are rapidly dumping new code into systems but increasingly too lazy to (too lazy) to organize, reuse, or maintain coherence. This is not laziness from individual programmers but a default tendency of the tools—AI excels at generation, not cleanup. And code duplication is a breeding ground for bugs.

The third witness should not logically be on this side. Mario Zechner, creator of the well-known game framework libGDX, and Armin Ronacher, creator of Flask (whose influence is unavoidable for anyone who has written Python web applications), both Hand crafted (personally built) core components of viral AI programming tools—by all rights, they should be cheering this wave. Instead, they issued warnings, calling much of today's AI programming "vibe slop"—meaning programmers no longer carefully design and test but let AI quickly slap something together, resulting in software that cannot withstand the test of time.

Zechner warned that infrastructure is collapsing and software is more its loopholes appeared one after another (bug-ridden) than ever. "This game can last a few more months or even years, but it will eventually cost us," he said. The weight of his words lies in the fact that he is not an outsider critic—he is inside the house.

The fourth witness is money.

If the first three witnesses spoke in technical language, Uber provides the language CFOs and COOs understand. Uber's COO said in an interview that internal AI costs are becoming increasingly difficult to justify as "reasonable investments." Previously, Uber's CTO complained that the company had already exhausted its 2026 Claude Code budget. This sparked serious internal discussions at Uber: what exactly are we getting for all these token expenditures?

After discussing with multiple senior engineering leaders, the COO reached an uncomfortable conclusion—more tokens do not mean the company is delivering proportionally more truly useful features. "That correlation simply doesn't exist yet," he said.

When these four witnesses are placed side by side, a common fact emerges: adoption does not equal value.

A company can have 95% of its engineers using AI, burn through massive tokens, and see usage rates soar on dashboards—yet still cannot clearly say: have consumer-facing features increased proportionally? Has system quality improved? Have incidents decreased?

In fact, this is the trickiest part of AI programming—the local experience feels extremely strong, but overall accounting is difficult. Programmers writing scripts genuinely feel faster; entrepreneurs building demos genuinely feel amazed. But at the organizational level, tokens cost real money, code reviews consume real labor, future maintenance incurs real costs, and online incidents cause real damage.

So, while the saying "low-level people use the most expensive tokens to produce massive amounts of garbage code" sounds harsh, it hits the nail on the head. It is not about insulting junior programmers but about a distortion in organizational incentives—when companies start rewarding AI adoption rates, more commits, faster delivery, and "tenfold code volume," what gets amplified first may not be the best engineering judgment but the most easily quantifiable surface output.

This is especially fatal in large organizations—where feedback loops are slow, responsibility is diffuse, codebases are massive, and technical debt is deep. An Agent can generate seemingly correct code without knowing why someone deliberately avoided that approach a decade ago. A new hire can use an Agent to submit patches rapidly without understanding which business landmines they might be triggering.

More importantly, when managers see "adoption rates" rise, they assume productivity is rising—but they cannot see the maintenance costs quietly accumulating for the next three years. This is currently happening frequently in China—major AI giants talk about "adoption rates" and "replacement rates" but dare not mention risks or openly discuss costs.

This is where Hotz's true value lies. What he opposes is not "can AI write code"—AI clearly can, and will only get better. He opposes a sleight of hand: confusing "generation speed" with "engineering capability," mistaking "passing tests" for "understanding systems," and equating "AI adoption rates" with "productivity."

In his view, Agents learn the distribution of "programming outputs," not the process by which human engineers form judgments, weigh trade-offs, and bear consequences in real systems.

Code is not like an article—where a piece with "some true facts" has Spread value (propagation value). Code's value does not lie in "parts of it might be right" but in being correct under critical conditions and not becoming an obstacle when systems evolve.

The difficulty with AI programming is that bad code from the past was obvious, but bad code generated by AI today is highly concealed—uniform in style, with pretty variable names, complete comments, and passing tests. It does not fail clumsily but fails smoothly.

Syntax, style, and tests—quality signals once used to judge "trustworthiness"—are now being batch (batch), cheap (cheaply) forged by models.

Karpathy is honest. Though his stance is clear, he does not avoid this point. In a recent podcast, while insisting that using Agents correctly can raise productivity tenfold or more, he admitted: when he actually reads Agent-generated code, he sometimes gets "heart palpitations"—bloated, full of copy-paste, fragile abstractions. It runs, but it's rough. The biggest cheerleader personally confirms GitClear's data.

Of course, the optimists' rebuttal is strong: models will improve, contexts will lengthen, memory will get better. Today's intern-like Agent may not stay an intern tomorrow. The most impartial middle witness is Simon Willison—co-creator of the Django framework and one of the first to warn about "prompt injection"—who has always been extremely alert to AI reliability. But even he admits that AI coding tools have genuinely changed over the past year. He even said vibe coding and agentic engineering are becoming closer than he had hoped. This is subtle: tools are indeed getting stronger, and even the most cautious are using them more. But precisely because of that, boundaries are harder to define.

So the real question is no longer "should we slam the brakes on this train?" It cannot stop, nor should it simply stop. The real question is: does this train have brakes at all?

Placing Hotz's and Karpathy's disagreements into the real industrial environment reveals two very difficult paths.

If Hotz is right, the industry will someday collectively admit that Agent programming is fundamentally unreliable. The consequences would be far greater than a single blog post—the valuation logic of all AI model companies would be severed at once. Programming is currently the largest, most monetizable application scenario for large models. Remove it, and Wall Street will not comply, investment funds will not recognize (approve), and the market values of giants and startups that write "how much code AI wrote" into financial reports and roadshows would collapse instantly.

Even more critical is the other side: programmers and organizations already dependent on Agents would be asked to switch back to old modes—rewriting line by line, debugging manually. But they cannot go back. The growth paths of junior engineers have shifted; team workflows have been rebuilt.

If Hotz's opponents are right—meaning Agent programming's capabilities will not be fundamentally denied but will keep rising—the scene will only get more chaotic. Code generation becomes easier; AI and tool companies' valuations soar; large enterprises adopting Agents see market caps surge. Myths of "one-person companies" and "solo founders building billion-dollar tools" will be retold endlessly.

But consider this: if Hotz's true fear materializes in this golden age of hype and frenetic growth, it is entirely possible that massive amounts of "smoothly failing" code will be continuously merged into payment systems, air traffic control, power grids, trading engines—massive systems no one fully understands or wants to take responsibility for. Collapse will not arrive as "this function was written wrong today" but as a disaster, potentially triggered by an unexpected boundary condition in one fell swoop.

Both paths warrant deep discussion because this is not merely a debate between optimists and pessimists but two simultaneously valid questions—

Will AI Agents change software development? The answer is already yes. Will AI Agents automatically make the software industry better? The answer is far less certain.

Epilogue

They might both be right, or they might both be wrong. Hotz may have underestimated the speed of tool evolution—historically, many skills that seemed non-delegable were eventually absorbed, encapsulated, and commoditized by tools. Kaparthy may have underestimated the speed of organizational degradation—the same tool can be a lever in the hands of a master but a debt machine in a low-performing organization, and technological progress never automatically brings disciplinary progress; sometimes, what it destroys first is precisely those disciplines that seem to have become 'less necessary.'

And the rest of us are living in the days before the check for their bet comes due.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links