Latest Forecast: Artificial Intelligence Set to Make Formal Verification Mainstream

12/17 2025 414

There has been extensive discussion on the various ways artificial intelligence (AI) will revolutionize software development, including enhancing efficiency, generating code, automating testing, facilitating low-code development, and even giving rise to the idea of 'everyone being a programmer.'

However, one pivotal transformation has often gone unnoticed—AI is propelling formal verification, a technology once confined to the highly academic and specialized realm, into the mainstream of practical software engineering.

Recently, Martin Kleppmann, a researcher in distributed systems at the University of Cambridge, wrote a blog post delving into AI's impact on formal verification.

In his blog, he pointed out that three trends are unfolding simultaneously:

The cost associated with formal verification is plummeting rapidly.

AI-generated code urgently necessitates a reliable alternative to manual review.

The certainty inherent in formal verification precisely offsets the probabilistic and imprecise nature of large models themselves.

The convergence of these three factors is poised to catapult formal methods from a niche research area into mainstream software engineering practices.

The real hurdle may no longer lie in technological limitations but in whether the industry is ready to update its perceptions and embrace a completely new software development paradigm.

Formal verification is not a novel concept. Over the decades, researchers have crafted a series of proof assistants and proof-oriented programming languages, such as Rocq, Isabelle, Lean, F*, Agda, and others.

These tools empower developers to write precise formal specifications for their code and, through rigorous mathematical reasoning, demonstrate that the program adheres to these specifications under all conceivable circumstances—including extreme boundary conditions that are nearly impossible to cover through conventional testing.

Leveraging these tools, humans have achieved some remarkable engineering milestones, such as formally verified operating system microkernels, C language compilers, and cryptographic protocol stacks. These accomplishments unequivocally prove that formal verification is not only technically feasible but also theoretically near-perfect.

However, the crux of the matter has never been about feasibility but rather about its value proposition.

For an extended period, formal verification has been predominantly confined to research institutions and a handful of high-security projects. Even in traditionally high-reliability software domains like medical devices and aerospace, the industry's adoption of formal methods has been exceedingly limited. The fundamental reason is that the process of proving is overly difficult, time-consuming, and costly.

A classic example is the seL4 microkernel. As of 2009, the kernel's implementation code comprised merely about 8,700 lines of C code. Yet, to achieve formal verification, approximately 20 person-years were invested, resulting in over 200,000 lines of Isabelle proof scripts.

On average, each line of implementation code necessitated over 20 lines of proof and nearly half a workday of manual effort. More significantly, there may be only a few hundred individuals worldwide who genuinely possess the skills to write such proofs. This expertise is highly dependent on experience and specialized 'jargon,' rendering it nearly impossible to scale.

From an economic standpoint, this is not a cost-effective proposition. For the vast majority of software systems, the anticipated losses stemming from software defects are often lower than the costs required for formal verification. More realistically, the costs associated with software defects are frequently borne by users rather than developers themselves. This has left formal verification in an awkward position of being 'theoretically correct but practically not cost-effective.'

This landscape is now being disrupted by artificial intelligence.

In recent years, programming assistants based on large models have not only made rapid strides in generating implementation code but have also begun to exhibit remarkable potential in writing proof scripts. Currently, this process still necessitates guidance from human engineers with professional backgrounds. However, the trend is unmistakable: as model capabilities improve, proof generation itself is expected to become highly automated in the coming years.

Once the primary cost of proof shifts from 'human experts' to 'machine computation,' the economic foundation of formal verification will be completely rewritten. Verification will no longer be a luxury but may become a routine step in the software development process.

More significantly, AI not only reduces the cost of formal verification but also creates a compelling demand for it. As an increasing amount of code is generated by artificial intelligence, traditional manual code reviews will rapidly become a bottleneck. Instead of having humans inspect AI-written programs line by line, a more reasonable approach is to have AI provide verifiable correctness proofs for the code it generates.

If a piece of code can be formally proven to meet its specifications, then whether it is 'handwritten' by humans becomes irrelevant. Compared to seemingly sophisticated yet flaw-ridden 'handcrafted code,' an AI-generated code accompanied by rigorous proof is actually more trustworthy.

From this perspective, proof scripts themselves may represent one of the most ideal application scenarios for large models. Even if the model produces 'hallucinations' during reasoning, it poses no substantial risk—because any invalid proof will be rejected by the proof checker, compelling the model to try again. The proof checker itself is a small piece of rigorously verified core code that is nearly impossible to bypass.

Of course, this does not imply that the software world will become 'defect-free' overnight. As the proof process becomes automated, the real challenge will shift to the definition of specifications themselves: how can we ensure that what we prove is precisely the property we truly care about? Writing formal specifications still demands experience, judgment, and rigorous thinking. Nevertheless, the cost of defining specifications remains significantly lower than manually completing a full proof.

In the future, artificial intelligence may even participate in writing specifications themselves, translating between natural language requirements and formal languages. Although this process carries the risk of losing semantic details, compared to the uncertainties prevalent in current software engineering, it is a manageable and verifiable risk.

If all this materializes, the paradigm of software development will undergo a fundamental transformation. Developers will only need to describe 'what properties the code should possess' in a high-level, declarative manner, leaving the implementation and proof processes to AI. Just as programmers today no longer concern themselves with the machine code generated by compilers, in the future, we may no longer need to read specific implementation code.

References:

https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.