Artificial Intelligence April 2, 2026

How AlphaEvolve Cracked Open Complexity Theory With New Math

A coding agent built by Google DeepMind has done something that decades of human effort could not: it discovered intricate combinatorial structures that immediately translate into new theorems in complexity theory. The system, called AlphaEvolve, pairs the generative power of Google’s Gemini large language models with evolutionary search and automated verification to explore mathematical landscapes far too vast for manual investigation. The results aren’t speculative – they come with machine-verified proofs of correctness.

AlphaEvolve’s contributions span from breaking a 56-year-old record in matrix multiplication to constructing gadgets and graphs that tighten our understanding of computational hardness. What makes this particularly striking is the system’s generality. It isn’t a narrow tool designed for one problem class. It evolved entire codebases, discovered novel optimization procedures, and generated structures that experts describe as “far more complex” than anything previously found by human or computer-assisted methods. Here’s how it works and what it found.

The Engine: How AlphaEvolve Evolves Code Into Discoveries

AlphaEvolve operates through an iterative evolutionary pipeline. It begins with populations of code snippets, evaluates the mathematical structures those snippets produce, and then uses an LLM to mutate the most successful candidates toward better solutions. This feedback loop – generate, evaluate, select, mutate – mirrors natural selection, but applied to algorithms rather than organisms.

The system leverages an ensemble of Gemini models. Gemini Flash, the fastest and most efficient variant, maximizes the breadth of ideas explored, while Gemini Pro provides depth through more sophisticated suggestions. Together, they propose code that implements algorithmic solutions, which automated evaluators then score against objective performance metrics.

What sets AlphaEvolve apart from a standard LLM generating code is the verification layer. Every proposed structure must pass rigorous automated checks, eliminating the hallucination problem that plagues standalone language models. The system represents its search targets in three distinct ways: as direct solutions, as constructor functions that build solutions, or at a meta-level where it designs entire search algorithms – including custom loss functions and gradient-based optimizers. This meta-level capability is what enabled some of its most counterintuitive discoveries, producing approaches that no human researcher would have attempted.

Shattering Strassen’s 56-Year Matrix Record

The headline achievement is AlphaEvolve’s discovery of a faster algorithm for multiplying 4×4 complex-valued matrices. Since Volker Strassen’s landmark 1969 algorithm, the best known method required 49 scalar multiplications. AlphaEvolve found a novel tensor decomposition that accomplishes the same task in just 48 scalar multiplications – a roughly 2% reduction that applies to exact multiplication of both complex and real matrices.

A 2% improvement in a fundamental operation might sound modest in isolation. It isn’t.

Matrix multiplication is the computational backbone of modern AI training, scientific simulation, and graphics rendering. At scale, this translates to estimated savings of $150,000 to $500,000 per AI model in training costs, with potential for over $20 million in deferred hardware costs if widely adopted. AlphaEvolve also refined algorithms for 14 other matrix sizes beyond the 4×4 case, and its discoveries are verifiable through accompanying Google Colab notebooks. This represents a significant advance over DeepMind’s earlier AlphaTensor system, which specialized in matrix multiplication but only found improvements for 4×4 matrices using binary arithmetic.

New Theorems in Complexity Theory

AlphaEvolve’s most intellectually significant contributions lie in theoretical computer science, where it generated new theorems by discovering combinatorial structures that had eluded researchers for years. The work, detailed in the paper “Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory,” targeted two distinct areas of computational hardness.

MAX-4-CUT: Pushing the Inapproximability Frontier

The MAX-k-CUT problem asks you to partition a graph’s nodes into k sets to maximize the number of edges crossing between different sets. It’s NP-hard, meaning no efficient exact algorithm is expected to exist. The critical question for theorists is: how well can approximation algorithms do?

For MAX-4-CUT, the previous best result proved it is NP-hard to approximate within a factor of 0.9883. AlphaEvolve discovered a new gadget reduction – an intricate structure involving 19 variables (nodes) with a complex weighting scheme where some connections carry up to 1,429 times the weight of others. This gadget established a new inapproximability bound of 0.987, meaningfully tightening what we know about the limits of efficient computation.

In the mature field of hardness of approximation, improvements like this typically require significant new techniques or combinatorial insights. The gadget AlphaEvolve found is substantially more complex than prior constructions, which typically involved around 10 nodes with far simpler weight distributions.

Ramanujan Graphs and Average-Case Hardness

The second complexity theory breakthrough concerns the hardness of certifying properties of random graphs – specifically, bounding the MAX-2-CUT and maximum independent set of sparse random graphs. Recent theoretical work connected this problem to the existence of specific Ramanujan graphs: deterministic graphs that mimic the statistical properties of random graphs.

Prior computer-assisted methods had found such graphs on up to 10 nodes. AlphaEvolve navigated a vastly larger search space to discover 4-regular Ramanujan graphs with unusually large 2-cuts on as many as 163 nodes. These discoveries significantly improved the lower bounds for average-case hardness. Combined with new non-AI algorithmic progress, the results nearly settled the computational hardness of these questions, matching upper and lower bounds to within the third decimal place.

The Lifting Technique: From Finite Structures to Universal Theorems

A natural question arises: how does finding a specific finite structure – a gadget with 19 nodes, a graph with 163 nodes – prove anything universal? The answer lies in a technique called “lifting.” In complexity theory, established proof frameworks rely on the existence of specific, highly optimized finite structures. When a better structure is found, the entire framework lifts this local improvement into a stronger universal result.

Think of a proof as a long chain. AlphaEvolve replaces one link in that chain with a stronger one, while keeping the connections to the rest of the proof intact. To certify overall correctness, you only need to verify the finite structure that was evolved – something a computer program can do automatically, without any human intervention. This is what makes AlphaEvolve’s contributions rigorous rather than speculative. Every structure it discovers comes with a machine-checkable certificate of correctness.

Beyond Complexity Theory: A Broader Mathematical Toolkit

AlphaEvolve’s reach extends well beyond complexity theory. When applied to over 50 open problems across mathematical analysis, geometry, combinatorics, and number theory, the system rediscovered state-of-the-art solutions in roughly 75% of cases. In 20% of cases, it improved upon the best known solutions.

Problem Domain Previous Best AlphaEvolve Result Improvement
4×4 Complex Matrix Multiplication 49 scalar multiplications (Strassen, 1969) 48 scalar multiplications ~2% reduction, breaking 56-year record
MAX-4-CUT Inapproximability NP-hard within factor 0.9883 New bound of 0.987 (19-node gadget) Tighter hardness threshold
Ramanujan Graphs Up to 10 nodes Up to 163 nodes with large 2-cuts Bounds matched to third decimal place
Hexagon Packing (11 hexagons) Side length 3.943 units (Morandi, 2015) Side length 3.931 units 0.3% edge reduction, 0.6% area savings
Kissing Number (11 dimensions) Previous lower bound 593 non-overlapping spheres New lower bound established

The hexagon packing result is a good example of AlphaEvolve’s unconventional approach. For packing 11 unit regular hexagons into a bounding hexagon, the previous best solution from 2015 used uniformly aligned hexagons with a bounding side length of approximately 3.943 units. AlphaEvolve reduced this to 3.931 units by tilting each inner hexagon at varying angles – a strategy that’s simple in retrospect but wasn’t explored by human geometers. This kind of optimization has practical applications in semiconductor wafer layouts and battery anode design.

The kissing number result is equally notable. This geometric challenge, which has fascinated mathematicians for over 300 years, concerns the maximum number of non-overlapping spheres that can simultaneously touch a central unit sphere. AlphaEvolve discovered a configuration of 593 outer spheres in 11 dimensions, establishing a new lower bound.

Real-World Impact: From Theory to Production

AlphaEvolve isn’t confined to abstract mathematics. Its algorithms have been deployed across Google’s computing infrastructure with measurable results.

These gains compound. In already-heavily-optimized environments, even fractional improvements cascade across fleet-wide infrastructure. The 0.7% compute recovery in Borg, for instance, means that at any given moment, more tasks can be completed on the same hardware footprint – a sustained efficiency gain with significant energy and cost implications.

What Makes This Different From Previous AI Systems

AlphaEvolve represents a qualitative shift from earlier systems like AlphaTensor, which was purpose-built for matrix multiplication and operated within a narrower search framework. Three capabilities distinguish it.

First, generality. AlphaEvolve tackles problems across mathematics, hardware design, compiler optimization, and scheduling using the same evolutionary pipeline. It isn’t retrained or restructured for each domain.

Second, meta-level search. Rather than just proposing solutions, AlphaEvolve can design the search algorithms themselves. For matrix multiplication, it created a gradient-based optimization procedure complete with custom loss functions, weight initialization strategies, and hyperparameter sweeps – requiring 15 non-trivial mutations during the evolutionary process. This meta-capability produced approaches described as things “no human would have tried,” including counterintuitive complex-to-real matrix conversions.

Third, verified correctness. Every discovery comes with automated proof verification. When the system finds a new gadget for MAX-4-CUT or a Ramanujan graph with specific properties, the correctness of that structure is confirmed computationally without human involvement. This addresses the fundamental challenge of using AI in mathematics, where absolute correctness is non-negotiable.

The Road Ahead for AI-Driven Mathematical Discovery

AlphaEvolve’s results suggest a future where AI systems serve as genuine research partners in theoretical mathematics and computer science – not replacing human mathematicians, but navigating search spaces too vast for manual exploration. The system’s ability to find gadgets with 19 variables and weight ratios of 1,429:1, or Ramanujan graphs on 163 nodes, demonstrates that LLM-driven evolutionary search can reach regions of combinatorial space that were previously inaccessible.

The practical implications extend beyond Google’s infrastructure. The matrix multiplication improvements alone could reshape the economics of large-scale AI training. The complexity theory results push forward our fundamental understanding of what computers can and cannot efficiently solve. And the system’s success across 50+ open mathematical problems – improving 20% of them – hints at a much broader role for AI in advancing human knowledge.

Perhaps most telling is a recursive detail: AlphaEvolve enhanced the efficiency of training the very large language models that power AlphaEvolve itself. That kind of self-improving loop, grounded in verified mathematical correctness rather than unchecked generation, may be the most significant signal of all about where algorithmic discovery is heading.

Sources