To Err is AI

April 28, 2025

Chess and coding

I was thinking about the progress of AI and the progress of chess. There are a lot of parallels here.

One less obvious analogy is how chess algorithms work. They analyze the game tree and quickly prune obviously wrong parts of it. People often say things like “I asked AI to do one thing and it did it wrong.” That’s not a useful framing. If you compare solving a problem to exploring a game tree, it’s actually quite similar. AI, like humans, can invest in the wrong solution path.

When solving a problem starts getting harder and harder, that’s often a sign that the approach is wrong. Right now, AI gets stuck exactly like that — it follows one solution and commits to it. But that’s not how we solve problems. We make mistakes, rewrite, sometimes throw out most of the code.

AI doesn’t have to be perfect

I often see people benchmarking AI with numbers like “80% accuracy” on some coding tasks. Two things to keep in mind:

These benchmarks are biased and likely overfit
You don’t need the first output to be perfect

What matters is that the output is above a certain threshold and that the reasoning can self-correct.

Let’s say solving a problem has a certain probability of success. And let’s assume the chance of fixing a broken solution is the same as getting it right in the first place.

If the AI is 80% accurate, the chance of success looks like this:

n = 1 → 80%
n = 2 → 96%
n = 3 → 99.2%

So with three tries, you’re already above 99%. Obviously this is simplified, but it shows you don’t need perfection. You need retries and some form of feedback.

Base Accuracy (p): % (e.g. 80 → 0.80)

Threshold (%): % (e.g. 95 → 0.95)

Max Attempts (n):

At n = –, cumulative probability ≥ threshold. Current threshold crossing probability: –%

We're stuck expecting perfect output

Right now we expect the AI to write perfect code in one shot. That's not realistic. Without proper backtracking, you’re basically rolling the dice once and hoping it lands on a full solution. That’s not how real coding works.

The real work starts after the first version. Refactoring, debugging, shifting abstractions, sometimes going back to rethink what the system even is. AI doesn’t do that yet in a meaningful way - or it would be extremely expensive and time-consuming to apply it.

You can’t solve non-trivial problems without exploring dead ends and doubling back. And real backtracking is expensive. It requires keeping internal state, understanding where things broke, and trying again in a smarter way. It’s not just running the same prompt with different wording.

We’re not there yet.

Backtracking should be built in

Being wrong and backtracking is fundamental. That’s something future AI coding systems will do better than now.

Don’t think of problem-solving as a straight path. It’s more like a labyrinth with dead ends. Right now, AI gets stuck by just following one line. But real problem-solving involves jumping out of a path and trying something else entirely.

I think it is often overlooked that in complex systems an initial solution is often not the best one. It is sometimes a matter of performing multiple experiments and going back to the "whiteboard" to plan another approach. With AI written code we must take a similar approach - just don't assume that the first solution is the best one.

Horizon effect and why AI just cannot solve it?

Again the analogy to chess seems appropriate. With AI chess algorithms there is something like Horizon Effect. Basically what it implies that even with the best and fastest algorithms there is some future state of the game that we cannot predict the outcome of.

Any complex solution will probably deal with a similar effect - you cannot write the whole system in one go because there are some unforseen consequences of future choices that you cannot just predict.

Backtracking sudoku example

Below you can see how backtracking works when solving a sudoku puzzle with only 1 good answer. In this naive implementation it tries to fill the missing numbers just trying out different possibilities. With every new entry in the grid it checks whether the conditions are still met - all blocks, lines, and rows must have unique numbers. If it happens that the solution is wrong it tries another number until it cannot try any new number - at this point it means that the previous number was wrong.

In the worst case this algorithm can start over if the initial guess was wrong. There are many heurestics that can be applied here - but this implementation is naive on purpose.

Speed: 100ms

Compute won’t just make responses smarter

GPUs aren’t just about model size. They’ll probably be used to explore multiple solution paths in parallel. Not just deeper thinking, but more branches. More pruning. More restarts.

Just like chess engines didn’t just become “smarter,” they became better at searching.

The recent success of AlphaEvolve seems to confirm that. AI is very good at guessing and it still takes thousands or millions of very good guesses to improve some hard problems.

AI code might become hard to follow

If we keep pushing in the direction of chess, AI will eventually write solutions that are ahead of what people write. Some of those solutions will be difficult to understand. And they’ll be studied just to figure out how they work.

But as with chess, people still play it and enjoy it. Computers didn’t kill the game - they deepened it. They showed us patterns we missed, and the game is more popular than ever. Coding might follow the same path.

AI might write code that solves complex problems better. But writing code, solving problems, building things - it should still be fun. That’s the part we shouldn’t lose.