The Schema Staircase: Why AI Mathematical Reasoning Hits a Wall

Every math student knows the feeling. You open the exam, scan the first problem, and think: "I've seen this before, just with different numbers." You breathe out. You know what to do.

But what exactly changed between the moment you didn't know what to do and the moment you did? And why does that feeling vanish so completely when you sit down at a Math Olympiad?

Mathematical difficulty

We were trying to understand why large language models can now win gold at the International Math Olympiad but still can't do research mathematics with the same successrate. We kept circling back to the same intuition: the difficulty of mathematical problem-solving isn't a continuous spectrum. It's a staircase — discrete levels where each step demands a qualitatively different kind of thinking.

At the bottom, there's the exam problem that's literally the same as one you've practiced, just with the numbers swapped out. You recognize the schema, you apply the procedure, done. One step up, and the problem doesn't look familiar anymore — the challenge is figuring out which technique applies. Another step, and you need multiple techniques woven together, where it's not clear which combination works or whether you're heading in the right direction at all. Somewhere further up, you leave the territory of known techniques entirely and need a genuinely original idea — the kind László Lovász reportedly says he's happy to have once a year.

We don't know exactly how many steps there are, or whether the staircase is the same in every branch of mathematics. But the key claim is this: the transitions between levels aren't gradual. Each step involves a different type of cognitive work, not just more of the same. Calling it all "pattern matching" is like saying both a calculator and a poet are "processing symbols" — technically true, but it collapses something essential.

Polányi's hierarchical levels

Michael Polányi (Polányi Mihály, 1891–1976), Hungarian-born polyhistor, spent decades arguing that reality is organized into emergent hierarchical levels.

His idea works like this: each layer has its own rules that you cannot reduce to the layer below. Chemistry depends on physics, but physics alone doesn't explain chemistry. Biology depends on chemistry, but biology has its own principles too. The higher layer doesn't break the rules of the lower one — it uses the freedom that the lower layer leaves open.

This is exactly why mathematical difficulty is a staircase, not a slope. Each step is not just "the previous step, but harder." It is a new type of difficulty. And the relationship between neighboring steps looks a lot like Polányi's layers: the lower step is necessary, but it doesn't determine the higher one.

Why this matters now

A recent AI benchmark called MATH-Perturb (Huang et al., 2025) tested exactly the bottom of this staircase. Researchers gave language models "the same problem with different numbers" — small changes — and the models handled them fine. But when the researchers changed the problems so that the original solution method no longer worked — bigger, structural changes — performance dropped 10–25% across every model they tested. The models kept applying memorized methods without noticing those methods no longer fit.

That is a system that can handle one step but fails at the transition to the next. And that transition — the ability to notice when your current approach has stopped working — might be where the truly interesting questions about mathematical reasoning begin.

We don't have answers yet. We're not even sure we have the right questions. But we believe the schema staircase is real, and we plan to investigate it further — with concrete examples, with AI interpretability tools, and with Polányi's framework as a guide.

The Schema Staircase: Why AI Mathematical Reasoning Hits a Wall

Mathematical difficulty

Polányi's hierarchical levels

Why this matters now

Written by:

Fuszti

Member discussion: