ChatGPT 5.5 Pro Solved an Open Math Problem in an Hour

Tim Gowers posted something remarkable this week: he fed ChatGPT 5.5 Pro an open research question from a combinatorics paper by Mel Nathanson, and the model solved it. Not approximately solved it. Produced a novel construction, improved the bound three separate times, and generated preprint-ready LaTeX. Meanwhile, the conversation around GPT-5.5's "improved token efficiency" is burying the actual story: burning fewer tokens doesn't mean anything if the price per token went up to match.

What ChatGPT Actually Did With the Math

The problem was a diameter question from additive number theory: constructing sets with small sumset diameter using efficient building blocks. Nathanson's paper left it as an open question. ChatGPT 5.5 Pro worked through it in stages, and Gowers logged the times.

Quadratic upper bound construction: 17 minutes, 5 seconds
Improved bound from exponential in k to exponential in sqrt(k): 16 minutes, 41 seconds
Further improvement to polynomial in k: 13 minutes, 33 seconds, plus 9 minutes, 12 seconds for verification
LaTeX formatting for three preprint-style outputs: 2 minutes 23 seconds, 47 minutes 39 seconds, and 31 minutes 40 seconds

The model's core contribution was inventing what Gowers describes as "B-sets" and "C-sets": constructions that behave like half a geometric series squeezed into a polynomial interval. This is not retrieval. It's not pattern-matching against known proofs. The model used techniques grounded in finite field theory (Singer 1938, Bose-Chowla 1963) and Sidon sets to build something genuinely new.

Isaac Rajagopal, an MIT researcher, reviewed the improved bound and confirmed it was correct at both the line-by-line and conceptual levels. Gowers put it plainly:

"It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove."

That's the real headline. Not "AI does math" in the vague hype-adjacent sense. A specific, verifiable, research-level result on an open problem, produced in under an hour.

The Token Efficiency Story Is Mostly a Red Herring

GPT-5.5 Pro does use tokens more efficiently than earlier versions, generating more meaningful output per token consumed. OpenAI has positioned this as a win for users. The problem is what people actually running cost comparisons are finding: the per-token price appears to have increased enough to offset the efficiency gain. Fewer tokens burned, higher price per token, roughly the same bill.

"the token efficiency improvement means nothing if the pricing per token went up enough to cancel it out which seems to be what actually happened"

To be clear: exact per-token pricing comparisons between 5.5 Pro and previous versions haven't been independently verified in the research available here, so treat that claim as directionally accurate but not confirmed. What is confirmed is that "token efficiency" is not a synonym for "cheaper," and anyone evaluating GPT-5.5 purely on that metric is asking the wrong question.

The Right Question Is Output Quality Per Dollar

The math experiment makes the real calculus obvious. If you're using a frontier model for work that actually requires reasoning, the metric that matters is whether the output quality justifies the cost. Not token count. Not efficiency ratios.

For the Gowers use case, there's no question. A research mathematician got multiple novel proofs with verified correctness in a single session. Whatever that cost, it's competitive with weeks of human research time. For simpler tasks like drafting emails or writing boilerplate code, the cost-quality tradeoff looks completely different, and the efficiency framing matters even less.

The extended thinking times are also worth flagging. Sessions where the model reasoned for 47 or more minutes are not free, and they require context windows and infrastructure that most lightweight workflows don't need. GPT-5.5 Pro's ceiling is clearly high. Whether that ceiling is relevant to your workload is a separate question.

Bottom Line

The token efficiency narrative around ChatGPT 5.5 Pro is a distraction. The model solving an open combinatorics problem in under an hour is not. If your work involves genuine reasoning tasks, complex problem-solving, or research-level thinking, GPT-5.5 Pro is worth serious evaluation. If you're trying to cut API costs on simple workflows, the efficiency improvements won't save you what the pricing changes take back.

Sources

Tim Gowers: "A recent experience with ChatGPT 5.5 Pro" (Gowers's Weblog)
Reddit r/artificial: GPT-5.5 token efficiency vs. pricing discussion

ChatGPT 5.5 Pro Solved an Open Math Problem in Under an Hour. But Don't Call It Cheaper.

What ChatGPT Actually Did With the Math

The Token Efficiency Story Is Mostly a Red Herring

The Right Question Is Output Quality Per Dollar

Bottom Line

Sources

You Might Also Like

ChatGPT 5.5 Pro Solved an Open Math Problem in Under an Hour. But Don't Call It Cheaper.

What ChatGPT Actually Did With the Math

The Token Efficiency Story Is Mostly a Red Herring

The Right Question Is Output Quality Per Dollar

Bottom Line

Sources

You Might Also Like

Every Sunday: the 5 AI tools, papers, and posts worth your time.