Limitless: An AI Podcast - OpenAI and Google Just Beat the World's Smartest Mathematicians
Episode Date: July 24, 2025In this episode, we celebrate OpenAI and Google's historic gold medal wins at the International Math Olympiad, showcasing significant advancements in problem-solving abilities.We discuss the ...technological breakthroughs enabling these achievements and the implications for education as AI challenges traditional notions of intelligence.However, the competition was not without its share of AI drama, as the giants continue to compete at all costs in the AI game of thrones.------💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT------TIMESTAMPS0:00 Intro1:35 AI vs. Math Olympiad4:11 OpenAI's Breakthrough6:54 The Gold Medal Debate8:39 The Controversy Unfolds12:51 The Google OpenAI Drama13:42 OpenAI's Desperate Moves15:20 The Models' Progress17:38 A New Era of Intelligence21:05 The Impact on Education25:13 Redefining Intelligence25:37 Conclusion and Farewell------RESOURCESJosh: https://x.com/Josh_KaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
All right, Josh, the AI nerds are fighting again.
This past weekend, there was a very prestigious competition called the International Math Olympiad,
which hosts some of the brightest, smartest mathematicians of our time.
And they're typically high schoolers.
And basically, they come together and they take a really hard math test.
This is like four to five hours.
And those that score the highest get medals.
You can get bronze, silver, and the highest scorers get gold medals.
So what's this going to do with AI?
Well, recently, over the last couple of years, the organizers of this International Math Olympiad decided to start inviting AI models to participate as contestants.
And they did terribly.
Like, no one's come even near the human geniuses, except this year, Josh, where they came to play.
And not one, but two AI models achieved not silver, but gold medals, which is just an insane thing, right?
So it should be all fun in games, right?
What a fairy tale story.
Well, unfortunately, OpenAI and Google got into an online spat
where they started accusing each other of cheating.
Now, remember, these are trillion-dollar companies.
So essentially, Josh, I was teleported this weekend back to my high school days
where I felt like the teacher had to come in,
separate the kids from arguing over some kind of random homework problem,
and get them to chill out.
We will look back at this episode and laugh at it like it's a joke.
because these AIs are, they're competing against high schoolers.
That's so lame.
Only high schoolers?
Like, come on.
And you're just barely getting gold.
Well, these are, in their defense, Josh, these are some pretty small high schools, man.
Like, I was looking at some of these math problems.
I don't know if you can see my screen here.
I'm sharing the official site.
And if you look at some of these problems, here we go.
And then like, okay, so they have basically, they host this competition in a different country each year.
And you can kind of like download the test yourselves after the fact.
to see how well you could do it.
I had a look at this one.
Josh from the Afrikaans.
I basically don't understand anything.
One second.
All right.
Take a look at that.
Yeah.
Take a look at this.
That looks like quite a bit of squiggly lines on a page.
You know what?
That could be mistaken for a piece of art in a gallery if you didn't peer too closely at it.
This looks insane.
Okay, so I take it back.
So the high school was probably pretty smart then.
And I guess the AI.
performing as well as the high school
is probably a pretty big deal, right?
Because that looks like very complicated math problems
that I'm assuming most of the smartest people in the world cannot solve.
Exactly.
Yeah, this is like something that is set,
technically set for high schoolers and sometimes college kids,
but is meant to demonstrate prowess in the field.
So there's a lot of university academics,
which obviously do math degrees and do PhDs,
but those are in very specific problems.
So you kind of, like in science,
you just need to kind of pick and choose your lane
and then dedicate your life to it.
High schoolers and college kids
are kind of like the last point
before you jump into your specialization.
So really, if you're the best at generalized maths,
you're going to compete in this competition.
And what's so interesting is,
typically AI models haven't been able to perform very well
because they needed a lot of context beforehand
about the problem, Josh.
So they needed to know that, you know,
there was certain, you know,
X equals something and why.
equal something and they had to have defined parameters to kind of figure out the problem.
But this was the first time that AI models basically were just given a blank sheet of paper,
or not a blank sheet of paper, but they stared at the problem just as we just looked at it just now
and had to read the words, read the characters, interpret what that meant in the context of that
situation and the way that the question was framed and then figuring it out themselves.
So it's as if the AI models had a camera that looked at a paper, similar way that we look at test
papers as kids throw eyes and figure it out themselves. So what changed? What happened in the last year that
made it so much better? Because it went from what, basically zero of six to now six or five of six?
Questions to answer and now it's a gold medalist. So what happened? So listen, I'm not going to try and
explain it, but maybe you and I can decipher it through the legends themselves that built these
models, right? Okay. So let me paint the scene for you, Josh. It is Saturday evening. You know,
Normal people are usually out and about, they're having fun.
They're probably having dinner, catching up with friends or chilling at home, watching a movie.
And this guy called Alexander Way, who is Open AI's head of reasoning.
Reasoning is basically this new fancy technique that AI models have typically demonstrated,
which has brought them up to like the frontier level of AI models.
Basically, if your model can do reasoning, it's typically a pretty smart model, right?
And he posts this tweet saying, I'm excited to share that our latest open AI experiment,
Rational Reasoning LLM has achieved a longstanding grand challenge in AI, a gold medal level performance
on the world's most prestigious math competition, the International Math Olympiad. And he goes on to
describe, you know, how the model basically took on each problem in its own regard and solved it,
and how this is a massive success and win for AI models and how, most importantly, Open AI
was the first ever model to complete this. And not too long after he posed.
that tweet, Josh. Sam Altman jumps in here, right? And he goes, again, he kind of echoes similar
thoughts. We achieved gold medal level performance on the 2025 IMO competition with general
purpose reasoning. And then he kind of like shills GPT5 at the end. Basically, it's like a
promotive thing for OpenAI. And I will say that this is really cool because, you know,
what they've achieved is something that hasn't been done before, right? So very impressive feat.
And in terms of how this works specifically, Cheryl Sue here,
It's a really good breakdown.
She says, the model solves these problems without tools like coding or lean, which is another coding tool.
It just uses natural language.
So as I said earlier, it kind of reads the paper and just kind of interprets what it thinks it means.
And it also has the same amount of time to do the test as other kits, so 4.5 hours.
And she says, we see the model reason at a very high level, trying out different strategies,
making observations from examples, and testing different hypotheses out.
And she says it's crazy how we've gone from 12% on the AIME test, which is what GPT-FORO, which is Open AI's early model, got to IMO Gold, Gold, Gold, Medal in 15 months.
So just to set that in context, Josh, that is a crazy leap in 15 months.
Imagine going from eighth grade level math to the best mathematician in the world in 15 months.
It's a pretty insane thing.
Yeah, I'd say so.
So essentially the breakthrough that Cheryl is highlighting here is number one, the model didn't need any context.
Number two, it used really high-level reasoning to figure out the problems from first principles.
And number three, it was able to test out multiple hypotheses at the same time instead of trying to one-shot the problem.
Typically in the past, when AI models have been given a prompt or a problem, it tries to just give it its best shot and give you one solution, Josh.
whereas what these models, these reasoning models do really well,
is they are able to hypothetically entertain many different scenarios
and then pick the best one of which it thought it was an answer.
And it ended up with the gold medal, which is insane, right?
But it wasn't entirely without a few glitches here and there, Josh.
So if you look at this post from Jasper,
he read through the entire kind of like problem set
that Open AI's model went through.
And he points out that some weird anomalies.
So he kind of like talks about like how it kind of like analyze and a bunch of things.
And he goes, however, the write-up is kind of messy, he goes.
It overuses shorthand and sentence fragments.
It introduces new terms without definitions.
For example, forbidden and sunny partners.
I have no idea what either of those terms could mean,
but it was just apparently just interspersing these phrases during its analysis.
And so as a reviewer or as an examiner, they were reading this, they were like,
sorry, wait, what is it talking about?
It got to the right answer, but what?
is it talking about, right? The other key point from this post is it was unable to solve one problem,
problem six. And I'm not even going to try and get into why it failed on that problem, but it was
just particularly hard for it to figure out. But it still scored a high enough percentage that it got
a gold medal. So it's basically a win for Open AI, but that's when the drama starts unfolding.
So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh. He goes,
According to a friend, the IMO, which is the International Math Olympiad,
asked AI companies not to steal the spotlight from kids
and to wait a week after the closing ceremony to announce the results.
Open AI instead announced the results before the closing ceremony.
Yates.
And then he goes on to basically say how this is essentially like some kind of clout chasing move from OpenAI.
And, okay, I tried to evaluate this, Josh, from Open AI's kind of perspective,
which is they basically want to steal the limelight,
but also say that they were the first AI model
to ever achieve gold on this competition,
which puts them in a good light
and makes users want to choose Open AI
and solidify the branding that Open AI is the best, right?
But on the other side, you know,
they're kind of like stealing the spotlight from the kids,
as this post says, but that's not actually the main trope.
The main trope here, Josh, is open AI wasn't the only model
to achieve a gold, right?
At the same time, during the same testing period, you had Google achieving the exact same score.
So then the question becomes, okay, well, it just, it was whoever was ethical about announcing their own result.
This post from Demis Hasabas, which is Google's head of AI, basically posts, and I'll note two days later, official results are in.
Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad.
An advanced version was able to solve five out of six problems.
So same as Open AI.
Same thing.
Struggle on the sixth problem.
Incredible progress.
Huge congrats to the team.
And a tweet here says that Google basically had to wait for marketing to approve the tweet until Monday.
But Open AI shared there's first at 1 a.m. on Saturday.
And we see the screenshot from Demis Hasabas, which, you know, he further clarifies this
basically saying, by the way, as an aside, we didn't announce on Friday because we respect.
the IMO's board's original request that all AI labs share the results only after the
official results have been verified. Now that we've been given permission to share, blah, blah, blah,
he shares. So Demis is playing the like, good Samaritan. He's like, yeah, we also have the good
model, but we, you know, we have some pride and some manners about how we deal with these things.
That's where it starts to get a little uglier, Josh, because we have open AI chiming in to this
tweet, which basically says, and this is some randomer commenting on Open AI and this entire
situation, so Open AI basically has zero advantages except the size of the team, aka the
open AI team was claimed to be smaller than Google Gemini's team. So what he's inferring here is
there's no real difference between Open AI's models and Google Gemini's models. You can pretty
much use either or open AI maybe has a smaller team to build that model, but who the hell cares?
And then one of the AI model researchers at OpenAI basically comes in and says, well, I think it's also interesting that they, they being Google, curated and provided useful context to the model, which we did not.
Feels like taking your tutors cheat sheet with you into the exam.
So shots basically being fired from Open AI saying, hey, you cheated.
You gave context to your model.
And that was why it was able to achieve gold.
We, Open AI, didn't provide any of that context
and it was able to reason from first principles.
There you have it.
But then directly beneath it, Vinay Rameshas,
who is a Google DeepMine AI researcher response,
it's worth noting actually that a deep think system,
which is Google's AI system,
with no access to this corpus,
so no context, also got gold.
Again, according to the official graders,
and he puts this in brackets,
because Open AI didn't wait for the official graders
to mark their score,
with exactly the same.
score. So basically this is like a pissing contest between two of the top AI model providers.
Here's my take, Josh, and then I really want to kind of lean into what you think about this
whole debacle. Number one, this seems so childish to me. Like, eventually AI models were
eventually going to get smarter or smart enough to solve these mathematical problems. And I think,
I think you said this earlier on. This is something that they're going to probably laugh about
10 years from now, right? That they were able to solve.
whatever, the most complex
mathematics problems for humans, right?
Mere humans.
And now AI is off creating
wonderful scientific discoveries
for us that we would have never
comprehended or figured out ourselves, right?
So firstly, like,
you're arguing over something that's so silly.
But number two,
this kind of seems desperate
on the Open AI side.
And maybe I'm being biased,
but like I'm just going to give you my take.
Open AI has kind of had like a series of stumbles recently.
They claimed that they were going to release GPT
which is their brand new frontier model,
but they've delayed it many months now.
They got outperformed by GROC4 from XAI,
so now they have a new benchmark
that they need to beat,
a new model that they basically need to outcompete.
They claimed that they were going to release
a new open source model
and then delayed it after a Chinese open source model
was released and had one trillion parameters
and outperformed not just their model,
but any other open source model out there.
And so I feel like
They're looking for a win, right?
They released their agent this week or last week.
And so, you know, that had mixed, review, mixed feedback.
So I feel like Sam is desperate for a win.
People are criticizing consistently their moat, asking, what is Open AI got?
They've lost a ton of researchers to meta and other companies.
I feel like their backs against the wall.
Sam's scared.
And he basically needs to grab any kind of win.
So it reeks of desperation.
What's your take, Josh?
I do empathize with the team.
They've been coming under fire from every single angle.
I mean, you have Zuck poaching all of their talent,
and then all of the other open source AI models are beating them at their own game.
And they're just kind of, they're really getting beat up now.
And I think that they're looking to get some footing.
I'm sure this probably plays a role in it.
But I'm sure behind the scenes,
they're really trying to fight hard to put their feet back on stable ground,
to get GPT5 out the door,
to build Project Stargate and make this big infrastructure network.
They need some win.
So sure, this was probably an attempt to get ahead, make them look good, win over some more hearts and minds.
But I think the most interesting part of the whole story is less the drama and more the fact that these models were able to accomplish a really impressive feat over such a short period of time.
From what I understand, previously when they attempted to solve these problems, they used a custom training data set.
They used custom tool sets.
It was mostly a model trained on solving mathematical problems.
And with this version, both the Open AI version and the Gemini models, they're both general purpose models.
They were not trained specifically with the intention of solving mathematical problems.
These are the general models that people day to day are using.
They're just now able to solve these math problems using this new general intelligence.
So it's a really interesting breakthrough that I think we get from reinforcement learning that now there is not so much of an advantage to training a model specific to one skill set,
when you could just make it great at everything.
There was one thing that I noticed that some people call it cheating, other people don't.
But the, so with the mathematical, with the actual test, the high school was had to take,
they're not allowed to use tools and they have a limited amount of time per question to answer.
The models that, the Open AI model and the Gemini model, they had infinite amount of time to answer
and they were allowed to use tools.
So there still are small differences in these.
Were they allowed to like use the internet?
I don't know the specifics.
I would imagine at least calculators.
at most probably the full repertoire of what we have currently available to us, which is full internet
search, code writing abilities, they could do their own mathematical checks. So I would just assume
the minimal amount of constraints possible. So there was much less constraints on the models,
but they did solve the questions. And I think that's super impressive. They got five out of six
right, which was gold and better than almost every student, if I'm not mistaken. Only a few
students got the six out of six completely correct. It's just cool to see the rate of progress of these
models getting better, that over the course of the last 15 months or so, they went from horrible
and narrowly trained to incredible and generally trained. And as long as that trend keeps going,
I think the drama matters less than the output, which is models are getting really good at solving
really hard math problems and original ones too, that the world has never seen before.
Yeah, well, that last point is actually the main takeaway that I had, Josh, which is it's original
never-before-seen problems. Typically, these AI models are trained on things.
that they've seen before.
As you said, right?
They trained on data sets.
So they've already seen the problem.
And then they have to work out,
they know the answer,
and they have to work out how to get that, right?
So they kind of have a leading factor.
Here, it's just kind of like completely unknown.
The other thing is this is kind of like
the culmination of a trend, Josh,
which is these AI models are really good at doing kind of binary tasks.
And I don't want to reduce mathematics to binary tasks,
But technically, it's numbers, sequential, formulas, that kind of stuff, right?
So if you can run enough compute at a thing and if you can get that AI model to consider all different decision parts, it's going to eventually get to the answer, right?
But it's always a specific answer at the end of that, right?
Whereas when it comes to more subjective things, more human experiential things, AI is typically struggled to improve.
prove at the same rate that it has for like all these different scientific and math problems.
So I'm glad that we've reached this pinnacle feat.
I think AI models are really good at one thing and not so great at other things.
And I'm excited to see how like they kind of like try to start leapfrogging each other over the next couple of years.
Yeah, it's that directional progress that we like.
Math is clearly the first because you can write down proofs and you could check your work.
And there is an actual verifiable solution.
And I think that's why we're seeing a lot of the progress start early in math and then hopefully go on to these other places.
But what we are seeing is these first signs of new knowledge breakthroughs where it's solving a new and novel problem that hasn't been released before based on its previous data set.
So it's not just pattern matching like you mentioned earlier where it has this data set of questions.
It's kind of finding the right examples and then applying that logic to the question.
It's actually reasoning, and it's reasoning in many instances, and then it's comparing its work,
and it's coming to a conclusion. And we saw this with the GROC heavy model last week, too, when it released,
where I think the new meta is many instances solving hard problems and then comparing.
So you lower that error rate more and more and more each time. And what we're seeing is great
progress. So, I mean, although Open AI and Google are fighting again, they're both fighting over exciting
progress. And sure, maybe one tried to sweep in and steal the valor, but they both did an excellent
job in actually completing these problems and placing gold in a test that was previously not possible
to do from an AI model. You know who the real winners are here out of this, Josh? Who's that?
High school kids who now have an AI model that can do all their math homework for them.
Isn't that incredible? Like, man, think about it. I wish I had that. You have an AI model that is
as smart as the smartest people on planet Earth in high school,
if it could solve those math problems,
it could solve anything.
It sounds human as well, Josh.
So, like, your teacher's going to struggle
unless they use AI themselves
to figure out whether you just did that yourself
or completely just ran that through GPT,
your mom's GPT subscription.
It really forces you to reevaluate the school model, right?
Because now that this information is so readily accessible,
it's so easy to solve these problems,
is that the actual thing worth learning?
or is it how to use these tools that's more important to get to the answer?
And there's this dual-pronged approach.
And we see developers and programmers talk about this a lot,
where as soon as they start to rely too heavily on the tools,
they start to lose their touch,
they start to lose their ability to deeply understand how it reaches conclusions.
But is that worth it in exchange for getting to the answer much quicker
and then being able to seek many more answers?
I don't know.
It's weird dynamic.
If I was a teacher, I'd be worried because, I mean,
similar to what we saw with the calculator, it can just replace the thinking process and just
yield you an answer. And the thing with the calculator is like you, you're using the calculator,
sure it figures out the answer for you, but you kind of loosely understand how it is working,
right? You know what numbers it's crunching to get to that answer. And then typically you do
a few things on a calculator and then you get to your eventual answer for whatever the original
question was. The issue with or the concern that what you're hiring,
highlighting here with AI is it's doing really complex problems which kids don't even need to
understand in the first place just to get an answer, which they can then give to their teacher,
get a grade, and then go to university. But the kids don't actually learn actively in that process.
And it's going to be a concerning trend if we see kids just trying to go from zero to 100
percent without understanding anything in between, a trend to watch.
This is our episode from a few weeks ago.
Is AI making you dumber?
Yes.
And I think that's just going to continue to be the question.
And I think the answer is it's all dependent on how you choose to use the tools that you're given.
And if you use these tools as further leverage, so I'm sure these math Olympiads who can actually complete the problems,
would love to have this model to check the problems and to work through the problems and to figure out shortcuts on solving these problems, where if you deeply understand it, then this becomes an amazing tool to check your work, to generate new questions for you.
it's a great study buddy or if you are not an olympiad and you still want to get to the answer well you just
kind of cheat your way through and you just ask it for exactly what you want so it's that it's that split again
and it's up to the person to take their own agency solve their own problems and try to use these for
for tools of leverage instead of just problem solving machines that actually reminds me of this
tweet i saw yesterday josh um so what you're looking at here is a tweet from dave white
Dave White is a very prestigious investment slash research advisor at this fund called Paradigm,
which basically it's a crypto fund, but it is one of the wealthiest funds out there.
So a lot of the investments they made were massive wins.
And a lot of the reasoning of those wins was from Dave White's analysis.
He is a deeply thoughtful mathematician at his core.
And he is famed for doing a lot of analyses on companies, mathematical analyses
that have ended up, you know, determining whether a fund puts $100 million in a company or zero, right?
So a very important job worth hundreds of millions of dollars, right?
And what he says here basically is him having an identity crisis because he has looked up to the IMO,
the International Math Olympiad, and he goes on to say in this tweet that subconsciously,
he is, whenever he's met a gold medalist IMO champion, he's always subconsciously thought that they were smarter than him,
that he is more respecting of them.
And now with this news that AI models basically can do his job for him,
can reason better than him at some of these math problems,
he now has an identity crisis.
He doesn't know kind of where to go from this.
And if people like Dave White is having this kind of like disillusion sentiment
from how smart AI is,
you can imagine how this is going to happen for everyone else
in all of their other sectors, John, right?
It doesn't matter if you're a mathematician or an investment,
research advisor, you could be a technician in some kind of engineering industrial role,
or you could be a teacher, or you could be a kid or a high schooler. I think this disillusionment
is going to spread, and I think it's super important for people to kind of evolve their thinking,
like you said, Josh, and learn how to leverage these tools versus just consume. Yeah, this is,
I mean, this is crazy. There's a lot of people that are going to have to adapt this new world
order of intelligence, where if you build up your entire identity around being
intelligent, well, perhaps you're going to have to alter the way you present yourself as
intelligent because the meaning of intelligence is becoming commoditized among these tools that
are now reduced into a single chat box. Yep, benchmarks are going to have to reset themselves
completely. But folks, that is the end of this episode. Thank you so much for tuning in again.
Josh and I are going hammer and tong at Limitless. Our goal is to get you the hottest and
trending topics and news fresh out the door, give you our commentary, our thoughts, and hopefully
some useful insights for you. If you enjoyed this episode, if you enjoyed any of our previous
episodes, please continue to share and spread them with all your friends and family and whoever
you think might be interested in this. We are getting tons of feedback from you guys. And with
every episode that we release, we're getting better. So please remember to like, subscribe, follow us.
It's hugely appreciative and helpful for us. And we'll see you on the next one.
