Limitless Podcast - OpenAI and Google Just Beat the World's Smartest Mathematicians

Episode Date: July 24, 2025

In this episode, we celebrate OpenAI and Google's historic gold medal wins at the International Math Olympiad, showcasing significant advancements in problem-solving abilities.We discuss the ...technological breakthroughs enabling these achievements and the implications for education as AI challenges traditional notions of intelligence.However, the competition was not without its share of AI drama, as the giants continue to compete at all costs in the AI game of thrones.------💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT------TIMESTAMPS0:00 Intro1:35 AI vs. Math Olympiad4:11 OpenAI's Breakthrough6:54 The Gold Medal Debate8:39 The Controversy Unfolds12:51 The Google OpenAI Drama13:42 OpenAI's Desperate Moves15:20 The Models' Progress17:38 A New Era of Intelligence21:05 The Impact on Education25:13 Redefining Intelligence25:37 Conclusion and Farewell------RESOURCESJosh: https://x.com/Josh_KaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures⁠

Transcript
Discussion (0)
Starting point is 00:00:03 All right, Josh, the AI nerds are fighting again. This past weekend, there was a very prestigious competition called the International Math Olympiad, which hosts some of the brightest, smartest mathematicians of our time. And they're typically high schoolers, and basically they come together and they take a really hard math test. This is like four to five hours, and those that score the highest get medals. You can get bronze, silver, and the highest scorers get gold medals. So what's this going to do with AI? Well, recently, over the last couple of years,
Starting point is 00:00:35 the organizers of this International Math Olympiad decided to start inviting AI models to participate as contestants, and they did terribly. Like, no one's come even near the human geniuses, except this year, Josh, where they came to play and not one, but two AI models, achieved not silver, but gold medals,
Starting point is 00:00:58 which is just an insane thing, right? So it should be all fun in games, right? What a fairy tale story. Well, unfortunately, Open AI and Google got into an online spat where they started accusing each other of cheating. Now, remember, these are trillion-dollar companies. So essentially, Josh, I was teleported this weekend back to my high school days where I felt like the teacher had to come in, separate the kids from arguing over some kind of random homework problem,
Starting point is 00:01:27 and get them to chill out. We will look back at this episode and laugh at it like it's a joke. because these AIs are, they're competing against high schoolers. That's so lame. Only high schoolers? Like, come on. You're just barely getting gold. Well, these are, in their defense, Josh, these are some pretty small high schools, man.
Starting point is 00:01:42 Like, I was looking at some of these math problems. I don't know if you can see my screen here. I'm sharing the official site. And if you look at some of these problems, here we go. And then like, okay, so they have basically, they host this competition in a different country each year. And you can kind of like download the test yourselves after the fact. to see how well you could do it. I had a look at this one.
Starting point is 00:02:04 Josh from the Afrikaans. I basically don't understand anything. One second. All right. Take a look at that. Yeah. Take a look at this. That looks like quite a bit of squiggly lines on a page.
Starting point is 00:02:17 You know what? That could be mistaken for a piece of art in a gallery if you didn't peer too closely at it. This looks insane. Okay, so I take it back. So the high school was probably pretty smart then. And I guess the AI performing as well as the high school is probably a pretty big deal, right? Because that looks like very complicated math problems that I'm assuming most of the smartest people in the world cannot solve. Exactly.
Starting point is 00:02:40 Yeah. This is like something that is technically set for high schoolers and sometimes college kids, but is meant to demonstrate prowess in the field. So there's a lot of university academics, which obviously do math degrees and you do PhDs. But those are in very specific problems. So you kind of, like in science, you just need to kind of pick and choose your lane and then dedicate your life to it. High schoolers is kind of,
Starting point is 00:03:04 and college kids are kind of like the last point before you jump into your specialization. So really, if you're the best at generalized maths, you're going to compete in this competition. And what's so interesting is, typically AI models haven't been able to perform very well because they needed a lot of context beforehand about the problem, Josh.
Starting point is 00:03:22 So they needed to know that, you know, there were certain, you know, X equals something and Y equals something, and they had to have defined parameters to kind of figure out the problem. But this was the first time that AI models basically were just given a blank sheet of paper, or not a blank sheet of paper, but they stared at the problem just as we just looked at it just now and had to read the words, read the characters, interpret what that meant in the context of that situation and the way that the question was framed, and then figuring it out themselves. So it's as if the AI models had a camera that looked at a paper,
Starting point is 00:03:55 similar way that we look at test papers as kids through eyes and figure it out themselves. So what changed? What happened in the last year that made it so much better? Because it went from, what, basically zero of six to now six or five of six? Questions answered. Now it's a gold medalist. So what happened? So listen, I'm not going to try and explain it, but maybe you and I can decipher it through the legends themselves that built these models, right? Okay. So let me paint the scene for you, Josh. Okay. It is. Saturday evening, you know, normal people are usually out and about, they're having fun, they're probably having dinner, catching up with friends, or chilling at home, watching a movie.
Starting point is 00:04:34 And this guy called Alexander Way, who is Open AI's head of reasoning. Reasoning is basically this new fancy technique that AI models have typically demonstrated, which has brought them up to like the frontier level of AI models. Basically, if your model can do reasoning, it's typically a pretty smart model, right? And he posts this tweet saying, I'm excited to share that our latest Open AI experimental reasoning LLM has achieved a longstanding grand challenge in AI, a gold medal level performance on the world's most prestigious math competition, the International Math Olympiad. And he goes on to describe, you know, how the model basically took on each problem in its own
Starting point is 00:05:15 regard and solved it and how this is a massive success and win for AI models and how, most importantly, Open AI was the first ever model to complete this. And not too long after he posts that tweet, Josh, Sam Altman jumps in here, right? And he goes, again, he kind of echoes similar thoughts. We achieved gold medal level performance on the 2025 IMO competition with general purpose reasoning. And then he kind of like shills GPT5 at the end. Basically, it's like a promotive thing for Open AI. And I will say that this is really cool because, you know,
Starting point is 00:05:47 what they've achieved is something that hasn't been done before, right? So very impressive feat. And in terms of how this works specifically, Cheryl Sue here gives a really good breakdown. She says the model solves these problems without tools like coding or lean, which is another coding tool. It just uses natural language. So as I said earlier, it kind of reads the paper and just kind of interprets what it thinks it means. And it also has the same amount of time to do the test as other kits, so 4.5 hours.
Starting point is 00:06:17 And she says, we see the model reason at a very high level, trying out. different strategies, making observations from examples, and testing different hypotheses out. And she says, it's crazy how we've gone from 12% on the AIME test, which is what GPT4A, which is Open AI's early model, got to IMO Gold, Gold, Gold, Gold Medal, in 15 months. So just to set that in context, Josh, that is a crazy leap in 15 months. Imagine going from eighth grade level math to the best math, mathematics. in the world in 15 months. It's a pretty insane thing.
Starting point is 00:06:55 Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting here is, number one, the model didn't need any context. Number two, it used really high-level reasoning to figure out the problems from first principles. And number three, it was able to test out multiple hypotheses at the same time instead of trying to one-shot the problem. Typically in the past, when AI models have been given a prompt or a problem, it tries to
Starting point is 00:07:20 just like give it its best shot. and give you one solution, Josh. Whereas what these models, these reasoning models do really well is they are able to hypothetically entertain many different scenarios and then pick the best one of which it thought it was an answer. And it ended up with the gold medal, which is insane, right?
Starting point is 00:07:34 But it wasn't entirely without a few glitches here and there, Josh. So if you look at this post from Jasper, he read through the entire kind of like problem set that Open AI's model went through. And he points out that some weird anomaly. So he kind of like talks about like how it kind of like analyze and a bunch of things. And he goes, however, the write-up is kind of messy, he goes. It overuses shorthand and sentence fragments. It introduces new terms without definitions, for example, forbidden
Starting point is 00:08:02 and sunny partners. I have no idea what either of those terms could mean, but it was just apparently just interspersing these phrases during its analysis. And so as a reviewer or as an examiner, they were reading this, they were like, sorry, wait, what is it talking about? It got to the right answer, but what is it talking about, right? The other key point from this post is it was unable to solve one problem, problem six. And I'm not even going to try and get into why it failed on that problem, but it was just particularly hard for it to figure out. But it still scored a high enough percentage that it got a gold medal. So it's basically a win for Open AI, but that's when the drama starts unfolding. So I've got this post up from Mikhail Samin, which kind of like sparks this entire
Starting point is 00:08:50 fight, Josh. He goes, according to a friend, the IMO, which is the International Math Olympiad, asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce the results. Open AI instead announced the results before the closing ceremony. And then he goes on to basically say how this is essentially like some kind of clout chasing move from Open AI. And, okay, I tried to evaluate this, Josh, from Open AI's kind of perspective, which is they basically want to steal the limelight, but also say that they were the first AI model to ever achieve gold on this competition, which puts them in a good light and makes users want to choose Open AI and solidify the branding that Open AI is the best, right? But on the other side,
Starting point is 00:09:37 you know, they're kind of like stealing the spotlight from the kids, as this post says, but that's not actually the main trope. The main trope here, Josh, is open AI wasn't the only model to achieve a gold, right? At the same time, during the same testing period, you had Google achieving the exact same score. So then the question becomes, okay, well, it just, it was whoever was ethical about announcing their own result. This post from Demis Hasabas, which is Google's head of AI, basically posts, and I'll note two days later, official results are in, Gemini, which is their flagship model, achieved gold medal level in the international. Math Olympiad, an advanced version was able to solve five out of six problems. So same as Open AI. Same thing. Struggle on the sixth problem. Incredible progress. Huge congrats to the team. And a tweet here says that Google basically had to wait for marketing to approve the tweet until
Starting point is 00:10:34 Monday. But Open AI shared there's first at 1 a.m. on Saturday and stole the spotlight. And we see this screenshot from Demis Hasabis, which, you know, he further clarifies this, basically saying, by the way, as an aside, we didn't announce on. on Friday because we respected the IMO's board's original request that all AI labs share the results only after the official results have been verified. Now that we've been given permission to share, blah, blah, blah.
Starting point is 00:10:59 He shares. So Demis is playing the like, good Samaritan. He's like, you know, we also have the good model, but we, you know, we have some pride and some manners about how we deal with these things. That's where it starts to get a little uglier, Josh, because we have open
Starting point is 00:11:14 AI chiming in to this tweet, which basically says, and this is some randomer commenting on OpenAI and this entire situation, so Open AI basically has zero advantages except the size of the team, aka the OpenAI team was claimed to be smaller than Google Gemini's team. So what he's inferring here is there's no real difference between Open AI's models and Google Gemini's models. You can pretty much use either or. Open AI maybe has a smaller team to build that model, but who the hell cares? And then one of the AI model researchers at OpenAI basically comes in and says, well, I think it's also interesting that they, they being Google,
Starting point is 00:11:56 curated and provided useful context to the model, which we did not. Feels like taking your tutors cheat with you into the exam. So shots basically being fired from Open AI saying, hey, you cheated. You gave context to your model and that was why it was able to achieve gold. We, Open AI, didn't provide any of that context and it was able to reason for first principles. There you have it. But then directly beneath it, Venei Rameshashas, who is a Google deep mine AI researcher response, it's worth noting actually that a deep think system, which is Google's AI system with no access to this corpus. So no context also got gold. Again, according to the official graders and he puts this in brackets because Open Air I didn't wait for the official creators to mark their score with exactly the same score.
Starting point is 00:12:46 So basically this is like a pissing contest between two of the top AI model providers. Here's my take, Josh. And then I really want to kind of lean into what you think about this whole debacle. Number one, this seems so childish to me. Like eventually AI models were eventually going to get smarter or smart enough to solve these mathematical problems. And I think you said this earlier on. this is something that they're going to probably laugh about 10 years from now, right? That they were able to solve whatever, the most complex mathematics problems for humans, right?
Starting point is 00:13:18 Mere humans. And now AI is off creating wonderful scientific discoveries for us that we would have never comprehended or figured out ourselves, right? So firstly, like, you're arguing over something that's so silly. But number two, this kind of seems desperate on the open AI side. And maybe I'm being biased, but like I'm just going to give you my take. Open AI has kind of had like a series of stumbles recently. They claimed that they were going to release GPT5,
Starting point is 00:13:45 which is their brand new frontier model, but they've delayed it many months now. They got outperformed by GROC 4 from XAI, so now they have a new benchmark that they need to beat, a new model that they basically need to outcompete. They claimed that they were going to release a new open source model and then delayed it after a Chinese open source model was released and had one trillion parameters
Starting point is 00:14:09 and outperformed not just their model but any other open source model out there. And so I feel like they're looking for a win, right? They released their agent this week or last week. And so, you know, that had mixed, review, mixed feedback. So I feel like Sam is desperate for a win. People are criticizing consistently their moat, asking, what is Open AI got?
Starting point is 00:14:32 They've lost a ton of researchers to meta and other companies. I feel like their backs against the wall. Sam's scared and he basically needs to grab any kind of win. So it reeks of desperation. What's your take, Josh? I do empathize with the team. They've been coming under fire from every single angle. I mean, you have Zuck poaching all of their talent,
Starting point is 00:14:52 and then all of the other open source AI models are beating them at their own game. And they're just kind of, they're really getting beat up now. And I think that they're looking to get some footing. I'm sure this probably plays a role in it. But I'm sure behind the scenes, they're really trying to fight hard to put their feet back on stable ground, to get GPT-5 out the door, to build Project Stargate and make this big infrastructure network. They need some wins.
Starting point is 00:15:16 So sure, this was probably an attempt to get ahead, make them look good, win over some more hearts and minds. But I think the most interesting part of the whole story is less the drama and more the fact that these models were able to accomplish a really impressive feat over such a short period of time. From what I understand, previously when they attempted to solve these problems, they used a custom training data set, they used custom tool sets. It was mostly a model trained on solving mathematical problems. And with this version, both the OpenAI version and the Gemini models, they were both general purpose models. They were not trained specifically with the intention of solving mathematical problems. These are the general models that people day to day are using.
Starting point is 00:15:58 They're just now able to solve these math problems using this new general intelligence. So it's a really interesting breakthrough that I think we get from reinforcement learning that now there is not so much of an advantage to training a model specific to one skill set when you could just make it great at everything. There was one thing that I noticed that some people call it cheating, other people don't. So with the mathematical, with the actual test, the high school was had to take, they're not allowed to use tools and they have a limited amount of time per question to answer. The models that, the open AI model and the Gemini model, they had infinite amount of time to answer. and they were allowed to use tools.
Starting point is 00:16:34 So there still are small differences in these. Were they allowed to, like, use the internet? I don't know the specifics. I would imagine at least calculators, at most probably the full repertoire of what we have currently available to us, which is full internet search, code writing abilities,
Starting point is 00:16:49 they could do their own mathematical checks. So I would just assume the minimal amount of constraints possible. So there was much less constraints on the models, but they did solve the questions. And I think that's super impressive. They got five out of six right, which was gold and better than almost every student, if I'm not mistaken, only a few students got the six out of six completely correct. It's just cool to see the rate of progress of these models
Starting point is 00:17:12 getting better, that over the course of the last 15 months or so, they went from horrible and narrowly trained to incredible and generally trained. And as long as that trend keeps going, I think the drama matters less than the output, which is models are getting really good at solving really hard math problems and original ones too, that the world has never seen before. Yeah, well, that last point is actually the main takeaway that I had, Josh, which is it's original never-before-seen problems. Typically, these AI models are trained on things that they've seen before. As you said, right? They trained on data sets.
Starting point is 00:17:45 So they've already seen the problem. And then they have to work out, they know the answer, and they have to work out how to get that, right? So they kind of have a leading factor. Here, it's just kind of like completely unknown. The other thing is this is kind of like the culmination of a trend, Josh, which is these AI models are really good at doing kind of binary tasks. And I don't want to reduce mathematics to binary tasks, but technically it's numbers, sequential, formulas, that kind of stuff, right?
Starting point is 00:18:18 So if you can run enough compute at a thing, and if you can get that AI model to consider all different decision parts, it's going to eventually get to the answer, right? But it's always a specific answer at the end of that. Whereas when it comes to more subjective things, more human experiential things, AI is typically struggled to improve at the same rate that it has for like all these different scientific and math problems. So I'm glad that we've reached this pinnacle feat.
Starting point is 00:18:47 I think AI models are really good at one thing and not so great at other things. And I'm excited to see how like they kind of like try to start leapfrogging each other over the next couple of years. Yeah, it's that directional progress. that we like. Math is clearly the first because you can write down proofs and you could check your work and there is an actual verifiable solution. And I think that's why we're seeing a lot of the progress start early in math and then hopefully go on to these other places. But what we are seeing is these first signs of new knowledge breakthroughs where it's solving a new and novel problem
Starting point is 00:19:23 that hasn't been released before based on its previous data set. So it's not just pattern matching like you mentioned earlier, where it has this data set of questions. It's kind of finding the right examples and then applying that logic to the question. It's actually reasoning, and it's reasoning in many instances, and then it's comparing its work, and it's coming to a conclusion. And we saw this with the Grock Heavy model last week, too, when it released, where I think the new meta is many instances solving hard problems and then comparing. So you lower that error rate more and more and more each time. And what we're seeing is great progress. So, I mean, although OpenAI and Google are fighting again, they're both fighting over exciting progress.
Starting point is 00:20:06 And sure, maybe one tried to sweep in and steal the valor. But they both did an excellent job in actually completing these problems and placing gold in a test that was previously not possible to do from an AI model. You know who the real winners are here out of this, Josh? Who's that? High school kids who now have an AI model that can do all their math homework for them. Isn't that incredible? Like, man, think about it. I wish I had that.
Starting point is 00:20:32 You have an AI model that is as smart as the smartest people on planet Earth in high school. If it could solve those math problems, it could solve anything. It sounds human as well, Josh. So, like, your teacher's going to struggle unless they use AI themselves to figure out whether you just did that yourself or completely just ran that through GPT, your mom's GPT subscription. It really forces you to reevaluate the school model, right? Because now that this information is so readily accessible, it's so easy to solve these problems, is that the actual thing worth learning? Or is it how to use these tools that's more important to get to the answer? And there's this dual-pronging approach. And we see developers and programmers talk about this a lot, where as soon as they start to rely too heavily on the tools, they start to lose their touch, they start to lose their ability to deeply understand how it reaches conclusions. But is that worth it in exchange for getting to the answer? much quicker and then being able to seek many more answers. I don't know.
Starting point is 00:21:28 It's a weird dynamic. If I was a teacher, I'd be worried because, I mean, similar to what we saw with the calculator, it can just replace the thinking process and just yield you an answer. And the thing with the calculator is like you, you're using the calculator, sure it figures out the answer for you, but you kind of loosely understand how it is working, right? You know what numbers it's crunching to get to that answer. And then typically you do a few things on a calculator. and then you get to your eventual answer for whatever the original question was.
Starting point is 00:21:59 The issue with or the concern that you're highlighting here with AI is it's doing really complex problems which kids don't even need to understand in the first place just to get an answer, which they can then give to their teacher, get a grade and then go to university. But the kids don't actually learn actively in that process. And it's going to be a concerning trend if we see kids just trying to go from zero to 100% without understanding anything in between, a trend to watch. This is our episode from a few weeks ago,
Starting point is 00:22:33 is AI making you dumber? Yes, exactly. And I think that's just going to continue to be the question. And I think the answer is it's all dependent on how you choose to use the tools that you're given. And if you use these tools as further leverage, so I'm sure these math Olympians who can actually complete the problems, would love to have this model to check the problems and to work through the problems and to figure out shortcuts on solving these problems, where if you deeply understand it,
Starting point is 00:22:56 then this becomes an amazing tool to check your work, to generate new questions for you. It's a great study, buddy. Or if you are not an Olympiad and you still want to get to the answer, well, you just kind of cheat your way through and you just ask it for exactly what you want. So it's that split again, and it's up to the person to take their own agency, solve their own problems, and try to use these for tools of leverage instead of just problem-solving machines. That actually reminds me of this tweet. I saw yesterday, Josh.
Starting point is 00:23:22 So what you're looking at here is a tweet from Dave White. Dave White is a very prestigious investment slash research advisor at this fund called Paradigm, which basically it's a crypto fund, but it is one of the wealthiest funds out there. So a lot of the investments they made were massive wins. And a lot of the reasoning of those wins was from Dave White's analysis. He is a deeply thoughtful mathematician at his core. and he is famed for doing a lot of analyses on companies, mathematical analyses that have ended up, you know,
Starting point is 00:23:56 determining whether a fund puts $100 million in a company or zero, right? So a very important job worth hundreds of millions of dollars, right? And what he says here basically is him having an identity crisis because he has looked up to the IMO, the International Math Olympiad, and he goes on to say in this tweet that subconsciously, he is, whenever he's met a gold medalist IMO champion, he's always subconsciously thought that they were smarter than him, that he is more respecting of them.
Starting point is 00:24:26 And now with this news that AI models basically can do his job for him, can reason better than him at some of these math problems, he now has an identity crisis. He doesn't know kind of where to go from this. And if people like Dave White is having this kind of like disillusioned sentiment from how smart AI is, you can imagine how this is how this is going to happen for everyone else in all of their other sectors, Josh, right? It doesn't matter if you're a
Starting point is 00:24:51 mathematician or an investment research advisor. You could be a technician in some kind of engineering industrial role or you could be a teacher or you could be a kid or a high schooler. I think this disillusionment is going to spread. And I think it's super important for people to kind of like evolve their thinking like you said, Josh, and learn how to leverage these tools versus just consume. Yeah, this is, I mean, this is crazy. There's a lot of people that are going to have to adapt this new world order of intelligence, where if you build up your entire identity around being intelligent, well, perhaps you're going to have to alter the way you present yourself as intelligent because the meaning of intelligence is becoming commoditized among these tools that are
Starting point is 00:25:32 now reduced down to a single chat box. Yep, benchmarks are going to have to reset themselves completely. But folks, that is the end of this episode. Thank you so much for tuning in again. Josh and I are going hammer and tong at Limitless. Our goal is to get you the hottest and trending topics and news fresh out the door, give you our commentary, our thoughts, and hopefully some useful insights for you. If you enjoyed this episode, if you enjoyed any of our previous episodes, please continue to share and spread them with all your friends and family and whoever you think might be interested in this.
Starting point is 00:26:04 We are getting tons of feedback from you guys, and with every episode that we released, we're getting better. So please remember to like, subscribe, follow us. It's hugely appreciative and helpful for us, and we'll see you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.