Limitless: An AI Podcast - OpenAI and Google Just Beat the World's Smartest Mathematicians

Starting point is 00:00:03 All right, Josh, the AI nerds are fighting again. This past weekend, there was a very prestigious competition called the International Math Olympiad, which hosts some of the brightest, smartest mathematicians of our time. And they're typically high schoolers. And basically, they come together and they take a really hard math test. This is like four to five hours. And those that score the highest get medals. You can get bronze, silver, and the highest scorers get gold medals.

Starting point is 00:00:32 So what's this going to do with AI? Well, recently, over the last couple of years, the organizers of this International Math Olympiad decided to start inviting AI models to participate as contestants. And they did terribly. Like, no one's come even near the human geniuses, except this year, Josh, where they came to play. And not one, but two AI models achieved not silver, but gold medals, which is just an insane thing, right? So it should be all fun in games, right? What a fairy tale story. Well, unfortunately, OpenAI and Google got into an online spat

Starting point is 00:01:10 where they started accusing each other of cheating. Now, remember, these are trillion-dollar companies. So essentially, Josh, I was teleported this weekend back to my high school days where I felt like the teacher had to come in, separate the kids from arguing over some kind of random homework problem, and get them to chill out. We will look back at this episode and laugh at it like it's a joke. because these AIs are, they're competing against high schoolers.

Starting point is 00:01:35 That's so lame. Only high schoolers? Like, come on. And you're just barely getting gold. Well, these are, in their defense, Josh, these are some pretty small high schools, man. Like, I was looking at some of these math problems. I don't know if you can see my screen here. I'm sharing the official site.

Starting point is 00:01:48 And if you look at some of these problems, here we go. And then like, okay, so they have basically, they host this competition in a different country each year. And you can kind of like download the test yourselves after the fact. to see how well you could do it. I had a look at this one. Josh from the Afrikaans. I basically don't understand anything. One second.

Starting point is 00:02:09 All right. Take a look at that. Yeah. Take a look at this. That looks like quite a bit of squiggly lines on a page. You know what? That could be mistaken for a piece of art in a gallery if you didn't peer too closely at it. This looks insane.

Starting point is 00:02:25 Okay, so I take it back. So the high school was probably pretty smart then. And I guess the AI. performing as well as the high school is probably a pretty big deal, right? Because that looks like very complicated math problems that I'm assuming most of the smartest people in the world cannot solve. Exactly.

Starting point is 00:02:40 Yeah, this is like something that is set, technically set for high schoolers and sometimes college kids, but is meant to demonstrate prowess in the field. So there's a lot of university academics, which obviously do math degrees and do PhDs, but those are in very specific problems. So you kind of, like in science, you just need to kind of pick and choose your lane

Starting point is 00:03:00 and then dedicate your life to it. High schoolers and college kids are kind of like the last point before you jump into your specialization. So really, if you're the best at generalized maths, you're going to compete in this competition. And what's so interesting is, typically AI models haven't been able to perform very well

Starting point is 00:03:18 because they needed a lot of context beforehand about the problem, Josh. So they needed to know that, you know, there was certain, you know, X equals something and why. equal something and they had to have defined parameters to kind of figure out the problem. But this was the first time that AI models basically were just given a blank sheet of paper, or not a blank sheet of paper, but they stared at the problem just as we just looked at it just now

Starting point is 00:03:41 and had to read the words, read the characters, interpret what that meant in the context of that situation and the way that the question was framed and then figuring it out themselves. So it's as if the AI models had a camera that looked at a paper, similar way that we look at test papers as kids throw eyes and figure it out themselves. So what changed? What happened in the last year that made it so much better? Because it went from what, basically zero of six to now six or five of six? Questions to answer and now it's a gold medalist. So what happened? So listen, I'm not going to try and explain it, but maybe you and I can decipher it through the legends themselves that built these models, right? Okay. So let me paint the scene for you, Josh. It is Saturday evening. You know,

Starting point is 00:04:27 Normal people are usually out and about, they're having fun. They're probably having dinner, catching up with friends or chilling at home, watching a movie. And this guy called Alexander Way, who is Open AI's head of reasoning. Reasoning is basically this new fancy technique that AI models have typically demonstrated, which has brought them up to like the frontier level of AI models. Basically, if your model can do reasoning, it's typically a pretty smart model, right? And he posts this tweet saying, I'm excited to share that our latest open AI experiment, Rational Reasoning LLM has achieved a longstanding grand challenge in AI, a gold medal level performance

Starting point is 00:05:04 on the world's most prestigious math competition, the International Math Olympiad. And he goes on to describe, you know, how the model basically took on each problem in its own regard and solved it, and how this is a massive success and win for AI models and how, most importantly, Open AI was the first ever model to complete this. And not too long after he posed. that tweet, Josh. Sam Altman jumps in here, right? And he goes, again, he kind of echoes similar thoughts. We achieved gold medal level performance on the 2025 IMO competition with general purpose reasoning. And then he kind of like shills GPT5 at the end. Basically, it's like a promotive thing for OpenAI. And I will say that this is really cool because, you know,

Starting point is 00:05:47 what they've achieved is something that hasn't been done before, right? So very impressive feat. And in terms of how this works specifically, Cheryl Sue here, It's a really good breakdown. She says, the model solves these problems without tools like coding or lean, which is another coding tool. It just uses natural language. So as I said earlier, it kind of reads the paper and just kind of interprets what it thinks it means. And it also has the same amount of time to do the test as other kits, so 4.5 hours. And she says, we see the model reason at a very high level, trying out different strategies,

Starting point is 00:06:22 making observations from examples, and testing different hypotheses out. And she says it's crazy how we've gone from 12% on the AIME test, which is what GPT-FORO, which is Open AI's early model, got to IMO Gold, Gold, Gold, Medal in 15 months. So just to set that in context, Josh, that is a crazy leap in 15 months. Imagine going from eighth grade level math to the best mathematician in the world in 15 months. It's a pretty insane thing. Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting here is number one, the model didn't need any context. Number two, it used really high-level reasoning to figure out the problems from first principles.

Starting point is 00:07:09 And number three, it was able to test out multiple hypotheses at the same time instead of trying to one-shot the problem. Typically in the past, when AI models have been given a prompt or a problem, it tries to just give it its best shot and give you one solution, Josh. whereas what these models, these reasoning models do really well, is they are able to hypothetically entertain many different scenarios and then pick the best one of which it thought it was an answer. And it ended up with the gold medal, which is insane, right? But it wasn't entirely without a few glitches here and there, Josh. So if you look at this post from Jasper,

Starting point is 00:07:40 he read through the entire kind of like problem set that Open AI's model went through. And he points out that some weird anomalies. So he kind of like talks about like how it kind of like analyze and a bunch of things. And he goes, however, the write-up is kind of messy, he goes. It overuses shorthand and sentence fragments. It introduces new terms without definitions. For example, forbidden and sunny partners.

Starting point is 00:08:05 I have no idea what either of those terms could mean, but it was just apparently just interspersing these phrases during its analysis. And so as a reviewer or as an examiner, they were reading this, they were like, sorry, wait, what is it talking about? It got to the right answer, but what? is it talking about, right? The other key point from this post is it was unable to solve one problem, problem six. And I'm not even going to try and get into why it failed on that problem, but it was just particularly hard for it to figure out. But it still scored a high enough percentage that it got

Starting point is 00:08:39 a gold medal. So it's basically a win for Open AI, but that's when the drama starts unfolding. So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh. He goes, According to a friend, the IMO, which is the International Math Olympiad, asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce the results. Open AI instead announced the results before the closing ceremony. Yates. And then he goes on to basically say how this is essentially like some kind of clout chasing move from OpenAI.

Starting point is 00:09:16 And, okay, I tried to evaluate this, Josh, from Open AI's kind of perspective, which is they basically want to steal the limelight, but also say that they were the first AI model to ever achieve gold on this competition, which puts them in a good light and makes users want to choose Open AI and solidify the branding that Open AI is the best, right? But on the other side, you know,

Starting point is 00:09:38 they're kind of like stealing the spotlight from the kids, as this post says, but that's not actually the main trope. The main trope here, Josh, is open AI wasn't the only model to achieve a gold, right? At the same time, during the same testing period, you had Google achieving the exact same score. So then the question becomes, okay, well, it just, it was whoever was ethical about announcing their own result. This post from Demis Hasabas, which is Google's head of AI, basically posts, and I'll note two days later, official results are in. Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad.

Starting point is 00:10:20 An advanced version was able to solve five out of six problems. So same as Open AI. Same thing. Struggle on the sixth problem. Incredible progress. Huge congrats to the team. And a tweet here says that Google basically had to wait for marketing to approve the tweet until Monday. But Open AI shared there's first at 1 a.m. on Saturday.

Starting point is 00:10:39 And we see the screenshot from Demis Hasabas, which, you know, he further clarifies this basically saying, by the way, as an aside, we didn't announce on Friday because we respect. the IMO's board's original request that all AI labs share the results only after the official results have been verified. Now that we've been given permission to share, blah, blah, blah, he shares. So Demis is playing the like, good Samaritan. He's like, yeah, we also have the good model, but we, you know, we have some pride and some manners about how we deal with these things. That's where it starts to get a little uglier, Josh, because we have open AI chiming in to this tweet, which basically says, and this is some randomer commenting on Open AI and this entire

Starting point is 00:11:23 situation, so Open AI basically has zero advantages except the size of the team, aka the open AI team was claimed to be smaller than Google Gemini's team. So what he's inferring here is there's no real difference between Open AI's models and Google Gemini's models. You can pretty much use either or open AI maybe has a smaller team to build that model, but who the hell cares? And then one of the AI model researchers at OpenAI basically comes in and says, well, I think it's also interesting that they, they being Google, curated and provided useful context to the model, which we did not. Feels like taking your tutors cheat sheet with you into the exam. So shots basically being fired from Open AI saying, hey, you cheated. You gave context to your model.

Starting point is 00:12:12 And that was why it was able to achieve gold. We, Open AI, didn't provide any of that context and it was able to reason from first principles. There you have it. But then directly beneath it, Vinay Rameshas, who is a Google DeepMine AI researcher response, it's worth noting actually that a deep think system, which is Google's AI system,

Starting point is 00:12:32 with no access to this corpus, so no context, also got gold. Again, according to the official graders, and he puts this in brackets, because Open AI didn't wait for the official graders to mark their score, with exactly the same. score. So basically this is like a pissing contest between two of the top AI model providers.

Starting point is 00:12:52 Here's my take, Josh, and then I really want to kind of lean into what you think about this whole debacle. Number one, this seems so childish to me. Like, eventually AI models were eventually going to get smarter or smart enough to solve these mathematical problems. And I think, I think you said this earlier on. This is something that they're going to probably laugh about 10 years from now, right? That they were able to solve. whatever, the most complex mathematics problems for humans, right? Mere humans.

Starting point is 00:13:20 And now AI is off creating wonderful scientific discoveries for us that we would have never comprehended or figured out ourselves, right? So firstly, like, you're arguing over something that's so silly. But number two, this kind of seems desperate

Starting point is 00:13:34 on the Open AI side. And maybe I'm being biased, but like I'm just going to give you my take. Open AI has kind of had like a series of stumbles recently. They claimed that they were going to release GPT which is their brand new frontier model, but they've delayed it many months now. They got outperformed by GROC4 from XAI,

Starting point is 00:13:54 so now they have a new benchmark that they need to beat, a new model that they basically need to outcompete. They claimed that they were going to release a new open source model and then delayed it after a Chinese open source model was released and had one trillion parameters and outperformed not just their model,

Starting point is 00:14:11 but any other open source model out there. And so I feel like They're looking for a win, right? They released their agent this week or last week. And so, you know, that had mixed, review, mixed feedback. So I feel like Sam is desperate for a win. People are criticizing consistently their moat, asking, what is Open AI got? They've lost a ton of researchers to meta and other companies.

Starting point is 00:14:35 I feel like their backs against the wall. Sam's scared. And he basically needs to grab any kind of win. So it reeks of desperation. What's your take, Josh? I do empathize with the team. They've been coming under fire from every single angle. I mean, you have Zuck poaching all of their talent,

Starting point is 00:14:52 and then all of the other open source AI models are beating them at their own game. And they're just kind of, they're really getting beat up now. And I think that they're looking to get some footing. I'm sure this probably plays a role in it. But I'm sure behind the scenes, they're really trying to fight hard to put their feet back on stable ground, to get GPT5 out the door, to build Project Stargate and make this big infrastructure network.

Starting point is 00:15:14 They need some win. So sure, this was probably an attempt to get ahead, make them look good, win over some more hearts and minds. But I think the most interesting part of the whole story is less the drama and more the fact that these models were able to accomplish a really impressive feat over such a short period of time. From what I understand, previously when they attempted to solve these problems, they used a custom training data set. They used custom tool sets. It was mostly a model trained on solving mathematical problems. And with this version, both the Open AI version and the Gemini models, they're both general purpose models. They were not trained specifically with the intention of solving mathematical problems.

Starting point is 00:15:55 These are the general models that people day to day are using. They're just now able to solve these math problems using this new general intelligence. So it's a really interesting breakthrough that I think we get from reinforcement learning that now there is not so much of an advantage to training a model specific to one skill set, when you could just make it great at everything. There was one thing that I noticed that some people call it cheating, other people don't. But the, so with the mathematical, with the actual test, the high school was had to take, they're not allowed to use tools and they have a limited amount of time per question to answer. The models that, the Open AI model and the Gemini model, they had infinite amount of time to answer

Starting point is 00:16:33 and they were allowed to use tools. So there still are small differences in these. Were they allowed to like use the internet? I don't know the specifics. I would imagine at least calculators. at most probably the full repertoire of what we have currently available to us, which is full internet search, code writing abilities, they could do their own mathematical checks. So I would just assume the minimal amount of constraints possible. So there was much less constraints on the models,

Starting point is 00:16:56 but they did solve the questions. And I think that's super impressive. They got five out of six right, which was gold and better than almost every student, if I'm not mistaken. Only a few students got the six out of six completely correct. It's just cool to see the rate of progress of these models getting better, that over the course of the last 15 months or so, they went from horrible and narrowly trained to incredible and generally trained. And as long as that trend keeps going, I think the drama matters less than the output, which is models are getting really good at solving really hard math problems and original ones too, that the world has never seen before. Yeah, well, that last point is actually the main takeaway that I had, Josh, which is it's original

Starting point is 00:17:37 never-before-seen problems. Typically, these AI models are trained on things. that they've seen before. As you said, right? They trained on data sets. So they've already seen the problem. And then they have to work out, they know the answer, and they have to work out how to get that, right?

Starting point is 00:17:50 So they kind of have a leading factor. Here, it's just kind of like completely unknown. The other thing is this is kind of like the culmination of a trend, Josh, which is these AI models are really good at doing kind of binary tasks. And I don't want to reduce mathematics to binary tasks, But technically, it's numbers, sequential, formulas, that kind of stuff, right? So if you can run enough compute at a thing and if you can get that AI model to consider all different decision parts, it's going to eventually get to the answer, right?

Starting point is 00:18:28 But it's always a specific answer at the end of that, right? Whereas when it comes to more subjective things, more human experiential things, AI is typically struggled to improve. prove at the same rate that it has for like all these different scientific and math problems. So I'm glad that we've reached this pinnacle feat. I think AI models are really good at one thing and not so great at other things. And I'm excited to see how like they kind of like try to start leapfrogging each other over the next couple of years. Yeah, it's that directional progress that we like. Math is clearly the first because you can write down proofs and you could check your work.

Starting point is 00:19:06 And there is an actual verifiable solution. And I think that's why we're seeing a lot of the progress start early in math and then hopefully go on to these other places. But what we are seeing is these first signs of new knowledge breakthroughs where it's solving a new and novel problem that hasn't been released before based on its previous data set. So it's not just pattern matching like you mentioned earlier where it has this data set of questions. It's kind of finding the right examples and then applying that logic to the question. It's actually reasoning, and it's reasoning in many instances, and then it's comparing its work, and it's coming to a conclusion. And we saw this with the GROC heavy model last week, too, when it released, where I think the new meta is many instances solving hard problems and then comparing.

Starting point is 00:19:53 So you lower that error rate more and more and more each time. And what we're seeing is great progress. So, I mean, although Open AI and Google are fighting again, they're both fighting over exciting progress. And sure, maybe one tried to sweep in and steal the valor, but they both did an excellent job in actually completing these problems and placing gold in a test that was previously not possible to do from an AI model. You know who the real winners are here out of this, Josh? Who's that? High school kids who now have an AI model that can do all their math homework for them. Isn't that incredible? Like, man, think about it. I wish I had that. You have an AI model that is as smart as the smartest people on planet Earth in high school,

Starting point is 00:20:36 if it could solve those math problems, it could solve anything. It sounds human as well, Josh. So, like, your teacher's going to struggle unless they use AI themselves to figure out whether you just did that yourself or completely just ran that through GPT, your mom's GPT subscription.

Starting point is 00:20:52 It really forces you to reevaluate the school model, right? Because now that this information is so readily accessible, it's so easy to solve these problems, is that the actual thing worth learning? or is it how to use these tools that's more important to get to the answer? And there's this dual-pronged approach. And we see developers and programmers talk about this a lot, where as soon as they start to rely too heavily on the tools,

Starting point is 00:21:14 they start to lose their touch, they start to lose their ability to deeply understand how it reaches conclusions. But is that worth it in exchange for getting to the answer much quicker and then being able to seek many more answers? I don't know. It's weird dynamic. If I was a teacher, I'd be worried because, I mean, similar to what we saw with the calculator, it can just replace the thinking process and just

Starting point is 00:21:35 yield you an answer. And the thing with the calculator is like you, you're using the calculator, sure it figures out the answer for you, but you kind of loosely understand how it is working, right? You know what numbers it's crunching to get to that answer. And then typically you do a few things on a calculator and then you get to your eventual answer for whatever the original question was. The issue with or the concern that what you're hiring, highlighting here with AI is it's doing really complex problems which kids don't even need to understand in the first place just to get an answer, which they can then give to their teacher, get a grade, and then go to university. But the kids don't actually learn actively in that process.

Starting point is 00:22:19 And it's going to be a concerning trend if we see kids just trying to go from zero to 100 percent without understanding anything in between, a trend to watch. This is our episode from a few weeks ago. Is AI making you dumber? Yes. And I think that's just going to continue to be the question. And I think the answer is it's all dependent on how you choose to use the tools that you're given. And if you use these tools as further leverage, so I'm sure these math Olympiads who can actually complete the problems,

Starting point is 00:22:48 would love to have this model to check the problems and to work through the problems and to figure out shortcuts on solving these problems, where if you deeply understand it, then this becomes an amazing tool to check your work, to generate new questions for you. it's a great study buddy or if you are not an olympiad and you still want to get to the answer well you just kind of cheat your way through and you just ask it for exactly what you want so it's that it's that split again and it's up to the person to take their own agency solve their own problems and try to use these for for tools of leverage instead of just problem solving machines that actually reminds me of this tweet i saw yesterday josh um so what you're looking at here is a tweet from dave white Dave White is a very prestigious investment slash research advisor at this fund called Paradigm, which basically it's a crypto fund, but it is one of the wealthiest funds out there.

Starting point is 00:23:37 So a lot of the investments they made were massive wins. And a lot of the reasoning of those wins was from Dave White's analysis. He is a deeply thoughtful mathematician at his core. And he is famed for doing a lot of analyses on companies, mathematical analyses that have ended up, you know, determining whether a fund puts $100 million in a company or zero, right? So a very important job worth hundreds of millions of dollars, right? And what he says here basically is him having an identity crisis because he has looked up to the IMO, the International Math Olympiad, and he goes on to say in this tweet that subconsciously,

Starting point is 00:24:17 he is, whenever he's met a gold medalist IMO champion, he's always subconsciously thought that they were smarter than him, that he is more respecting of them. And now with this news that AI models basically can do his job for him, can reason better than him at some of these math problems, he now has an identity crisis. He doesn't know kind of where to go from this. And if people like Dave White is having this kind of like disillusion sentiment from how smart AI is,

Starting point is 00:24:45 you can imagine how this is going to happen for everyone else in all of their other sectors, John, right? It doesn't matter if you're a mathematician or an investment, research advisor, you could be a technician in some kind of engineering industrial role, or you could be a teacher, or you could be a kid or a high schooler. I think this disillusionment is going to spread, and I think it's super important for people to kind of evolve their thinking, like you said, Josh, and learn how to leverage these tools versus just consume. Yeah, this is, I mean, this is crazy. There's a lot of people that are going to have to adapt this new world

Starting point is 00:25:17 order of intelligence, where if you build up your entire identity around being intelligent, well, perhaps you're going to have to alter the way you present yourself as intelligent because the meaning of intelligence is becoming commoditized among these tools that are now reduced into a single chat box. Yep, benchmarks are going to have to reset themselves completely. But folks, that is the end of this episode. Thank you so much for tuning in again. Josh and I are going hammer and tong at Limitless. Our goal is to get you the hottest and trending topics and news fresh out the door, give you our commentary, our thoughts, and hopefully some useful insights for you. If you enjoyed this episode, if you enjoyed any of our previous

Starting point is 00:25:57 episodes, please continue to share and spread them with all your friends and family and whoever you think might be interested in this. We are getting tons of feedback from you guys. And with every episode that we release, we're getting better. So please remember to like, subscribe, follow us. It's hugely appreciative and helpful for us. And we'll see you on the next one.

Limitless: An AI Podcast - OpenAI and Google Just Beat the World's Smartest Mathematicians

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.