OpenAI Podcast - How a reasoning model cracked an 80-year-old math problem - Episode 20

Starting point is 00:00:00 Hello, I'm A. Jumain and welcome to the Open AI podcast. On today's episode, we're speaking with Alexander Wei, Hung Ching Wu, and Li J. Chen from the reasoning research team behind a recent math breakthrough from an open AI model. They'll tell us the story behind the discovery and what stood out to them about the reaction. Everyone had a hard time sleeping because it's so so exciting. Okay, this model is something that's really amazing. I mean, this is something that can be published in the best journal of math. Maybe this is the one and a hundred times where it's too good to be true, but it's actually true. Lee J, tell me what you work on.

Starting point is 00:00:39 Oh, I work on reasoning with Alex. Okay. How did you find your way into reasoning? Last summer, Alex had this breakthrough in, like, I.O. and I'm old. You know, I used to be a participant in I.O.I. Okay. And then I was like, oh, that's crazy. You know, model can already win medals.

Starting point is 00:00:56 I'm gold medals. At that time, I was an assistant professor at UC Berkeley. But then I'm thinking, like, maybe I should try to rethink my career. It seems like making the model smarter will maybe have some bigger impact on the world. And then I just kind of had a conversation with Alex back in last October. And then I got super excited about this thing. And eventually, I just drawing open-E-E-I. We hear I-Y and IMO come up a lot.

Starting point is 00:01:26 Alex, you want to unpack those for everybody. So IMO and I are these two colleagues. competitions for high schoolers. They stand for the International Math Olympiad and International Olympiad of Informatics, respectively. And these are just devilishly hard math problems. You get two sessions for each of these exams that are like four and a half to five hours and you just have to do three problems. And so for a long time, these were sort of an implicit like grand challenge in AI. One would we be able to get models that could perform. as well as the best humans on these exams.

Starting point is 00:02:05 That was a pretty interesting starting point, I think, we're measuring the success, the model, and we're here to talk about how far things have gone since then, which is pretty incredible. But how did you find your way into reasoning? So I did my PhD in ML. And towards the end of my PhD, I got excited about this idea of spending more compute

Starting point is 00:02:25 at inference time to solve, you know, harder and harder reasoning problems. At the time, I was, playing with GPD 3.5 turbo in the API. And I didn't really get any interesting results. But there was this team at OpenAI that seemed to be doing something pretty similar. And so I got super excited about it. And it was lucky enough to be able to join.

Starting point is 00:02:48 So probably the simplest way to describe that as like letting an inference time is basically letting the model think longer about it. Yes, that's right. So basically before this era of test time compute, models, sort of answered immediately without, like right off the cuff without thinking. And what inference time compute, test time compute does is you now let them all, give them all a chance to think and improve its answer and like try different things before having to finally output something. Now obviously just helps make them all smarter, lets them do things that they wouldn't

Starting point is 00:03:24 otherwise be able to do instantly. When you started to work on reasoning, did you have an idea of where you wanted to see this go, like what your expectations were? Were you looking at it purely from, hey, this is very cool from an academic point of view, or did you have some sort of other vision? I think for me, the draw of reasoning when I first got excited about it was that this was something that, you know, models just obviously can't do right now. So this was like end of 2023, start of 24, models were, you know, struggling with grade school math problems. And so at that time, it was just like, can we just get these models to do something reasonable on math at all,

Starting point is 00:04:03 let alone have them be, like, you know, like, you know, much, much better than I am at it. I remember when I, the first, my first day at Open AI, Nolan Brown asked me, you know, when I thought models would get IMO gold. That was just a benchmark we talked about. I think at the time, a lot of people, even within research, thought that, like, Like, you know, IMO gold was out of reach this year, but maybe like 2026. I felt like, you know, I had an idea that, you know, if we just like pushed for it, maybe, I thought we could do it by April.

Starting point is 00:04:42 It took until June to get a really good model. And then IMO rolled around and, you know, we were able to get gold. And I think zooming out, I think this happened a lot faster than I expected. And it's, I mean, it's crazy to me that, you know, progress since then has kept up at this same sort of blistering pace. Like, it was just 10 months ago, but it feels like the IMO level of problem, like, feels like far in, like, the rearview mirror of AI today. No one asked me the same question. I mean, not about IMO gold, but about whether another can solve P versus N. I think POS NP might be something quite hard

Starting point is 00:05:28 Because I think the reason is that I think for solving POSMP You would need to build a new theory Maybe you have to write many books of new ideas to get there So currently it seems we are still far from that But you know maybe who knows what will happen in the future So yeah Hengshin, what do you work on? I was working on a strategy computer science

Starting point is 00:05:51 I was collaborating a lot with Li J in my PhD. I was at Berkeley. And I remember when, like, O-1 come out, I was talking to my advisor saying, oh, there's no barrier in, like, models of solving mass problems anymore. I think he just smiles.

Starting point is 00:06:10 And he knew that he was going to lose a student. Oh, wow. So let's talk a bit about that because it's an interesting point, because, as you said, it went from the model, we'd just have a moment to try to figure out, the answer, then all of a sudden you've given it the ability to spend longer and to think about it, you know, reasoning. And the results have come pretty quickly and I think surprising a lot of people.

Starting point is 00:06:35 You had a model that was able to basically disprove one of the Erdos's conjectures. Could you explain that just a little bit? Yeah. So our models last week, they were able to produce a proof of the disproof rather of the unit distance conjecture due to Erdos. And this was, an 80-year-old open problem in the field of combinatorial geometry, where basically the question concerns if you have endpoints, let's say on a piece of paper, how many of them can be one-inch apart exactly, and how many pairs can be one-inch apart exactly, and how does this number grow asymptotically with a number of points on the piece of paper? This wasn't a trivial problem.

Starting point is 00:07:25 When Erdos put this together, the idea was to say that it could, you know, I think ideally it had to be only done on a plane or something like this, but there was, you know, the idea that maybe there was no better way. And this has been out there because it's a very interesting problem. And the fact that a model solved this is pretty profound. And also this model was a general purpose model, correct? Yes, that's right. So Erich's original conjecture was essentially that the optimal,

Starting point is 00:07:52 the optimal solution to having as many distance one points on the plane was to arrange them in a square grid. And what the model proved was that the square grid was not actually close to optimal at all and that you can do much better with a different construction using a lot of high-powered number theory. How did you choose these problems? I guess we didn't really choose the problem. what happened was we want to test the upper bound of our model's capability.

Starting point is 00:08:26 So we just use a selected subset of Erdos problems and to test the capability of the model. I would love to know one, who is the one that hit enter and asked the model the question. I guess both of us, like I and the Hongxing. You guys at the same time, like, press. Yeah, maybe. I think what happened was actually we were testing like two side different internal models. and we both saw some crack solutions. It was really, really exciting for us.

Starting point is 00:08:58 How did you know that it worked? Of course, you first asked the model to check it. Okay. But of course, you know, model sometimes they are not reliable. I got it. It's good. Don't worry about it. Yeah.

Starting point is 00:09:07 So then we just, after we check it with the model, it seems plausible. Then we just ask a bunch of, you know, all our mathematics friends in the company, you know, Metab and Maxilke. And at first, they were like, oh, there's no way this can. be true. It's a major open problem. But after, you know, just they think about it for a day, they couldn't figure out any mistake. Then they become more convinced. Then eventually they're like, actually this may be correct. Yeah, then like everyone had a hard time sleeping because it's so

Starting point is 00:09:37 so exciting. What was the conversation like when you started getting, you know, people saying that this was accurate? For me, I was not that surprised because I guess when Metav's first say, okay, first, what happened was first, Mattabe, this is definitely wrong. But I actually knew that he probably just spent like five minute, 10 minute looking at it. So like in my heart, I don't really believe that. But later he told me it's 50%.

Starting point is 00:10:03 I was thinking, okay, if we extrapolate the trend, maybe next night, it would be 100%. So yeah, it's a little bit dreamlike, but also was like, it feels a little bit natural that this model would do something amazing. Later, it just become more and more real that this might actually be correct. This might actually be a big deal.

Starting point is 00:10:32 The first time, the model can publish something that would get into top math journals. We knew this day is going to come, but never knew that it's going to become reality so fast. It's like living your dream. I mean, this is something like can, be published in the best journal of mass. It is way beyond, like, you know, IMO level.

Starting point is 00:10:55 So I only expect something to happen at some time, but at some point. But maybe not this just, not just this may. One of the things I think that we've seen emphasize at Open AI is that OpenEA doesn't try to train two specific benchmarks and stuff, that OpenE tries to build really good general overall models. And I think sometimes people say, like, well, we just try to build a generally smart model. And we find these things a lot of way. And when it comes to reasoning, it's the same thing.

Starting point is 00:11:22 Something that's really good reasoning overall, you find these capabilities. Does that ring true for you? Yeah, I think, yeah. I think for this model in particular, I think it's one that I think all of us have also just used, like, in lieu of the current model, current model in Codex. And it works quite well as just a general purpose model. having the capabilities to do this editur unit distance result, I think people will be able to do this at home in the near future. It's been exciting to see people react to this and pay attention to this.

Starting point is 00:12:01 We went from just a very short period of time ago where people said models weren't good at math, and now models are doing this. What have been some of the more fun things you've seen online or reactions from people? Ever since we announced a result, my friend in TSA started to ask me to try their open problems, including my advisor, gave me two, three open problems to try on.

Starting point is 00:12:26 I think the reaction was very possible. I think people really get a sense that the frontier of AI today can really come up with research output that I think many human mathematicians would be proud to achieve. and I think it's really great that we're able to communicate this, like, you know, that this is the frontier of progress to the rest of the world. I've seen people like, you know, make these, like, designs of trying to like sketch out, like, the model's construction. And if you plot it on a grid, it's actually like this very, like, pretty, like symmetric, geometric design. Yeah, I guess we are thinking maybe try to make one of the design, you know, put them in our frame. and put on a desk or something to celebrate this moment.

Starting point is 00:13:19 Yeah, I think it's going to be fun when we start seeing it, things like tiling problems and other stuff where we can actually just look at the artifacts we need. So we've been hearing more about Erdoch problems lately, and some seem like they weren't as challenging to solve as perhaps as people thought. They just needed some attention, yet this one seems to be a little bit more complicated. Where would you rank this? Oh, yes. I think he proposed like a thousand questions.

Starting point is 00:13:42 or more, right? So like he, you know, this problem is just collection of all the problems he has asked. Some problem he has offered some money for solution. Some problem he just, you know, noted. And this problem he, you know, he offered, I think, $500, which is for the last century. So, you know, it was a little bit. And also, like, this is one of the central question in this field of discrete geometry. And, you know, and this has been, you know, heavily discussed by mathematician in, like, many discrete geometry papers. And so it's kind of one of the questions people have thought about a lot and really want to see the answer.

Starting point is 00:14:23 So I would say this is more like a major open problem in a concrete field of mathematics instead of some, just like, you know, many other urges question, which may be just some, you know, something like urges ask after a launch or something. So how do you collect that $500? Did it disappear when he passed away? I think there's a special agency for that. You usually go just frame the check. Yeah.

Starting point is 00:14:51 So maybe we'll just frame the check in Sam's office. I don't know. How do you feel this proves that reasoning is effective? I think the biggest proof is that if you look at the plot in the official blog, if you give model more time to think, the accuracy on its problem goes fast. Like, if you give it a lot of time, it can get almost 50% correct. So more thinking, more correctness. I think that's really a proof of reasoning being effective.

Starting point is 00:15:20 But Alex, we'll go back to this. This isn't a math model. This is a model that can do a lot of many different things. Do you see a correlation between as these get better at solving things like mathematics that it works with other general problems? That's the hypothesis at least is that, you know, this model was not trained specifically for math. And so, and we just wanted to, you know, like we have this new model, how we came about this. We wanted to take it on a test drive, essentially.

Starting point is 00:15:48 And so we evaluated it on some, you know, very challenging math problems and to just see, like, what can it do? When you go through the proof and you look at what it came up with, were the things that surprised you, things that you would describe as creative. So for some context, like the proof is like well above my own mathematical pay grade. But just at a high level, my understanding was that, you know, this idea of taking class field theory and applying it to problems in incommeditorial geometry hadn't really been done before, though this was, though there were like, you know, though some people like knew that there was, there could be this bridge between these two fields. Being able to do that and execute it requires, like, first of all, to make the connection requires, quite a bit of like insight and creativity and then to execute the proof

Starting point is 00:16:41 is also like, you know, a very like delicate, careful affair that very few people would be able to do. I think the most surprising thing for me is you tell model to do something and you went to have a lunch and when you come back, you'll see that it actually does much better than you thought

Starting point is 00:17:00 and at that moment you feel like, okay, this model is something that's really amazing. So going back to GPD 3.5 turbo and working with that and looking at a model that was doing automatically instant sort of inference and figuring these things out to now a model that's able to do incredible mathematical proofs, is it using tools? Is it using lean? Is it using some other things like that? Or is this doing purely inside the model? For this particular case, the model basically is like codex. It can code. It can look at the website and find information.

Starting point is 00:17:37 Yeah, so it's basically a general chatubit setup. Chachybt can also write Python and execute them. But I don't think the model write any link. I think DJ has a story about the Cambridge Dictionary. Oh, okay. So, okay, the first thing the model do when it gets to website is to check what unit means in the Cambridge dictionary. It's a little bit ridiculous, yeah.

Starting point is 00:18:01 So it looked up the word unit? Yeah, you also make sure it has to. the absolute correct understanding of what is doing it. Have you seen it do other things like that where you're saying like, oh, it's trying to ground itself to make sure it understands the question. And definitely a lot of time in the model answer, it will actually explain the definition again to show that it actually grounded the definitions. As people who are very knowledgeable about computer science, people who know a lot about mathematics,

Starting point is 00:18:31 is it intimidating to all of us and see this happen? I think it should not be intimidating. I think it should be empowering. After the proof actually come out, the mathematician has improved, first improved the bound approved. And second, they use the intuition,

Starting point is 00:18:47 the motivation of the construction to knock down other open problems as well. So I think the trend is going to continue. Like model can make good breakthrough on some very hard questions we don't know how to solve. But then how to, how to digest that idea, how to use that method for other good things, I think human still

Starting point is 00:19:11 has a role in this. So what do you think the role of somebody working in mathematics is going to be like five years from now? I think there will be a lot of AI and the human collaboration. Because now, you know, AI they know a lot, right? They can connect distant ideas. But human can also think for longer. Currently, it seems AI cannot build a new theory for mass. example. But I guess human, once they have the help of AI, they can just grab all the ideas from distinct field of mass. I think they can empower human way more. Do you see this working into other fields? Are we going to see discoveries in physics? So I mean, I can't speak for physics, but I mean, I guess like we're all researchers in AI. And I think definitely for me,

Starting point is 00:19:57 like my day to day looks completely different than when I first started doing research in this field. I think so much of my work is now done by coding agents. I can just do so much more. And I think that's been a sort of like magical feeling that with AI, you can really start now to feel like you can use AI to build AI faster. How much has AI changed the way you do these sorts of things? I think changing completely. Even when I just joined half a year ago, I was hand-coding the codes, looking up the select channels for directions. But now the default is just as codex. And I ask codex do a lot of things.

Starting point is 00:20:47 And then I just go to lunch. I just go to talk to people. The work completely changes. And I use codex on your phone and you can check on it. Yeah. It's interesting to how much more I want to do things. things now that you have this sort of tool that can work all the time and do stuff. Lee J, how do you explain this to your friends who are sort of trying to understand what this

Starting point is 00:21:11 means and how it's going to impact other fields? So, I mean, I have some mathematician friends and I have some, you know, friends in other fields. Yeah. So I think the way I want to tell them is that I feel like, you know, some maybe are free that, I know, AI will replace them, you know, AI will just replace mathematician. Yeah, but I think it's really about, you know, empowering like every theoretical researcher. Yeah, because, you know, AI really have this advantage of knowing so many stuff and connect things.

Starting point is 00:21:38 Currently, it seems like the problem hard for human may not be hard for AI. And that's a really great thing. Like we can use AI to solve those problems, get new ideas, and then we can digest them and make new discovery, just like Hong Xin said. So I think some of them get very excited about this. And of course, one thing is that, you know, currently is only a mess. But I believe that because it's general reasoning model, like all the, at least theoretical researcher, they can benefit a lot from that. Like, I think the dream world will be like everyone have some access to the top level reasoning ability. So other researchers can use them to discover whatever they want to discover.

Starting point is 00:22:19 And then basically open-out will accelerate science like a lot, like because you are empowering every scientist to accelerate the science worldwide. And then that's all mission. So if I was a researcher, how would I get started? What advice would you have to say, okay, try this first? We'll start with you, Hung-Ching. Get GPD Pro subscription. It's really, really much better than thinking without Pro. And because they think longer.

Starting point is 00:22:45 Yeah. And try to ask the boldest question you can ask. I had an experience that sometimes I try to decompose a problem into smaller problem and ask the model. And it turns out that, it was not as good as just directly asked a question because my decomposition was not the best way. Why do you think that was? I think because as human we have all kinds of priors on how problems should be solved,

Starting point is 00:23:12 and they are very helpful in reducing their thinking time. But very often the prior are wrong, and there are blind spots. And AI models, they sometimes just can surprise us with discovering these hidden things. When I spoke to Alex Lvchastka, he talked. about how kind of treating it like a graduate student, you know, not talking down too low, but not too high, but at the right level so you could just understand that it knew the terms that worked for you. Alex, how about for you? What advice would you give somebody who's a researcher who wants to try to figure out how to be more effective with this? Yeah, I think a lot of it is

Starting point is 00:23:48 actually like, I think these days learning to trust the model and like figuring out like, you know, how far you can go in trusting the model and also learning like, you know, what's beyond what the model can do. Because if you don't have a sense of that, you don't like, you know, maximally use the full capabilities of the model. I think Ligia has taught me a lot about how to use these tools better. I feel like I'm like sort of a dinosaur in some respects in terms of adoption. Because I think I started, like, working at Open AI well before these tools existed.

Starting point is 00:24:22 And so I think I have a lot of old bad habits where I don't trust the models enough. I still think it's like the models of six months ago or something. That's an interesting paradigm. Okay. So, DJ, what advice would you give? Oh, I have this method of, you know, every time you double your trust on a model and see when it fails. And if it fails, you just go back. And like you do this every month.

Starting point is 00:24:48 Then you can quickly get to the point where you can maximally trust the model, but also not breaking your stuff. Yeah, and apparently for the last five months, it's going really, you know, exponentially. Back in the GPT three days, I had like a list of tests and things like this I would do. And I'd watch them sort of incrementally get better than GPD4. And then by the time, 01 came out, I'd have to throw it out because that was just a toy problems at that point. And I feel like I have to continuously sort of adjust and kind of keep trying bigger and more complicated things to do that with. Do you think that for somebody who's in mathematics or in a related field right now who's feeling a little bit concerned by this? Do you think that they should be taking a more optimistic approach?

Starting point is 00:25:32 I think it's legit to feel concerned, especially when the field is a lot of the field is problem-solving oriented, because model are going to be really good at problem-solving. but mathematics is really, really much more than problem solving. It's more about understanding the structure and building new theories, like Lee J said. And I think we should try to figure out how to better use the model to help us in solving the problems that we met, and then try to accelerate the speed that we build new theory and come up with new understandings. I think that's the more optimistic view. When Codex becomes much better, they can do so much more thing for you.

Starting point is 00:26:25 You would expect you will work less because codex is good. But somehow you actually work more because that's way more thing you can do. So I actually hope this can happen for math as well. You know, like the model becomes so good. I must imagine have 10 ideas. You can ask 10 models to try them and see like one of them succeed. And they don't have to do tedious calculation, but themselves, right? So like, I only imagine like maybe, you know, what happened with coding can

Starting point is 00:26:53 happen to a mathematician. It's interesting to you because, you know, when we talk about the Erdos problems, Paul Erdos was a very interesting person who found a lot of things curious and said, oh, this is neat. And we have this category of problems he put together. But there's not a lot of rhyme or reason to them. They were just things that he found curious or worked with other people on. And I think that's kind of a big thing that's neat about science in general, as often we think that there are these real specific hierarchies, but literally can just be things we're curious about. That being said, how long before there are no more unsolved Erdos problems? Some of them are very, very hard.

Starting point is 00:27:28 Yeah. Yeah, so, yeah, I don't know. Do you foresee us, maybe Alex needing to come up with a new category of problems? I think probably like the hardest problems on that list. I think that list includes like the collets conjecture. Like these are problems that feel like very, very far out of reach of like the mathematical technology of today, even though many of them are quite simple to state. So we'll still have some more things to work on and continuously move things.

Starting point is 00:27:52 So that's good to know. It's exciting, though, too, to think about what happens when you do start applying this to other areas in physics and astronomy and start looking at data sets and stuff and what kind of discoveries are going to be in store. Do you have any particular area that you're hoping to see? Oh, I hope they just saw P-SMP. Okay. How about you, Alex?

Starting point is 00:28:14 I think the next milestone in my head is really like AI that can like do AI research. I think there are so many like unsolved problems here. We're in a sense like in many ways like limited by like the, you know, all the limitations of just, you know, our own intelligences. I'm optimistic about like, you know, just having AI broadly available as a technology because there's just like so much more. demand for intelligence in the world that, you know, like humans can supply. Oh, I wanted to say P was then P2, but Hong Singh said it. Yeah. So I guess beyond that, like one concrete thing I'm very interesting in is like, you know,

Starting point is 00:28:58 like it currently seems AI is trying to combine ideas from different fields. And of course, in a very novel and, you know, sophisticated way, but like can AI actually generate completely new ideas from scratch? I mean, that's something like we haven't really seen. concretely in AI. And that's something I maybe want to see next happening. And that can be very cool. Have you seen traces of that yet?

Starting point is 00:29:22 I think so. Like, you know, even in this magic problem, I mean, I think some, if you look at the chain of thought, which is like 125 pages, I think some, some of the thoughts are pretty creative, although they didn't work out. Yeah. I mean, the final idea is more like combining all the stuff, but some of, it has some creative thoughts. But it is interesting, you know, early on arguments were like,

Starting point is 00:29:44 these models weren't creative, but you could give it two ideas that had never been connected before and say, what's the relationship? And that would be something very, very new and felt like something different. And I feel like we'll probably be seeing more of that. Do you see us coming up with new forms of mathematics? I think that actually will be a shooter away down the line. Because next year. Maybe.

Starting point is 00:30:09 I think because model now are very, very good at coming out with some idea to solve a problem. but it's not good at proposing a completely new, different kind of math or proposing new theory. How to get a model to do that is still very, very open. How I would think about it is we see this, like, you know, like Moore's Law for the time horizon at which these models are effective. And I think you sort of feel that in math where, you know, there's like every few months, the amount of,

Starting point is 00:30:44 time, these models can like sort of work independently for doubles, at least the amount of human equivalent time. And so, you know, for solving problems, if you're really, really good at it, maybe some problems like you actually have pretty short paths for the solution, you don't need to take that long. But I think for inventing like new ways of doing mathematics, that's much more like a years or decades-long process. And so I think, you know, it'll still take a. bit of time for that exponential to get there.

Starting point is 00:31:17 This was done by an internal model that you guys worked on. And since then, 5.5 has been able to do the same thing. We've seen other labs have said that they've been able to do this as well. But this was several weeks ago, which is now ancient history. What have we seen since then? I think one difference between the original result and I think what the follow-up findings have been is that, like, actually for the original model, there was like no scaffolding needed. You sort of just asked it to do the problem and then it gave you the answer.

Starting point is 00:31:49 And so actually this is all, you can read the original prompt and response in the note we uploaded on the blog, whereas I think the follow-up efforts have had a little bit more like structure or like steering of the models. But I think one interesting data point here is that like, you know, it's really all about test time compute scaling. After we initially solved the problem, this is the plot LESIA brought up earlier, is that with enough test time compute budget,

Starting point is 00:32:23 the model is able to solve the problem around 50% of the time. So it's not surprising that, like, you know, you can get there with other methods. You can find this with other methods as well. But I think what was really important here is just that, like, you know, as you pour in more test time compute, you get better results. It seems like it's kind of a virtuous cycle

Starting point is 00:32:45 where you take today's model, give it more compute, let it solve for that, and you understand how these drawings can be solved. Next generation models can learn from that and just get more and more efficient and you just have this basically. It seems like it just scales forever, right?

Starting point is 00:32:59 What do you think we're going to see by the end of the year? What I want to see is like, you know, people use our model to discover lots of new stuff and not only in mass, but also in all of science. Of course, open eye wants to do some cool math stuff. I think it would be better. Everyone can use the model to discover their own science. And I would expect many mathematicians will use the model.

Starting point is 00:33:22 I mean, maybe not completely on the model, but I collaborate with the model to discover a lot of more math results. I think that would be really cool. I've talked to some mathematicians who know others who are very reticent to even try using AI in mathematics. What is the best argument you can give? I think I'll just show them the proof of unit distance contract. The disprove of the conjecture. I think it's just about productivity, right? Like we do math not just to enjoy the pleasure of the problem solving,

Starting point is 00:33:56 but also to advance the field and to understand the truth that's what we're looking for. And using AI is going to speed up that by a lot. And it's going to tell us what we are really struggling to find. And it will be hard to resist using AI at some point. You could be an astronomer and not use a telescope, but you kind of have to ask why. Yeah, exactly. I know one of the researchers here likes to watch computers play chess against each other. and it feels like he sometimes learns things from that.

Starting point is 00:34:38 Do you think that we'll learn to be better mathematicians or researchers or scientists or just thinkers in general by watching the solutions the models come up with? Looking at the 125 pages of thinking, it's probably not very helpful for a mathematician. But just by looking at the answer, you actually do learn some idea that was, was not there that you didn't know before. And that inspired the later mathematical works that knocked down other problems.

Starting point is 00:35:13 So I definitely think people learn some, like mathematicians learn something from AI solutions. Yeah, some of the mathematician that we asked to review the proof, together with collaborators, they actually use the idea to disprove some product conjecture, but for real numbers. I think that's like one very good example. I can crack down important questions and give us ideas that we can apply elsewhere. Yeah, I think it's remarkable that like, you know, like this group of mathematicians has already like just in the span of a week already used it to, you know, disprove this result that I think is maybe of like similar importance to the unit distance conjecture. So I think this is a wonderful example of mathematicians.

Starting point is 00:36:01 like seeing this and using it as inspiration and, you know, bringing the ideas to bear on a different problem. What does this mean for the mathematical community? I think for us, like when we do these experiments, I think we want to, like, make sure we, like, empower the academic communities we interact with, where we don't just, like, go to some community and, like, you know, from the outside, try to, like, solve a bunch of their problems and, like, give them a bunch of, like, AI slop. But what we really want to do is we want to make these tools available to researchers

Starting point is 00:36:42 and let them direct, you know, this, like, all this, like, AI test time compute at the problems they think are important. And I think it's not, I think it's, like, it really shouldn't be viewed as a race to, like, you know, solve as many Edrosch problems as we can. but more like, you know, we want to like make people aware that the technology is out there. This is what it can do. You're not trying to solve every Erdo's problem. Yeah, I would not say that as our goal.

Starting point is 00:37:13 I think this just happened to be a particularly significant result that we thought would be important to, you know, share with the world that this is the capability level of models today. But this is like, it's really not the goal to like, you know, just like go through the list as if it were erased. Do you foresee things applying to like cryptography? And there's also some debate too about do these models get so good that we kind of surpass even where quantum computing goes, which sounds kind of crazy. Yeah, I think cryptography is really an important topic these days because, you know, the foundation of cryptography is really about some problems like factoring. It's hard to solve by computers, right? But basically, we only have conjecture. There's no mathematical proof of this fact.

Starting point is 00:38:02 And suppose the model gets really good at the algorithms. Maybe they will prove some of the cryptography conjecture and saying, okay, those protocols are actually secure. We don't have to conjecture them to be secure. Or maybe they'll find some loophole. And that's also very important. I think we need to make sure the foundation of all security is good. So the model can stress tests the foundation of the cryptography to make sure we have better security.

Starting point is 00:38:33 What about quantum computing? I think that's a very different territory, right? Like quantum computing, like, okay, actually I used to study quantum computing. Like my first paper is on quantum advantage, which shows like for some tasks, quantum computer can do better than classical computers. But so far, I think the models, I mean, they are just classical computers. I mean, they do what human can do. I mean, maybe a bit better. But quantum computers, they can sometimes do, like, more fancy stuff,

Starting point is 00:39:05 like simulating some quantum effect in chemistry, which we probably not... I'm not an expert on that, but that might not... It's not clear, like, it is just two different paradigms. So, yeah, so I'm not super sure how they, like, compare to each other. But I think AI is going to greatly accelerate the pace that we develop quantum computers.

Starting point is 00:39:25 Like in recent just in these years, there's improvement in like error correcting. Like you have quantum error correcting codes that only uses like simpler type of operations. And that really speed up the like physical implementation. So I expect more of these to come from like collaboration with AI that AI can propose new like quantum error correction algorithms. and then we can develop the quantum computers much faster. Once you ask the mother to solve a question, you can of course follow up with, you know, how did you solve it? Can you explain this part of the proof to me?

Starting point is 00:40:07 And then the mother will patiently try to teach you how everything goes line by line. Yeah. So it's like it's actually not just, you know, one-shot problem solving. You can ask you for a question to, you know, to learn how the proof works. And I really like that. One thing you learn very quickly as a researcher is that if your results are too good to be true, you probably have a bug somewhere. I think every researcher has had an experience where, like, you know, they see amazing numbers from their experiments. And it turns out, you know, their experiment was actually wrong.

Starting point is 00:40:44 The numbers were wrong. You know, when I first heard about this from Li Jiao and Hongshing, that was my prior. I was like, oh, I'll wait for them to like. like find the bug. But then I think it as like as like the days were on, you sort of like have this like growing like optimism that oh like, you know, maybe this is the one and a hundred times where it's like it's like too good to be true, but it's actually true.

Starting point is 00:41:13 Well, gentlemen, thank you very much. Thanks so much. Thank you so much.

OpenAI Podcast - How a reasoning model cracked an 80-year-old math problem - Episode 20

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.