No Priors: Artificial Intelligence | Technology | Startups - AI and the Future of Math, with DeepMind’s AlphaProof Team

Episode Date: November 14, 2024

In this week’s episode of No Priors, Sarah and Elad sit down with the Google DeepMind team behind AlphaProof, Laurent Sartran, Rishi Mehta, and Thomas Hubert. AlphaProof is a new reinforcement learn...ing-based system for formal math reasoning that recently reached a silver-medal standard in solving International Mathematical Olympiad problems. They dive deep into AI and its role in solving complex mathematical problems, featuring insights into AlphaProof and its capabilities. They cover its functionality, unique strengths in reasoning, and the challenges it faces as it scales. The conversation also explores the motivations behind AI in math, practical applications, and how verifiability and human input come into play within a reinforcement learning approach. The DeepMind team shares advice and future perspectives on where math and AI are headed.  Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Rishicomplex | @LaurentSartran | @ThomasHubert Show Notes:  0:00 Personal introductions 2:19 Achieving silver medal in IMO competition 3:52 How AlphaProof works 5:56 AlphaProof’s strengths within mathematical reasoning 8:56 Challenges in scaling AlphaProof 13:40 Why solve math? 17:50 Pursuing knowledge versus practical applications 21:30 Insights on verifying correctness within reinforcement learning 28:27 How AI could foster more collaboration among mathematicians 30:28 Surprising insights from AI proof generation 34:17 Future of math and AI: advice for math enthusiasts and researchers

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, listeners, and welcome to No Pryors. Today we have Thomas Hubert, Rishi Mehta, and Lawrence Sartrand from DeepMind's Alpha Proof team. Alpha Proof is a new AI system that can find and verify mathematical proofs, building on DeepMind's earlier successes in chess and go to tackle one of AI's greatest challenges, mathematical reasoning. In today's episode, we'll explore how Alpha Proof works its implications for math and AI. more about test time RL and what this reveals about machine learning's capability to reason rigorously. Really happy to have you guys. Welcome. Thank you for having us, yeah. Maybe you can start by just talking a little bit about your backgrounds and how you came to be working on Alpha Proof together.
Starting point is 00:00:45 I'm Rishi. I was one of the tech leads on Alpha Proof. I've been working in computer science and machine learning for a while. I'm a chess player and I came across the Alpha Zero paper and saw some of the chess games that that agent produced. and I found it really inspiring. And I thought, like, this is the kind of thing I need to work on. Like, coming up with something beautiful and superhuman and almost alien felt magical. It came over to DeepMind and the Alpha Cedar team, which Thomas was leading, was working on math. And that's how I got into MAPT. My background, so, yeah, started working in industry in my early career.
Starting point is 00:01:21 I worked on a many detection computer networks. I worked on ad-targeting and switched to AI research. And there a constant interest of mine has been systems that can spend more computers to either tackle harder problems or to think more. And math seem to be a perfect domain for that. Yeah, on my side, I was actually a Go player. So instead of doing programming since the age of 10,
Starting point is 00:01:49 I was actually playing Go. And I played a lot of goal during my youth. And then at some point, it was also my dad's dream to build a computer goal program. And so I was kind of figuring out, like, what do I need to know to be able to be the computer goal program? And then I realized that maybe it was being built at that time.
Starting point is 00:02:10 And so that's how I discovered DeepMind and how I discovered AI. And that's how I joined the company. And I've been kind of involved with AlphaGo and Alpha Zero, MuZero, this line of work. And recently we had worked on Alpha code and Alpha Tensor. So you know, like that's way before, change the pt, but we already knew that transformers were kind of changing a little bit how things
Starting point is 00:02:33 were done. And so we, you know, I found that in mass, yes, you could get this perfect verifiability. And with alpha code, we realized we can generate a lot of good code. And so it was very natural at that time to think about the potential there was for mathematics. Can you contrast? So maybe just as context, like for any listener who wasn't a super cool mathlet like me. IMO is the sort of oldest and most prestigious math competition for young people. There's a set of six problems. They feel impossibly hard. And Alphalproof had this really amazing results of solving four of the six problems this year. Can you talk a little bit about math and the IMO in particular as a problem relative to gameplay and other search problems like chess?
Starting point is 00:03:22 I think, you know, like first, there's a big difference is that in board games, you play against someone, and that's a lot of fun in the board games. And for instance, when we did this alpha-go or alpha-zero algorithms, we could really have this thing, this thing about self-play. You could always play against someone who is exactly just your strengths, and that proved to be a powerful idea. When you're kind of trying to learn to do mass, in some sense, you know, you don't really have an opponent. You just have to think about it. And mass is a bit special in a sense that it's almost like a purely cognitive kind of thing where you can just, you know, the only thing you can do is to think more. You know, maybe you can't really go into the real world and run an experiment.
Starting point is 00:04:08 I guess mathematician says that, you know, sometimes it's a good thing to take a nap and that your unconscious self kind of think about the problem and there's a good way to come up with new ideas. But it's a lot about thinking. And so, you know, like when you're confronted with a really hard problem, there is a whole question about how do you go and try to solve it? Maybe this is a good time to ask you to describe for our listeners, most of whom are technical or somewhere in the tech field, but also a broader business audience.
Starting point is 00:04:33 Like, you know, how does Al-A-proof work overall architecturally? Yes, sure. So, Alpha-proof is based on this thing called Alpha-Zero. So maybe let me start there. Alpha-Zero, as this program we developed to, you know, like, be basically kind of solve, kind of perfect information on board games. And the way it works is it's also based on a reinforcement learning algorithm. So you can think of Alpha Zero as maybe three components, like one, a neural network,
Starting point is 00:05:00 two kind of large-scale reinforcement learning, basically learning from trial and errors. And three, it also has like this planning and kind of search component to it, to try to, given the current situation, trying to search for kind of the best answer. And it turned out that, you know, like maybe we weren't very much, that when we are doing chess, but if you can handle kind of infinite action spaces, then instead of looking for a chess move, you can look for, for instance, a line of a proof. And so that's what we try to do when we started off a proof. It's basically can we look, our action spaces to generate lines of proofs.
Starting point is 00:05:42 And maybe very importantly, we used formal language to do that. So basically another way to say that is we use kind of code to write math. and it's become quite popular recently. And the advantage of that is that once the proof is complete, then the machine would give you a signal back to say, yes, your proof is correct or not. And so we could search for kind of correct proofs. Once we find a correct proof, we can learn from it,
Starting point is 00:06:10 and get a better neural network and kind of go into this self-improving loop of tackling harder and harder problems and learning more and more about mass. So at a very high-level picture, that's kind of how it works and how kind of the ideas behind Office Zero are adapted to the mass. One of the things that was an interesting tidbit from your announcement was, you know, if there's a series of problems, solves one problems within minutes, three days to solve other problems. Is there a way you can characterize like the search space for math overall or what alpha proof is better at in terms of domains than others, like types of reasoning within math? The search space and math is quite large compared to something like chess or other board games. So, you know, some people might think of writing math proofs as like picking from a bag of known tricks at each step. But in fact, like there are many proofs where like you've got to come up with some non-triple constructions, like you've got to invent a function out of thin air or you've got to like come up with like some way to manipulate.
Starting point is 00:07:11 Even if you're just rewriting an expression, you can rewrite it in infinite ways. and there's only like a few ways you could rewrite it to actually make progress on the proof. Sometimes thinking about some novel problem requires like decades of theory building and approaching it from an entirely new perspective to arrive at the angle that helps you solve it. And so in that sense, you know, the reason why there are like a lot of math problems that are like very simple to state but have stood the test of time in that they've been unsolved for centuries even, is that the search space is not sort of easy to navigate in most cases. I think what Alphabroof is good at amongst the IMO categories is it's largely good.
Starting point is 00:07:51 So the IMO problems have come in four categories. So there's algebra, number theory, combinatorics, and geometry. The two that it's strongest at are algebra and numbers theory. It's relatively weaker at combinatorics, although it can do quite good at some IMO combinatorics problems. And we didn't apply it at geometry at this year's contest. and one of the ways in which it navigates this massive search space is via an idea that we came up with which we call test time RL. So this is an idea where like, let's say you're confronted with the problem
Starting point is 00:08:24 that you don't know how to solve and you know you can do some search with what you know right now and you're not able to find a proof to it. What the agent then does is it constructs many variations of the problem in the vicinity of that problem and attempts to solve all of them. and if it manages to solve any of them, it learns from that experience and sort of comes closer and closer to solving the original problem.
Starting point is 00:08:48 And you can think of this as like when confronted with the new scientific problem, many of your priors that you've developed from other problems may not directly apply to it. Instead, you've got to do like very bespoke experiments on this problem itself to learn something about it and then he'll climb your way towards the solution of this problem. And so this process kind of mimics that.
Starting point is 00:09:08 And so when you talk about, you know, these, like, the problems that are solved out to three days. Like, these are the problems that had this, in this loop of, like, we can't solve it with the base network, but we propose many, many variations. We try all of them. We get slightly better, slightly better, slightly better, and then we hill clam our way to the final solution after, like, days of inference time compute. Or what are the limitations or approaches to scaling or, you know, increasing the set
Starting point is 00:09:33 of problems that alpha-proof can solve? And I guess there's two dimensions to that. To your point, there's a set of areas where it already does quite. well, algebra and other areas. And then there's a set of areas where you may need new forms of training data or other things like certain aspects of geometry or, you know, potentially other areas of mathematics. I'm a little bit curious how you think about what is necessary to increase performance dramatically from here.
Starting point is 00:09:57 And obviously what you've done is incredibly impressive. I'm just sort of curious, like, how you think about what are the obstacles to future growth? Maybe the main thing that Alpha Proof doesn't do is theory. building, it doesn't imagine its theory, not for profoundly equipped this. Number theory wouldn't be able to come up with the complex analysis necessary to derive results in number of theories.
Starting point is 00:10:23 And it's one of the main things that we don't even we generate variants and get better by solving these and that enables us to tackle the original problems of interest. Theory building is a component that is that would be required to go further. Maybe another dimension. Some problems like combinatorics can be stated in a somewhat obfuscated manner.
Starting point is 00:10:52 That doesn't mean that all combinatorics problems, just as a minor communitarics problems are all about counting the number of things. For instance, if you have 40 socks in the drawer, how many pairs do you have? It can be stated in some obfuscated way and how to translate them in lean is a major difficulty and then how to solve then the is a bit, it's a bit unwieldy. How to best cycle this is still an open question.
Starting point is 00:11:19 I think there is a link a little bit to your theory building because, for instance, you know, if there were, in some sense, part of the difficulty comes from the fact that the tools you need to express those things don't actually exist currently in the, in the library that we're using. And so, for instance, if you had like some kind of things that expressed, you know, what is an agent, what is a, an environment, what is a strategy, then for instance, some most of the, you know, a lot of the combinatorics problems would be much easier to state. But because that doesn't exist at the
Starting point is 00:11:48 moment in some sense, kind of the auto formalization would need to kind of come up with all these things beforehand before actually kind of formalizing the problem. And so that's kind of, maybe that's kind of linked to the fact that, you know, at some point you will need to be able to develop new definitions, new mathematical objects, kind of come up with their properties and their proofs, et cetera, et cetera. So I guess I think it was 1900 when David Hilbert posed his famous 23 problems that kind of defined a lot of the big areas that at the time he felt were important to mathematics and the sort of unsolved things that were important or topical.
Starting point is 00:12:22 Or is there some Turing test equivalent, I guess, as a generic question. It's an excellent question. That totally defined kind of, you know, like the mathematics in the 20th century and maybe, you know, the million-eum problems. the seven millennium problems of which one has been solved so far is another attempt at defining kind of what you know what could be one trajectory of of mathematics for the 21st century what our chances of solving that that's like kind of that's like really hard question because we know you know our brain kind of thinks linearly and but we know from our experience and working on the eye that
Starting point is 00:13:01 we should actually try to think exponentially things might might change pretty quickly um At the same time, you know, I think we have no idea how hard maybe something like the Riemann hypothesis is, is it, you know, is it two orders of magnitude, three orders? Maybe it's like 10 orders of magnitude away from what we can do, right? Because we don't have an existence kind of proof, it's much harder to kind of have an estimate of how hard this actually is. But I imagine that, you know, solving that problem will involve like kind of creating brand new mass, brand new theories and things like this. So if we want to take alpha proof all the way there, I think that's a capability we need to, either it emerges, it's possible because to prove harder problems,
Starting point is 00:13:46 you start to need to be able to introduce this new kind of mathematical objects, to decompose the problem in some problems. And we see that already happening for the IMO at a small scale. But it's very possible that you could emerge from just trying to solve problems. Or potentially, you know, we have to kind of explain specifically think more about what does it mean to build theories and how can we encourage both to do these kind of things. Yeah, just for our listener friends who don't spend as much time on the unsolved math
Starting point is 00:14:18 problems, just to contextualize a little bit, the Remain hypothesis is, it essentially predicts that prime numbers follow a specific pattern, like distribution, and it has important implications. Maybe we could go from there to thinking a little bit about when you were, working, beginning to work on math as a search space, there are like lots of interesting hypotheses for like why go work on math at all. One is like the field itself, right? And so be curious if any of you have problems you want to work on. Another is the premise that like this sort of advancement in reasoning will allow us, it will transfer to other domains,
Starting point is 00:15:00 be they, you know, science or what we think of as more like language base, like less verifiable non-code, non-rules-based domains as well. Where do you all want to take this? I'm not really a mathematician, so I didn't have like a problem I absolutely wanted to solve, but it's been my favorite subject when I was a student. And I think, yeah, you're absolutely right. There are like, you know, at least two kind of main reasons why you would want to potentially spend a lot of time on mass.
Starting point is 00:15:27 One is that, I guess, it's been described as the language of the universe, and, you know, it's been extremely powerful to both describe and predicts the natural world. And of course, to shape it and you see that, you know, basically we see mass being at the core of all the technology we're using now. So kind of having a good understanding of mass is probably kind of very important to understand our current world. And then of course you could, as you alluded, to make an argument that, you know, kind of solving mass or, you know, like, which requires, you know, reasoning, generalization, abstraction, all these things that we think about when we're talking about
Starting point is 00:16:05 something like a cognitive HGI would be, is a path to, you know, go towards HGI, and that could really help him kind of the development of HGI. So I think at least for me, you know, these are maybe the two main reasons as to, you know, why Mass is a particular interesting, you know, topic to try to solve in a general way, even though, you know, like Mass, okay, let's say it's still kind of a constraint domain, but this kind of unbounded complexity, and actually it's representing a lot of stuff. But it might be worse kind of making a really kind of focused effort on it because of all the potential implications it may have. Again, that's something about my own motivation. I've always been interested in the question of making systems that get better by
Starting point is 00:16:56 thinking more. I looked at this in mesh translation and text diffusion. navigating agents that discover the environment where they do better by thinking more. Math seems to be, in particular, proving statements, so the core of alpha proof. Seems to be one of the last big challenge domains where I still had a long way to go. And one of the key ingredients would be figuring out a way of thinking more. I said that I can be searching more. That can be a test immoral and maybe the next step of thinking more will be thinking so much that the agent comes up with its own theories.
Starting point is 00:17:37 So math has a test bit for thinking more. What was my motivation? One of the big reasons to perceive AI at all is to like uncover the secrets of the universe, you know, like the deep questions of like why do things work the way they do and why are we here and you know, why am I conscious and what's going on? And I guess like math, like solving these pure math problems, it feels like one domain, where we're already, like, there's, like, a large number of people who are already just doing this.
Starting point is 00:18:02 They're seeing math as a search for truth. Like, you know, people are pursuing these problems, not because they have any applications often, but more that it's like, you know, what actually is the answer? Why does this thing work this way? Something like you, in solving the remand hypothesis feels like it has the flavor of, like, you know, a sort of pure search for truth, which is just quite a feeling. I guess, like, related to that, you know, there's a long history in mathematics of people, or in science in general or science and mathematics of doing things for their own sake
Starting point is 00:18:31 or for the pursuit of knowledge for its own sake. And, you know, Sarah mentioned as an example, number of theory, and its applications in cryptography, zero knowledge proofs are propping up in different ways in cryptocurrencies, you know, group theory and algebra propped up in quantum mechanics over time and was sort of developed beforehand and then applied and developed further there. Are there specific areas that you're most excited about from an applications area? for some of the work that you're doing? Or is it mainly doing it for the love of the theoretical part of it?
Starting point is 00:19:01 It's a good question. I think we can, maybe, you know, maybe the answer depends on each one of us. I think one thing that I'd be motivated about is to learn from mathematicians, you know, what they are interested in and what they find interesting in the current, you know, world of mathematics. And so, for instance, you know, I've been reading about this thing called the Lang Lang Langs program that is trying to connect different areas of math, And it was described to me a little bit as kind of, you know,
Starting point is 00:19:27 just like in physics where you're trying to look for this unified theory of physics. It's like trying this unified theory of mathematics where maybe, you know, we have number theory here and we have like geometry there. And, and, but maybe there is kind of something behind that, you know, that is more unifying. So I personally like those abstract ideas, even though maybe I don't understand them very well. But yeah, I'd be very motivated to just see, you know, like, you know, and what mathematicians care about, and they might disagree between themselves as well.
Starting point is 00:19:59 I think they would disagree, right? If you look at the historical examples, like when Vombelli starts working on imaginary numbers, he gets completely ridiculed by, I don't know, the more important mathematicians at the time. And many years later, we get alternating current and the ability to describe electricity. And so I think it's really interesting as a question. Like, if you have the machine, the machine is able to develop its own theorems, like, where do you point it? for interest or usefulness eventually. It's a really big question.
Starting point is 00:20:28 To that, one domain I'm particularly excited is code verification. That is, at the moment, when we write software, we write the code and rewrite tests. And when the tests fast, we are reasonably happy. Everyone's in a wild bugs that pass the test or even security issues. It would be much better if we could express encode the properties that the algorithm is supposed to verify. and to prove in code that the algorithm does verify these properties. So it's already done for very critical domain like Avionics, cryptography, where it's very important, but it has to be done by humans.
Starting point is 00:21:07 I think the software industry would be in a much better place if verification was much more common, and if we remove the bottleneck, which is humans writing those proofs. So if humans could be enabled by specialized tools, we could handle the miniature of these proofs. I think we'd have done a major stepfons. I think another application area I'm excited about is the transfer of this technology to many other domains.
Starting point is 00:21:33 There are two kinds of players who take. One is just the transfer of the mathematical reasoning skill that this agent acquires via this kind of training to many other domains. And as we see, math is critical for engineering and science and you name it. But also, like, you know, some of the, the sort of tech we developed here of like, you know, like scaling RL and like figuring out how to spend a lot of inference time compute stuff like this feels like it's quite generally
Starting point is 00:21:58 applicable to many other problems. I'm going to ask a question that is more, more conjecture, right? We've been talking about math and code and domains which are not easily formalizable, but are formalizable and verifiable. How much do you think this applies in the language domain can NAI make something funnier, and it's easier to tell if something is funny than to, like, write a good stand-up set, right? And so I think it's kind of an instructive example, but like, do you have any intuition for how to take some of these learnings around the ability to verify correctness or quality, you know, within the R.L approach? I guess, you know, there are, so as you said, you know, there are domains where there is a ground
Starting point is 00:22:50 truths and there are domains where they aren't a electron trust and so when they aren't and you know like for instance funny is probably kind of you know a human kind of it's a it's a fuzzy human concept and maybe there were kind of aliens out there they would have a different sense of humor and so for those kind of problems I think that you know the only way you can kind of get your branding is through like the humans are your grounding for other domains maybe you know the real world is some sort of grounding. And then, you know, you have to go to the real world to get your branding.
Starting point is 00:23:26 But basically, the kind of the R.L would allow you to say, well, you know, like, where does my reward come from? I guess sometimes it comes from the real world. Sometimes you can perfectly round it. And sometimes, you know, like the reward comes from the human. And I think that's fine. There is, there are some techniques, basically, that we develop in a sense that, you know, of course, when you can kind of machine check things, you can, well,
Starting point is 00:23:48 at a scale that is quite different from, you know, if you have to ask humans in general, even though we are quite a lot of humans. So there's still probably quite a lot of scaling that is possible, especially for things like this where, you know, like humor, where you could, you know, everyone should have a say about what is funny or not. So definitely, you know, for instance, the RL framework, I think is still a great framework to think about those questions.
Starting point is 00:24:17 And part of, you know, what makes RL work that should kind of also transfer. And maybe the things where we rely on perfect verification, that's maybe they wouldn't transfer to that particular question. Where is there a room for, I suppose, like human rating or input in terms of the explicit description or labeling of their reasoning in domains that are more verifiable, right? Like if you have unlimited access to Terry Tao to do labeling for you, is that useful? Or, you know, how should we think of that versus, like, just do more search and work on better, like, formalization?
Starting point is 00:24:57 The way alpha proof currently operates there is that it discovers its own proofs, and when they are valid, it lands from them and develop its own style, which has been commented upon as looking, yeah, quite alien. So with unlimited access to Territah would create proofs that are correct with some niceness as percised by this human. And we could start optimizing amongst the set of valid proofs towards proofs that look nice for purposes of interpretability for purposes of teaching. So there was definitely still room in this space of valid proofs for proofs that that humans might prefer. That's actually like a pretty damning statement to some degree, right? Because it's just a, it's just like preference versus perhaps like, you know, capability advancement.
Starting point is 00:25:55 Like what do I care if the proof looks alien if we have new knowledge? Interpretability seems useful in this case, but. Yes, I'm not saying that's the only angle for sure. With more supervised data, we can avoid the exploration problem. And we could translate all the pros that are unknown to man. And that certainly would make the agent much better. That would save us years of compute for, for sure. I'm not saying that interpretability is the only way we could use.
Starting point is 00:26:27 We could use the mathematician, for sure. But it could provide us a signal that we couldn't get otherwise, presumably. With time, compute, theory building, we might be able to, should maybe have rediscovered proofs that are already know. I think it's an interesting question because it like, it highlights the sort of complementarity of specialist human data and like RL data. And I think it's especially prominent with like LLMs, where like LLMs have generated,
Starting point is 00:26:55 have these like very strong human priors because they're pre-trained on a lot of human data. But then when we do RL with them, they have an opportunity to take these human priors and build on them and develop their own styles of doing things like Aval Proof has done. And I think one thing we've seen in this project is that often, like, the small amounts of precious human data can be really useful to exceed the agent's behavior and sort of, you know, get it from like a complete zero state to like somewhere much higher where it can play with the environment in a much more efficient way. And then beyond that, it can take it from there and, you know, perhaps match or exceed the human, but with its own style. So that's how, I guess, like, that feels to me, like, that's probably going to be the way forward with RL for LMs in general, is that, like, the specialist humans are going to serve to, like, get the LLM from, like, just a bunch of weights that knows how to do nothing to, like, something that, like, surprisingly strong. And then the RL is going to take you from there to, like, something that's, like, superhuman.
Starting point is 00:27:52 I think it was supposed to be point career, somebody who was the last mathematician who knew all of mathematics or at least, you know, understood big portions of it. And suddenly you have an AI or a system that could potentially encapsulate all of mathematics in a single program. And it could really be used as something that could help check a proof. It could help sort of push a mathematician forward in their own research. You know, often when somebody proves something now, if they're in a more side field, there aren't that many people who can actually verify or check the proof that they've done. How many mathematicians have access right now to alpha proof? How do you think about engaging with the mathematics community about day-to-day usage of
Starting point is 00:28:28 this pretty amazing set of advancements. So at the moment, you know, mathematicians don't have access to alpha proof. And to be honest with you, you know, like we at the moment, we can't rival at all with someone like Territale. We, I think we demonstrated that, what we've demonstrated is that we can learn
Starting point is 00:28:46 general mathematics, almost from scratch and arrive at kind of an impressive high school level. And then we need to grow our knowledge base so that we become useful. But I don't know. if you've seen kind of Terry Tau's kind of recent kind of interviews over the last year
Starting point is 00:29:04 and he's been saying that, you know, one thing that is interesting in mass is that collaborations has been relatively small. A lot of papers are one, two author paper, and max five authors. And it's been because it's been very hard to collaborate with more mathematicians because you need to check
Starting point is 00:29:20 what they are doing. And so it's very time consuming. But if you instead rely on a formal system to check everyone else's work, then you could do a little bit like an astronomy where you could have an amateur kind of living in the middle of maybe nowhere and you've never met. And then you wouldn't have to trust him like you could trust in some sense the machine to check the work.
Starting point is 00:29:44 And if the machine says it's a correct proof, then it's a correct proof. And then you can kind of start to work with many, many more people in some sense, kind of that's cool because you can kind of start to think about making collaborations between between many humans, but also potentially AI's, right? And so that could be, you know,
Starting point is 00:30:04 that's one way we are thinking about it as well. It's like, oh, can we, can we, could we potentially collaborate to those projects? And at the moment,
Starting point is 00:30:13 the way, the way we think about it is, you know, we think more of ourselves as some kind of, it would be great if we could be like a graduate student and kind of the mathematicians
Starting point is 00:30:22 could give us kind of the questions. I think that's maybe, you know, related to theory and building at the moment, But we're not really good at asking the right questions. And there's definitely kind of we would like to rely on the whole mathematical community to kind of ask the questions that they care about and see if we can help even a little bit
Starting point is 00:30:42 answering those questions. I was also a Go player growing up. And so, you know, seeing the least little match and like very, as you described, alien moves and go was incredibly interesting to me too. Is there something that has felt alien or surprising in terms of like an alpha proof so far that you can talk about? Yeah, sure. So let me show you guys the proof that or part of the proof that Alphabroof
Starting point is 00:31:11 Proof came up with for Problem 6 at the IMO. I think it'll be much more easy to understand if I flashed it on the screen. So this is Problem 6. So the IMO typically has six problems. and problems three and six are the hardest ones. And so this is supposed to be one of the hardest problems of this year's IMO. But it actually turned out to be one of the hardest problems ever because only five out of the 500-odd contestants managed to solve it.
Starting point is 00:31:43 So, you know, it is really hard. So we can just go over very quickly what this question is saying. And I won't go through the whole proof because it's probably incomprehensible in the time that we have. But like I'll point out like one cool construction that I'll have. who came up with. So the question talks about like the definition of a function being aquasulian. So aquasulian
Starting point is 00:32:01 is not really a mathematical term. This is just a tongue-in-cheek reference to the fact that Baath, where this contest was being held, used to be called aquasuli's by the Romans. But anyway, so they've given us two equations that characterize a aquasulian function. And we've got to show that
Starting point is 00:32:17 there exists some integer C such that for any aquacillian function F, there are at most C different rational numbers of this form. So this is a bit complicated to understand, but essentially this form is the sort of even part of a function. And what it's trying to say is that you could like take any acquisition function and give it many rational numbers and figure out this expression for many rational numbers are and you might get like infinite values or you might get five values or three values. So it's asking you to prove that there is some bound C beyond which you cannot have more unique values and also asking you to find what this bound C is. So often, so that's a question, what is C?
Starting point is 00:32:57 So often when you're trying to answer a question like this, one strategy you can use is to show an upper bound for C and a lower bound for C, and then show that those two converge, and then you find C. So I'm going to pick up in the proof from the point where AFA proof had already proved that C is less than or equal to two. That's itself quite an interesting proof, and you can look at it on our blog post if you're interested.
Starting point is 00:33:21 But at this point in the proof, Azo proof has to decide is C 1 or 2? And the way it's going to have to decide that is by finding... So it's trivial to show that C can be 1. What's hard to show is like, is there any acquisitive in function F such that C is 2? And, you know, you can pause and try and find functions
Starting point is 00:33:43 F that follow these properties and see if you can find one for which you can find two different values of this FFR plus FF minus are expressed. But one thing that's interesting is like Tim Gowers, who was one of our judges and is also a Fields Medalist, tried this question for a couple hours, and he couldn't find the construction for function that had this property, which is, of course, not to say that Alpha Proof is better than Tim Gowers at math, but it's just, it just highlights how hard this part of the problem
Starting point is 00:34:12 is. And Alpha Proof came up with this construction, which is this function, F of X equals minus X plus two seal of X, which is this funky-looking function that I've plotted over here. And interestingly, this function, if you plug in R equal to minus one and R equal to half, you get two different values. And so you can tell that C must be equal to two. So in my view, this is maybe the hardest problem that Alpha Proof ever solved. And it was a really cool, funky construction that relies on this seal function. Very cool.
Starting point is 00:34:46 Can we start with advice for people who are working on math today and investing time in it, given alpha-proof and related models. Formal math is going to be an increasingly important thing going forward, not just because AI is going to be really a very, very strong tool within formal math, but also because, like, it's emerging as a way of collaboration between mathematicians and many prominent mathematicians are adopting it. And, you know, it's still a small minority of the mathematical community that operates in Lean, but it's a growing minority.
Starting point is 00:35:20 And AI will only accelerate that. And so I guess I'm not a mathematician, so who am I to advise mathematicians? But like, I guess my advice from the outside would be like learn lean as early as you can. I realize as well it's a great tool for education because, you know, like most of the proofs are written at the level of abstraction of the person who writes a proof. And so for instance, you know, if that person thinks that step is trivial, then, you know, you wouldn't have any explanation about it. But if you have that proof in a formal language, then you can zoom in, you know, at the level of detail that is adapted for you. And so that's one thing. You can learn much more at your own level.
Starting point is 00:35:59 And the second thing is you can actually do self-play with yourself in a sense that, you know, maybe you are trying to learn about a very abstract part of mass. And that's, you know, the usual thing is, you know, it's difficult to explain mass because it's quite abstract. And then maybe you can't tell whether what you're doing is correct or not. And if you are doing it on paper, then maybe the next best thing you can do is to try to find someone or your teacher, then that might take you a lot of time, right, to get some feedback. But if you are using a formal language, then, you know, you would know exactly what's left to be to be proved. If you've done the thing correctly or if you've kind of fooled yourself a little bit. So even for educational purposes, I think it might be very interesting.
Starting point is 00:36:42 It's not really advice. It's more speculation. I said that there were two types of mathematicians, the ones who build theories and the one who proved theorems. I understand that in order to prove theorem, sometimes you have to build a whole theories, as well as done for the last theorem. But it seems that there is more room for human creativity, human taste, human scale, in building theories than in the proof part. One thing that I kind of struggle with here is it is increasingly true across a bunch of domains that are being, where the hill is being climbed with AI, I suppose, that taste seems to matter increasingly much. It could be Terry Tao taste in problems because if you're working on arithmetic progression in primes, it actually leads to, like, advancements in methods and tools. But I actually think it's really, one, it's challenging because, like, telling somebody go develop. a good taste is a more nuanced, you know, directive than make sure you learn lean, right?
Starting point is 00:37:53 But two, it's also pretty interesting and parallel to the work you guys have been doing in that to some degree, you know, if you look at the behavior of where you're applying computation, it's like, oh, well, why do you work on, you know, abstractly some problems around this area because it will improve your ability to solve the problem, right? And so I think maybe, as you said, we'll all do a little bit more test time RL. I don't know if we'll all develop taste, even if that looks useful. I saw this from somewhere, which is like, as machines get better at finding the answers, like we're going to have to get better at finding the questions.
Starting point is 00:38:26 And, you know, like these systems don't have a, you know, their own sort of notion of what questions are interesting. And given a large question, how do I break it down into smaller questions that might be interesting? And that's probably where, like, not just in math, but in many domains, like, even as a sort of software engineer, right, like, I find that, like, more and more my job is, like, not figuring out these sort of small, low-level details, but, you know, making, like, good high-level decisions that tools can help you with. And that's probably going to be true of more and more domains. Thanks so much for doing this. It's great to meet all of you, and I appreciate you doing this well. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel. If you want to see our faces, follow the show on. Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.