Dwarkesh Podcast - Terence Tao – Kepler, Newton, and the true nature of mathematical discovery

Episode Date: March 20, 2026

We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion.People sometimes say that AI will make especially fast progress at scienti...fic discovery because of tight verification loops.But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long.During this time, what we know today as the better theory can actually make worse predictions.And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy!Watch on YouTube; read the transcript.Sponsors- Jane Street loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Street’s ResNet challenge and posted a great walk-through on X. If you want to try one of these puzzles yourself, there’s one live now at janestreet.com/dwarkesh.- Labelbox can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train how it thinks, not just what it thinks. Whatever you’re focused on—math, physics, finance, psychology or something else—Labelbox can help. Learn more at labelbox.com/dwarkesh.- Mercury just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. It’s a super low-friction way to stay on top of your business. Learn more at mercury.com/insights.Timestamps(00:00:00) – Kepler was a high temperature LLM(00:11:44) – How would we know if there’s a new unifying concept within heaps of AI slop?(00:26:10) – The deductive overhang(00:30:31) – Selection bias in reported AI discoveries(00:46:43) – AI makes papers richer and broader, but not deeper(00:53:00) – If AI solves a problem, can humans get understanding out of it?(00:59:20) – We need a semi-formal language for the way that scientists actually talk to each other(01:09:48) – How Terry uses his time(01:17:05) – Human-AI hybrids will dominate math for a lot longer Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Transcript
Discussion (0)
Starting point is 00:00:00 Okay, today I'm chatting with Terence Tao, who needs their introduction. Terence, I want to begin by having you retell the story of how Kepler discovered the laws of planetary motion, because I think this will be a great jumping off point to talk about AI for math. Okay, yeah, so I've always had an amateur interest in astronomy, and so I've loved stories of how the early astronomers worked out, the nature of the universe. So Kepler was building on the work of Copernicus, who was himself building on the work of Aristarchus. So Copernicus very famously proposed the heliocentric model that instead of the planets and the sun going around the earth, that the sun was at the center of the solar system and the other planets were going around the sun. And Copernicus proposed that the orbits of the planets were perfect circles.
Starting point is 00:00:47 And his theory kind of fit the observations that the Greeks and the Arabs and the Indians had worked out over centuries. I think Kepler got interested. like he learned about these theories in his studies and he made this observation that the ratios of the size of the orbits that coroner predicted seemed to have some geometric meaning. I think he started proposing that if you take say the orbit of, say, the Earth and you enclose it in, I think, maybe a cube, the outer sphere that encloses the cube
Starting point is 00:01:22 almost matched perfectly the orbit of Mars and so forth. And there were six planets, known at the time, five gaps between them and there were five perfect platonic solids, the cube, the tetrahedron, isocry, octagian, and dodeca asian. And so he had this theory which he thought was absolutely beautiful that he could inscribe these platonic solids between the spheres of the planets and it seemed to fit. And it seemed to be to him like, you know, God's design of the planets was matching this mathematical perfection of the platonic solids.
Starting point is 00:01:53 So he needed data to confirm this theory. And at the time there was only one really high quality data set almost in existence, which was the, so Taekubrahi, this Danish astronomer, very wealthy, eccentric astronomer, had managed to convince the Danish government to fund this extremely expensive observatory, this, in fact, an entire island, where he had taken decades of observations of all the planets, Mars, Jupiter, every night, at least every night for which the weather was clear. With a naked eye, actually, he was last of the naked eye astronomers. and so he had all this data which Kepler could use to confirm his theory
Starting point is 00:02:29 and so Kepler started working with Tycho but Tack was very jealous of the data he only gave little bits of bit at a time and I think Kepler eventually just stole the data he copied it and had to have a fight with Brahe's descendants but he did work out he'd take out the data and then he worked out to kind of his disappointment that his beautiful theory didn't quite work like the data was sort of off from his platonic solid theory
Starting point is 00:02:55 by 10% or something. And he tried all kinds of fudges, moving the circles around and things. It didn't quite work. But he worked on this problem for years and years. And eventually he figured out how to use the data to work out the actual orbits of the planets. And that was incredibly clever, genius amount of data analysis. And then he eventually worked out that the or the ellipses, not circles, which was shocking for him. And then he worked out
Starting point is 00:03:25 the two laws of planetary motion, the ellipses, also equal areas, sweep out equal times. And then ten years later, after collecting a lot of data, the furthest planets like Saturn and Jupiter were the hardest for him to
Starting point is 00:03:42 work out. But then he finally worked out this third law also that the the time it takes for a planet to complete its orbit was proportional to some power of the distance to the sun. And these are the three famous Kepler's laws of motion. And he had no explanation for them. It was just all driven by experiment. And it took Newton a century later to give a theory
Starting point is 00:04:07 that explained all three laws at once. The take I want to try on you is that Kepler was a high temperature LLM, where Newton comes up with this explanation of why the three laws of planetary motion must be true. And of course, the way the Kepler discovered. is the laws of planetary motion, or figures out the relative orbits of the different planets is, as you say, a work of genius. But then, you know, through his career, he's just trying random relationships. And in fact, in the book in which he writes down the third law of planetary motion, it's sort of an aside on the harmonics of the world, which is this book about, you know, all these different planets have these different harmonies. And the reason there's so much famine
Starting point is 00:04:46 and misery on Earth is because the Earth is Mi-Fami, that's the note of Earth. And so all this random astrology. But in there is the Cube Square law, which tells you what relationship the period has to a planet's distance from the sun, which is, as you're detailing, if you add that to Newton's F equals MA and then the equation for centripetal acceleration, you get the inverse square law. And so Newton works that out. But the reason I think this is an interesting story is I feel like LLMs could do the kind of thing of like 20 years, let's try random relationships, some of which make no sense.
Starting point is 00:05:22 as long as there's a verifiable data bank like Brahe's data set where, okay, I'm going to try out random things about like musical notes. I'm going to try out random things about platonic objects. I'm going to all these different geometries. I have this bias.
Starting point is 00:05:32 There's some important thing about the geometry of these orbits. And then one thing works. And as long as you can verify it, it can then drive, these empirical regularities can then drive actual deep scientific progress. Traditionally, when we talk about the history of science,
Starting point is 00:05:46 idea generation has always been kind of the prestige part of science. So, I mean, a scientific problem comes with there's many steps you know you have to identify a problem and then you have to identify a good problem to work on a fruitful problem and then you need to collect data you need to figure out a strategy to analyze the data to make a hypothesis and at this point you need to propose a good hypothesis and then you need to validate and then you need to write things up and explain there's this there's a dozen different components but yeah the ones we celebrate are these of eureka genius
Starting point is 00:06:18 moments of idea generation. And yeah, so Kepler certainly had to, as you say, cycle through many ideas and several which didn't work and I bet many that he didn't even publish at all, because yeah, they just didn't fit. And that's an important part of the process, trying all kinds of random things and seeing if they worked. But as you say, they have to match by an equal amount of verification. Otherwise, it's slop. I mean, we celebrate Kepler, but we should also celebrate Brahe for his assidious data collection, which was sometimes more precise than any previous observation. And that extra decimal point of accuracy was actually essential for Kepler to get his results. And he was using, you know, Euclidean geometry and like the most advanced mathematics
Starting point is 00:07:13 he could use it at the time to match his models with the data. So, like, all aspects had to be in play. You know, the data and the theory and the hypothesis generation. I'm not sure nowadays that hypothesis generation is the bottleneck anymore.
Starting point is 00:07:33 Sciences has changed in the century since. So, classically, sort of the two big paradigms for science were theory and experiment. Then in the 20th century, numerical simulation came along. And so you can also do computer simulations to test theories. But then finally, in the late 20th century, we had big data. We had the era of data analysis.
Starting point is 00:07:58 And so a lot of new progress is actually driven now by analyzing massive data sets first, collecting large data sets, and then drawing the patterns from them to deduce laws, which is a little bit different from how science used to work, where you make a few observations, or you just have one out of the blue idea and then you collect data to test your idea. That's a classic scientific method.
Starting point is 00:08:19 Now it's almost reverse. You collect big data first and then you try to get hypotheses from it. I mean, Kepler was maybe one of the first early data scientists, but even he didn't start with Tycho's data set and analyze it. He had some preconceived theories first.
Starting point is 00:08:37 But it seems like this is less and less the way we make progress just because, yeah, the data is just so much more massive, it's just so much more useful. Oh, interesting. I actually feel like the mold of 28th century science that you're describing is actually very well describes what happened in Kepler where he did have these ideas. 1595 and 96 is where he comes up with first polygons and then platonic objects theory. But they were wrong.
Starting point is 00:09:06 And then a few years later, he gets brought his data. and it's only after 20 years of just trying random things that he gets this empirical regularity. And so it actually feels closer to Brahe's data is analogous to some massive data bank of simulations. And then now that you've got the data, you can keep trying random things. But if it wasn't, Kepler would be out there
Starting point is 00:09:28 just writing books about harmonics and platonic objects and there would be nothing to actually verify against. Yeah, yeah, yeah. So the data was extremely. important. But the distinction you're not trying to make is that sort of traditionally you make a hypothesis and then you test it against data. Yeah. But now with machine learning and data analysis and statistics and something, you can start with data and through, say, statistics work out laws that were not a person before. So Kepler's third law is a little bit like this,
Starting point is 00:10:03 except that for the third law, instead of having the thousand data points that Brahe had, Kepler had like six data points. Like every planet, you knew the length of the orbit and the distance of the sun. And there was like five or six data points. And he did what we would now call regression. You know, he could fit a curve to these six data points. And he got a square cup law, which was amazing. But actually, he was quite lucky, I mean, that these six data points gave him the right conclusion.
Starting point is 00:10:30 You know, that's not enough data to be really reliable. There was a later astronomer, Johannes Borda, who told him. who took the same data, actually, the distances to the planets. And inspired by Kepler, I think, he had a prediction that the distances of the planets formed basically a shifted geometric progression. He also fit a curve. Except that there was one point missing. So there was a big gap between Mars and Jupiter.
Starting point is 00:10:55 His law predicted that there was a missing planet. So it was a kind of a crank theory, except when Uranus was discovered by Herschel, the distance to Uranus fit exactly this pattern. And then series was discovered this asteroid between, I think, in the asteroid built. And it also fit the pattern. So people got really excited that the board had discovered this amazing new law of nature. But then Neptune was discovered, and it was completely like way off. And basically it was just a numerical fluke.
Starting point is 00:11:26 There was six data points. Yeah, so maybe one reason why Kepler didn't highlight his third law as much as the first two laws. is that maybe instinctively, even though he didn't have modern statistics, he kind of knew that with six data points, he had to be somewhat tentative with the conclusions. But maybe to ask the question about the analogy more explicitly, does this analogy make sense to if we have, you know, in the future we'll have smarter and smarter AIs
Starting point is 00:11:55 and we'll have millions of them? And then they can go out and hunt for all these empirical regularities. It sounds like you don't think the bottleneck in science is finding more things that. that are for each given field, they're equivalent of the third law planetary motion so that then later on somebody can say, oh, we need a way to explain this,
Starting point is 00:12:13 let's work out the math. Here's the inverse square law of gravity. Right. So I think AI has basically driven the cost of idea generation down to almost zero. Yeah. In a very similar way to how the internet drove the cost of communication down to almost zero.
Starting point is 00:12:27 Yeah. Which is an amazing thing, but it doesn't make, it doesn't create abundance by itself. Yeah. So now the bottleneck is different. So we're now in a situation where suddenly people can generate thousands of theories for a given scientific problem. And now we have to verify them, evaluate them.
Starting point is 00:12:45 And this is something which we have to change our structures of science to actually sort this out. So, you know, in fact, traditionally we build walls. So in the past, before we had AI slop, you know, we had sort of amateur scientists, you know, have their own theories of the universe, many of which were basically of very little value. And so we've built these like peer review publication systems and things to kind of filter out and try to isolate the high signal ideas to test. But now that we can generate these these possible explanations
Starting point is 00:13:22 at massive scale and some of them are good and a lot are terrible, I mean, human reviewers, they're already being overwhelmed, actually. Many, many journals are reporting AI during submissions are just flooding their submissions. So it's great that we can generate all kinds of things now with AI, but it means that the rest of the aspects of science have to catch up. Verification, validation, and assessing what ideas actually move the subject forward and which ones are dead ends or red herrings.
Starting point is 00:13:57 And that's not something where we know how to do at scale. You know, for each individual paper, we can discuss it with, you know, have a debate among scientists and get to a consensus in a few years. But when we're generating, you know, a thousand of these every day, you know, this doesn't work. So I think there is this incredibly interesting question of if you have billions of AI scientists. Not only how do you gauge which ones are real progress, but how do you, I mean, this is actually a question that human sciences had to face. And we've solved somehow. And I'm what I actually am not sure how we solve this. But in any given field, let's say in their 1940s.
Starting point is 00:14:31 And there's, if you're at Bell Labs or if you're just generally trying to, there's these new technologies coming out, pulse code modulation, basically how do you transfer signals, how do you digitize signals, how do you transfer them over analog wires? And then, but there's like all these papers about the engineering constraints there and the details. And then there's one which is like comes up with the idea of the bit, which has implications across many different fields. And you need some system which can then look at that and say, okay, we need to apply this
Starting point is 00:14:56 to probability. We need to apply this to computer science, et cetera. And in the future, the AIs are coming up with, you know, the next version of this kind of unifying concept. And how would you identify it among millions of papers which might actually constitute progress, but which have much less general unifying ideas? A lot of us are the test of time. So many great ideas didn't actually get a great reception at the time that they were first proposed. It was only after some other scientists realized that they could take it further and apply them to their own, you know, deep learning itself.
Starting point is 00:15:28 was actually a niche area of AI for a long time. The idea of getting answers entirely through training on data and not through first principles reasoning was very controversial. It took a long time before it actually started bearing fruit. You mentioned the bit. I mean, there were other proposals for computer architectures than the 01 that is universal today. I think there were trits, you know, zero one, three-valued logic. And in an alternate universe, maybe a different paradigm would have,
Starting point is 00:15:58 would have showed up. People have argued that the transformer, for example, is the foundation of all modern large language models. And it was the first deep learning architecture that really was sophisticated
Starting point is 00:16:09 enough to capture language. But it didn't have to be that way. There could have been some other architecture that was the first to do it. And once that was adopted, it would become the standard. So I think one reason why it's hard to assess
Starting point is 00:16:25 whether a given idea is going to be fruitful is that it depends on the future. It depends on, and it depends also on the culture and society, like which ones get adopted, which ones don't. You know, the base 10 numeral system in mathematics extremely useful, much better than the Roman numeral system,
Starting point is 00:16:43 for instance. But again, there's nothing special at 10. It's a system that we, it's useful for us because everyone else uses it. And we've standardized it and we've brought all our computers and our number of representation systems around it. And so we're stuck with it now, actually. You know, some people occasionally push for other systems than decimal,
Starting point is 00:17:03 but it's this is no, this is no, there's too much inertia. So you can't look at any given scientific achievement purely in isolation and give it an objective grade without being aware of the context, both in the past and the future. And so it may never be something that you can just reinforce and learn the same way that you can for much sort of more localized problems. Yeah.
Starting point is 00:17:32 It seems often in the history of science when a new theory comes up that in retrospect we realize it's correct, it seems to make implications that just either make no sense because they're wrong and we realize later on why they're wrong or they're correct
Starting point is 00:17:48 but seem wildly implausible at the time. So as you talked about, Aristarchus had heliocentrism in the third century BC and then the ancient Athenians were like, this can't be because if the Earth is going around the sun, we should see the relative position of the stars
Starting point is 00:18:05 change as we're going around the sun. And the only way that wouldn't be the case is if they're so far away that you don't notice any parallax, which is actually the correct implication. But there's times when actually the implication isn't correct and we just need to graduate to a better level of understanding. So Leibniz would, you know, chide Newton and disagree with Neon's here gravity
Starting point is 00:18:24 on the basis that it implied action at a distance, and then there's, we don't know the mechanism and Newton himself was sort of stunned that inertial mass and gravitational mass were the same quantity. So all these things, which were resolved by Einstein. Yes, yes. But it was still progress. And so the question for a system of peer-review for AI would be,
Starting point is 00:18:43 even if you can falsify a theory, how would you notice that it still constitutes progress relative to the thing before? Yeah, so often actually the ultimately correct theory initially is worse in many ways. Yeah, so Copernicus's theory of, of the planets, it was less accurate than Tomley's theory. So geocentrism had been developed for, you know, a millennium by that point. And they had made many, many tweaks and very increasingly complicated ad hoc fixes to make it more and more accurate.
Starting point is 00:19:13 And Copernicus theory was a lot simpler, but much as accurate. There was only a couple that made it more accurate than Tomley's theory. I mean, science is always a work in progress. So when you only get part of the solution, it looks worse than a theory which is incorrect, but somehow has been completed to the point where it kind of answers all the questions. As you say, Newton's theory had big mysteries, the equitiveness of mass and action at a distance, which were only resolved with a very conceptually different approach centuries afterwards. often progress has been made actually not by adding more theories, but by deleting some
Starting point is 00:19:59 assumptions that you have in your mind. So, you know, one reason why geocentrism held up for so long is we had this idea that objects naturally want to stay at rest. This is the aristotle notion of physics. And so the idea that the Earth was moving, you know, how come we weren't all sort of all falling over, you know, once you have neutral laws of motion, you know, object and motion remains in motion and so forth, then it makes sense. But you had to, so conceptually, it's a very big conceptual leap to realize that the earth
Starting point is 00:20:29 is in motion. It doesn't feel like it's in motion. And like the biggest advances, you know, Darwin's theory evolution, you know, is the idea that species are not static. But, you know, it's not obvious because you don't see evolution in your lifetime. Well, now we can actually can. with um but um you know it's it seems it seems permanent and static um you know right now we're going through a and um and cognitive version of the kuburnican revolution where we used to think
Starting point is 00:21:03 that human intelligence is the center of the universe and now we're actually seeing that there's there's very different types of intelligence that that are out there with very different strengths and weaknesses um and so um our assessment of which tasks require intelligence which ones don't, it has to be reordered quite a bit. And so, you know, it's trying to fit AI into sort of our theories of scientific progress and what is hard and what is easy. We're struggling quite a lot. We have to ask questions that we've never really had to ask before.
Starting point is 00:21:32 Or maybe the philosophers had, but now we all have to deal with it. This actually brings up a topic I've been very curious about. So you mentioned Darwin's The Year of Revolution. There's this book, The Clockwork Universe, by Edward Dahlnack, which covers a lot of this era of history we're talking about. And he has this interesting observation in there that the origin of species is published in 1859. The Principia Mathematica is published in 1687. So the origin of species comes out basically two centuries after the principia.
Starting point is 00:21:58 And conceptually, it seems like Darwin's theory is simpler. There's a contemporaneous biologist to Darwin who reads the origin of species Thomas actually. And he says, how stupid not to have thought of that? And nobody ever says that about friendshipia. The chiding themselves are not having beaten you to gravity. And so there's a question of, well, why did it take? longer. It seems like a big part of the reason is that the evidence for natural selection is cumulative and retrospective, whereas Newton can just like, here's my equations,
Starting point is 00:22:27 let me see the moon's orbital period and its distance. And if it lines up, then we've made progress. And so Lucretius actually had the idea, this idea that species adapted to their environment in the first century, BC. But nobody ever like really talks about it until Darwin, because Lucretius can't run some experiment and people are forced to pay attention. And so I wonder if wheel and retrospect end up seeing much more progress in domains which have this kind of tight data loop
Starting point is 00:22:56 where you can verify them quite easily even though they're conceptually much more difficult. I think one aspect of science is it's not just creating a new theory and validating it, but communicating it to others. So Darwin was actually an amazing science communicator. He wrote in English, in natural language, I'm speaking like a... In Olean.
Starting point is 00:23:19 Okay, my, yeah, okay. I have to sort of get out of my technical mindset. Yeah, okay. He spoke in plain English. You know, it didn't use equations. And he synthesized a lot of, you know, disparate facts. Yeah, so, you know, little pieces of evolution had been worked out in the past, but he had this very compelling vision.
Starting point is 00:23:40 And again, still missing things. Like he didn't know the mechanism for hereditary. He didn't have DNA. Yeah, but his writing style was persuasive, and that helped a lot. Newton wrote in Latin. He had invented entire new areas of mathematics just to explain what he was doing. He was also from an era which was where scientists were much more secretive and competitive. So, you know, academia is still competitive.
Starting point is 00:24:08 But it was even worse back in Newton. Day. So he held back some of his best insights because he didn't want his rivals to get any advantage. He was also actually somewhat unpleasant person from what I what I gather. So it was actually only a couple of decades after Newton where other scientists explained his work in much simpler terms that they became widespread. So yeah, the art of exposition and making a case and creating a narrative is also a very important about a science. And if you have the data, it helps, but people need to be convinced,
Starting point is 00:24:49 otherwise they will not push it further. Or they want to take initial investment to learn your theory and really explore it. And that's another thing which is really hard to reinforce and learn on. How can you score, how persuasive you are? Well, there's the entire marketing departments who are trying to do this.
Starting point is 00:25:06 So maybe it's good that AI are not yet optimized to be persuasive. So, yeah, there's a social aspect to science. Even though we pride us also having an objective side to it where there's data and there's experiment and validation, we still have to tell stories and convince our fellow scientists. And that's a soft, squishy thing. Like it's a combination of data and, yeah,
Starting point is 00:25:39 And painting a narrative, and it's been out of gaps, you know, I mean, as you know, so even Darwin, as I said, there are pieces of this theory he cannot explain. But he could still make a case that, you know, in the future, people would find transitional forms, that they would find the mechanisms of inheritance. And they did. Yeah. I don't know how you can quantify that in such a precise way that you can start to reinforce something. Maybe that would be forever the human side of science. One takeaway I had from reading and watching your stuff on the cosmic distance ladder. By the way, I highly, highly, highly recommend people watch your series with Thru the One Round on the Cosmic Distance Ladder.
Starting point is 00:26:21 But one takeaway was that the deductive overhang in many fields could be so much bigger than people realize where if you just had the right insight about how to study a problem, you might be surprised at how much more you could learn about the world. And I wonder if you think that's sort of a product of astronomy at the particular times and history that you're studying or is just that based on the data that is incident on the Earth right now, we could actually divine a lot more than we happen to know. Right. So astronomy was one of the first sciences to really embrace data analysis and squeezing every last possible drop of information out of the information they had. because data was the bottleneck.
Starting point is 00:27:05 I mean, it still is the bottleneck. I mean, it's really hard to collect astronomical data. So astronomers are the best, you know, almost, world class in extracting, you know, almost like Sherlock. You know, it's just like extracting all kinds of conclusions from little traces of data. I hear that a lot of quant hedge funds, they're preferred hires in astronomy PhD.
Starting point is 00:27:28 I understand. They also are very interested for other reasons in extracting signals from, from various random bits of data. Okay, speaking of clever ideas, one of my listeners, Sean, solved the puzzle that Jane Street made for my audience and posted a great walkthrough on X.
Starting point is 00:27:44 For context, Jane Street trained a Resnet and then shuffled all 96 layers and then challenged people to put them back in the right order using only the model's outputs and training data. You can't brute force this. There's more possible orderings than atoms in the universe. So Sean broke the problem into two different parts. First, pair the layers into 48 different blocks, and second, put those blocks in the right order.
Starting point is 00:28:06 For pairing, Sean realized that in a well-trained resonant, the product of two weight matrices in a residual block should have a distinctive negative diagonal pattern. And this arises as a way for the model to keep the residual stream from growing out of control. From this insight, he was able to recover the right pairings. For ordering, Sean noticed that the model seemed to improve if he sorted the blocks by the size of the residual contributions. Starting with that rough approximation, he combined a clever ranking heuristic with local swaps to recover the exact right order. His full walkthrough is linked in the description. Don't worry if you didn't get to this puzzle in time, though.
Starting point is 00:28:40 There's still one up about backdoor Dallelms that even Jane Street doesn't know how to solve. You can find it at jane street.com slash thwar cash. All right, back to Terence. We do under-explore sort of how to extract extra information from various signals. Like, I just to pick one random study, I remember reading once that people had discovered, we're trying to measure how often scientists actually read these citations, the papers that cite.
Starting point is 00:29:11 So how do you measure this? You can try to survey different scientists. But they had some clever trick. So many citations have little typos, like a number is wrong or punctuation symbols wrong. And they measured how often a type of work got copied from one reference to the next. And they could infer whether an author was actually just copying, cutting and pasting a reference without actually checking it. And so from that, they were able to infer some measure of sort of how much attention people were paying.
Starting point is 00:29:46 So there are also clever tricks to extract. So these questions you posed earlier of, you know, how can we assess whether a scientific development it's fruitful or interesting or represents real progress. Maybe there are really useful metrics and or footprints of this, of this, of this, of this, of this, of this, of this, of this, of this phenomenon in in a data, we can, we can examine citations and, and like how often something is mentioned in a conference or something. And maybe that there's, there's a lot of sociology of science research to be, to be done. And that could actually detect these things. maybe we usually get some astronomers on the K-section. Okay, so I think this brings us nicely to the progress that from the outside, it seems like AI for Math is making.
Starting point is 00:30:39 And I think you had a post recently where you pointed out that over the last few months, AI programs have solved 50 out of the 1,100 odd, or those problems. But then I think, I don't know if it's still correct, but as of a month ago, you said that there had been a pause because the low-hanging fruit had been picked. first of all, I'm curious if actually that is still the case that we have picked a low-hanging fruit and now we're at this plateau currently. It does seem so.
Starting point is 00:31:03 I mean, there's so activity at the other. Yeah, so 50 odd problems have been solved with AI systems, which is great, but there's like 600 to go. And people are still chipping away at one or two of these right now. We're seeing a lot fewer sort of pure AI solutions now where the AI just one-shots the problem. So there was a month where that happened and that has stopped.
Starting point is 00:31:27 An awful lack of trying. I know three separate attempts to get Frontier Model AI to just attack every single one of the problems somewhat seriously. And they picked out some minor observations or maybe they found that some problems I already saw from the literature, but there hasn't been any further AI purely powered solution yet. People are using AI a lot currently.
Starting point is 00:31:50 So someone might use AI to generate a possible proof strategy. and then another person will use a separate AI tool to critique it or rewrite it or generate some numerical data for it or do a literature survey. And some problems have been solved by an ongoing conversation between lots of humans and lots of AI tools. But it does seem like it was this one-off thing. So maybe one analogy for these problems is like imagine like there's all these that you're in some sort of mountain range with all kinds of cliffs and walls. And maybe there's a, there's a little, um, wall which is maybe like three feet high and one there's six feet high and then there's 15 feet high and then there's, there's some mile high
Starting point is 00:32:36 cliffs. Um, and you're trying to climb as many of these cliffs as possible. But it's in the dark. Uh, we don't know which ones are told, which ones are short. And, um, so, you know, we try to light some candles and make some maps and slowly we kind of figure out, uh, some of them are climable. some of them we can identify some partial track in the wall that you can reach first. And then these AI tours, they're kind of like these jumping machines that can kind of jump, you know,
Starting point is 00:33:04 two meters in the air, you know, higher than any human. And sometimes they jump in the wrong direction and sometimes they crash, but sometimes they can reach the tops of the lowest, you know, walls that we couldn't reach before. And so we basically set them loose in this mountain range hopping around. and then there's this exciting period where they could actually find all the low ones and they could reach them.
Starting point is 00:33:29 But then there's been no, I mean, maybe if the next time there's a big advance in the models, then they will try it again and maybe a few more will be breached. But it's a different style of doing mathematics than sort of the, you know, so normally we would hill climb and we would make a little,
Starting point is 00:33:51 markers and identify partial things and, you know, these tools they either succeed or they fail and they've been
Starting point is 00:34:00 really bad at creating sort of partial progress or identifying intermediate stages that you should focus on first.
Starting point is 00:34:08 Again, going back to this previous discussion, you know, we don't have a way of evaluating partial progress. The same way
Starting point is 00:34:15 you can evaluate a one shot success or failure of solving a problem. So there's two different ways to think through what you've just said.
Starting point is 00:34:22 And one of them is more bearish on the eye progress and one of them is more bullish. And bearish on being, oh, they're only getting to a certain height of wall, which is not as high as humans are reaching. And the second is that, well, they have this powerful property that once they achieve a certain water line, they can fill every single problem that is available at that water line, which we simply can't do with humans where we can't make a million copies of you and give each of them a million dollars of inference compute and have you. have you do 100 years of subjective time research on
Starting point is 00:34:54 100 different problems at the same time or a million different problems at the same time. But once AI's reached Terence Tower level, they could do that. And once they reach intermediate levels, they could do the intermediate version of that. So the same reason that we should be bearish now is the reason we should be especially bullish, not even when they achieve superhuman intelligence,
Starting point is 00:35:13 but just when they achieve human level intelligence because their human level intelligence is qualitatively wider and more powerful than our human level. level intelligence. I agree. Yeah. So they excel at breadth and humans excel at depth, and human experts at least. Yeah. So I think they're very complementary. But our current way of doing math and science is focused on depth because that's where the human expedies is, because humans can't do breadth. But yeah, so we have to redesign the way we do science to take full advantage of this breadth capability that we now have. So as I said, we do it.
Starting point is 00:35:50 we should have a lot more effort in creating very broad classes of problems to work on rather than one or two really deep important problems. I mean, we should still have the deep important problems. And humans should still be working on them. But now we have this other way of doing science. You know, I mean, we can explore entire new fields of science by first getting these broad, moderately competent AI to sort of map it out and clear out all the easy, make all the easy observations,
Starting point is 00:36:21 and then identify certain islands of difficulty, which then human experts can come and work on. So I see very much a future of very complementary science. Eventually, you would hope to get both breadth and depth, you know, and somehow get the best of both worlds. But I think we need practice with the breadth side. It's too new.
Starting point is 00:36:46 We don't even have the paradigms really to, to make fuller advantage of it. But we will. And then science will be unrecognizable after that. To this point about complementarity, programmers have noticed that they're way more productive as a result of these AI tools. And I don't know if you as a mathematician feel the same way,
Starting point is 00:37:08 but it does seem like one big difference between vibe coding and vibe researching is that with software, the whole point of the thing is to have some effect on the world through your work, and if it leads to you better understanding a problem or are you coming up with some clean abstraction to embody in your code, that is instrumental to the end goal. Whereas maybe with research, the reason we care about solving the millennium-private problem is presumably that in the process of solving them, we discover new
Starting point is 00:37:38 mathematical to objects or new techniques and those who understand our civilization's understanding of mathematics. And so the proof is sort of instrumental to the intermediate work. I don't know if you agree with that dichotomy or if that in any way will explain the relative uplift we'll see in software versus research. Right. Yeah. So certainly in math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring the progress. I think even in software, there's different types of software tasks. I mean, you know, like if you just kind of create a webpage that does the same thing that a thousand other web pages do.
Starting point is 00:38:17 There's sort of no skill to be learned. Well, there's still some skill maybe that the individual programmer could pick up. But, you know, for kind of a boilerplate type code, definitely, you know, it's something that you should definitely offload to AI. But, you know, sometimes once you make the code, you know, you still maintain it and there's issues who are upgrading it and making compatible with other things. And that, I think, I've heard that that, programmers are reporting, you know, that even if an AI can create the first prototype of a
Starting point is 00:38:50 tool, making it mesh with everything else and making it interact with the real world and the way they want. I mean, that's an ongoing process. And if you didn't have the skills that you pick up from writing the code, that may impact your ability to maintain it down the road. So certainly mathematicians, you know, we've used problems to build intuition and to train people, to have a good idea as what's true, what to expect, what is provable, what is difficult. And so, yeah, just getting the answers right away
Starting point is 00:39:28 may actually inhibit that process. I mean, so I made distinction between theory and experiment before. So in most sciences, there's an equal division between there's a theoretical side and experimental side. But math has been almost unique because it's almost entirely theoretical. We place a premium on sort of trying to have coherent, clean theories of why things are true and false.
Starting point is 00:39:56 And we haven't done much experiments as to the, like, maybe we have two different ways to solve a problem, which one is more effective. We have some intuition, but we haven't done large-scale studies where we take a thousand problems and we just test them. But we can do that now.
Starting point is 00:40:12 So I think AI-type tools, we really will actually revolutionize the experimental side of math, where you don't care so much about individual problems and the process of solving them, but you want to gather just large-scale data about what things work, what things don't. Same way that if you want to, if you're a software company and you want to roll out a thousand pieces of software, you know, you don't really want to handcraft each one and learn lessons from each. You just want to find what are the workflows that let you scale? So we don't yet, the idea of doing mathematics at scale is at its infancy. But that's where AI is really going to revolutionize the subject.
Starting point is 00:40:52 Interesting. I feel like a big crux in these conversations about how good AI will be for science is, I think you said this, like, oh, they're using existing techniques and modifying them. And it would be interesting to understand how much progress one can make simply from using existing techniques. Like how much of, if I looked at the top mad journals, how many of them are, how many of the papers are coming up with whatever coming up with the technique means, doing that versus using existing techniques in new problems? And what the overhang is, where if you just applied every known technique to every open problem, would that just constitute a humongous uplift in our civilization's knowledge? Or would that not be that impressive and useful? This is a great question.
Starting point is 00:41:37 We don't have the data to fully answer it yet. Certainly, a lot of work that human mathematicians do, you know, when you take a new problem, one of the first things we do is we just find, we look at all the standard things that have worked on similar problems in the past and we try them one by one. And sometimes that works. And that's still worth publishing sometimes because the question was important. Sometimes they almost work and you have to add one more wrinkle to it. And that's also interesting. But then, you know, the papers that go into the top journals are usually ones where you, you know, the existing methods. methods can kind of solve, you know, 80% of the problem, but in that this is 20% which is resistant.
Starting point is 00:42:16 And a new technique has to be invented to fill in the gaps. It's very, very rare now that a problem gets solved with sort of no reliance on past literature, where all the ideas come out of nowhere. You know, that was more common in the past, but math is so mature now that it's, it's just so much of a handicap to to not use the literature for So, yeah, AI tools are getting really good at the first part of that, just trying all the standard techniques on a problem, often now actually making fewer mistakes in implementing them than humans. They still make mistakes, but I've tested these tools, you know, on, on like little tasks that I can do. And sometimes they pick up errors that I make. Sometimes I pick up errors that they make. It's about a tie right now. But yeah, I haven't yet seen them take the next step. You know, so when there were holes in the argument where none of the things are working,
Starting point is 00:43:22 then what do you do? And then they can kind of suggest random things. But often I find that trying to chase them down and make them work and finally they don't work, it wastes more time than it saves. So now, so I think some fraction of problems that we currently think are hard. will fall from this method. I mean, especially the ones that haven't received enough attention. So like with the Urdish problems, you know, like almost all of the 50 problems that were solved by AIs were ones for which basically there was no literature. I mean,
Starting point is 00:43:55 I'll just put the problem once or twice. I think maybe some people tried it casually and they couldn't do it, but they never wrote up anything. But it turned out that there was a solution and it was just, you know, maybe combining this one obscure technique that not many people know about with some other result in the literature. And that's the kind of the median level of what AI can accomplish. And that's really great. It clears out 50 of these problems.
Starting point is 00:44:20 So I think you'll see some isolated successes. But what we found, so people have to have done large-scale sweeps of these early problems. And like if you only focus on the success stories that get broadcast on social media, that looks amazing. You know, like all these problems
Starting point is 00:44:35 that haven't been sold before for decades now they're falling. But whenever we do a systematic study, any given problem, an AI tool has a success rate of maybe 1 or 2%. It's just that they can buy a scale and if you just pick the winners, it looks great. So I think there will be a similar thing happening with, you know, there are hundreds of really prestigious, difficult math problems out there. A couple may, you know, some A.m. may get lucky and actually solve them.
Starting point is 00:45:03 And there was some backdoor to solve the problem that everyone else missed. And that will get a lot of publicity. But then people will try these fancy tools on their own favorite problem. And they will again experience the 1 to 2% success rate. Right. So there will be a lot of noise amongst the signal of sort of when they're working, when they're not. We have to do, it's increasingly important to collect these really standardized data sets. You know, there are efforts now to create a standard set of challenge problems for AI to solve.
Starting point is 00:45:37 And not just rely on the AI companies to only publish their wins. and not disclose their negative results. So that will maybe give more clarity as to where we're actually at. Well, I think it's worth emphasizing how much progress in AI constitutes already to have models that are capable of applying some technique that nobody had written down is applicable to this particular problem. The progress is simultaneously amazing and disappointing. It is a very strange feeling to see these tools in action.
Starting point is 00:46:07 and, you know, but also be climatized really quickly. You know, I remember when Google's web search came out 20 years ago, and it just blew all the other searches out of the water. Like, you're just getting relevant hits on the front page, like perfectly, almost, you know, exactly what you wanted. And it was amazing. And then after a few years, you just took for granted that you could just Google anything. And, yeah, so a lot of, yeah, I mean, 2026 level AI would be stunning in 2021.
Starting point is 00:46:34 And a lot of it, you know, face recognition, natural speech, yeah, doing, you know, college level math problems, we just take for granted now. Right. Yeah. Okay. So speaking of 2026, yeah, you made a prediction in 2023. Then I think by 2026, what was it, it would be like a colleague in mathematics or? Yeah, I trustworthy co-author, if used correctly. Good. Which is looking pretty good in retrospect. Yeah, I'm pretty pleased. Yeah. So, you know, let's even continue to streak. you personally are 2x more productive as a result of AI. What year would you say that?
Starting point is 00:47:11 Yeah, so productivity, I think, is not quite a one-dimensional quantity. Like, I'm definitely noticing that the style in which I do mathematics is changing quite a bit and the type of things I do. So, for example, my papers now have a lot more code, a lot more pictures because it's so easy to generate these things now. So some plot which have taken me hours to do now, I can do in minutes. But in the past, I just wouldn't have put the plot in my paper in the first place. I would just talk about it in words. So it's hard to measure what 2x means. So, yeah, on the one hand, I think the type of papers that I would write today,
Starting point is 00:47:51 if I had to do them without AI assistance, they would definitely take five times longer. Interesting. But I would not write my papers that way. 5X. So, yeah. But it's because these are sort of auxiliary type. I mean, you know, the, you know, so things like, like doing a much deeper literature to search, supplying a lot more numerics. I mean, they enrich the paper.
Starting point is 00:48:17 So, yeah, the core of what I do, like actually solving the most difficult part of a math problem, that hasn't changed too much. I used pen and paper for that. But, you know, there's not a, that's a little. lots of silly things. I use an AI agent now to reformat. Sometimes, oh, my parentheses are not quite the right size. You know, I just manually change in my hand, and I can get an AI agent to sort of do all that quite nicely now in the background. So, yeah, they really sped up lots of secondary tasks. They haven't yet sort of spit up the core thing that I do.
Starting point is 00:48:53 But it's allowed me to sort of add more things to my papers. Yeah. But by the same token, like if I were to write a paper I wrote in 2020 again and not add all these extra features but just have something of the same sort of level of functionality, then that doesn't save that much, to be honest. Yeah, so it's made the papers sort of richer and broader, but not necessarily deeper. You made this distinction between artificial cleverness and artificial intelligence. And I would like to better understand those concepts. What is an example of intelligence that is not just cleverness? Yeah.
Starting point is 00:49:36 So intelligence is famously hard to define. It's one of these things that you can't know it when you see it. But when I talk to someone and we were trying to collaboratively solve a math problem together, there's this conversation where, you know, neither of us knows how to solve the problem initially. but one of us has some idea and it looks promising. And so then we have some sort of prototype strategy. And then we test it and then it doesn't work, but then we modify it.
Starting point is 00:50:10 And there's some adaptivity and continue improvement of the idea over time. And eventually, you know, we sort of, we've systematically mapped out what doesn't work, what does work, and we can kind of see a path forward. But it's evolving with our discussion. and this is not quite what the AI is. The AI can kind of mimic this a little bit. So to go back to this analogy of these jumping robots, you know, so they can jump and fail and jump and fail and jump and fail.
Starting point is 00:50:44 But what they can do is they kind of jump a little bit and they reach some handhold, but then they sort stay there and then they pull other people up and then they try to jump from there. There isn't this cumulative, process which is sort of built up interactively. It seems to be a lot more
Starting point is 00:51:03 trial and error and just repetition brute force which it scales and it can work amazingly well in certain contexts but
Starting point is 00:51:14 yeah this idea it's sort of building up cumulatively from from partial progress is kind of is what's still not quite there yet. Interesting.
Starting point is 00:51:23 You're just say if Gemini three or Cloud for 0.5, whatever, solves a problem. It is not the case that its own understanding of math as progressed. Or even if it works on a problem without solving it, it's not that its own understanding of math has progressed.
Starting point is 00:51:36 Yeah, you run a new session, it has forgotten what it just did. It has no new skills to attach to build on related problems. Maybe what you just did is part of 0.0% of the training data for the next generation. So maybe eventually some of it gets absorbed.
Starting point is 00:51:53 But yeah. So Terence talks about the importance of decomposing particularly gnarly problems into a series of easier chunks. Even if this doesn't result in the full solution, approaching problems in this way helps you build up the intuitions and practice the techniques that you'll need to keep making progress. But models today tend to struggle with these kinds of problem-solving techniques. That's where Labelbox comes in.
Starting point is 00:52:14 Labelbox helps you train models not just to get the right answer, but to think the right way. They've operationalized these reasoning behaviors into rubrics, giving you the ability to evaluate every important dimension of a model's output. These rubrics go beyond simple correctness. Did the model reach for the right tools? Did it check its own work and explore alternative paths? How clear was its response?
Starting point is 00:52:35 These skills are useful across domains. Math, physics, finance, psychology, and more. And they're becoming increasingly important as models take on harder open-ended problems, some of which have multiple solutions and some of which we don't even know the solutions too. Labelbox can get your rubrics tailored to your domain, helping you systematically measure and shape how your models think. learn more at labelbox.com slash dwarcash. One big question I have is how plausible is it that if we just keep training AI
Starting point is 00:53:05 to get better and better at solving problems and lean, that they will continue to solve more and more impressive problems. And then we will, in retrospect, be surprised at how little insight be got from some lean solution to proving the rebound hypothesis or something. Or do you think it is a necessary condition all solving the rebound hypothesis, even by an AI that is like totally doing it in lean, that the constructions which are made, the definitions which are created, even in the lean program, have to advance our understanding of mathematics, or do you think it could just be assembly
Starting point is 00:53:36 called guble to gook? Yeah, we don't know. I mean, some problems have been basically solved by pure brute force, a four-color theorem is a famous example. We have still not found a conceptually elegant proof of this theorem. It basically, and maybe we never will. I mean, some problems may only be solvable by just splitting into some enormous number of cases and doing a brute force, an insightful computer analysis on each case. I mean, part of the reason we prize problems like a real hypothesis is that we're pretty sure that something amazing has to, a new type of mathematics has to be created or a new connection between two previously unconnected areas of mathematics has to be discovered to make this work.
Starting point is 00:54:21 we don't even know what the shape of the solution is, but it doesn't feel like a problem that will be solved just by exhaustively checking cases or something. I mean, it could be false, actually. So we could actually, there is an unlikely scenario that the hypothesis is false and that's just this, you can use compute, oh, here's a zero off the line and a massive computer calculation verifies it.
Starting point is 00:54:44 That would be very disappointing. I don't know. I do feel that, you know, fully autonomous one-shot approaches are not the right approach for these problems. I mean, I think you'll get a lot more mileage out of the interplay between humans collaborating with these tools. And I can see one of these problems being solved by some smart humans assisted by some extremely powerful AI tools. But the exact dynamic may be very different from what we envisioned right now. I mean, it could be a collaboration of a type that we just doesn't exist yet. Yeah, I mean, there may be a way to generate, you know, a million variants of the human
Starting point is 00:55:35 data function and do some data analysis, AI assisted data analysis. And we we discover some pattern between connecting them, which we didn't know about before and this lets you transform the problem into a different area of mathematics. There could be all kinds of scenarios. Suppose the AI figures it out and latent in the lean is some brand new construction, which, you know, if you realize the significance, we would be able to apply it in all these different situations. How would we even recognize it, right? Like if you just, again, I have a very naive question, but you if you come up with the equivalent of like Descartes comes up with this idea, oh, you can have this coordinate system where you can unify algebra and geometry. But in lean code, it would just look like R to R and it would look that significant or something. or similarly, I'm sure there's other constructions which have this kind of property.
Starting point is 00:56:25 Well, the beauty of formalizing a proof in something like Lean is that you can take any piece of it and study it atomically. So, you know, so when I read a paper with my humans, which sold some difficult problem, you know, there's some big sequence of lemurs and theorems and things. And so ideally, the author will talk their way through, you know, what's important, what's not, but sometimes they don't reveal what, what, um, what. what steps were the important ones, and which ones are just kind of boilerplate standard steps.
Starting point is 00:56:56 But you can study each lemur in isolation, and some of them I can say, oh, this looks fairly standard, this resemble something I'm familiar with. I'm pretty sure there's nothing interesting going on here. But this lemma, oh, that's something I haven't seen before. And I could see why if you had this result, that would really help prove the main result.
Starting point is 00:57:13 Like, you can assess whether some things are really sort of key to your, to your argument or not. And Lean really facilitates that. You know, you can, you know, the individual steps are identified really precisely. I think in the future there'll be, you know,
Starting point is 00:57:31 there'll be entire professions of mathematicians who might take a giant lean generated proof and maybe, you know, do some ablation on it or something. I try to remove steps of parts of it and try to find more elegant ways, you know, maybe some other AIs to sort of do some reinforcement learning. How can you make the proof more elegant and maybe other AIs were great
Starting point is 00:57:52 whether this proof looks better or not. One thing that will change quite a bit in the near future is that until recently, writing papers was the most time-consuming and expensive part of the job. And so you did it very rarely. You only wrote up your results once everything was all the other parts of your argument were checked out and things because just rewriting it, again,
Starting point is 00:58:15 refactoring was just a total pain. But that's one thing that's become a lot easier now with modern AI tools. So you know you don't have to have just one version of your paper. You know you can, you know, you can generate hundreds more. So yeah, one giant messy lean proof may not be very meaningful or understand on its own, but other people can can refactor it and do all kinds of things with them. We have seen if with the Erdush problem website, you know, the people will will, an AA will generate a proof and then he was 3000 lines of code that that verify the proof. But then people got other AI
Starting point is 00:58:48 to summarize the proof and people write their own proofs. There's actually post-processing, once you actually have one proof, we actually have a lot of tools now to deconstruct it and interpret it. It's a very nascent area of science or mathematics, but I'm not as worried about, you know, so some people concern,
Starting point is 00:59:11 what if the real hypothesis is proven with a complete incomprehensible proof? I think once you have the artifact of a proof, we can do a lot of analysis. analysis on it. You posted recently that it would be helpful to have a formal or semi-formal language for mathematical strategies as opposed to just mathematical proofs, which is what Lean specializes in. I would love to learn more about what that would involve or look like. We don't really know. I mean, we've been very lucky in mathematics that we have worked out the laws of logic and mathematics, but this is actually a fairly recent accomplishment. I mean,
Starting point is 00:59:44 it was started by Euclid, you know, a millennia ago. But only in the early 20th century. Did we finally, this talk here are the axioms of mathematics, well, the standard axioms of OECO, CFC, and the axioms of first auto logic. And this is what a proof is. And this we've managed to automate and have formal language for.
Starting point is 01:00:05 But there could be some way to assess plausibility of certain, you know, so you have a conjecture that something is true. You test a few examples and it works out. Like how does this increase your confidence that the conjecture is true. We have a few sort of mathematical ways to model this, a Bayesian probability, for example, but they're not,
Starting point is 01:00:29 but you often have to, they often, you have to set certain base assumptions and, and it's, there's a lot of subjectivity still in, in, in, in these tasks. So it is, it's not clear,
Starting point is 01:00:42 I mean, it's, this is more of a wish than, than, than, than a plan to, to, develop these languages, but just seeing how successful having a formal framework in place,
Starting point is 01:00:54 like Lean, has made deductive proofs so much easier to automate and train AI on. If there was some similar framework, so the bottleneck for using AI to create strategies and make conjectures is we have to rely on human experts and the test of time to validate whether something's plausible or not. If there was some semi-futable, formal framework where this could be done semi-automatically in a way that isn't sort of easily hackable to, you know, of course, yeah, it's really important with these formal proofsistence that there are just no, there's no back doors or exploits that that you can do to somehow get your certified proof without actually proving it because reinforcement learning
Starting point is 01:01:44 is just so, so good at finding these, these, these back doors. But yeah, if a some framework that sort of mimics how scientists talk to each other in a semi-formal way, you know, using data and argument, but also, you know, constructing narratives and there's some subjective aspect of science that we don't know how to capture in a way that we can insert AI into them in any useful way. Interesting. So, yeah, this is a future. problem. I mean, there are research efforts to, you know, to try to create automated conjectures and maybe there are ways to benchmark these and get some way to simulate this. But this is,
Starting point is 01:02:33 it's all very, very new science. Can you help me get some intuition for, I have two step questions. One, it would be very helpful to have a tangible sense of, you know, It would be helpful to have a specific example of what something like this would look like that the way scientists communicate that we can't formalize yet. And two, it seems almost definitionally paradoxical to say building up some narrative or building up some natural language explanation and then also having something which you could have formalized. And I'm sure there's some intuition behind where that overlap is and I'd love to understand
Starting point is 01:03:17 that better. So an example of conjecture. So Gauss was interested in the prime numbers and he computed, he created one of the first mathematical data sets. He just computed the first 100,000 prime numbers or so, hoping to find patterns. And he did find a pattern, but maybe not the pattern he was expecting. He found a statistical pattern in the primes
Starting point is 01:03:40 that if you count how many primes there are up to 100, 1,000, 1 million and so forth, they get sparser and sparser. but the drop-off in the density was inversely proportional to the natural logarithm of the range of numbers. So he conjectured what we're now for the prime number theorem. The number primes up to x
Starting point is 01:04:01 is like x divided by the natural log of x. And he had no way to prove this. It was data-driven. So this was a conjecture. It was revolutionary for its time because it was maybe the first really important conjecture of math that was statistical in nature. So normally you talk about patterns,
Starting point is 01:04:23 like maybe the spacing between the primes that has a certain regularity or something, but this was really something which it didn't tell you exactly how many primes there were in any given range. It just gave you an approximate approximation that got better and better as you went further and further out. But it helped,
Starting point is 01:04:42 so it started the field of what we call analytic number theory. But it was the first in many conjectures like this, many of which got proved, which sort of started consolidating the idea that the prime numbers actually didn't really have a pattern, that they behaved like random sets of numbers with a certain density. I mean, they had some patterns, like they're almost all odd. And they're not actually random. They're what's called pseudo-random. I mean, there's no random number generation involved in creating the prime numbers. But over time, it became more and more productive to think of the primes as if they were just generated by some
Starting point is 01:05:21 god rolling dice all the time into creating this random set and this allowed us to make all these other predictions so there's still open conjecture in number three called the twin prime conjecture that there should be infinitely many pairs of primes that are twins this is two apart like 11 and 13 we can't prove that and there's actually good reasons why we can't prove it but but because of this statistical random model of the primes we are absolutely convinced this true we we We know that if the primes were sort of generated by flipping coins or something, just by random charts, just like infinite monkeys at a typewriter, we would see twin primes appear over and over again.
Starting point is 01:05:57 And we have over time developed this very accurate conceptual model of what the primes should behave like based on statistics and probability, but it's all mostly heuristic and non-rigorous, but extremely accurate. So the few times when we can prove things about the primes that has matched up with the predictions of the same. what we call the random model of the primes. So we have this conjectural concept framework
Starting point is 01:06:21 for understanding the primes that we, everyone believes in. And it's the same reason why we believe the real hypothesis is true, why we believe that cryptography based on the primes is basically is mathematically secure. It's all part of this belief. In fact, one reason why we care about the Riemann hypothesis
Starting point is 01:06:41 is that if the Rieman hypothesis failed, we knew it was false, it means it would be a serious blow to this model that this, it would mean there's a secret pattern to the primes that we were not aware of. And I think we would very rapidly abandon any cryptography based on the primes because if there was one pattern that we didn't know about, there's probably more. And these patterns can lead to exploits in crypto and yeah, it's going to be a big, big shock. So we really want to make sure that doesn't happen.
Starting point is 01:07:12 So yeah, it's, it's, um, so we've been, convinced of things like a movement of hypothesis and things over time, but some of it is experimental evidence, some is the few times we've been able to make theoretical results they've always aligned. You know, it is possible that the consensus is wrong, and we've all just missed something very basic. You know, there have been paradigm shifts in the past in scientific history. Yeah, but we don't really have a way of measuring this. I think partly because we don't have enough data on how math, when science develops.
Starting point is 01:07:46 We have one timeline of history and we have like, you know, 100 stories of turning points in history. If we had access to a million alien civilizations and each of the different development of history of science in different orders, then maybe we'd actually have a decent shot at an understanding of how do we measure
Starting point is 01:08:08 what is progress and what is a good strategy. And we could maybe start formalizing it and actually having a framework. Maybe if what we need to do is actually start creating lots of mini universes simulations of AI solving very basic problems in arithmetic or whatever, but coming over their own strategies for doing these things and having these little laboratories to test. I mean, there are people who investigate like what's the smallest neural network
Starting point is 01:08:38 that can do tend to gym application and things like that. I think we could actually learn a lot of just some, from evolving small AIs on simple problems. We could learn a lot. I was super excited when Mercury reached out about sponsoring the podcast because I've been banking with them for years. I think I opened my first account with them in 2023. Something I've come to appreciate over the last few years
Starting point is 01:08:58 is that Mercury is constantly updating things and adding new features. Take their newest feature, Insights. Insight summarizes your money in and out, showing you your biggest transactions and calling out anything that deserves extra attention. Like maybe you're a revenue from a particular partner has gone down, or you've got a big uncategorized purchase that needs to be investigated. It's a super low friction way for me to keep tabs on my business and make quick decisions.
Starting point is 01:09:19 For example, I tried to invest any cash that I don't need on hand to keep running the business. With insights, with just a couple of clicks, I was able to see exactly how much money I spent in each month of 2025. And that lets me know exactly how much cash I'll need for the next year or so of operations, and then I can go invest the rest. Mercury just keeps adding new features like this. Go to mercury.com to check it out. Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and Column N.A. Members FDIC.
Starting point is 01:09:47 You have to learn about new fields not only very rapidly, but deeply enough to contribute to the frontier. So in some sense, you're also one of the world's greatest autodidacts. What is your process of learning about a new subfield in math? What does that look like? Yeah. So I suddenly identified with kind of of the, yeah, we talked about depth and breath before. And it's not purely human AI distinction. I mean, humans also split, so I think it was Irving who split them into hedgehogs and foxes. And it's a hedgehog knows one thing very, very well. And a fox knows a little bit about everything. So I definitely, I didn't, you know, I think of myself as a fox. You know, I work of hedgehogs a lot. And sometimes I can be a hedgehog if need be.
Starting point is 01:10:39 I've always had a little bit of an obsessive streak. If there's something which I read about, which I feel like I should understand, I have the capability to understand this, but I don't understand why it works. There's some magic in it that, you know, so someone was able to use a type of mathematics I'm not familiar with and get,
Starting point is 01:10:56 which I would like to prove, and I can't do it by myself, but they could do it by their method. Then I wanted to find out what was their trick. It bugs me that they, someone else can do something, which I think I can do, but I can't. So I've always,
Starting point is 01:11:09 always had that kind of obsessive completionist type streak. I've had to weed myself off computer games because I started a game. I want to play it to completion. So, all the levels. So that's one way in which I learn new fields. I collaborate with a lot of people who have taught me other types of mathematics. I just make friends with a other mathematician who's working on another area of mathematics. And I find their problems interesting,
Starting point is 01:11:38 but they have to teach me some of the basic tricks and what's known, what's not known, and I learned a lot from that. I found that writing about what I've learned. I have a blog where I sometimes record things that I've learned. Because in the past, when I was younger, I would learn something and do it's a core trick and say, okay, I'm going to remember this.
Starting point is 01:12:02 And six months later, I've forgotten, I remember remembering it, but I can't reconstruct my argument. And the first few times it was so frustrating to have understood something and then lost it. I sort of resolved, I should always write down anything cool that I've learned. And this is part of how this blog came about.
Starting point is 01:12:22 I'll understand it take you to write a blog post. It's something I often do when I don't want to do other work. You know, like there's some referee report or something. There's something that it feels slightly unpleasant for me to do at the time. And so writing a blog, it feels creative and fun. like it's something that I do for myself. So maybe, depending on the topic, it could be a quick, you know, half an hour or several hours.
Starting point is 01:12:46 But I, it doesn't, because it's something that I do sort of voluntarily, it doesn't feel like it, it doesn't feel, it doesn't feel, it time flies when I write these things, as opposed to sort of doing something which I have to do for administrative reasons. But it's just that it's drudgery. Those are tasks by the AI is really helping with nowadays, actually. Is it, if civilization could, could from first principles decide how to use Terry Tao's time? You know, it's like a limited resource. What is the biggest diff between
Starting point is 01:13:18 if the veil of ignorance got to decide how to use Terry Tao's time versus what it does now? Okay. This podcast wouldn't be happening. Yeah, as much as I complain about certain tasks that I don't want to do, but I have to do, so as you get more senior in academia, you get more and more responsibilities.
Starting point is 01:13:37 I get some more committees and whatever. But I have also found that a lot of events that I kind of reluctantly went to because I was obliged to for one reason or another. Because there's outside my comfort zone, I often find interactions of people who I wouldn't normally talk to, like you, for instance. And I would learn interesting things and have interesting experiences. And I would have opportunities to then network with other people that I would never
Starting point is 01:14:05 have done to for. So I do believe a lot in sanctity I mean I do optimize my time
Starting point is 01:14:12 in when I so there's some portions of my day where I do schedule
Starting point is 01:14:18 very carefully but I have been willing to leave some portions just okay
Starting point is 01:14:24 I'm going to do something which is which is not my usual thing and maybe it'll be
Starting point is 01:14:28 a waste my time but maybe I will learn something and more often
Starting point is 01:14:33 than not it's I've I feel like I've gotten a positive experience which is not something I would have planned for. And so I believe a lot in servendipity. And maybe there's a danger actually that in the modern society, it's not just AI,
Starting point is 01:14:49 but we've become really good at optimizing everything. And maybe we are optimizing, we're not optimizing a lot of optimization. That, you know, with COVID, for example, we switched a lot to remote meetings. And so everything was scheduled now. And so we kept busy, at least in academia, you know, we met almost the same number of people that we met in person,
Starting point is 01:15:14 but everything had to be planned. You had to schedule things in advance. And what we lost out on was sort of the casual, like knocking on the hallway, just meeting someone while getting a coffee. And there's this, you know, serendipitous interactions that you may think are not optimal, but actually are really important. You know, when I was a grad student, I would go down to the library to look,
Starting point is 01:15:40 I had to look for a journal article. I had to physically go down the library, check out the journal, and read your article. And sometimes the next article, you can just browse through, and the next article is also interesting. Sometimes it wasn't,
Starting point is 01:15:52 but you could accidentally find interesting things, which is something which has basically been lost now because you can just type in, you know, if you want to access an article now, you just type it into a search engine or even an AI, and you can get instantly what you want, but you don't get sort of the accidental things
Starting point is 01:16:08 that you might have gotten if you've done it more inefficiently. So, yeah, there have been times when I spent a year once at the Institute for Advert Study, which is a great place to, you know, there's no distractions, you're there to just do research. And like the first few weeks you're there, like it's great. You're getting all these papers written up that you've been wanting to do for a long time.
Starting point is 01:16:34 you were thinking about problems for blocks, hours of a time. But I find if I'd stay there for more than several months, like I run out of inspiration somehow. Like I get bored, I actually serve the internet a lot more. You actually do need a certain level of distraction in your life. It somehow adds enough randomness and temperature, high temperature, if you need. So, yeah, I don't know the optimal way to schedule my life. it just seems to work.
Starting point is 01:17:05 I'm very curious when you expect AIs that can like actually do frontier math better than at least as good as well as the best human mathematician. I mean in some ways they're already doing frontier math that is super intelligent that humans
Starting point is 01:17:20 can't do, but it's a different frontier from what we're used to. I mean, you could argue that calculators were doing frontier math that humans could not accomplish but it wasn't number crunching. but but replacing Terry Tao
Starting point is 01:17:35 completely I mean what do you want me for I just go on all the podcasts after I'm not sure it might not be the right question to ask I think
Starting point is 01:17:57 within a decade a lot of things that mathematicians currently do where we spend a lot of the bulk of our time doing it and a lot of stuff we put in our papers today can be done by AI. But we will find that that actually wasn't the most important part of what we do. You know, 100 years ago,
Starting point is 01:18:19 a lot of mathematicians were just solving differential equations. People needed, physicists needed some exact solution to some system. And they were just, they hired a mathematician, laboriously go through the calculus and work out the solution to this fluid equation, whatever, a lot of what a 19th century mathematician would do, you could make a call to Mathematica or Wolfram Alpha or a computer algebra or now more easily and AI and it would just solve the problem in a few minutes.
Starting point is 01:18:50 But we moved on. We worked on different types of problems after that. You know, once computers came along, you know, computers used to be human. People used to laboriously create log tables and work out primes as Gauss did. And that has all been outsource to computers. But we moved on. In genetics, you know, to sequence at the genome of a single organism, that was an entire PhD of a geneticist.
Starting point is 01:19:19 So carefully, you know, separating all the chromosomes and whatever. And now you can just spend $1,000 and send it to a sequencer and get it down. But genetics is not dead as a subject. you move to a different scale. Maybe you study whole ecosystems rather than individuals. I take your point. But on the question of,
Starting point is 01:19:36 well, when is most mathematical progress and almost all mathematical progress happening by AI? So if you find out, oh, this year, a millennium price problem has been solved,
Starting point is 01:19:44 you would put, you know, at 95% odds that an AI did it autonomously. Surely there will be such a year. I guess. I mean, I do believe that that hybrid human plus AI
Starting point is 01:19:58 will dominate mathematics for a lot longer. It will depend. It will require some additional breakthroughs beyond what we already have. So it's going to be sarcastic. You know, I think, you know, AI is currently are very good at certain things, but really terrible at others.
Starting point is 01:20:15 And while you can sort of add more and more frameworks on top to kind of reduce the error rates and make them work with each other a bit more and so forth, I, it feels like we are, we don't have all the ingredients to really have a truly satisfactory sort of replacement for all intellectual tasks.
Starting point is 01:20:37 It is complementary currently. It's not a replacement. But maybe, I mean, because the current level AIs will accelerate science in so many ways, hopefully new discoveries, new breakthroughs will happen more quickly. I mean, it's possible that also
Starting point is 01:21:02 by somehow destroying serendipity, we actually inhibit certain types of progress. Anything is possible, really, at this point. I think the world is very, very unpredictable at this point in time. What is your advice to somebody who would consider a career in math or is early in a career in math,
Starting point is 01:21:21 especially in light of AI progress? How should they be thinking about the career differently, the if at all as a result of the A progress. Yeah, so we live in a time of change. It is, as I said, we live in a particularly unpredictable error. And I think in time, like things that we've taken for granted for centuries may not hold anymore. So, yeah, the way we do everything, not just mathematics will change. And, you know, so I think, which is, you know, I mean, in many ways, I would prefer the much more boring, quiet era where things are much the same as they were 10 years ago, 20 years ago.
Starting point is 01:22:05 But so I think one just has to embrace this, that this, there's going to be a lot of change. And that, you know, the things that you study, some of them may become obsolete or revolutionized. But some things will be retained. And so you somehow always have to keep an eye on, like there'll be a lot of opportunities for things that you wouldn't be able to do before. So, I mean, in math, you know, you previously had to basically go through years and years of education
Starting point is 01:22:41 in math PhD before you could contribute to the frontier of math research. But now it's quite possible at the high school level or whatever that you could get involved in math project and actually make a real contribution because of all these AI tools and lean in everything else. So there'll be a lot of non-traditional opportunities to
Starting point is 01:22:59 learn. So you need a very adaptable mindset. You know, there'll be one for pursuing things just for curiosity, for playing around, and I mean, you still need to get your credentials for,
Starting point is 01:23:13 I mean, I'll thank you for a while. It'll still be important to sort of still go through traditional education and learn math and science and stuff, the old-fashioned, for a while but you should also be open to very very different ways of doing science some of which don't exist yet yeah so it's it's a scary time but also very exciting yeah awesome that's a
Starting point is 01:23:41 great note to close on kairns thanks so much yeah pleasure

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.