Theories of Everything with Curt Jaimungal - The AI Math That Left Number Theorists Speechless

Episode Date: May 23, 2025

Head on over to https://cell.ver.so/TOE and use coupon code TOE at checkout to save 15% on your first order. Get ready to witness a turning point in mathematical history: in this episode, we dive int...o the AI breakthroughs that stunned number theorists worldwide. Join us as Professor Yang-Hue Hi discusses the murmuration conjecture, shows how DeepMind, OpenAI, and EpochAI are rewriting the rules of pure math, and reveals what happens when machines start making research-level discoveries faster than any human could. AI is taking us beyond proof straight into the future of discovery. As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe Join My New Substack (Personal Writings): https://curtjaimungal.substack.com Listen on Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e Timestamps: 00:00 Introduction to a New Paradigm 01:34 The Changing Landscape of Research 03:30 Categories of Machine Learning in Mathematics 06:53 Researchers: Birds vs. Hedgehogs 09:36 Personal Experiences with AI in Research 11:44 The Future Role of Academics 14:08 Presentation on the AI Mathematician 16:14 The Role of Intuition in Discovery 18:00 AI's Assistance in Vague Problem Solving 18:48 Newton and AI: A Historical Perspective 20:59 Literature Processing with AI 24:34 Acknowledging Modern Mathematicians 26:54 The Influence of Data on Mathematical Discovery 30:22 The Riemann Hypothesis and Its Implications 31:55 The BST Conjecture and Data Evolution 33:29 Collaborations and AI Limitations 36:04 The Future of Mathematics and AI 38:31 Image Processing and Mathematical Intuition 41:57 Visual Thinking in Mathematics 49:24 AI-Assisted Discovery in Mathematics 51:34 The Murmuration Conjecture and AI Interaction 57:05 Hierarchies of Difficulty 58:43 The Memoration Breakthrough 1:00:28 Understanding the BSD Conjecture 1:01:45 Diophantine Equations Explained 1:03:39 The Cubic Complexity 1:19:03 Neural Networks and Predictions 1:21:36 Breaking the Birch Test 1:24:44 The BSD Conjecture Clarified 1:26:21 The Role of AI in Discovery 1:30:29 The Memoration Phenomenon 1:32:59 PCA Analysis Insights 1:35:50 The Emergence of Memoration 1:38:35 Conjectures and AI's Role 1:41:29 Generalizing Biases in Mathematics 1:44:55 The Future of AI in Mathematics 1:49:28 The Brave New World of Discovery Links Mentioned: - Topology and Physics (book): https://amzn.to/3ZoneEn - Machine Learning in Pure Mathematics and Theoretical Physics (book): https://amzn.to/4k8SXC6 - The Calabi-Yau Landscape (book): https://amzn.to/43DO7H0 - Yang-Hui’s bio and published papers: https://www.researchgate.net/profile/Yang-Hui-He - A Triumvirate of AI-Driven Theoretical Discovery (paper): https://arxiv.org/abs/2405.19973 - Edward Frenkel explains the Geometric Langlands Correspondence on TOE: https://www.youtube.com/watch?v=RX1tZv_Nv4Y - Stone Duality (Wiki): https://en.wikipedia.org/wiki/Stone_duality - Summer of Math Exposition: https://some.3b1b.co/ - Machine Learning meets Number Theory: The Data Science of Birch–Swinnerton-Dyer (paper): https://arxiv.org/pdf/1911.02008 - The L-functions and modular forms database: https://www.lmfdb.org/ - Epoch AI FrontierMath: https://epoch.ai/frontiermath/the-benchmark - Mathematical Beauty (article): https://www.quantamagazine.org/mathematical-beauty-truth-and-proof-in-the-age-of-ai-20250430/ SUPPORT: - Become a YouTube Member (Early Access Videos): https://www.youtube.com/channel/UCdWIQh9DGG6uhJk8eyIFl1w/join - Support me on Patreon: https://patreon.com/curtjaimungal - Support me on Crypto: https://commerce.coinbase.com/checkout/de803625-87d3-4300-ab6d-85d4258834a9 - Support me on PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 SOCIALS: - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs #science Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 I want to take a moment to thank today's sponsor, Huel, specifically their black edition ready to drink. So if you're like me, you juggle interviews or researching or work or editing, whatever else life throws at you, then you've probably had days where you just forget to eat or you eat something quickly and then you regret it a couple hours later. That's where Huel has been extremely useful to myself. It's basically fuel. It's a full nutritionally complete meal in a single bottle. 35 grams of protein, 27 essential vitamins and minerals, and it's low in sugar. I found it especially helpful on recording days so I don't have to think about prepping for food or stepping away to cook. I can just grab something in between conversations and keep going. It's convenient, it's consistent, it doesn't throw off my rhythm.
Starting point is 00:00:54 You may know me, I go with the chocolate flavor. It's simple and it doesn't taste artificial. That's extremely important to me. I was skeptical at first but it's good enough that I keep coming back to it, especially after the gym. Hey, by the way, if it's good enough for Idris Elba, it's good enough for me. New customers visit huel.com slash theoriesofeverything today and use my code theoriesofeverything to get 15% off your first order plus a free gift.
Starting point is 00:01:23 That's Huel. Slash. Theories of everything all one word. Thanks again to Huel for supporting the show. Become a Huel again by visiting the link in the description. What the hell? This is bizarre that this button exists. It went from zero to almost 100% immediately. That was amazing. and exist. It went from zero to almost 100% immediately. That was amazing. I'm extremely excited. Today we're going to be talking about a new paradigm of science, AI research in particular for math and physics, as well as do a deep dive into Professor Yonghui He's murmuration conjecture. Professor, what am I leaving out?
Starting point is 00:02:00 I think, well thank you very much for having me again. I had so much fun talking to you last time Well, thank you very much for having me again. I had so much fun talking to you last time. And it's great fun. So I got so excited about this new field of AI assisted theoretical discovery, in pure mathematics and theoretical physics, that we're really entering a new era of discovery. And it's amazing. It's happening every week. You know, there's something new that's that could potentially be transformative in the way we do science Yes, you said it shocked the math world. Maybe even the physics world and the science world in general. Yeah, absolutely. So And I'm gonna talk about a little bit more details about this but but just private companies like DeepMind, the fact that they're getting Nobel prizes and all these various AI companies, OpenAI, EpochAI, they're actually doing fundamental research with their models that are beginning
Starting point is 00:03:03 to solve problems at real research and you know at the real research level. So this is quite exciting, quite exciting times. Why don't you talk about how the landscape of research has changed with the advent of these new AI models. Now there's a large class like there's LLMs but then there's also new machine learning techniques in general that you have uncovered and other people have uncovered.
Starting point is 00:03:26 So why don't you just talk about that? Yeah, absolutely. So, you know, as I said, I think I mentioned in our first long conversation, I think I got into this thing as a complete novice. You know, my background was in the interface between algebraic geometry and string theory. And in 2017, I was just beginning to learn from Coursera, just the very basics of machine learning. And so since then, I personally have evolved,
Starting point is 00:03:54 and I have seen the field evolve of how far we can go, because 2017 was pre-Chad GPT. So this was just really machine learning of data from pure mathematics. But now we're much more advanced than this. We're beginning to benchmark DeepMind's AlphaGeo2, DeepMind's AlphaProof. They're beginning to show these signs of reasoning. Whether they are reasoning is a purely philosophical question, which I have no right to answer.
Starting point is 00:04:28 But what I'm seeing is that they're certainly beginning to outperform your typical undergraduate. And now EPOC AI, which is another company, has launched this tier one, two, three problems, which are research level mathematics, the kind of problems you would give to graduates to give a collaborator or a colleague. And I can show you some of that later, some of the stuff. Yeah, I'd love that.
Starting point is 00:04:53 And they're entering their tier four phase. So then in fact, I'm flying to Berkeley on Thursday to have a meeting with a bunch of other mathematicians to benchmark how far that their tier four reasoning machine can go. So this is really rather, you know, it's happening. It's happening as we speak, how the landscape of research can be changed. Yeah, sorry.
Starting point is 00:05:17 Yeah. Yeah, I think that's in part why, firstly, it's an honor to speak with you again. Our last conversation went viral. I think that's this whole AI-assisted research in math and physics is part of the reason because our last conversation was quite specialized in nature. It just means the audience loves you, loves hearing from you. I also love speaking with you. You also have some books. In 2018, you had a book published called Topology and Physics, and I believe that's followed by your textbook
Starting point is 00:05:45 in 2020 on machine learning and pure math. Right. Yeah. So the 2018 one was when I finished and posted on archive which is a very kind thing that Springer lets me do. But then the final book was the first textbook on there. I wrote that book primarily to teach myself machine learning and AI because I was a complete novice in this and it was just I wanted to share this experience as a theoretician, as a mathematical physicist, how to share with this community, how to even begin with learning
Starting point is 00:06:18 about machine learning and AI and this advanced data techniques. But now I think the field has progressed way beyond that in the last eight years. We're all very, very impressed with how we as a community, we're all very impressed with how fast this thing is going. It's a great pleasure talking to you. As I said, I'm a big fan. The depth that you go into is quite different from the normal science communication because you really, I like the fact that you like to dig deep in your topics.
Starting point is 00:06:56 I guess even more advanced than quanta, which is really good. There's not that much out there that does this. That's why I appreciate what you're doing so much. Okay. Well, let's dig deep. What are some different ways that people use machine learning? So for instance, Terry Tao uses it as an assistant to proofs but also to generate conjectures and perhaps they even point to existing tools that he may not have heard of. And those are more of the LLM sort. Then there's another sort of just finding patterns in large data sets and that connects to your murmuration conjecture. Isn't there a third category?
Starting point is 00:07:26 Yeah that's right. So I was just trying to, I think I outlined this briefly last time, I mean just trying to, because this is exactly the kind of thought process that's going on, how to categorize the different approaches. And of course they're all interrelated and it's hard to delineate them. But, so this is my, you know, my top-down mathematics is this intuition-guided basis of mathematical research. And this LLM approach is what I call meta-mathematics, and this is kind of LLM-assisted co-ilots that Terry Tao is talking about. And then this third category of bottom up where not necessarily any AI is involved. This is I'm thinking about things like lean provers and proof as copilots where you just
Starting point is 00:08:20 have millions of lines of code. And sooner or later somebody is going to process that by AI. The interplay between these three directions and between that and the human is clearly beginning to change the landscape of mathematical research. Now before we get into your presentation here you mentioned on another talk on another either an article or a podcast I don't recall from my research, that there are two types of researchers. One is a bird and another is a hedgehog. And you're more of the type that likes to fly and connect. I'm also similar.
Starting point is 00:08:54 So I think that's one of the reasons we get along so well and the audience loves you. But tell us more about this. Yeah, I think they're very, you know, they're very as tribute to this quote. I think it originated from a Greek fable where the fox and the hedgehog are compared, where the fox knows a lot of things and the hedgehog likes to dig in. And I think the great mathematician Arnold made a reference to this in classifying mathematicians where he calls things like, you know, one is an eagle that flies and tries to see the landscape and then the Hedgehog digs deep to one particular problem, solves it. I mean, there's no particular, no, neither is superior to the other. It says we definitely need both and each mathematician can function as both. But certainly AI is
Starting point is 00:09:54 helping us in both personalities, simply because there's so much literature out there. The AI can have an overview of what everything is in terms of literature and also there's so much technical detail. You know, there's some very boring parts of the proof that you just simply don't have time to iron out and that can certainly be helped by LLM models and it is beginning to do that. I was actually just at a conference last week in Exeter. There was a conference called the Impact of AI for Mathematics, where the organizer, Madhu Das, she's a number theorist, and she said she's just recently completed this very complicated paper where she had a proof of lemma. And then what she did was she knew there's another lemma which has got to be true.
Starting point is 00:10:52 Okay. And so what she did was she copied and pasted the entire proof of her lemma, it's a bit technical stuff, into ChachiPT 01 and then said, now can then copy the lemma and said, can you supply the basic proof strategy of this lemma, which I know to be true. Now, to be fair, this is very far from automated reasoning. This is just language model. And then the importantly what she did was was and then she went line by line, symbol by symbol, the proof that Chachi Biti gave her for the Big Long Lemma. And it was largely
Starting point is 00:11:33 correct. And with a bit of prompting, she was actually able to nudge out a complete version of that proof. And then that was done. I mean, so it would have taken her much longer to have ironed out all the details herself. So this is really rather impressive. And this is only because of O1. And O1 came out, I think this year or end of last year. So it's really transforming the kind of stuff, the boring stuff you can delegate
Starting point is 00:11:59 and also the pattern recognition part you can also delegate. So we become like superhuman all of a sudden and when interact with these agents who can help us actually do research. Not just do very elementary problems now, but serious research level mathematics. Why don't you talk about some of your personal use cases? So do you use chat GPT more than Claude or do you use Gemini more than the others?
Starting point is 00:12:26 What is your mixture? I think in terms of research, to be honest, prior talking to her, I never even really played around with chat, this kind of LLMs to help with my research. I didn't know that was as possible. I know it's kind of thing you can, if you want to know very quickly a topic, you can go to Wiki, you can Google, but indeed
Starting point is 00:12:50 ChachiBT or DeepSeek will probably answer your question very quickly. If there's some theorem that you've just forgot or there's a field that you really don't want to know about it, DeepSeek will summarize it much better than, much more efficiently than if I went to some expert and wasted his or her time and just have that, you know, the process is much more efficient. If I just wanted a very quick overview of some specific, even a specialized topic. So that's been very, very helpful to me. So it's only been three years since ChatGPT came out, and already we're seeing this massive change in the landscape. Do you imagine that three years from now, or let's say 10 years from now, that the role of the future academic or intellectuals or mathematicians, if you want to specialize, will be that of a decider or director like a curator rather than a doer.
Starting point is 00:13:47 So the doer is the one that right now we use computation, we use syntax, we come up with some proof but the decider would be one that just okay it governs, it says this is what you should do. Compute already helped us a lot by the end of the 20th century. So, no professional mathematician really goes by the end of the 90s and early 2000s. No professional mathematician, for example, does boring integrals anymore during research because that's completely outsourced to say something like Wolfram Mathematica and then later SageMath because this is just boring and we know how to do it.
Starting point is 00:14:28 It will take as many hours if we really want to grind out some technical integral. Of course, I'm not saying that you shouldn't teach undergraduates integrals anymore because that's part of the learning process and it's still important to teach undergraduates this kind of thing. But no professional mathematician was started. You know, it's, they're, they're really boring and horrible things. Like even the simplest thing, you know, sine 17x, nobody really wants to go on it. It just type into Mathematica. Because if I just need that, I need that result very quickly so I can supplant to my next step of what I envision in my paper, I'm not going to waste next step of what I envision in my paper. I'm not going to waste a couple of hours trying to integrate something very elementary.
Starting point is 00:15:09 And I'll probably even get it wrong. There'll be factors wrong. So that already transformed it. So I can see 10 years from now, simple basic, maybe I'm being conservative here, simple bits of a proof or simple bits of a derivation can just be outsourced to the likes of ChachiPT or DeepSeq or something even more specialized. We don't currently have LLM just for mathematics. That's surely going to come very soon. I'm sure the Frontier Math project by Epoch AI will start providing this kind of services. Okay, well let's get into your presentation on the AI mathematician. Yeah, sure. Well, thank you. So I guess, you know, last time we talked about various things,
Starting point is 00:15:59 and I just want to share more in this chat some of the capabilities both in the tops of top down and bottom up of what we're looking at. And to be honest even since our conversation which was what five months ago the field has advanced significantly which is very very impressive. So just briefly to as I was saying last time, and as I also just mentioned, I tried to classify this in this review article. These three directions of mathematics, of course, they are intertwined and the memorandum of gestures that I did with my collaborators Lee, Oliver and
Starting point is 00:16:43 Posnikov is really a very good example of this top-down. I don't know explain why. This top-down mathematics is one that I want to emphasize here and then I will of course I will go back to just refresh people's mind a little bit about how people like Terence Tao and all these great and top minds are doing a proof of systems in terms of what I would call bottom up and mathematics. This is a very interesting point where I want to emphasize the typical mathematician, and that includes theoretical physicists. Historically, we do things top down by just looking at patterns and spotting patterns. We do many things in terms of practice before foundation. This is very important.
Starting point is 00:17:35 This is something that can't really be formalized because linguistically trying to formalize mathematics is an extremely important program. All of these benchmarking of problem solving using large language models when you have a precise, well-defined problem and trying to find a solution. But the history of scientific discovery is certainly not that. I would say probably more than 50% is actually finding the problem or have a vague notion of something before he can formalize it.
Starting point is 00:18:06 Can you give an example? So, for example, yeah, so here I'll give an example. So, for example, Newton invented calculus without any notion of what even convergence means. He just had this intuitive idea of motion and then he, because it was Newton, he intuited there is this thing called derivative. This is way before we could even have epsilon delta limits, which came in the 19th century, almost 300 years later. Algebraic geometry is something closer to my heart. Algebraic geometry was just started with
Starting point is 00:18:38 the Apollonius and Euclid with just shapes and stuff and we can intuitive the kind of theorems we want to prove before this Babaki school in the 1950s and 1960s you know the height of the Babaki school try to formalize that in terms of definitions of fields and rings and polynomial rings and ideals. Now this is just this is how theoretical discovery has always happened. This is how theoretical discovery has always happened. In some sense, the reason I want to emphasize this bit is not only just because this is one I'm most familiar with and the one that I suppose I've been mostly involved with, another reason I want to emphasize is that
Starting point is 00:19:20 it's hard to imagine how AI can help us with this because it's so vague and it's so human. There's a lot of mistakes and if you train some language model, there's not even any data to train on because these are not formal proofs. These are just grasps of ideas of intuition. The point I want to make that is even in this direction, AI is beginning to help us.
Starting point is 00:19:45 Okay, so let's imagine we're back in the 16th, sorry, the 17th century with Newton. And Newton was saying, okay, I want to come up with something like calculus. He didn't even have that term. He just had this notion of motion, like you said. What would Newton do with an LLM? Like, what is your vision? So, I don't usually stop mid-conversation about mathematics to talk about metabolism, but I've been using something that's made an appreciable difference. It's called Cell Being by Verso. Summer's coming up, and if you're like me, getting lean while juggling work, stress,
Starting point is 00:20:18 and everything else isn't exactly straightforward. I train, I eat well, but as I age, as we all age, fat loss gets harder. Cell being uses research-backed ingredients that help your body boost NAD levels. NAD, by the way, is crucial for metabolism, energy, and even DNA repair. However, there's a large added benefit here to cell being. This formula also helps regulate hunger hormones and supports fat breakdown. Basically, it tells your body to burn more and crave less. Since taking it, I've noticed I'm not as peckish, I feel more clear-headed, my energy is distinctly more stable throughout the day, and I'm discernibly quicker mentally. I haven't changed anything
Starting point is 00:21:00 else about my supplement regimen, I've just added this and it's helped me stay consistent without forcing it. Also Verso third party tests every batch and publishes the results, which matters to me personally because this way I know exactly what I'm getting. So if you're looking to dial in your energy, metabolism and shed a bit of that stubborn fat before summer, then check it out. Head to cell.ver.so.toe and use the code TOE to get 15% off your first order. That's cell.ver.so.toe. The code is TOE. Thanks to Verso for sponsoring this episode.
Starting point is 00:21:43 What would Newton do with an LLM? Like what is your vision? That's an interesting point. So what Newton would do with an LLM, if he had LLM, which was certainly to process all previous literature. Now to be fair, at Newton's time, somebody like Newton could read almost the entirety of any relevant literature up to his point. I'm thinking about everything from Euclid's Elements, Galileo, bits of Kepler, and he certainly wouldn't have that, and he would just go and read it, and that's fine. But now literature has grown so exponentially, there are no more Newton's, human Newton's that could possibly read
Starting point is 00:22:29 the entire literature of a field and that's why LLMs could come in to help. So this is in the LLM space of discovery. You can summarize literature and you can try to create new possible links between literature. And this is happening now, I think. I think there are, I think LAMA, LAMA, which is LLM for math, like LAMA, double L, LAMA, LAMA is something, it's an AI tool that's beginning to digest the archive, for example. And on the other hand, that's the LLM side of the story. And now what about the other half is how could Newton based on mathematical patterns and he did have a lot of patterns. It would be things like this
Starting point is 00:23:18 will be mathematical data. So certainly they had data in terms of theoretical and experimental physics, where you could measure the rolling of the kind of stuff that Galileo did, the rolling of balls along inclined planes, that kind of data. Or he had the astronomical data of Kepler. But he also would have had mathematical data or platonic data. I like this word platonic data because it's pure. The kind of data that would be like sets of polynomial equations in two variables, which he actually tried to classify himself. How many cubics, because he knew about the conic classification problem.
Starting point is 00:24:00 And he would look at these things and then he would spot patterns. And this kind of stuff also gave rise, I would imagine, I can't imagine, you know, the mind of Newton, but I would imagine he would look at vast amounts of such data and then try to formulate a theory out of it. Ah, okay. And that's kind of really nice. So that's actually, that's an LLM independent thing. This is just pattern spotted. So like Newton, there are these things called Newton polynomials, which express,
Starting point is 00:24:32 it's a technical thing, which will be Newton, certain symmetric polynomials in multivariables being expressible in some basis. I would imagine Newton would have written pages and pages of this stuff and spotted a pattern and then tried to prove a general theorem, which is now the theory of Newton polynomials. I thought where you were going was this amorphous ideation with an LLM. So for instance, Newton would say, okay, I have balls coming down an incline.
Starting point is 00:25:06 I don't have a precise formula for velocity. I don't even know if velocity is a great concept, but I noticed that it moves faster, but it can also so that may be associated with something I would call acceleration. But then there's something else like an impact which I may call force. Can you help me with this? Is there a way to make this concrete like to take something that's ill-defined and make it well-defined? I thought that's where you're going. Do you think LLMs help with that or is that not what you were thinking about? No, no, I think that definitely helps a bit. We're not quite there yet, I think, where we could just start asking LLMs and say here's the literature, digest it all and give me new possible links. But we are
Starting point is 00:25:47 actually not insanely far away from that goal now because the LLMs are getting so good at doing this sort of thing. So it's actually not impossible. In some sense Newton would yeah the mind of Newton is something like that. I guess, let me see my next slide. I think I do mention, yeah, yeah, I do mention that the another mind as great as Newton and Gauss and I would do, I will give that example in a minute. All right, let's get back to the presentation. Oh, so I think I don't, I can't remember whether, speaking of Newton, I can't remember whether
Starting point is 00:26:24 I talked about this last time. If not, this is a joke worth repeating again, which is what is the best neural network of the 18th century? You could argue the 17th century was Newton and the best neural network of the 18th to 19th century is clearly the brain of Gauss. And that's another, here's a very, very good example as well of just top-down intuition-guided mathematics. And I might have mentioned this last time, but it's worth repeating. So what is the thought process of this great discovery here? So everybody knew about the primes.
Starting point is 00:27:01 You know, Euclid already proved that there's an infinite number of primes and the proof is very, very beautiful and it's kind of intricate. It's the first thing you would teach in a number theory course. It's not obvious at all there's an infinite number of primes, but Euclid had this proof by contradiction argument why there should be an infinite number of them. Gauss certainly would have known that proof, but Gauss wanted to know more, and it's been 2000 years since Euclid, Gauss wanted to know about more details about the distribution of primes. We know the primes get rarer and rarer even though it's an infinite number of them. They do get rarer and rarer. How
Starting point is 00:27:39 rarer do they get? And he just devised this function which we now call the prime counting function p of x. It's a terrible name because p is pi but it got stuck in the literature. And p of x is simply the number of prime numbers less than a given positive real number x. So that was one of his insights was to devise this function which is now continuous because primes are inherently discrete. And this is very interesting. He plotted this and he looked at this curve. He invented regression, apparently, in order to do a curve fitting because he needed it. This is all done at the age of 16. He invented regression to see what is the best shape that fits this. And all of this done by hand, and he
Starting point is 00:28:28 even had to compute the primes into the hundreds of thousands because the tables stopped there. And he even got some of them wrong because it's a very boring and tedious computation. Imagine what Gauss could have done. One, if he had math sage or Mathematica, how much more conjectures would he raise? Well, the problem with that is that if he has access to the modern tools, he also has access to TikTok and it's not clear if Gauss would be... Let's hope that because Gauss is Gauss, he wouldn't waste his time becoming like, I don't know, watching YouTube videos.
Starting point is 00:29:04 And watch more, maybe he would, maybe he'll watch meaningful YouTube videos like other conversations like The kind of conversation you put on theories of everything Okay, great but anyway But he would do this thing and and he looked at his P of X and he says it's clearly X of a natural log of X You won't be able to see this just by looking. So he really actually had to invent regression to do this. And so statistical statistics was a side product of this problem. As far as I know, I could, but this needs to be checked with the, you know, real historians of
Starting point is 00:29:35 science, but at least that's the, that's the story here. And this is just crazy. Like, how do you even, what do you even do? And this, the proof of this fact was given 50 years later by Adamard and Delavay Poussin because you had to wait for Cauchy and Riemann to invent complex analysis in order to give the tool to prove this fact. How did Gauss know this? Based on just, at the time time was surely because large data right he really went into into the tens and hundred thousand range in order to spot this kind of patterns and it's just amazing and I can plot this by because
Starting point is 00:30:17 if you just plot the first hundred or so he looks kind of like a like a log or kind of like a line or something but you but you really need to go into the thousands or tenth of thousands range in order to do something like this. So amazing. That's exactly the kind of top-down guiding intuition. There was not even that foundation to prove something like this until years later. Riemann as well, I mean a great example. Riemann hypothesis, which is arguably the most famous open problem in all of human intellect and is certainly the one that we all bow down and worship. The Riemann hypothesis has so many implications in mathematics. It's one of the Millennium Prize problems.
Starting point is 00:30:59 The Riemann hypothesis is so important precisely because there are now probably tens of thousands of mathematics papers whose first opening line is, let us assume that the Riemann hypothesis is correct and the rest of the paper. So it has so many implications. So that's the kind of conjectures that are great, that it has implications to so many other possible results. The interesting thing about the Riemann hypothesis is that it appeared as a footnote in Riemann's paper.
Starting point is 00:31:33 Riemann was just doing the Riemann zeta function precisely to address a similar problem to this, to the precise distribution of primes. And he wrote in a footnote that I checked the first couple of zeros and they all have real part a half. I believe that this is true, but I don't really need this result right now. Really, you can see that amazing footnote. And then we just said, I can't think about this right now, but I think this is kind of interesting. And then that was the the beginning the birth of that.
Starting point is 00:32:07 Now how did Riemann intuit something like this? Well these number theorists and their margins. Exactly, number theorists love margins. Exactly, now yeah that's what I never thought of it. That's a good point. That's a good point. But that's an actually extremely good point you mentioned. They're written into margins because they haven't been formally approved. If it's structured, it would be bottom-up mathematics and it would be in the main text. And often, this marginalia are just afterthoughts or just sparks of genius of these people who just relegate this thing to a side comment. And that kind of intuition leads to centuries of research. So that's a very good point, Iris, about the difference between margins and formal text.
Starting point is 00:33:06 Because papers are written, whether it's pure mathematics or theoretical physics, papers are written in a very structured backwards kind of way, quite different from the way they're reached in this intuitive kind of way. Yeah, so that's a good point. Yeah. And then this doesn't stop. And this is something that the memorization stuff will get more into, which is this BST conjecture, which is another Millennium price problem. This is another one that carries from $1 million tag. And how did this come about? And I will talk more about what the BST conjecture is. This is Birch, Brian Birch, who is still alive. He's I think 90 something.
Starting point is 00:33:47 Mathematicians are very long lived because they're happy. The Birch's went and died in the 60s. They're in the basement in Cambridge and they just plotted loads and loads of data for ranks of conductors of the elliptic curve. Now lots by 1960 standards would be in the order of hundreds to thousands. The LMFDB, which I'm going to talk about in a minute, is a database of 3.6 million curves. So we've progressed quite a bit. A 3.6 million is a kind of data scale where you could really train things on. And that's where I could really come in. And this is just another stop. They plotted this and they raised this conjecture. They noticed a certain pattern
Starting point is 00:34:34 between r and ranks, these technical terms, which I'm going to define in a minute. And that was the birth of yet another great and foundational problem. And this is regarded as central piece of mathematics as well. Okay. And these are all intuited, if you wish. Right. So just one quick slide. I got into this because of algebra geometry. And so I think I mentioned it in the last talk. Just trying to see how machine learning can help us spot patterns if you wish.
Starting point is 00:35:09 But I think since 2017, I grew a lot alongside with my son. We both grew. He's growing very fast and I'm growing intellectually just to digest this field. And it's a humbling experience just to see this vast interaction of so many different people and experts. And so again, I'd like to thank all of my collaborators. Now, I can't possibly read out all the names, but that's where the QR code comes.
Starting point is 00:35:39 Scan this QR code. There's a long, this will point you to Google Doc, where I will try as much as possible to keep up to date all of the names and affiliations and the papers with my co-authors, so you can think, thank them properly. At some point, now this is an interesting part, I wanted to chat GPT to give the, generate the list of these people and search the internet
Starting point is 00:36:02 and find a picture of each of them and give them affiliation so to save me the time of typing them out so we're talking about with a hundred people yeah we can talk off air about that I can call that up for you extremely easily okay but but what is really interesting is that chat GPT did a terrible job it found random affiliations of people who didn't exist because you know they're doing LLM. So it's confusing my collaborators. So I couldn't possibly credit Chachi BT for this. And so this is a very early thing. And also they just couldn't. Chachi BT could not produce
Starting point is 00:36:39 the correct photos of any of these people. So we are limited. So as excited as I am, I must point out limitations to all of that. I tried DeepSeek, Claude, I tried them all. None of them could even answer this problem, which should be a simply, a simple problem. There are ways of using the agent or the LLM to interrogate itself
Starting point is 00:37:00 so that it can double check. So we can talk about that off air. Oh, wow. That's that. If you can help me with that, can talk about that off air. Oh, wow. That's that though. If you can help me with that, it'll be great because this is something, this is obviously something I can help us because this is boring and it just needs to be done
Starting point is 00:37:13 as even part of scientific discovery. It took me an hour to find that LLMs were useless in this, but that hour I could have tried to do something more meaningful. But you would take hours. If I would just do it properly and include copy and paste pictures, it would take many hours. And you know that's why we're so, this is just to tell you we're not quite there yet even with a simple task like this. So surprisingly it can help us with mathematical discovery. But you
Starting point is 00:37:43 know, you know all of this will change very quickly. That's the book I think you mentioned earlier, which is this. This is book of my learning experience, trying to learn about machine learning. This finally came out, I think, in an archive version in 2018. I think it appeared in 2020. This is the landscape.
Starting point is 00:38:05 This is from everything from machine learning. And then this editorial in 2020. So now let's get back to the real meat of the subject, which is I believe this is still part of this review I was trying to say. I tried to emphasize that bottom-up mathematics is a natural language processing because this is however you want to define it. Metamathematics is LLM and top-down mathematics is intuition-guided thing that end of the
Starting point is 00:38:40 day is image processing. I like this image processing idea which is you know any mathematics any mathematical data platonic data if you wish at some level is an image you can pixelate it. I like this image analogy because the great David Mumford who is also a Fields Medalist even back in the 90s after he got the Fields Medal he stopped everything after he got the Fields Medal, he stopped everything. He got the Fields Medal for doing topology in K-Theory, if I remember, algebraic topology. And then he switched fields completely, dropped mathematics altogether, and started working in computer vision.
Starting point is 00:39:16 Now that I've read more of his recollections, he blogs as well. He's a very excellent blogger like Terry Tao. So David Monfort, he said the reason he got into this computer vision thing was I think he was really having early visions of how AI can help with research because he was trying to imagine the human mind being an image processing machine. What does a mathematician actually see? The mathematician is beginning to have mental images of formulae. There is this transformation process from what you see as abstraction, as mathematics, into a mentally constructed image. That's why he
Starting point is 00:39:56 was so interested in vision. And that image in the mind is somehow, well, in our, I guess in today's language, this will be the latent representation of your data. You know Hadamard, how he had a book on how mathematicians think. Oh, I heard of that. I have not read that. That's kind of interesting. Is that the same Hadamard as the Hadamard de la Valle, de la Valle, de Hadamard? Oh, interesting.
Starting point is 00:40:23 Oh, cool guy. Yeah. And, and I'm wondering if he did a historical analysis for Euler, because Euler was blind for half his life or something like that, some large portion of his life. So I'm wondering if Euler still use mental imagery to formulate or solve his problems or then abstract to something else. Absolutely. Yeah, it's quite imagined. I had a student once in Oxford a few years ago now. She was quite remarkable because she's completely blind.
Starting point is 00:40:54 I think she was blind from an early age. So she sat through my lessons without being able to see anything. I just had to, she had to picture what I was saying and then digest it all in her head and do all of the mental calculations in her head. Interesting. So I was wondering what was she actually doing? She did fairly well in her final exams with being completely, she needed somebody obviously to translate whatever she's had to dictate to someone.
Starting point is 00:41:26 Yeah, so it was quite remarkable that I got to know this student. But anyhow, this is a... But my image processing is this kind of thing is now that I read Mumford, I'm beginning to think why I was beginning to think that all of mathematics, there's all of top-down, all pattern recognition is an image processing problem. Uh-huh. Breaking news, a brand new game is now live at Bet365.
Starting point is 00:41:56 Introducing PrizeMatcher, a daily game that's never ordinary. All you have to do is match as many tiles as you can, and the more you match, the better. We also have top table games like our incredible super spin roulette, blackjack, and a huge selection of slots. So there you have it. How can you match that? Check out PrizeMatcher and see why it's never ordinary at bet 365. Must be 19 or older Ontario only. Please play responsibly. If you or someone you know has concerns about gambling visit connexontario.ca, T's and Z's apply. Yeah this I guess there is this old debate
Starting point is 00:42:27 and this involved all the grades like Atiyah and Dygraph and Hitchin and Witten, all these people. Is physics or is theoretical physics or is mathematics inherently, is nature algebraic or is it geometric? Interesting. So this top down mathematics, this is the debate. Newton was clearly geometrical.
Starting point is 00:42:50 He had such a distaste and disdain for algebra, because it's meaningless symbols to him. He made some comment about algebra is this, I can't remember the original quote, but he would say something, it's a very disgusting disgusting thing that you had to resolve to this meaningless symbols. He was very, very pictorial. His proof of what we now call the Gauss theorem about integration over spheres, this is just about a gravitational body exerting a force on an external object. Gauss would just surround this by and then use
Starting point is 00:43:28 this Gauss's law. But Newton actually integrated piece by piece and used all his intricate pieces together in a diagram and got the same answer. It's the kind of horror show you would never do because Gauss's law, Gauss's theorem is just one line. You just do this integral. But Newton actually had to piece together. So Newton was definitely visual. Roger Penrose is definitely visual. Penrose, my last conversation with him, he said, he almost said, I think, just in case I'm putting words in his mouth.
Starting point is 00:43:58 But I believe he says something like, if it's not intuitive and if it's not geometrical, he doesn't even accept that as a proof. I think Conway was like that as well. This is one of the reasons why Conway never really accepted Richard Borchardt's proof of the moonshine conjectures. Because Borchardt used this very strange vertex operated algebra. I think you had a conversation with Ed Frankel about this. And Borchardt's actually.
Starting point is 00:44:26 And Borchardt's, yeah, exactly. But Borchardt, he borrowed this piece of completely crazy stuff, vertex-operated algebras, and had this beautiful structure. I mean, it's obviously awesome and brilliant. He got him the Fields Medal, and he was able to use that to prove the moonshine conjecture from Kai. And Conway to my knowledge, who was the one who told me this? It was oh gosh the previous director of the IS. Oh Robert Dijkgraaf? The one before. Oh, Robert Dijkgraaf? The one before the...
Starting point is 00:45:04 Oh, Goddard! Peter Goddard! Peter Goddard knew Conway very well and he was telling me that Conway never really deep down accepted his proof of butchers because he's not visual. Conway is this very playful guy, as you can imagine. He wanted everything to be pitory. He wanted to see his lattices. So, anyhow, so in a way, this diagram puts geometry in this direction and puts algebra in this direction. Well, how is Conway visualizing the monster group? Maybe in terms of the leech lattice. He had this picture of the leech lattice. To him, the monster group is some extension of the automorphism of the leech slats.
Starting point is 00:45:49 Which I guess in a way, that's how he and Reece and Reeba originally came up with Monster. It wasn't by very hardcore, this whole funny business of classifying simple groups that he really intuited it in a way. He got this group out by doing norm two lattices and he was able to see the symmetry, the group of symmetries of this lattice and that's a remarkable thing. Yeah, it's just unfortunate that whole generation of people are, that generation of this lattice, early computer algebra, finite group people are slowly dying out. Conway is dead, Norton is dead, and my own dear friend John McKay, started Moonshine passed away. I wrote this obituary because I was his last close collaborator. He actually became, he
Starting point is 00:46:50 became a grandfather figure to me. He became, you know, he saw my kids grow up. Interesting. And McKay would tell me these stories about how he was interacting with Conway and how all of that people, how portraits. And McKay is also, here's another crazy, that's another whole conversation about what is this bizarre intuition that Mekhi had. If there's anybody in the later part of the 20th century who had a almost Ramanujan-like intuition, it would be John Mekhi. In a way, he's an unsung hero because he would just look at lists of numbers or look at pictures and graphs and see, ah, but this field is related to this. And that is very
Starting point is 00:47:34 much like this. He's AI before AI. Even physicists? I think that's a whole different conversation. We're going to have to do part three and is okay I get very emotional when I talk about John because it's like, you know, he's a is a very much like a like a father figure To me. Well the next time I'm in Oxford, we should have a part three and we have a part three conversation Just talking about moonshine conjectures In the in the from the perspective became I know you you certainly oh, yeah And he pronounces names Mackay not okay, and he pronounces his name as Mackay, not Mackay, even though it's written down as Mackay. He insisted on being called John Mackay.
Starting point is 00:48:10 I know you chatted with Franco and you chatted with Borchardt on Moonshine and that stuff. But it's unfortunate that he passed away before you started all this thing. He passed in 2022 because he had a very interesting But it's unfortunate that he passed away before you started all this thing. He passed in 2022 because he had a very interesting knowledge of that world of moonshine and stuff. But anyhow. Have you heard of stone duality?
Starting point is 00:48:36 Stone? Yeah, stone duality or a stone type duality. No, not at all. What is that? So, it's a duality between topology and then Boolean algebras, which some people see as an analogy or an equivalence between geometry and then syntax or something more algebraic. Oh, interesting. Oh, I'll have to look into that. Thanks for pointing that out. I didn't know. Is this like the stone of the Stone-Weierstrass theorem? I believe so. You got them all correct. Yeah. I don't know. Okay. I just didn't know there was this stone. This is called the stone of the stone virus to us. There are still I believe so. I hate you got them all correct. Yeah, I don't know.
Starting point is 00:49:05 OK, I just didn't know there was this. This is called the stone correspondence. Oh, yeah. Oh, it's don't doality. Yeah, duality, duality. Oh, I love I would love to see. Sorry. I just bumped in my.
Starting point is 00:49:15 I love to see that. Oh, interesting. Very interesting. Anyhow, so so. Back to back to our current story. So in 2022, Chachi BT actually, as you know, part into this conversation, Chachi BT passed the Turing test, which again, I'm very surprised he was not on every single newspaper headline.
Starting point is 00:49:35 I don't know why this wasn't emphasized, you know, we can't really... The Turing test was a big thing. I think the fact that Chachi BT passed the Turing test just simply showed that you don't need reasoning or understanding to have intelligent conversation. Maybe it says a lot about humans. It says more. Chachi BT passing the Turing test says more about humans than it says about AI thinking. We give too much credit, too much credit to what meaningful conversations are. Sort of as a response to that, we organized a conference in Cambridge with loads of people, and you can probably recognize some of the names, Buzzard and Birch. We tried to formulate something that's more stringent than a Turing test for AI guided discovery. I reported this with my friend Bertz as a nature correspondence.
Starting point is 00:50:38 I can't remember whether I talked about this in our... Did I talk about the birch test? Yes, the birch test plus plus or the Turing test plus plus was the birch test yeah is the birch test yeah last time so we'll put a link on screen for the last conversation in case you're just tuning in and you're wondering it was a wonderful conversation and I believe we talked about let's see the birch test bottom up top down metamathematics and even classifications of CY manifolds and then this database construction. Right. Okay, so that can pass with the B. So this is AI plus N for the Birch test. So let me just, I guess I'm very good at digressing. Sometimes I digress so much that
Starting point is 00:51:17 I don't even remember what I'm digressing on anymore. But the point is this is clear, clear signs of ADHD. But I've never had it. As you mentioned in speaking about these shower thoughts or the margins, the digressions are sometimes more meaty than the meat. Often, yeah, often, yeah. But back to this AI guided discovery,
Starting point is 00:51:37 which in terms of AI assisted top-down, intuition guided discovery in mathematics, there have been various candidates in the past eight years or so. Some of the ones, of course, everybody talks about this beautiful paper in this DeepMind collaboration by Alex Davis. Alex comes here a lot because DeepMind is in St. Pancras, which is a 30-minute walk from this institute, which is kind of very nice. We have a nice hub in London for this sort of thing.
Starting point is 00:52:12 Google DeepMind isn't in California? There's a London office. Oh, okay. So there's a branch at least. I guess, yeah, there must be. So Alex is actually in London. So that's the Davis et al paper. And DeepMind is...
Starting point is 00:52:33 Cool. So he comes very regularly. Of course, because he works for DeepMind, he can't tell us exactly what he's working on, nor can he tell us what the next project will be. But at least he can summarize what's going on know what what's going on in that in the in the tech world which is kind of kind of interesting the fact that Nobel prizes are being given to non-university organizations which is right very very nice which which is what this organization is at some point oh that's another whole conversation again so I think I asked for another whole conversation again. So I think I mentioned to you last time, these are the rooms where Faraday lived. Okay, so
Starting point is 00:53:11 these are, we're very lucky, the London Institute, we're at the second floor of the Royal Institution, where the likes of Humphry Davy and Thomas Young and Michael Faraday lived. So I'm very fortunate to be in this space to work and try to get this in. But one of our themes, the reason I mention this, one of the themes, one of our four research themes is AI for theoretical discovery of this institute. And it's kind of, and we're independent of the universities so that we could devote our time fully to research. That's kind of it. So how does this lead to the murmuration conjecture?
Starting point is 00:53:54 Yeah, I'm supposed, I promised to tell you about the murmuration. Yeah, so these early, this Clabielle manifolds, which we spent so much time talking about last time, this is because it was my bread and butter as I was growing up as a grad student so that was clearly the first thing I'm gonna apply machine learning to. The kind of experiments of producing neural networks to predict topological invariants of these
Starting point is 00:54:16 varieties in the image processing kind of way immediately fails the Birch test out straight because it's not interpretable. Sure now it's been improved to to 99.999% or whatever it is, but it's useless to a scientist. It just simply says that, oh yes, there is an underlying pattern, but how do you actually extract anything meaningful from that pattern? That's the main question. So the closest so far, and when I say so far, So the closest so far and when I say so far it really It's this could change in a couple of months. Oh, you have no idea because the field is growing so fast the closest so far in the last
Starting point is 00:54:59 Gosh is almost a decade. I guess my son is is eight now You know in the last decade of all this AI discovery There there's been hundreds of papers now on various things, let's on how do I use machine learning to do this in number theory, in theoretical physics, in quantum field theory, there are literally hundreds of papers. Now, the one that really made Buzzard and Birch happy is this memorization conjecture. The discovery process of this
Starting point is 00:55:27 is something that I would like to see, at least that this is sort of the state of the art of human machine interaction. That's why it's so close to my heart. And this is joint work with Kiu Huan Li, Thomas Oliver, and Alexey Potsniakov. And now with a paper to appear with Andrew Sutherland, who is the guy who set up this LMFDB. So I think I mentioned last bits of machine learning experiments in number theory, you know, providing, can AI predict primes?
Starting point is 00:56:00 No, we're certainly not at that stage. I'm not saying you can't. If even AI can detect a pattern of primes by itself, then we will be at the next level in not only proving the Riemann hypothesis, we would also crack every single code in every single bank in the world, because that's all the cryptography is dependent on this. So wait, what's the main impediment for AI to not predict primes? There's a large data set there. Yeah. So actually, that's a good question.
Starting point is 00:56:34 The short answer is I don't know. I've certainly fed in millions of primes into whatever representation into a neural network of whatever architecture I just simply asked it to predict the next one. It does terribly on this. I think now I think somebody's even written a paper on this called why prime prediction is so hard for neural networks. I can't remember the precise networks. I think at some level it's at some level it has this again goes back to the Riemann hypothesis. That's why Riemann hypothesis the Riemann exact pattern in the distribution of the zeros of the Riemann zeta function in the critical strip will give you precise
Starting point is 00:57:28 patterns in the distribution primes. And people have proven statistical statements about the distribution of the zeros, that they're truly stochastic up to some level. I'm not an analytic number theorist, but you can... Basically, there is so much noise or truly stochastic randomness in the distribution of the zeros of the zeta function that it's very difficult if you try to train. In other words, training a neural network of the zero of the zeta function is like training it with noise. Interesting. Something fundamental, but it could just be that this representation we're using for the zeta function is not very good.
Starting point is 00:58:16 We should dig deeper. But you need almost another, of course, if you find the right representation that would give a very good pattern spotter, that representation is the new mathematics we're looking for. So speaking of classifications, as we're both fans of classifications, is there a way to map this problem of doing an image processing for prediction of primes slash the Riemann zeta function image processing for prediction of primes slash the Riemann zeta function zeros. Is it mappable to P versus NP or is it a new class like, okay, problems that can be solved with image recognition or image prediction versus problems that can't in math? This is a deep question and at some point I was talking to model theorists and especially
Starting point is 00:59:04 Boris Zilber who is a leading figure in model theory, because in model theory tries to classify mathematical problems in terms of hierarchies of difficulty. And this is not a P versus NP kind of difficulty, but a difficulty in the very underlying structures. The question is like, why is it that a polynomial over the integers is so much harder than to think about a polynomial defined over the complex numbers? Even though the complex numbers is in some sense a completion of the integers, but why is it so much harder to look for models over integers? And at some point we were thinking that maybe the problems that
Starting point is 00:59:47 we're going to encounter that the neural networks will struggle with are ones that will go higher in this hierarchy of difficulty. But we haven't thought much more about this, but it will be very interesting to correlate this. But this is not computational complexity. There should be a new definition of a complexity in terms of how difficult a problem is, but it's still, it's hard to say what this, how to define this. Hmm, actually, just as an aside, an aside on to this. An aside on, on the side, yeah.
Starting point is 01:00:16 So there's a contest called the Summer of Math Exposition by 3Blue1Brown, and it's about getting people to make animations and lessons for math, different math topics. We're doing one on this podcast theories of everything, but for physics and also complexity and physics, AI and complexity. And this is a teaser of an announcement. It's not the full announcement, but those who are listening, it's going to be announced shortly. And there'll be prize money for those who have the best explanation the top five gets you'll see. Oh I love I love to see that amazing amazing stuff. Oh sorry back to back to BSD so now finally I have to so this is the memorization again there's said this was this was considered by it
Starting point is 01:00:59 got Quanta interested and Quanta considered this one of the breakthroughs of 2024. Because they obviously, because this was something that was AI guided and it was, it really surprised the experts. And just sort of, I want to tell you the story of this, because it shows where we are in terms of AI assisted discovery. When does fast grocery delivery through Instacart matter most? When your famous grainy mustard assisted discovery. So download the app and get delivery in as fast as 60 minutes plus enjoy zero dollar delivery fees on your first three orders service fees exclusions and terms apply Instacart groceries that over deliver Probably my biggest contribution to this was to have insisted on this paper being called memorization
Starting point is 01:01:59 Because I remember this this Skype that came through this the original paper was in in the in the in the in Gosh three years ago when when this pattern that came through. The original paper was in, oh gosh, three years ago when this pattern that appeared and my collaborators were saying, you know, this reminds me of this thing that birds do. And I said, oh, you mean, murmurations of starlings. And then I was like, then I said, you know, I'm going to insist that when this paper gets finished, we're going to call this paper the murmuration phenomenon. And I kind of got stuck. That's probably my biggest contribution.
Starting point is 01:02:31 Because these are, my collaborators are card-carrying number theorists. We teamed up because they were trying to explore this AI-assisted world. And I needed some real experts. So this already breaks, it breaks the birch test. The fact that I had to look for human experts to try to generate something like this already fails the birch test. But it was worth it. It was worth breaking birch for because I made friends with number theorists and it was something surprising to their community. Just a bit about the importance of the BSD Conjecture.
Starting point is 01:03:06 This is kind of nice because it gives an opportunity for me to share my own ignorance on the BSD Conjecture because I learned about this as I was working not as a number theorist, which I'm not, but as an amateur number theorist coming from AI discovery side. So this is a, and it made me appreciate why the BSD conjecture is so important and why it's so interesting and how surprisingly it, AI can help with this and that, and how the Birch test was almost met
Starting point is 01:03:40 by this particular problem. So let's go way back to Diefantin equations. Diefantin equations, named after Diefantus, is just about finding rational or integer solutions to polynomials. I said these two are equivalent because you can always rationalize the denominator and cancel out. So finding're finding over finding solutions of a Q is really kind of the same as finding these solutions over Z. So a typical example of a Diophantine equation is find all the rational
Starting point is 01:04:12 solutions to this and the solution this here is is Pythagoras. So Pythagoras tells us this is probably the most famous example 3 fifth squared plus 4 fifth squared is equal to 1. If you think about it this is actually highly non-trivial. The fact that, you know, this squared up to 25 and just canceled to one is already kind of interesting, bizarre. But Pythagoras, I can't remember whether it was Pythagoras or Euclid, who gave the full solution to this equation. And the solution is that there is a one parameter infinite family solution of solutions of rational solutions. So the rational solutions are often called Q points or a Q for rational or rational points on this quadratic. So what we say points
Starting point is 01:05:01 because you can plot this and this is thanks to Descartes, you can plot this and this is a circle. So you're finding rational points on a unit circle. That's why the word solution and point is used interchangeably in this field called arithmetic geometry. This is great, but what I want to emphasize is that, what's less known is that this is less known, is that this is obviously just one solution, but there is a one parameter infinite solution to this.
Starting point is 01:05:33 And so in other words, all solutions can be parameterized in a specific way. So this is a quadratic case. If we we recall high school algebra or high school Cartesian geometry, a quadratic is what's known as a conic section. You're slicing the cone. If you bump it up, it's already extremely difficult. If you bump up the degree, it's already become an impossible problem. So instead of considering x squared plus y squared is equal to one, by the way, all of the quadratic ones can be solved in a similar way. So all the conic sections, conic sections not over the complex numbers, but conic sections over the rationals can be solved in a similar way. But once you go
Starting point is 01:06:18 into cubics, you're completely stuck. Even something simple like this, I change the 2 into 3, how do you find all rational points on this? We still don't know in general in some sense. But you can see the kind of problems we can get in Fermat for example is about talking about higher degree polynomials. Fermat's last theorem is x to the n plus y to the n is equal to 1, and find all rational points on that particular curve. That's it. And the theorem states that the only rational ones are the quadratic ones. And they're from 3 and above.
Starting point is 01:07:02 So x cubed plus y cubed is good one, there cannot exist any rational points and so on and so forth. So now, these are called conic sections, that's just a curve like parabolas and stuff like this and circles and ellipses. Once it's a cubic, it's called an elliptic curve. There's a general theorem that, so you can imagine there could be more terms, right? Why not have something like x squared y, that's also a cubic, is it degree 3, or xy squared, that's also a cubic and so on. There's a theorem by Warstras that all of these ones can be reduced after transformations of variables into this form.
Starting point is 01:07:45 So you have a quadratic in y and a cubic in x and then a linear in x. You can transform away all of the other coefficients if you wish. So this is called the general Weierstrass representation of an elliptic curve. This is my, well, I grew up in this one and it's important to emphasize that my bias towards favoring over this is what exactly prevented me from being able to understand any of this from the AI point of view in the beginning. Because for an algebraic geometry, this is the one that we always use and it's like our favorite thing. We always try to use something. And I will tell how an experiment failed playing with this.
Starting point is 01:08:26 Just because I was taught to always think in terms of Weierstrass form. This is a canonical representation of elliptic curves. And it's just, you know, it's in this particular form. It has no quadratic in x and there's no other form, no linear, no cubic in y. So just before we move on, this Weierstrass form, this means that any elliptic curve can be classified by these two numbers, g2 and g4? linear no cubic sine y. So just before we move on this wire stress form this means that any elliptic curve can be classified by these two numbers g2 and g4? Yeah exactly exactly yeah there's a there's a variable transformation that
Starting point is 01:08:53 puts you into this canonical form. Okay so I know you'll get to it but I'm interested as to why this prevented you because I could imagine that these two numbers can serve as something like pixels or RGB numbers. That's exactly, you're reading my mind. You're reading my mind. That's exactly the experiment that I tried and I failed. And in hindsight, it's not surprising how I failed. But I'll get to that in a minute. So let's park that idea. So just like canonical conics can be written into some kind of standard conics that we would remember from high school, the canonical cubic can be written into this Rast-Rast form. The important thing about the cubic thing is that somehow this cubic curve, there are very deep reasons for this, captures a lot of the non-trivial arithmetic and number theory.
Starting point is 01:09:46 For example, the Fermat's last theorem was able to be cracked because Frey and friends were able to reduce Fermat's. That's neither a conic nor a cubic, but they were able to reduce that to a particular elliptic curve called the Frey elliptic curve called the free elliptic curve and then Wiles comes in and proves the modularity theorem. That's a whole big story. So somehow this cubic is just at the intersection. That's why, oh by the way, cubics are great because this is an example. Well this is the only example of a Clavier manifold in dimension one. So there's something about the elliptic curve. The cubic. So all Clavier's in complex dimension one. So there's something about the elliptic curve, the cubic curve. So all clavials in complex dimension one, remember the picture that I drew last
Starting point is 01:10:30 time where you have positive curvature is the sphere, zero curvature is the torus and surfaces of a general type are negative curvature. This is the Euler-Riemann, what do they call, normalization, the Riemann uniformization theorem. But the critical case, positive curvature, negative curvature, zero curvature, the torus, if you were to represent this algebraically, that's exactly this elliptic curve. So elliptic curves are Claub-Yel manifolds of complex dimension one and this is just the critical part, the critical part that also captures so much number theory. So that's why, so algebraic geometers, differential geometers, physicists and number theorists are interested in Clabiellness because of this intrinsic zero curvature.
Starting point is 01:11:26 There's a lot of depth about this statement. Zero-curvatured objects give so much wealth because it's just at the boundary of positive and negative curvature. Oh, and that's what you mean by criticality is zero curvature? Hi, everyone. Hope you're enjoying today's episode. If you're hungry for deeper dives into physics, AI, consciousness, philosophy, along with my personal reflections, you'll find it all on my sub stack.
Starting point is 01:11:53 Subscribers get first access to new episodes, new posts as well, behind the scenes insights, and the chance to be a part of a thriving community of like-minded pilgrimers. By joining, you'll directly be supporting my work and helping keep these conversations at the cutting edge. So click the link on screen here, hit subscribe, and let's keep pushing the boundaries of knowledge together. Thank you and enjoy the show. Just so you know, if you're listening, it's C-U-R-T-J-A-I-M-U-N-G-A-L dot org, KurtJaimangal
Starting point is 01:12:22 dot org. It's zero. Yeah, yeah. It's the dividing point. And there's lots of conjectures. Yao has various conjectures on this about finiteness of this topological type in this space. But anyhow, back to, so that's why I know about the curves because I came from this
Starting point is 01:12:43 algebra geometry string theory background that also wanted to study this richy curvature flat or zero curvature objects. So back to this earth. So now it's a theorem and this is a theorem due to so many people that have gotten the Fields Medal for proving different parts of this theorem. And this theorem really, really spanned a long time. People like Wey, Delien, Grotendieck, Dwork, Fountains, and Modell, they all contributed to this. And the theorem is this.
Starting point is 01:13:18 We can't say something like Pythagoras, which says that there's a one family, infinite family parameter solution to the rational points of the quadratic or the conic. We can't say something like that because it's too hard, but at least what we can say is the following. It's that any elliptic curve over q, the rational points over any elliptic curve, it itself forms a group and the group is of this form. The group is that there's an R infinite parameter family of solutions, that's called R, it's called the rank. So how many copies of the infinite solutions they are? And then there is this finite what's called the torsion solution. There are 16 types of torsion.
Starting point is 01:14:10 This is really at the heart. All of this stuff, by the way, is at the heart of the Langlands. So you would have told... Ed Frank would have told you about how excited he is about this sort of thing. But the one particular thing I want to emphasize is R, this rank is the the number of infinite family of solutions. And this R is the rank of an elliptic curve. Yeah.
Starting point is 01:14:38 This is the Mordell-Weill theorem or the Mordell- Exactly. Exactly. You absolutely do. Exactly. Exactly. But Mordell-Weill. Exactly, exactly. You absolutely do. Exactly. Exactly. Model V. Ah, V.
Starting point is 01:14:46 Andre V. Anyhow, so the reason I want to emphasize, there was still nothing to do with BSD, but this rank is the infinite number. How many measures, how many infinite family solutions over Q? So in the case of Pythagoras, the rank, if you wish, would be 1 because there's a 1 family, 1 infinite family, infinite family. So R is the generalization of 1 from the conic section case to the elliptic curve case. So that's it. This really is the state of the art in what you can say about rational points of an elliptic curve. There is this wonderful thing called rank, which is actually quite difficult to compute. It's not like I give you, no, I can just read it off.
Starting point is 01:15:36 It's not like there's some analytic formula that says, ah, yeah, I get this. I look at this one, one, this is x cubed, y squared. I can just tell you the rank is what I can't remember what it is in this case to or whatever it sure Yeah, and the the earliest experiment that I did was exactly as you as you said as you suggested But this is back in 2019. I just took a database of about three million virus elliptic curves in virus trace case. I took the two numbers G2 and G4. And then this was done.
Starting point is 01:16:10 So this was a paper that I did with. So the reason is I did it with two data scientists who are using the fanciest data possible. And so we're all a bunch of amateurs as far as BSTs is concerned, was to take the G2 G4 as two parameters and just plot them and then label them by R, the rank, and try to see a pattern. We got a null result because this G2 G4 in the database, I can show you, it's actually it's massive. They're in the trillions. So it's very hard to get much.
Starting point is 01:16:45 We had to take the log of all these numbers to even establish a plot. And the rank was so randomly distributed, even with the fanciest technology. So what's the solution to this problem? A plot, yeah, we didn't see anything. It was just, we couldn't get G2 and G4 to predict to any accuracy level what the rank is. But nevertheless, this was featured by new scientists because it was such a strange and novel thing to do.
Starting point is 01:17:16 Even though it was a null result. But at least it was inching towards something. Somebody someday must be able to say something intelligible at BSD from a data science point of view. Anyhow, back to this. So what should be the thing to do? And this is where number theory expertise actually comes in. The first of all, this is an old lore, which is if you can't solve a solution over the integers, solve it modulo-prime and see how far you can't solve a solution over the integers,
Starting point is 01:17:49 solve it modulo-prime and see how far you can get. So for example, I can't think of a rational number that I can't off the top of my head now. I think there is, there exist solutions off the top of my head of a rational point on this elliptic curve, but at least let me try to work over modulo prime. So modulo 23, this is true. You can check it because 2 cubed is 8, 8 plus 16 is 24 and that is one modulo. So that works. This one works. You can see in the module of five, this works. So OK, this seems like a game.
Starting point is 01:18:28 But the deep results of all of people like Delene and Ve and all these people is that if you work over a sufficient number of primes, in fact, if you worked over all primes and take a limit, you should know something very deep about the solutions over Q. And that's the point. So in particular, what you should record are these Euler coefficients,
Starting point is 01:18:52 which is the number of solutions, modulo prime, and how they deviate from p plus one, the prime itself. So this is what's known as an Euler coefficients. Just keep track, start with two, and then try three, try four, I said try three, try five, try seven, and then find how many solutions there are. And this is a now this is a finite problem. You can just do in the worst case, a grid search because you're doing modular prime, you just have to count the number, you know, and just check by brute force. So now what you is to form what's called the local zeta function.
Starting point is 01:19:30 The local zeta function is this exponential generating function that keeps track of the number of solutions over P. You fix a prime and you literally count the number of solutions modulo p. The more technical thing, what you really should do is because all finite fields are prime powers, what you actually do is to do an exponential generating function over the number of solutions of the field of prime characteristic p. But that's a technical side.
Starting point is 01:20:07 But what you really should do, what you're essentially doing is to keep track of the number of solutions of the elliptic curve, not q points, but the number of fp points, modulo prime p. Okay. And the fact that this generating function becomes this polynomial divided by polynomial form is what got DeLing, when DeLing proved this, he got the Fields Medal for showing that this is in this particular form. It's a very, very, very deep result in number theory.
Starting point is 01:20:38 So okay, long story short is that when I met Oliver and Lee, they told me that whatever the hell I was doing with these data scientists, no offense to data scientists, we were just amateurs, we shouldn't have used the Weierstrass representation because the Weierstrass representation inherently is not what an arithmetic geometry, what a number theorist would be using that would capture the fundamental arithmetic of elliptic curves. So now there's the geometry of elliptic curves, which is my background, but that doesn't capture the Varshares form, doesn't capture that. The AP coefficients captures the arithmetic. So we should be using AP coefficients to predict the rank. So then we did.
Starting point is 01:21:28 So instead of using a pair of very large integers, G2 and G4, and set up a neural network to predict R, you take instead the first, say, 100 AP coefficients. Now we're in a very interesting point cloud of a hunt a hundred dimensional point cloud It turns out 50 is enough. It doesn't have to be a hundred just randomly chosen you take the first 50 50 primes It's just not too crazy now for you know for it for by today's AI key standards Take the first list list do a hundred take the first AP cover take take the first 100 primes now Take an elliptic curve, reduce
Starting point is 01:22:10 this elliptic curve and count how many solutions modulate these primes and compute these polar coefficients. So for each elliptic curve you get this 100 dimensional vector. Move to the next elliptic curve, 100 dimensional vector. Now you just start labeling. Because of this wonderful thing called LMFDB, the LMFDB was set up by a bunch of people. I think Andrew Sutherland at MIT is one of the instigators of it. Sutherland and Booker and a bunch of people set up this thing, which just records everything you ever wanted to know about elliptic curves.
Starting point is 01:22:44 There are tens of millions, on the order of 10 million in this dataset. which just records everything you ever wanted to know about the Liberty Curves. There are tens of millions, on the order of 10 million in this dataset. So now you've got 100 dimensional vectors as you march through. Because the LMFDB has the rank information, you can start to set up a newer network. So we set up a newer network or some other classifier. And just like what we did with, just like what I did with Alessandro Andretti and Paranchelli on using G2G4, this pair of Vastroskovich's predict rank, that gave no result. And when we did this for 100 differential vector, this immediately gave 99.99%
Starting point is 01:23:25 accuracy in prediction you went from you know zero to almost a hundred percent immediately and you're using less data as well and using less data remember with the data side is with the G2 G4 with these guys we used something like all 3.5 million near-plete curves. And we couldn't get any accuracies at all. But with this one, even with 100,000, so you give me any elliptic curve, you just look at the Euler coefficients in this machine on demands. I can tell you with almost 100% accuracy what this rank is going to be. Interesting.
Starting point is 01:24:01 So, of course, this immediately breaks birds' nests because I had to talk to real human experts who told me to actually use the RASHA, to use the Euler representation rather than the RASHA representation. But at the time I was amazed. I was like, this is so cool. Of course, this is still useless for science. This was just a very, very cool thing to do in terms of visualizing the curves. And then after some digging,
Starting point is 01:24:29 we realized what really was under the hood that was happening. This is when Lee and Oliver were telling me, because they're number theory. So well, he says, well, this is no surprise because this is the BSD conjecture at work. Somewhere under the hood, this is BSD conjecture at work. So now I can finally define for you with the BSD conjecture at work. Somewhere under the hood, this is the BSD conjecture at work.
Starting point is 01:24:46 So now I can finally define for you with the BSD conjecture. I will give you the weak version so as not to bore you for two reasons. One, the strong version is very too technical and two, I don't even understand it very well myself. But the weak version simply says, if you take this generating function that keeps track of the Euler coefficients, form this product polynomial, now you take a product of all primes, this is what's called the local to global behavior because you localize it. Remember this zeta function is local to a particular prime. And now you take the product of all primes,
Starting point is 01:25:28 you form this new function called the global L function. Now, I want to emphasize this is called zeta function, not because of abuse of terminology, because this is the Riemann zeta function. If you work not over the elliptic curve, but if you worked over a point Okay, and then this product is exactly the only product For the zeta function if you were to work over point But now we didn't we were more sophisticated our algebraic variety our manifold is not just a point but an actual elliptic curve
Starting point is 01:26:04 So it becomes a much more richer structure. So you get this this so this L function really is this global L function really is an analog of the Riemann zeta function. So that's why this whole Langland's business is so beautiful and so intricate because he unifies geometry with geometry with harmonic analysis with number theory. This is why Edith Frankel was so excited about all of this stuff. He was saying with such, because it unifies so many different branches of mathematics. Anyhow, so the BST conjecture states, we don't know what the rank is, you can't do it by
Starting point is 01:26:42 looking at the curve, but once you have the L function, by this strange procedure, working module over a fixed prime, and then take the product of all primes, you get an analytic function. Now we have an analytic function. I can start mucking around with it. I can use complex analysis.
Starting point is 01:27:01 The order of vanishing of this analytic function at one, this is exactly the rank. That's the BSD conjecture. If you can prove that the order vanishing given any elliptic curve at one is exactly is this rank, then you get a million dollars. That's the BSD conjecture. All right.
Starting point is 01:27:24 Remember the inverse of the Riemann zeta function at one, the Riemann zeta function has a simple pole at one, so its reciprocal will have a vanishing of order one. Its order of zero is one at s equal to one. So it's a good analogy with the Riemann zeta function. But for elliptic curves instead of a point, this one, the order vanishing should exactly equal to rank. So this gives you a way to compute rank, albeit a very, very convoluted and very, very intricate way to compute rank. But what surprised us was a neural network was able to predict this rank, this number without going through any of this stuff and predicted almost to 100% accuracy.
Starting point is 01:28:09 So where's your million dollars? Well, there's no million dollars because one, there's no proof, there is no statement, and so what? So this is, we're still in the, as far as the Birch test, the A test has already failed because we had to talk to real experts. The I test has already failed at this point because there's no interpretability. So now how do we break the I test? And then surprisingly also break the N test that should also be non-trivial. Right? So that's why the memorization phenomenon was important.
Starting point is 01:28:48 So then, so we got to know, we got this prediction, we were very excited. Oh yeah, it's always really cool. And then the excitement died down and we're like, so what, what do we do? So this is the wonderful moment where you can recruit an undergraduate intern. So Alexey Potsnikov was at the time a second year undergraduate student of Kiu Huan Li's. But anyhow, so Alexey was given this thing like dig under the hood and tell me, tell us what this neural network or this classifier, this base classifier or tree classifiers are actually doing. And finally, we honed in onto a PCA analysis, a principal component analysis. Because basically we found that this rank prediction
Starting point is 01:29:34 was doing so well with basically anything. Naive Bayes classifiers, newer networks of a very simple architecture. We didn't even have to go to transformers or encoder architectures. It's just a simple, you know, forward feed linear, active linear structure with, you know, I think it was a basically sigmoid activation function
Starting point is 01:29:56 was good enough to do this. So, and PCA certainly would do it. So if you remember PC, I mean, I said, I, so PCA, I had to learn, I didn't know what a PCA was until 2017. PCA was just a principal component analysis. Here's a 100 dimensional data, you know, is a point crowd, I can't visualize it.
Starting point is 01:30:14 So I find these eigenvalues and find the principal components of these eigenvalues, meaning like the ones that the data has most variance of and then focus and project onto this eigenvalue directions. This is like stats 101, which I never learned. But luckily, I knew what a PCA was, so we were just mucking around with PCA. Here's the remarkable fact.
Starting point is 01:30:35 If you take a PCA projection of this 100-dimensional vector space of elliptic curve Euler coefficients, you project it onto principal components. In this case, a two-dimensional projection is sufficient to do it. You can see that the elliptic curves at different ranks actually separated out essentially, after 13th degree noise. Sorry, is it important that you do your principal component analysis down to two dimensions
Starting point is 01:31:03 or does it just happen to work out that way in this case? So in our case we chose two because it was easier to see. You would have done it, if you project in any other dimension you would see this kind of separation. Sure. Yeah. Two was just nicest and in retrospect we really should have chosen like an Italian flag or you know, or a French flag Because it really just looks like a flag. In this one, this is elliptic curves of rank 0, rank 1 and rank 2
Starting point is 01:31:33 This is already quite interesting, right? It's still useless in terms of actual mathematics It was very good for AI that you know AI was able to just you know There's not even really AI. At this point, it was just PCA analysis, or just a data analysis in a picture that analyzed elliptic curves in a way that was never done before. This is 2020 when we had this result. So we look at this, and we saw why this is kind of nice.
Starting point is 01:32:01 There are elliptic curves, they separated. Oh, by the way, elliptic curves, the ranks of elliptic curves is again, hugely, you can imagine because of the BSD. So Manjur Bhagawa was able to prove that almost all elliptic curves are ranked either zero or one in the infinite limit. And he got the Fields Medal for that.
Starting point is 01:32:22 So this is obviously a very, very important result. But in the NFDB, you do also have higher ranks as well. Just, you know, in the infinite limit, you will be vastly dominated by ranks zero and one. So they're either no rational points, no infinite families of rational points or one parameter family, just like the the conic section case. But the world record I believe is the one that's held by Noam Alkes who has discovered this rank 28 elliptic curve. It's huge. You can write it out. And that one is so rank 28 means there are 28 parameter families of rational points on that particular curve. But so in LMFDB, you can already see there is sufficient number of ranks 0, 1, and 2 cases that there is this 3. I mean, there are rank 4s as well, but you won't see that in this picture.
Starting point is 01:33:21 Now, so why do I emphasize on PCA? I'm finishing with this. Now, with PCA is because PCA is just a matrix projection. It's a linear transformation. You could look under the hood and just look at these matrices. Okay. And that's what we that's what we asked Alexei to do. What does it mean to look under the hood at the matrix? Because this is a PCA, just because you know, this is a 100 dimensional vector point cloud being projected to two dimensions. So there's a whole bunch of 100 by 2 matrices. You can just look at it. Nobody ever does this, right? You don't look at what the AI algorithm is doing.
Starting point is 01:34:03 But in this case, you can just make people look at it. And so we gave it to Alexei to look at it, not expecting much. And this is my undergraduate. But Alexei expected all expectations. He really looked at a lot of sample these matrices and noticed that almost all of the non-zero values are focused on essentially just one row. The one row dominated vastly over any other. So what does that mean in terms of what? So that's interesting. So if you have a PCA projection, if you
Starting point is 01:34:39 have a matrix projection that's focused on just one row, that means it's essentially you're just doing a sum. You're just doing an average in some sense. And that's exactly it. So now what it's actually doing is that it's taking its Euler coefficients and averaging over a particular range of elliptic curves ordered by conductor and I won't bore you with the details of conductor and it's just computing this average and if you do this average plotted against primes you will start seeing this murmuration phenomenon. So let me just emphasize a few points on this. First of all, So let me just emphasize a few points on this. First of all, you're taking,
Starting point is 01:35:26 and we wouldn't have done this if we didn't do a PCA analysis. First of all, if we didn't do this machine learning exercise on rank, we wouldn't have dug under the hood in the first place. So that was already AI guided. Then it told us we homed in on PCA because we thought that might have been the most interpretable thing. And once we do interpret it, it gave this equation, which is a very, very simple equation, which always says just to do the following.
Starting point is 01:35:57 Take families of elliptic curves, order them by this conductor range, and average over different elliptic curves but at a fixed prime. This is known as a vertical average. It's a very strange thing to do because traditionally what you would do is to average for fixed elliptic curves or average over different primes, right? You know, things like take a product, the product formula is for a fixed elliptic curve and average over different primes. But this is, the PCA told us no, you do the opposite. You take different elliptic curves over a range and average over a fixed curve, over a fixed prime and plot the thing against prime. Once you plot it, you see exactly what this is. So the red bit are all of these zero rank ones, and the blue bit are exactly all the rank one ones.
Starting point is 01:36:56 So the reason that all of the neural networks and all of the stuff were able to fundamentally tell the difference between zero and zero and differently lifted curves was because they were oscillating in a different way. So that's why because it's so striking, right, you see this now, now you can say, well, it's, it's guiding, it's guiding us what to do is guiding us, you can do this to all of the to all of the other ranks, you just plot, you know, just isolate different ranks and plot them. And it turns out to all of the other ranks, and just isolate different ranks and plot them.
Starting point is 01:37:26 It turns out that all of the even parity ranks, all the even ranks oscillated in this way, and all of the odd ranks oscillated in the other way. Remember, there is 0, 1, 0, 1, 2, 3, they're all these different ranks. Fundamentally, you can tell the parity of the rank just by the way that these oscillation patterns happen. And this is the point. So and then this was a plot that was produced by Lee and Posnikov.
Starting point is 01:37:52 And Posnikov did this and showed us this. And I remember this Zoom chat very well. This is 2020-2021. So we're still like COVID times, right? So all we did all day was to zoom people. And then it was like, and I think, you know, my friends were saying, this is like, looks like what these birds do. And I say, ah, it looks like memorations of, memorations, that's what it is, memorations of starlings. And I said, we should absolutely call this phenomenon memorations. We should call, instead of calling it a boring oscillatory pattern, which we should say this this phenomenon Memorations we should call instead of calling it a boring Oscillatory pattern which we should say this is an emigration like because it's not quite oscillatory right because it's it's kind of there's there's noise around
Starting point is 01:38:34 It and this noise is very interesting. I'll tell you in a bit So this noise is part of the statistical error that you get from doing with finite data because we know we only had You know three.6 million whatever LMFDB but the point was when then we immediately wrote to Sarnak and to Sutherland who are the leaders in this field thinking that oh this is trivial this is a trivial con you know you this is a kind of typical kind of thing you write to the expert and you're going to get you get a reply within a day and it says oh yeah this is trivial right and it is a consequence of this theorem that I proved in like you know 20 years ago in
Starting point is 01:39:11 this paper this is the usual story like you know a hundred times is this right but not only do we not get a back a thing this this message we've got a long message back from Peter Sarnak who wrote, this what the hell guys, this is bizarre that this pattern exists. Why? Then there was this many, many emails back and forth. I mean, to be honest, a lot of these emails way over my head because my contribution was an AI guided algebraic geometry.
Starting point is 01:39:44 I'm not an number theorist. So there's lots of back and forth. Then this became the memorization phenomenon. Then there were all these conferences organized. Back to this birth test. It failed the A because it wasn't automatic. We needed human. We were mucking around all the time with human experts. It passed the I test because it became interpretable.
Starting point is 01:40:11 This was a precise formula. Most importantly, it passed the N test. This is the first time that an N test was passed, because it actually galvanized a field of study number three. Now there's a whole field called memorization phenomenon. Wow. This is totally out of like, what the hell? I mean, this is above my pay grade because I don't know the number three community that well at all.
Starting point is 01:40:36 So that's why Quanto was so exciting. There were conferences organized in ICERM. There are all these people, there are workshops apparently in Bristol and various universities, they're like, oh yeah, there's this kind of a memorization workshop on this. So I'm not going to, I need to wrap up because I think it's getting too much detail. I want to tell you that there are other parts of this memorization thing. But now you can, this is a precise conjecture. This is a precise conjecture that was raised by, guided by AI explorations with humans. And Peter Sarnak says it really well. He says, this is a conjecture that Bertrand, Sleendon and Dyer could have raised themselves,
Starting point is 01:41:20 but they didn't because they never thought to take this average. Because it's a bizarre thing to do. The AI doesn't know what it's doing. All it's doing is spotting patterns. So just to emphasize the parts of this conjecture now of the phenomena, which is expected to be true for all L functions. So in other words, the memorization phenomenon should be a general phenomenon in the entire Langlands program. So, it's now proven for Dirichlet character. The memorization to Dirichlet character has been now proved. So, this memorization actually converges to a precise curve. And this was proven by, I wasn't involved in this because this unique actual, it was
Starting point is 01:41:59 an actual number theory paper. And then Nina Zubrilina, who is a Sax PhD student, who was a Sanax student, and then Alex Cohen. And now they proved it for way too modular forms as well. And this was all in 2023, in 2024. And there are more results being precisely proven. And what it really is, and this goes back to Gauss and to Riemann zeta, is that this memorization fundamentally generalizes a bias in the distribution of primes. That's also quite striking. So this is an interesting fact. So Chebyshev noticed this factor before. So Chebyshev noticed it's called the Chebyshev noticed this factor before. So Chebyshev noticed it's called the Chebyshev bias. If you take
Starting point is 01:42:46 all primes, if you take primes and consider and find the remainder of these primes upon division by four, you're going to get remainder either one or three, right? Because you're modulo four. remainder either one or three, right? Because you're modular four and you would have thought that it's 50% one and 50% three in the large limit. But Chebyshev in the 19th century already noticed there's just a tiny bit of a bias towards three than one. And that was just a conjecture of his. And this is again, mucking around with data, this platonic data. Like, why is it the Primes are more biased towards one of the ones, upon division by four? This is known as Chibi-Chib's bias. And this was proven actually by Sarnak and Rubinstein in the 90s,
Starting point is 01:43:44 but only conditional on the Riemann hypothesis being true. Interesting, right? There is this fundamental bias, so there is no unconditional proof of this, but it's conditional on the Riemann hypothesis that this is true. But what is interesting is that the memorization phenomenon is a generalization of this Chebyshev bias to all of the L function world. So it's a generalization from prime biases to the biases in all L functions because there is this underlined oscillatory behavior. And that also has this deep relation with the BSD conjecture. So that's where this whole world, and that's why it passes I, it passes N, people are still working on it, but it doesn't pass A, because, you know, human expertise, we're constantly involved in interpreting and choosing PCA. Anyhow, to wrap up this whole story, where are we, right? In terms of mathematical conjectures, in formulating problems in this top-down
Starting point is 01:44:46 guided mathematical discovery over the centuries. And I will say that in the 19th century, the eyes of Gauss were good enough to come up with this conjecture. But the 20th century, BSD already needed a computer to come up with a groundbreaking conjecture. And we are now just at the castle, but that's why it's so exciting that AI guided human intuition led to this deep
Starting point is 01:45:10 mind paper and led to the memorization thing and this new in a new matrix multiplication matrix. So obviously where we're going is this is this is this is a combination of these three directions. And I, and I, and I, and I mentioned And I mentioned earlier, in these very rooms that I'm talking, that Faraday would have had the conversation with Maxwell in this room, well, about 150 years ago, that our institute
Starting point is 01:45:43 has devoted one of our four themes to this AI for mathematical physics because this is really a paradigm shift in terms of how science is done. So just to promise this is the last slide, so what is the current state of the art? Where are we with this AI guided thing? Now let's drop this human guided intuition for the time being. Alpha-G02 DeepMind has now reached silver medal. I think we had this long joke and discussion about how to beat the mind, the 16 year old Terry, sorry. The 12 year old Terry Tao.
Starting point is 01:46:20 When you beat the 16 year old Terry Tao, it's game over. That's a new age of scientific experiments. But we're making like these guys or this field, we as a field, as a community, we're making progress every couple of months. It's so fast. It's exponentiating. The DARPA, the US military thing, now that US is defunding everything, but it will not defund military. So the military wing, the DARPA, which is the Defense Advanced Research Instinct, I was just at a meeting with them two weeks ago, just launched eXp math, where they literally are saying, exponentiating mathematics. You can Google this. The EXP math project that DARPA is funding now
Starting point is 01:47:05 is how to benchmark accelerating mathematics by proving. But unfortunately, they're not funding in this AI guided discovery area, which I think should be funded. And that's the way, you know, combination of all these three directions. Anyhow, so that's Alpha Geo 2. Alpha Proof, again, DeepMind, Anyhow, so that's Alpha-Geo2. Alpha-Proof, again, deep mind, is on proving theorems at an almost research level. And I think we even joked last time that at the silver medal level, Alpha-Geo2 is high school level. Alpha-Proof is maybe college level, right? But this is the interesting one. I think this is very, just to wrap up, the EPOC AI Frontier Math Project, and you can Google this, and this is really on professional mathematical problems.
Starting point is 01:48:01 The kind of problems you would give it to a colleague or to a research or a very advanced postdoc or graduate student. And they're benchmarking this now. In fact, I'm flying to Berkeley on Thursday to help. There's a whole team of us we're flying in to benchmark their tier four problems. And this is happening this weekend, actually. So by, as of December 2024, EPOC AI is capable of solving only 2% of advanced research level mathematical problems. But by March, they finished their tier 1 to tier 3 problems, and they gave this division of tier 1 to 3 problems and graduate level problems.
Starting point is 01:48:45 You can go to this website to look how hard these problems are. And Terry Tao gave some problems, and Ken Ono. These are all really research problems. And I gave a problem to the tier four, which is about to appear. And they're still soliciting more. And just go to this link. You can just look, oh my God, this is the kind of problem
Starting point is 01:49:07 I would give to my research student, or to the kind of problems I would work on with a collaborator, like that we would write papers about. And their gate and their benchmark is about 10 to 25% on tier one to three. So once you, their next benchmark is tier four, and tier four is so hard that humans are not likely to solve it, not because they're tricky like Olympia problem, but because they're actual research level problems. And we are actually together as
Starting point is 01:49:41 a community are attacking this kind of problems. So this is where the state of the art is. So in terms of where the future of mathematics is, I think I try to summarize in this picture. So I'm using an old picture of Terry Tao because he's the best human mathematician. So how would it go? So you would have literature, the corpus of literature from scientific papers, you'll go back and forth where human and I were processed together and then use top-down mathematics to formulate conjectures from platonic data that's gathered from the literature or process it directly from the literature and formulate the kind of problems. Once the problem is formulated, you would go to auto-formalization. It's only a matter of time before Lean, the Lean community gets it. By
Starting point is 01:50:41 auto-formalization, I mean you take a math paper in LaTeX and just hit return, you know, translate it to a lean format. We're very far from that right now, having conversations with Buzzard, not because the technology is not there, but because there's not enough lean data to train a large language model on. Interesting. We only have millions of lines of lean so far available. Just millions. We need billions in order to have. And it's millions of, you know, these poor guys type in everything, right? Yes. So, so conjecture formulation to this meta- auto-formalization, and then you go through
Starting point is 01:51:21 and find pathways through, through math mathly lib in this bottom up approach where you would have a combination of LLMs and that would generate your proof. And then you will feed to the human where the human would actually interpret what this is and then go back to writing the paper with AI and feed it back to the literature. This loop, feedback to the literature. This loop, I think, is where we are already heading toward. It's not immediately in the foreseeable future that all of this is going to be automated, but what is remarkable is that this is within reach. I can't put a date on it, maybe 10 years, maybe 5 to 10 years, because every single step of this, we are being held by AI.
Starting point is 01:52:08 For example, the memorization, which has taken available data and formulated it in this way. Already last week, I'm meeting people who are actually coming up with proof pathways by even by chat GPT and then humanly verifying that. This is the brave new world of mathematics, I mean, of theory, of new discovery. And we're so lucky to be in this age where AI has advanced enough where it could actually help us with genuine new discoveries. Yong, thank you so much for bringing us to literally the frontier,
Starting point is 01:52:44 the bleeding edge of math and also the future. Oh, it's a great pleasure talking to you. Thank you for listening. I get very excited about this. I'm parking everything else so I could devote to this new community. And it's a growing community of mathematicians who believe in this. And I think there was a recent, there's a recent quanta report. I'm not involved in that one. That involved experts like Andrew Granville, who talked about what is a beautiful proof
Starting point is 01:53:17 and how AI can help us with it. And that they are also amazed at how fast this is going. Hmm, interesting. Thank you. Thank you very much. I've received several messages, emails, and comments from professors saying that they recommend theories of everything to their students, and that's fantastic. If you're a professor or a lecturer and there's a particular standout episode that your students
Starting point is 01:53:39 can benefit from, please do share. And as always, feel free to contact me. New update! Started a sub stack. Writings on there are currently about language and ill-defined concepts as well as some other mathematical details. Much more being written there. This is content that isn't anywhere else.
Starting point is 01:53:56 It's not on theories of everything. It's not on Patreon. Also full transcripts will be placed there at some point in the future. Several people ask me, hey Kurt, you've spoken to so many people in the fields of theoretical physics, philosophy, and consciousness. What are your thoughts? While I remain impartial in interviews, this substack is a way to peer into my present deliberations on these topics. Also, thank you to our partner, The Economist. Firstly, thank you for watching, thank you for listening. If you haven't subscribed or clicked that like button, now is the time to do so.
Starting point is 01:54:33 Why? Because each subscribe, each like helps YouTube push this content to more people like yourself, plus it helps out Kurt directly, aka me. I also found out last year that external links count plenty toward the algorithm, which means that whenever you share on Twitter, say on Facebook or even on Reddit, etc., it shows YouTube, hey, people are talking about this content outside of YouTube, which in turn greatly aids the distribution on YouTube. Thirdly, you should know this podcast is on iTunes, it's on Spotify, it's on all of the audio platforms. All you have to do is type in theories of everything and you'll find it.
Starting point is 01:55:10 Personally, I gained from rewatching lectures and podcasts. I also read in the comments that hey, toll listeners also gain from replaying. So how about instead you re-listen on those platforms like iTunes, Spotify, Google Podcasts, whichever podcast catcher you use. And finally, if you'd like to support more conversations like this, more content like this, then do consider visiting patreon.com slash Kurtjmongle and donating with whatever you like. There's also PayPal, there's also crypto, there's also just joining on YouTube. Again, keep in mind it's support from the sponsors and you that allow me to work on toe full time.
Starting point is 01:55:46 You also get early access to ad free episodes, whether it's audio or video, it's audio in the case of Patreon, video in the case of YouTube. For instance, this episode that you're listening to right now was released a few days earlier. Every dollar helps far more than you think. Either way, your viewership is generosity enough. Thank you so much.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.