Y Combinator Startup Podcast - François Chollet: The ARC Prize & How We Get to AGI

Episode Date: July 3, 2025

François Chollet on June 16, 2025 at AI Startup School in San Francisco.François Chollet is a leading voice in AI. He's the creator of the Keras library, author of Deep Learning with Python, a...nd the founder of the ARC Prize, a global competition aimed at measuring true general intelligence.He's spent years thinking deeply about what intelligence actually is—and why scaling up today’s AI models isn’t enough to reach it.In this talk, he walks through the limits of pretraining and memorized skills, and lays out a path toward true general intelligence—AI that can adapt on the fly, reason in new situations, and invent novel solutions. He explains why abstraction and compositionality matter, how ARC became the benchmark for progress, and what his team at a new research lab called Ndea is building next.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi everyone, I'm Francois. I'm super excited to share with you some of my ideas about HGI and how we're going to get there. This chart right there is one of the most important facts about the world. The cost of computers has been consistently falling by two orders of magnitude every decade since 1940. There's no sign that is stopping anytime soon. And in AI, computers and data have long been the primary bottleneck to what we could achieve. And in 2010, as you all know, With the abundance of GPU-based computers and large data sets,
Starting point is 00:00:34 deep learning really started to work. And all of a sudden, we're making fast progress on problems that had long seemed intractable across computer vision and natural language processing. And in particular, self-supervised text modeling started to work. And the dominant paradigm of AI became scaling up LAM-3 training. And this approach was crushing almost all benchmarks.
Starting point is 00:00:59 And remarkably, it was getting predictably better benchmark results as we scaled up model size and training data size with the exact same architecture and the exact same training process. That's the scaling laws that Jared told you about a few minutes ago. So it really seemed like we had all figured out. And many people extrapolated that more scale was all that was needed to solve everything and get to a GI. Our field became obsessed with the idea that,
Starting point is 00:01:29 General intelligence would spontaneously emerge by cramming more and more data into bigger and bigger models. But there was one problem. We were confused about what these benchmarks really meant. There's a big difference between memorized skills, which are static and task-specific, and fluid general intelligence, the ability to understand something you've never seen before on the fly. And back in 2019, before the rise of LALAMs, the rise of the alarms, I release an AI benchmark to highlight this difference. It's called the abstraction and reasoning corpus, or arc 1.
Starting point is 00:02:09 And from at that time back in 2019 to now, with a model like GPD4.5 for instance, there's been a roughly 50,000x scale up of basal alarms. And we went from zero percent accuracy on that benchmark to roughly 10 percent, which is Not a lot, it's very close to zero, if you take into account the fact that anyone of you in this room would score well above 95%. So to crack general fluid intelligence, it turns out when it did new ideas
Starting point is 00:02:42 beyond just scaling up pre-training and doing static inference. This benchmark was not about regurgitating memorized skills. It was really about making sense of a new problem that you've never seen before on the fly. But then last year, in 2020, In 2020, everything changed. The AI research community started pivoting
Starting point is 00:03:02 to a new and very different pattern. Test time adaptation, creating models that could change their own states at test time to adapt to something new. So this wasn't about carrying preloaded knowledge anymore. It was really about the ability to learn and adapt at inference time. And suddenly, we started seeing significant progress on arc.
Starting point is 00:03:23 So finally, we had AI that was showing genuine signs of fluid intelligence. So in particular in December last year, opening I previewed its O-3 model, and they used a version of it that was fine-tuned specifically on ARC, and that showed human-level performance on that benchmark to the versus time. And today in 2025, we have solidly moved on from the pre-training scaling pattern, and we know fully in the era of test-andaptation. So test-adaptation is all about the ability of a model to modify its own behavior, dynamically
Starting point is 00:03:58 based on the specific data it encounters during inference. So that covers techniques like test time training, program synthesis, chain of thought synthesis, where the model tries to reprogram itself for the task at hand. And today, every single AI approach that performs well on ARC is using one of these techniques. So today, I want to answer the following questions. First, why did the pre-training scaling paradigm
Starting point is 00:04:26 not get us to a GI. If you look back just two years ago, this was the standard dogma, everybody was saying this. And today, almost no one believes this anymore. So what happened? And next, does this sound adaptation get us to AI this time? And if that's the case, maybe a GI is already here.
Starting point is 00:04:46 Some people believe so. And finally, besides the sound adaptation, what else might be next for AI? And to answer these questions, we have to go back to a more fundamental question. What is even intelligence? What do we mean when we say we're trying to build AI? If you look back over the past decades, there's been two lines of thoughts to define intelligence and to define the goals of AI. There's a Minsky style view.
Starting point is 00:05:16 AI is about making machines that are capable of performing tasks that would normally be done by humans. And this echoes very closely the current mainstream cooperates you that AI would be a model that could perform most economically valuable tasks. Like 80% is often quoted as the number. But then there's the MacCarty view that AI is about getting machines to handle problems they have not been prepared for. It's about getting AI to deal with something new. And my view is more like the MacCarty View. Intelligence is a process, and skill is the output of that process. So skill itself is not intelligence, and displaying skill at any number of tasks does not show intelligence.
Starting point is 00:06:04 This is like the difference between a road network and a road building company. If you have a road network, then you can go from A to B for a specific, predefined set of A's and B's. But if you have a road building company, then you can start connecting new A's, new B's, on the fly as your needs evolve. So intelligence is the ability to deal with new situations. It's the ability to blaze fresh trails and build new roads. So attributing intelligence to actually a crystallized behavior program, a skill program, that's a category error.
Starting point is 00:06:40 You are confusing the process and its output. So don't confuse the roads and the process that created the road. So to formalize this a bit, I see intelligence as the conversion ratio between the information you have, mostly your past experience, but also any developer-imparted priors that the system might have, and your operational area over the space of potential future situations that you might encounter. And that's going to feature high novelty and uncertainty. So intelligence is the efficiency with which you operationalize past information in order to deal with the future. It's an efficiency ratio. And that's the reason why using exam like benchmarks to the AI models is a bad idea.
Starting point is 00:07:27 They're not going to tell you how close we are to AI. Because human exams weren't designed to measure intelligence. They were designed to measure task-specific skill and knowledge. They were designed according to assumptions that are sensible for humans, but not for machines. Like, for instance, most exams assume that you haven't read and memorized all these. the exam questions and the answers before him. So if you want to reguously define and measure intelligence, here are some key concepts that you have to take into account.
Starting point is 00:07:59 The first is the distinction between static skills and fluid intelligence. So between having access to a collection of static programs to solve known problems versus being able to synthesize brand new programs on the fly to face a problem you never seen before. And of course, it's not a binary, it's not one or the other, there's a spectrum between the two.
Starting point is 00:08:24 The second concept is operational area for a given skill. There's a big difference between being skilled on in situations that are very close to what you've seen before and being skilled for any situation within a very broad scope. For instance, if you know how to drive, you should be able to drive in any city, not just in a specific geo-fenced area. I can learn to drive in San Jose,
Starting point is 00:08:49 and move to Sacramento and you can still drive. Right? Again, so there's a spectrum there. It's not binary. And lastly, you should look at information efficiency. For a given skill, how much information, how much data, how much practice did you need to acquire that skill?
Starting point is 00:09:06 And of course, higher information efficiency means higher intelligence. And the reason these definitions matter a lot is that as engineers, we can only build what we measure. So the way we define, and measure intelligence is not a technical detail. It really reflects our understanding of the problem of cognition. It scopes out the questions we're going to be asking,
Starting point is 00:09:30 and so it determines the answers that we're going to be getting. It's the feedback signal that drives us towards our goals. And a phenomenon you see constantly in engineering is the short control. So it's the fact that when you focus on achieving a single measure of success, you may succeed, but you will do that at the expense of everything else that was not captured by your measure. So you hit the targets, but you miss the points. And you see this all the time on Kaggle, for instance.
Starting point is 00:10:03 We saw it with the Netflix Prize, where the winning system was extremely accurate, but it was way too complex to ever be used in production. So it ended up never being used. It was effectively pointless. We also saw it in AI with chess playing for AI. The reason the AI community set out to create programs that could play chess back in the 70s was because people expected this would teach us about human intelligence.
Starting point is 00:10:30 And then a couple decades later, we achieved the goal when Deep Blue beats Kasparov, the world champion. And in the process, we had really learned nothing about intelligence. So you need the targets, but you miss the points. And for decades, AI has chased task-specific skill, Because that was our definition of intelligence. But this definition only leads to automation, which is exactly the kind of system that we have today. But we actually want AI that's capable of autonomous invention.
Starting point is 00:11:00 We don't want to stop at automating known tasks. We want AI that could tackle humanity's most difficult challenges and accelerate scientific progress. That's what AGI is meant to be. And to achieve that, we need a new target. We need to start targeting fluid intelligence itself. the ability to adapt and invent. So one definition of HGI only enlocks automation.
Starting point is 00:11:26 So it increases economic productivity. Obviously, it's extremely valuable. Maybe it also increases unemployment. But the other definition unlocks invention and the acceleration of the timeline of science. And it's by measuring what you really care about that we'll be able to make progress. So we need a better target, we need a better feedback signal.
Starting point is 00:11:48 What does that look like? My first attempt at creating a way to measure intelligence in AI systems was the RQI benchmark. So I released RQ1 back in 2019. It's like an IQ test for machines and also humans. So RQ1 contains 1,000 tasks like this one here. And each task is unique. That means that you cannot cram for RRQ.
Starting point is 00:12:13 You have to figure out each task on the fly by using your general intelligence rather than your memorized knowledge. And of course, solving any problem always requires some knowledge. And in the case of most benchmarks, the knowledge priors that you need are typically left implicit. In the case of arc, we made them explicit. So all arc tasks are built entirely on top of core knowledge priors, which are things like objectness, elementary physics, basic geometry, topology, counting.
Starting point is 00:12:48 So concepts that any four-year-old child has already mastered. And solving arc requires very little knowledge, and its knowledge that is very much not specialized. So you don't need to prepare for Arc in order to solve it. What makes Arc unique is that you cannot solve it purely by memorizing patterns. It really requires you to demonstrate through the intelligence. And meanwhile, pretty much every other benchmark out there is targeting fixed, known tasks, So they can't actually be solved or hacked their memorization alone. That's what makes ARC fairly easy for humans, but very challenging for AI.
Starting point is 00:13:24 And when you see a problem like this, where a human child can perform really well, but the most advanced, the most sophisticated AI models have their struggle, that's like a big red flashing lights telling you that we're missing something, that new ideas are needed. One thing I want to keep in mind is that ARC is not going to tell you whether a system is already the AGR now. That's not its purpose. Arc is really a tool to direct the attention of the research community towards what we see as the most important unsolved bottlenecks on the way to AI. So Arc is not the destination.
Starting point is 00:14:01 And solving Arc is not the goal. Arc is really just an arrow pointing in the right direction. And Arc has completely resisted the pre-training scaling paradigm. Even after a 50,000 X scale up of pre-trained basal alarms, their performance Ack stayed near zero. So we can decisively conclude that fluid intelligence does not emerge from scaling up pre-training. You absolutely need test adaptation in order to demonstrate genuine fluid intelligence. And importantly, when the arrival of test-andaptation happened last year, Arc was really
Starting point is 00:14:36 the only benchmark at the time that provided a clear signal about the profound shift that was happening. Other benchmarks were saturated. So they could not distinguish between a true IQ increase and just brute force scaling. So now you see this graph and you're probably asking, well, clearly at this point, arc one is also saturating. So does that mean we have human level AI now?
Starting point is 00:15:00 Well, not yet. What you see on this graph is that arc one was a binary test. It was a minimal reproduction of fluid intelligence. So it only really gives you two possible modes. Either you have no fluid intelligence, in which case you will score near zero, like basal alarms, or you have non-zero fluid intelligence,
Starting point is 00:15:22 in which case you will instantly score very high, like the O3 model from OpenEI, for instance. And of course, every one of you in this room would score within noise distance of 400%. So arc saturates, arc one saturates where below human level fluid intelligence. And so now we are in, in need of a better tool, a more sensitive tool,
Starting point is 00:15:48 that would provide more useful bandwidths and better comparison with human intelligence. And that tool is ARC-G-I-2, which released in March this year. So back in 2019, Arc 1 was meant to challenge the deep learning pattern, where models are big parametric curves used for static inference. And today, Arc 2 challenges reasoning systems.
Starting point is 00:16:11 It challenges the test-sadaptation pattern. The benchmark format is still the same. There's a much greater focus on probing compositional gyrization. So the tasks are still very feasible for humans, but they're much more sophisticated. And as a result, arc two is not easily bruteforceable. In practice, what this means is that in arc one, for many tasks, you could just look at it and instantly see the solution,
Starting point is 00:16:37 without I think to think too much about it. With arc two, all tasks require some level of deliberate thinking. But they still remain very feasible for humans. And we noticed because we tested 400 people firsthand in person in San Diego over several days. And we are not talking about people who have physics PhDs here. We recruited random folks, Uber drivers, UCDS students, people who are unemployed. So basically anyone trying to make some money on the side.
Starting point is 00:17:10 And all tasks in Arc 2 were sold by at least two other people that saw. it and each task was seen on average by about seven people. And so what that tells you is that a group of 10 random people with majority voting would score 100% on Arc 2. So we know these tasks are completely doable by regular folks with no prior training. So how well do AI models do? Well if you take Basel Alams, models like GPD 4.5, Lama 4, it's simple. they get 0%.
Starting point is 00:17:43 There is simply nowhere to do these tasks simply via memorization. Next, if you look at static reasoning systems, so systems that use a single chain of thoughts that they generate for the task, they don't do much better. They do on the order of 1 to 2%. So very much within noise distance of 0.
Starting point is 00:18:02 So what it tells you is that to solve arc 2, you really need test sound adaptation. All systems that do meaningfully above 0 are using TTI. But even then, they're still far below human level. So compared to Arc 1, Arc 2 enables much more granular evaluation of DTS systems, systems like O3 for instance. And that's where you see that O3 and other systems like us are still not yet quite human
Starting point is 00:18:28 level. And in my view, as long as it's easy to come up with tasks that any one of you can do that are easy for humans, but that AI cannot figure out, no matter how much computer just right it, we don't have a GI yet. And you will know that we are close to having a GI when it becomes increasingly difficult to come up with the SCALS. We are clearly not there yet. And to be clear, I don't think ARC 2 is the final test.
Starting point is 00:18:54 We're not going to stop at ARC 2. We've started development on ARC-AGI-3. And ARC 3 is a significant departure from the input output per format of ARC 1 and 2. We are assessing agency, the ability to explore, to learn interactively, to set goals, achieve goals autonomously. So your AI is dropped into a brand new environment
Starting point is 00:19:20 where it doesn't know what the controls do. It doesn't know what the goal is. It doesn't know whether the gameplay mechanics are. It does to figure out everything on the fly, starting with what is it even supposed to do in the game. And every single game is entirely unique. They're all built on top of core knowledge priors only, just like in Arc 1 and 2.
Starting point is 00:19:43 So we'll have hundreds of interactive reasoning tasks like this one. And efficiency is central to the design of Arc 3. So models won't just be graded on whether they can solve a task, but on how efficiently they solve it. And we are establishing a strict limit of the number of actions that a model can take. And we are targeting the same level of action efficiency
Starting point is 00:20:07 as we observe in human. So we're going to launch this in early 2020. early 2026 and next month in July, we're gonna release a developer preview so you can start playing with it. What's it gonna take to solve Arc 2, and we're still very far from it today, then solve Arc 3, and we're even further away from that.
Starting point is 00:20:26 Maybe in the future, solve Arc 4, eventually get to a GI. What are we still missing? So I've said that intelligence is the efficiency with which you operationalize the past to face a constantly changing future. But of course, if the future you face had really nothing in common with the past, no common ground with anything you've seen before,
Starting point is 00:20:49 you could not make sense of it, no matter how intelligent you were. But here's the thing. Nothing is ever truly novel. The universe around you is made of many different things that are all similar to each other, like one tree is similar to another tree, is also similar to your neuron,
Starting point is 00:21:05 or electromagnetism is similar to hydrodynametism, similar to hydrodynamics, is also similar to gravity. So we are surrounded by isomorphisms. I call this the kaleidoscope hypothesis. Our experience of the world seems to feature a never-ending novelty and complexity, but the number of unique atoms of meaning that you need to describe it is actually very small, and everything around you is a recombination of these atoms. And intelligence is the ability to mine your experience, to identify.
Starting point is 00:21:37 to identify these atoms of meaning that can be reused across many different situations, across many different tasks. And this involves identifying invariance, structure, things that seem to be repeated principles. And these building blocks, these atoms, are called abstractions. And whenever you encounter a new situation, you're going to make sense of it by recombining on the fly abstractions from your collection to create a brand new model. that's adapted to the situation. So implementing intelligence is going to have two key parts.
Starting point is 00:22:15 First, there's abstraction acquisition. You want to be able to efficiently extract reusable abstract from your past experience, from a feed of data, for instance. And then there's on-the-fly recombination. You want to be able to efficiently select and recombine these building blocks into models that are fits for the current situation. And the emphasis on efficiency here is crucial.
Starting point is 00:22:42 How intelligent you are is not just determined by whether you can do something, is determined by how efficiently you can acquire good abstractions from real experience, how efficiently you can recombine them to navigate novelty. So if you need hundreds of thousands of hours to acquire a simple skill, you're not very intelligent. Or if you need to animate every single move on the chessboard to find find the best move, you're not very intelligent. So intelligence is not just demonstrating high skill,
Starting point is 00:23:13 it's really the efficiency with which you acquire and deploy these skills. It's both data efficiency and compute efficiency. And at this point, you start to see why, simply making our AI models bigger, entering them on more data, didn't automatically lead to a GI. We are missing a couple of things.
Starting point is 00:23:32 First, these models lacked the ability to do on-the-fly recombination. So at training time, they were learning a lot. They were acquiring many useful abstractions. But then at test time, they were completely static. You could only use them to fetch and apply a pre-recorded templates. And that is a critical problem that test standardation is addressing. TTA adds on-the-fly recombination capabilities to our AI.
Starting point is 00:23:59 And that's actually, that's a huge step forward that gets much, much closer to a GI. That's not the only problem, Recombination is not the only thing missing. The other problem is that these models are still incredibly inefficient. If you take gradient descent, for instance, gradient descent requires vast amounts of data to distill simple abstractions, many orders of magnitude more data than what humans need,
Starting point is 00:24:25 roughly three to four others of magnitude more. And if you look at recombination efficiency, even the latest set of the RCTA techniques, I still need thousands of dollars of computers to solve arc one at human level. And that doesn't even scale to arc two. And the fundamental issue here is that deep learning models are missing compositional generalization.
Starting point is 00:24:51 And that's the thing that arc two is trying to measure. And the reason why is that there's more than one kind of abstraction. And this is really important. I said that intelligence is about mining abstractions from data and then recombining them. There's really two kinds of abstraction. There's type one and type two. They're pretty similar to each other.
Starting point is 00:25:11 They mirror each other. So both are about comparing things, comparing instances, and merging individual instances into common templates by eliminating certain details about the instances. So basically, you take a bunch of things, you compare them, you drop the details that don't matter, and what you're left with is an abstraction. And the key difference between the two
Starting point is 00:25:35 is that one operates over a continuous domain and the other operates over a discrete domain. So type 1 or value-centric abstraction is about comparing things via a continuous distance function. And that's the kind of abstraction that's behind perception, pattern cognition, intuition, and also of course modern machine learning. And type 2 or program-centric abstraction
Starting point is 00:25:59 is about comparing discrete programs, which is to say graphs. And instead of trying to compute this, between them, you're going to be looking for exact structure matching. You're going to be looking for exact isomorphisms, subgraph isomorphisms. And this is what underlying much of human reasoning. It's also what software engineers do when they are refactoring some code. So if you hear a software engineer talk about abstraction, they mean this kind of abstraction.
Starting point is 00:26:27 So two kinds of abstraction, both driven by analogy making, either value analogy or program analogy, And all cognition arises from a combination of these two forms abstraction. You can remember them that the left brain versus right brain metaphor, one half for perception, intuition, and the half for reasoning, planning, rigor. And transformers are greats at type 1 abstraction. They can do everything that type 1 is effective for. Perception, intuition, pattern, cognition, they all work well. So in that sense, transformers are a major breakthrough in AIR.
Starting point is 00:27:04 but they're still not a good fit for type 2. And this is why you will struggle to train one of these models to do very simple type 2 things like sorting a list or adding digits provided as a sequence of tokens. So how are we going to get to type 2? You have to leverage discrete program search as opposed to purely manipulating continuous interpolative and living spaces
Starting point is 00:27:29 learn with gradient descent. Search is what unlocks invention beyond just automation. All known AI systems today that are capable of some kind of invention, some kind of creativity, they rely on discrete search. Even back in the 90s, we were already using gigantic search to come up with new antenna designs. Or you can take AlphaGo with MOV 37. That was discrete search. Or more recently, the Alpha evolved system from DeepMind, all discrete search systems. So deep learning doesn't invent, but search does. So what's discrete program search?
Starting point is 00:28:07 It's basically combinator search over graphs of operators taken from some language, some DSM. And to better understand it, you can try to draw an analogy between program synthesis and the machine learning techniques you already know about. In machine learning, your model is a differentiable parametric function, so it's a curve. In program synthesis, it's going to be a discrete graph, a graph of all. orps, symbolic ops from some language. In ML, your learning engine, the way you create models,
Starting point is 00:28:41 is gradient descent, which is very compute efficient, by the way. Gradient descent will let you find a model that fits the data very quickly, very efficiently. In program synthesis, the learning engine is search, it's commensatory search, which is extremely compute and efficient, obviously. In machine learning, the key obstacle that you run into
Starting point is 00:29:00 is data density. In order to fit a model, you need a dense sampling, of the data manifolds. You need a lot of data. And program synthesis is the exact reverse. Program synthesis is extremely data efficient. You can fit a program using only two or three examples. But in order to find that program,
Starting point is 00:29:19 you have to sift through a vast space of potential programs. And the size of that space grows collaterally with problem complexities. You run into this combinator explosion wall. I said earlier that intelligence is a combination, of two forms of abstraction, type 1 and type 2. And I really don't think that you're going to go very far. If you go all in on just one of them,
Starting point is 00:29:43 like all in on type 1 or all in on type 2, I think that if you want to really unlock their potential, you have to combine them together. And that's what human intelligence is really good at. That's really what makes us special. We combine perception and intuition together with explicit step-by-step reasoning. We combine both forms of abstraction in all of the same way.
Starting point is 00:30:03 in all our thoughts, all our actions everywhere. For instance, when you're playing chess, you're using type 2 when you calculate, when you unfold some potential moves step by step in your mind. But you're not going to do this for every possible move, of course, because there are too many of them, right? You're only going to be doing it for a couple of different options,
Starting point is 00:30:24 right? Like here you're going to look at the knight, the queen, and the way you narrow down these options is via intuition, is via pattern cognition on the board. And you build that up very much true experience, right? You've mined your past experience and consciously to extract these patterns. And that's very much type 1. So you're using type 1 intuition to make type 2 calculation tractable. So how is the merger between type 1 and type 2 going to work?
Starting point is 00:30:49 Well, the key system 2 technique is discrete search over a space of program. And the blocker that you run into is cumulative explosion. And meanwhile, the key system 1 technique is curve-fitting and interpolation on the curve. So you take a lot of data, you embed it on some kind of interpolity manifold that enables fast but approximate judgment calls about the target space. And the big idea is going to be to leverage this fast but approximate judgment calls to fight committal explosion and make program search tractable. A simple analogy to understand this would be drawing a map.
Starting point is 00:31:27 So you take a space of discrete objects with discrete relationships that would normally require a clinical search, like past finding on a subway system, for instance, and you embed these objects into a latent space where you can use a continuous distance function to make fast but approximate guesses about discrete relationships. And this enables you to keep cognitive explosion in check while doing search. And this is what the full picture looks like.
Starting point is 00:31:56 This is the system that we are currently working on. AI is going to move towards systems that are more like programmers that approach a new task by writing software for it. And when faced with a new task, your programmer like meta-learner will synthesize on the fly a program or model that is adapted to the task.
Starting point is 00:32:16 And this program will blend deep learning submodules for type 1 sub-problems, like perception, for instance, and algorithmic modules for type 2 sub-problems. And these models are going to be assembled by a discrete program searchist that is guided by deep learning-based intuition about the structure of program space. And this search process isn't done from scratch.
Starting point is 00:32:40 It's going to leverage a global library of free-usable building blocks of abstractions. And that library is constantly evolving as it's learning from incoming tasks. So when a new problem appears, the system is going to search through this library for relevant building blocks. And whenever in the course of solving a new problem,
Starting point is 00:33:00 you're synthesizing a new building you're going to be uploading it back to the library, much like as a software engineer, if you develop a useful library for your own work, you're going to put it on GitHub so that other people can reuse it. And the ultimate goal here is to have an AI that can face a completely new situation, and it's going to use its rich abstraction library
Starting point is 00:33:21 to quickly assemble a working model, much like a human software engineer can quickly create a piece of software to have a new problem by leveraging existing tools, existing libraries. And this AI is going to keep improving itself over time, both by expanding its library of abstractions and also by refining its intuition about the structure of program space. This system is what you are building at India, on your research lab.
Starting point is 00:33:49 We started India because we believe that in order to dramatically accelerate scientific progress, we need AI that's capable of independent invention and discovery. We need AI that could expand the future frontiers of knowledge, not just operate within them. And we really believe that a new form of VR is going to be key to this acceleration. Deep learning is great at automation. It's incredibly powerful for automation, but scientific discovery requires something more. And our approach at Endia is to leverage deep learning guided program search to build this
Starting point is 00:34:25 programmer like meta-learner. And to test our progress, our first milestone is going to do. going to be to solve RKGI using a system that says that knowing nothing at all about RKGI. And you ultimately want to leverage our system for science to empower human researchers and help accelerate the timeline of science.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.