a16z Podcast - How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era

Episode Date: January 16, 2026

The Stanford PhD who built DSPy thought he was just creating better prompts—until he realized he'd accidentally invented a new paradigm that makes LLMs actually programmable. While everyone obsesse...s over whether LLMs will get us to AGI, Omar Khattab is solving a more urgent problem: the gap between what you want AI to do and your ability to tell it, the absence of a real programming language for intent. He argues the entire field has been approaching this backwards, treating natural language prompts as the interface when we actually need something between imperative code and pure English, and the implications could determine whether AI systems remain unpredictable black boxes or become the reliable infrastructure layer everyone's betting on. Follow Omar Khattab on X: https://x.com/lateinteractionFollow Martin Casado on X: https://x.com/martin_casadoCheck out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 Nobody wants intelligence, period. I want something else, right? And that something else is always specific, or at least more specific. There is this kind of observed phenomenon where if you over-engineer intelligence, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better. And a lot of the hard-coded decisions you made are things you end up regretting. So I think it's fair to assume that, like, models will get better and algorithms will get better,
Starting point is 00:00:27 and a lot of that stuff will improve. Then the question we really ask is, intelligence is great, but what problems are you actually trying to solve? That idea that scaling model parameters and scaling just pre-training data is all you need, exists nowhere anymore. Nobody thinks that. Actually, people deny they ever thought that at this point. Now you see this massively human-designed and very carefully constructed pipelines for post-training, where we really encode a lot of the things we want to do.
Starting point is 00:00:51 You see massive emphasis on retrieval and web search and tool use and agent training. There is clearly a sense in which the labs have already recognized that the overall, old playbook doesn't work. The question is, is that actually sufficient for making the best use and the most use of these language models? It's not a problem of capabilities. It's a problem of actually we don't necessarily just need models. We want systems. The conventional wisdom says we're racing toward EGI by making language models bigger and bigger. But what if the entire framing is wrong? On today's episode, you'll hear from A16Z general partner, Martine Casado, and guest Omar Khatab, assistant professor at MIT and creator of DS Pi. Omar doesn't think we need artificial
Starting point is 00:01:32 general intelligence. He thinks we need artificial programmable intelligence, and the difference matters more than you think. Here's the paradox. Katab has built one of the most widely used frameworks for working with LLMs, DS Pi, but he's skeptical that raw model capabilities will solve our problems. While others obsess over scaling laws and parameter accounts, he's asking a more fundamental question. Even if models become infinitely scalable, infinitely capable, how do humans actually specify what they want? Natural language is too ambiguous, code is too rigid. We need something in between, a new abstraction layer that lets us to clear intent without drowning implementation details. Think of it as a jump from assembly to sea, but for AI systems.
Starting point is 00:02:15 The stakes are higher than prompt engineering. This is about whether AI becomes a programmable tool we can reason about and compose, or just an inscrutable oracle we prompt and pray. We get into the three irreducible pieces of an AI system, why the model god is a dead end and what it actually means to build software when intelligence is cheap, but the specification is hard. Well, listen, Omar is great to have you,
Starting point is 00:02:39 and congratulations on everything. Just so for everybody that's listening, Omar is doing some, in my opinion, of the more interesting technical work in building frameworks around LLMs and models. And a lot of this has consequences on things like, like, you know, AGI and capabilities and everything else. And a lot of, like your comments on social media to me, have been kind of some of the most insightful.
Starting point is 00:03:00 So I've been really looking forward to having you on the podcast. Thank you for seeing me, Martin. And many of you go to chat as well. Awesome. So listen, maybe let's just start with your background, you know, since we have some shared roots, and then we'll go from there to a general conversation. So, I mean, I'm now an assistant professor at MIT. I started a few months ago in electrical engineering and computer science and part of C-Sail.
Starting point is 00:03:22 I did my PhD at Stanford where I think the timing was really interesting. I started in 2019 and I graduated about a year ago. That timing was really great because foundation models as a concept, didn't even necessarily have that name. We hadn't coined it at Stanford yet. Was starting to take shape. You know, Burke was around for about a year at the time. But people were sort of hadn't really figured out how to make them work. But I would say as importantly, how to make use of them to build different types of systems and applications. which is basically what I did throughout my whole PhD.
Starting point is 00:03:55 So I mean, you're the, I presume the primary person behind DSPY. Is that correct? That's, you could say that, yeah. Yeah, yeah. So for those that you don't know, DSPY is widely, why the users we're going to be talking about it. That's one of the most widely used, I would say open source projects around prompt optimization for LLMs. So maybe let's just go ahead and start. You know, you have tweeted, you know, about, you know, whether LLMs will get to AGI or
Starting point is 00:04:18 non. I know it's a kind of very fluffy, a high-level place to start, but we'd love to your thoughts thoughts on, you know, are we headed towards AGI in the near term? Is this an apt goal? Like, where do you weigh down? And it is particularly timely right now, given the conversation that Andre Kapathi just had the Dorkish podcast where he was like, well, you know, maybe 10 years if you're optimistic. What do you weigh in on this to be? So, I mean, I think, honestly, it's a surprising position because I feel like I'm not sure, but I'm less sort of, say, bearish than Carpathie necessarily.
Starting point is 00:04:53 Oh, you are? Yeah, which is very surprising. You're less bearish than Carpati on AGI. Right, which is very strange to me. Let me tell you what I think. So back when I started my PhD, basically you can look at a lot of sort of the work that we've done, you know, with my advisors and collaborators and others over the past six years or so
Starting point is 00:05:11 as pushing back on this perspective that scaling model size and maybe doing a little bit more pre-training, and, you know, especially at the time, it really was about model size. and just sort of doing more uniform scaling of that nature is just going to solve all of your problems. And the pushback has to, you know, has two sides. One side is this is an incredibly inefficient way to build capabilities that you care about. If you know what you want,
Starting point is 00:05:36 that's just waiting for everything to emerge is just incredibly inefficient and the diminishing returns just speak for themselves. The other problem is really a problem of specification or like abstractions. Scaling language models makes this I think realistic bet that anything people want to build with these models is just a few keywords away or a few words away
Starting point is 00:05:59 and that people know how to actually think of what these words should be. I think it's an incredibly limiting abstraction. But the reason I'm less bearish than maybe Carpathie sounded, although again, I'm not really sure, is, you know, I mean, I think we're seeing very rapid,
Starting point is 00:06:15 I would say improvement in the perspective that we see out of the frontier laps, like that idea that scaling model parameters and scaling, you know, just pre-training data is all you need, exists nowhere anymore. Nobody thinks that. Actually, people deny they ever thought that at this point. And now you see this massively human designed
Starting point is 00:06:35 and very carefully sort of constructed pipelines for post-training where, like, we really encode a lot of the things we want to do. You see massive emphasis on retrieval and web search and tool use and sort of agent training. And you see all of this emphasis on. on, you know, opening eye at their latest thing was building this agent builder, and they have products like codex and others. So there is clearly a sense in which the labs have already recognized that the old playbook doesn't work and that they are actually, or at least is not like complete.
Starting point is 00:07:02 And so if by AGI, we just mean this thing that, you know, at a very large set of problems, you can ask it sort of problems. And as long as you give it enough context, it's able to handle them, you know, the models are increasingly powerful and reliable. The question is, is that actually sufficient for making the best use and the most use of these language models. And I think that's where my fundamental pushback doesn't go anywhere because I think the problem is just, it's not a problem of capabilities. It's a problem of actually we don't necessarily just need models. We want systems.
Starting point is 00:07:35 And I can speak a little more about that. Yeah, so it's actually one. Yeah. So it's just a little bit. So there is a view of the world that like kind of like some variant of the, you know, transformer architecture. is going to get us there. And then the end-to-end argument kind of suggests that,
Starting point is 00:07:55 you know, you put all the data into one model, and you have one model that will just become, you know, so good because Skittling Laws Hole that it solves all of reasoning, right? That's kind of this, you know, absoluteest end-to-in argument. I think nobody believes that anymore anyway. I think people do in video, maybe not in LLLLNs, but in video, I think a lot of people are like, listen, there will be one video model that you put everything in
Starting point is 00:08:17 It does everything. It does 3D. It does physics. It does whatever. So maybe at LLMs, you know, people don't believe that anymore because they're for long
Starting point is 00:08:24 to suggest it's not true. There's another view, which is like, LLMs are totally a dead end. You know, what in Carpapie called them ghosts, which I thought it was so beautiful, which is,
Starting point is 00:08:33 you know, they can kind of, you know, do some sort of linear interpolation of stuff that they've heard in the past, but like, they can't do planning.
Starting point is 00:08:41 And so you need an entirely new architecture. And you're saying that you're not in that camp of getting an entirely new architecture. It depends, because I've been arguing for a different architecture for years, but that different architecture is built around having these models. No, no, 100%. Yeah, that was the third of what I was going to say. So the first one is like one model rules them all.
Starting point is 00:08:58 The second one is this is the wrong path, and there is no kind of system you could build with these models. You know, you've got to do something totally different, right? I would say like Jan Lecun would say that with Jepa or whatever. Like you need to do something fundamentally different. And then you're in this third spot, which is you can build some sort of system with these models and you can get to, I mean, AGI is such a loose word,
Starting point is 00:09:21 but you can actually get to what we're trying to achieve, which is pretty generalized intelligence to tackle any sort of problems. Is that a fair characterization? I think so. I mean, I think you could think of it as, I think AGI is fairly irrelevant. Like, it's not the thing I'm interested in.
Starting point is 00:09:34 I'm interested, I joke sometimes, I'm interested in API or artificial programmable intelligence. And the reason I say this is, why are we building AI? Why are we building seeking to build AGI? I think we're not, And you can take a step back and ask, you know, well, maybe it's a scientific question or maybe it's just like a dream people have. But I think fundamentally, it's in my opinion a way of improving and expanding the set of software systems we can build or just systems we can build in the world.
Starting point is 00:10:03 And if you think about why people build systems, software systems as an example, but really any engineering endeavor, it's not so much so that it's not really about that we lack general intelligences. There are a billion general intelligences out there. They're 8 billion people. We build the systems because we want them to be processes that are reliable, that are interpretable, that are, you know, easy, you know, that we can iterate on, that are modular, that we can study, that we can, right? So there is all these properties that are scalable and efficient. There is a reason we care about systems.
Starting point is 00:10:34 And that is not like, you know, it's not that we lack intelligence. So the question that I think is most important is, how do we build programmable intelligences? And I think the alignment folks get some of this right. You could have a very powerful model that doesn't listen to what you say. And a lot of pre-trained models could be perceived that way. You know, they have a lot of latent capabilities, presumably. And the question is, you know, could you make it do what you want? But I think what alignment fails to do, at least as a general sort of way of thinking,
Starting point is 00:11:00 is it sort of omits to think about, well, what is actually the shape of the intent that people want to communicate to these models? How can I get people to actually express what it is that they want to happen? And with that bottleneck being, you know, as narrow and tight as it is, it's not a question of are the models capable enough or not. So that's what I'm saying. I might be even less bearish than Carpathy about whether the models will get so good such that given all the right context and the right instructions and the right tools and the right,
Starting point is 00:11:30 they become. I see. Yeah. Yeah, like maybe. I think this is very aligned. Again, we don't, you know, not to refer to another discussion that's not on here, but just in general, a few more take issue with the definition. of AGI as being like the same thing as an animal or a human,
Starting point is 00:11:48 which is that's not actually particularly interesting, given a bunch of animals and some humans, but like we actually want smarter software systems. And then you think a comp, like a systems based approach to models is the right way to get there. Is that fair enough? So like it's not going to be one model. It's going to be said,
Starting point is 00:12:06 then can you maybe roughly sketch out what you think is the right way to build a system to do this? What are the components that are meaningfulness? So I would say like the first inspiring sort of concept here or like the starting point for this conversation is, look, to be honest, I have no idea what the capabilities of the models, the core capabilities of the models will be today, tomorrow, in a year, in 10 years. I just don't really know. And I, you know, I'm invested in getting a sense of how that will happen and progress.
Starting point is 00:12:34 And I think like it's kind of, it's easy to sort of like model different paths based on how you think the progress that's been happening has been happening. But in any case, like, there is kind of a bitter lesson to keep in mind, and I don't mean necessarily Rich Sutton's own interpretation of his great essay. I just mean like it is true that there is this kind of observed phenomenon where, you know, if you over engineer intelligence in AI, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better. And a lot of the hard-coded decisions you made are things you end up regretting. So I think it's fair to assume that like models will get better and algorithms will get better and a lot of that stuff will improve. And then the question we really ask is, well, you know, intelligence is great, but what problems are you actually trying to solve? Like what is the, you know, what is the application that you want to improve?
Starting point is 00:13:28 Are you trying to, I don't know, approach helping doctors do medicine? Are you trying to improve doing certain types of research, you know, fewer cancer maybe are you trying to build the next. codex or cursor or, you know, one of these types of coding applications, are you building, you know, so the question is like, what are you actually trying to solve? And I would argue that intelligence is this really amazing, powerful concept, but precisely because it's a foundation for a lot of applications. And sort of the analogy I like to draw here is improvements in chip manufacturing and increasing numbers of transistors in, in sort of CPUs. Nobody thinks that more general purpose and more powerful general purpose computers
Starting point is 00:14:07 make software obsolete or make us forget about systems. The thing you think about is they make software possible, but you kind of need to have a stack. So back to your question, what should the stack look like? I think the first thing we need to agree on is like, what is the language, so to speak?
Starting point is 00:14:20 What is the medium of encoding of intent and of structure with which we can specify our systems to that computational? So could we approach exactly this question? I just have a line of inquiry on exactly this question, which is like, What is the right language to specify? So I love you to tell me why this is the wrong approach.
Starting point is 00:14:42 So let's assume I'm an advocate for the God model, right? So models just keep getting better. There's one model that keeps getting better. Let's say that my task is software. And I want to build, do you know the game, Core Wars? What is it cool? Four War. Okay, this is a very old hacker game from like the 1970s where like you would write software
Starting point is 00:15:03 and that would try to kill each other. So let's say here, so I want to build an online multiplayer version of Core War. So what is wrong with the following approach? I have a prompt that says, I want to build a multiplayer version of Core War that's online, and that's my prompt, and then I just sit and I just wait for models to get better.
Starting point is 00:15:23 Why is that not the right approach to this? Wait, so actually, so something about what you said is great. What you said is, you just said it, you express the thing you want, and you were so lucky that the thing you wanted, was easy enough to express. Like, you know, you're assuming that the speaker, you know, in this abstract hypothetical scenario is being honest,
Starting point is 00:15:41 what they want is to build that particular software you mentioned. They fully specified it in a single sentence, and they're not doing anything else. They're waiting for the models to get better. The only issue I have with this, by the way, is that, well, I don't know how long you're going to wait, but if you're comfortable with it, that actually, that's endorsed by me. The problem is, as you're probably actually trying to hint at maybe,
Starting point is 00:15:59 is, well, most things people want, especially most things that don't exist yet, There is no five-word statement that even the best intelligence in the world is going to do for you. And there is really an untrivial amount of alignment-ish element. It's such a loaded, it's such an important statement for this discussion, which is like there is no, say, simple way to describe what you want. There's multiple ways to interpret why that is. One of them is, well, yeah, one of them is, I don't know what I want. Another is my wants are complex, so I want to use a lot of words.
Starting point is 00:16:36 And another one is there's actually fundamental tradeoffs. So like my wants would be ambiguous. Are you talking about all three of those or? Yeah, I'm talking about all three. If when it comes to like actually getting people to express what they want from a system, I mean, the kind of the premise I start from is people want systems. Nobody wants intelligence, period. I think this is a great, this is such a great point.
Starting point is 00:16:58 I actually really like how you said that. I hadn't thought about it that way. I don't want better GPUs, right? I want maybe a neural network. I want something else, right? And that's something else is always specific. Or at least more specific. And so the question is,
Starting point is 00:17:10 what is the number of things people can want? So if the vision of AGI, and by the way, this, like, the reason I said, like, I'm pretty, I'm not necessarily super pessimistic in practice is the frontier AI labs have been, like, they kind of tackle them one at the time, but they've had a track record enough for me
Starting point is 00:17:25 to be reasonable enough to like, when they reach a bottleneck, they kind of go to the next thing and then unblocked themselves. Sure. Right. So that's great. But at some point,
Starting point is 00:17:34 like, there's a view of AGI, which is GPD3, the original GPD3, but scaled one million times or one billion times and you get, you know, a GPD 10.
Starting point is 00:17:44 And there's that GPT 10 that you go to and in order to build, you know, a complex system, sorry, in order to not build a system, right, in order to just treat it
Starting point is 00:17:52 as the end user-facing system. Every time you go, you juggle your context and you juggle your prompting, which might, you know, maybe because the model is so good, it might not be, the prompting part might not be that hard. And you like, you ask it from scratch every time or some ridiculous thing like that. And I think in the grand scheme of things, people are slowly realizing, obviously that's not what you want. And so this is the argument for systems is that it's just all of this
Starting point is 00:18:14 decision making that happens in making a concrete application or product or saying that encodes, taste and knowledge about the world and also knowledge about human preferences or some substrate of a complete story that you want. And it kind of systematizes it encodes it, makes it, makes it sort of maintainable and portable and modular. Because all this stuff that we like to have in building systems, and the moment you start thinking that way, you don't want that to be like a blur, like a string blurb. So let's, I mean, I don't want to get too philosophical,
Starting point is 00:18:47 but for me, this always begs this very interesting question. So let's just take what you're saying at face value, which is I have a lot of complex wants, and those shift over time. And so, like, a string will never encapsulate it. And so, you know, I want to say a whole bunch of stuff and, like, maybe pull some context in. But it could be the case that these models are so powerful that I just start to abdicate want. When you ever think about that? I'll just want less.
Starting point is 00:19:11 I'll be like, I want whatever the model gives me. Do you think that there's any direction in the future where, like, we just are less picky about our actual wants? And we do converge to, like, these high-level things? Or are you really convicted that? No, this is totally possible. I mean, recommendation algorithms versus like search algorithms. Recommendation algorithms are like, give me what I want. Yeah.
Starting point is 00:19:30 So literally like my universal prop that I could just like, I can just go to the beach. And every time there's like a new model, I just go use it on that new model, rather than building a complex system would be give me what I want right now. Right. And, you know, over time that model can train you like a recommendation feed, right? Like you just open the 4U tab. Exactly. That's right.
Starting point is 00:19:51 What it gives you. But I mean, hope it doesn't, you know. I mean, but that requires such a fundamental. That's a choice we can make. And a different choice we can make is, well, actually, no, we do care about, you know, building systems and encoding knowledge into them. One thing that's been growing on me for a while is kind of, to make this slightly less philosophical, although maybe not much, you know, the idea in machine learning that, you know,
Starting point is 00:20:14 like, it's kind of a fundamental and old and known idea, but that there is no free lunch. And there's like a lot of interpretations of the same theory. Like, the theory is true. it's a mathematical statement, and it basically just says, if you assume nothing about the world, all learning algorithms you can build and all learners you build are equally bad, pretty much. And once you sort of understand the mathematical version of that, it's kind of, it's almost a, it's a really simple statement. And what that really means, and I think like it's a, it's something that comes time and,
Starting point is 00:20:43 you know, time and time again, is that something fundamental about intelligence, as we call it, is actually about knowing our world and knowing what, because we're humans, what humans are like and what humans are interested in. And you know, like, you can't kind of scale your way into that. Now, if humans themselves change their preferences to be simpler, yeah, that's the future that's possible. I actually agree. I think, like, there are, like, sometimes we want real solutions to problems.
Starting point is 00:21:08 There are fundamental tradeoffs. We have to articulate those tradeoffs, right? I mean, the only, like, there is no simplified version of the answer, given what you want to accomplish. So let's assume we're in that world. So I can't go to the beach with my one problem. Instead, I have to like describe. So you've done work on DSPY, which I think is, in my opinion,
Starting point is 00:21:27 is just the most systematic approach to making the prompt more powerful. So maybe you can describe DSPY and how it works and how it addresses this problem. Yeah. Very specifically the problem is like, we've decided that like my one prompt and just waiting for the bottle to get better is not going to be sufficient for whatever reason. So now I need a better way to think about prompting. Yeah. So actually, back to your example, suppose that what you wanted to build was a bit more complex.
Starting point is 00:21:49 So there was more specific. more specification involved. But suppose also that you were in more rush because, again, applications don't, like, they don't want to wait. Nobody, but I'm building a system. I want to use the best, you know, intelligence, so to speak, that exists now. But I do need to progress. I do need to proceed.
Starting point is 00:22:05 And so the question is, what are you going to do? And when I started, you know, one of the hardest things that makes communicating DSPI stuff difficult is we've been doing this, some version or another of this for something like six years. And DSPI itself is three years old. Like, a lot of this is codified
Starting point is 00:22:20 before a lot of the changes in the field. So it kind of makes some of the conversations slightly trickier. But what people did for the longest time. So that's what they did in 2022, when people were thinking with early models. That's what they did in 2023. Only in 2024 did there start to be some slight change to this. But fundamentally, to this day,
Starting point is 00:22:37 the biggest hurdle in using a model is front engineering, which is basically at least my understanding of it and really I think the most canonical understanding of it is changing the way in which you express what you want, such that it evokes the model's capabilities in the right ways. And so this is less about, I would argue, things that are much more timeless and important. And it's really about the belief that there is like a slightly different wording of what you
Starting point is 00:23:04 ask that could get the model to behave a lot better. And the problem is that this is actually true. This is true for the latest. This is why, you know, opening eye on others, you know, and Anthropic and others, they release like even for the latest models, they release prompting guides and why not. And they say, well, you're not holding it right, right? You're not. And they're correct.
Starting point is 00:23:19 But for the most part, the argument that early DSPI was making was... How do you pronounce it? DSPI? Yeah, DSPI, like NNPai or... Oh, I love that. Oh, I always thought DSPI. The argument that we were making was the models keep getting better, but in any case, they keep changing.
Starting point is 00:23:37 And the thing you want to build changes a lot more slowly. I'm not saying it doesn't change, but it is actually a conceptual separation between what you were trying to build and LMs and, you know, V-A vision-like. language models, like that space is basically separate. And so what if we could try to capture your intent in some kind of purer form, you know, and that intent has to go through language. That's why you're trying to do AI, is that there's some inherent under specification and fuzziness. You're trying to defer some decision making. You know, I don't know how this function should exactly behave in every edge case, but please be reasonable is what you're trying to communicate,
Starting point is 00:24:13 right, with these types of programs you're building. So, so ESPI says basically there is a number of ideas that you need and you need them together, which is the thing that I think is a little trickier to a lot of people, you need these five things. There's five bets that DSPI makes, and you need them together and you need to be seamlessly composable. And actually, in order to get 05, you don't need five concepts. You basically fundamentally need one concept. So the idea is we have Python or we have programming languages. These programming languages encode a lot of things that are highly not trivial. First of all, they have control flow. And control flow means that I can get modular pieces really easily because they can define separate functions. These separate functions and modules, the nice
Starting point is 00:24:51 thing about them is that they really give you a bunch of stuff. So they create like a notion of separation of concerns where contracts of different functions can be described without you knowing everything inside the function or caring about everything inside the function. If you trust that it was built properly, you could just invoke it and it does its job and then you can reason about how you can compose these things. But you can also compose algorithms over functions. Like I can have a, you know, a more general sort of, you know, processor or function or something that takes these functions and applies things on top of them that are sort of higher level of concerns. I can refer to variables and objects and mutate them or, you know, pass them around.
Starting point is 00:25:27 When I say, if this, then that, and I really mean it, I don't have to go to the model to reassure it that if it doesn't do this or if it, you know, if it does listen to that if statement, because I really mean it, I will tip it a thousand dollars. You know, and one thing here is this is a really limiting paradigm. Conventional programming is a really limiting paradigm. Why would we want to go back to it? And I think the answer is like all of the things I mentioned now, like all these symbolic benefits from a specification standpoint,
Starting point is 00:25:52 this is not about capabilities, are really hard to encode in natural language. You can reinvent them. You can tell the model, you know, if you see this, then do that. And the model might reasonably say, well, you know, he didn't actually really mean like 100% of a time. I think the reasonable thing this time is an exception, right? Well, you actually can't.
Starting point is 00:26:09 I mean, you actually can't do that with natural language. which is without implicitly for creating a formal language. I mean, the most obvious version of this is ambiguity, right? So the dog brought me the ball and I kicked it. That's fundamentally ambiguous. You don't know if I kicked the dong or if I kicked the ball. And both are totally reasonable depending on the person, right? And so at some level, English doesn't do the job.
Starting point is 00:26:34 Right. So, but programming languages, I agree, right? But programming languages are also really fundamentally limited in that you have to over-specify what you want. 100%. You kind of have to go be above and beyond what you actually want because no ambiguity is allowed. And that forces you to think through things,
Starting point is 00:26:52 you know, maybe you don't even know how to do. Like, how do you write a function that generates good search queries or that, you know, plays a game well or, you know,
Starting point is 00:27:00 it's very difficult to do something. Yeah, I don't want to get too wonkyer because I know where you're going and I just have to say this because it just helps frame this conversation. I mean, by the way, what you said on X,
Starting point is 00:27:08 which when we're getting to really kind of change my brain, which is, so for imperative programming, that's absolutely the case, right? Which is, you need to know everything that possibly happens, right? Or if you don't know, you make it, you're making, like, the language is going to make a very fixed assumption. Yeah, it's going to make some basic assumption, right? So it's almost like if you're managing a state machine, you've got to know every state machine transaction. That's imperative language. Declarative languages are quite different, right?
Starting point is 00:27:32 So in declarative languages, you actually specify what you want formally, right? And then the system kind of figures out how to get to that end state. Right? But the problem is you have to be able to specify every aspect of that end state perfectly, which again, like for some problems is very complex. So that's also limited and you just have to know the end state. Yeah. So now, you know, you're working on DSPI.
Starting point is 00:27:59 And I would love, you know, you talk about how using LLMs with a bit more formalism pushes it to like yet another level. Right. So the only new abstraction in DSI, and it's incredibly simple. It's just this notion of signatures. It's just borrowed from the word for function signatures. Our most fundamental idea is that, which is just so basic and simple, is that interactions with language models in order to build AI software should decompose the role of ambiguity,
Starting point is 00:28:28 should isolate ambiguity into functions. And what do you want to specify a function? Like, how do you declaratively specify a function? I think the first, like, the most fundamental thing is it takes a bunch of objects. They better be typed and, you know, like, they better take, you know, they better have, like, interesting and meaningful names. It does a transformation to them and you get back some structured object, you know, potentially carrying multiple pieces. And when you do this, it's your job to, it's your job. And this is not easy, but it is your job to describe exactly what you want without thinking particularly about the specific model or compositions you're thinking of.
Starting point is 00:29:06 And this is actually a lot harder than it sounds to most people. So for example, you would not, you know, so there is a class of, there is a class of problems for which some people actually write prompts that are almost signatures. So these are cases where you only have one input. Your output is just a response. You're not trying to like, you know, like you basically take a chat bot, you know, because the APIs are usually or the models are structured such that. This is a very natural use case.
Starting point is 00:29:28 And people like, they try to prompt minimally, right? So they don't encode a lot of, you know, they don't say, I don't know, think step by step or you're an agent that's supposed to do this or they just kind of, just say what they want. So there's a class of people that almost implicitly write signatures. But there's something wrong with the fundamental shape
Starting point is 00:29:44 of the API that usually exists. And so signatures are just saying here is a better shape and we made every decision here slightly more carefully. Now, once you have signatures, every other part of DSPI from an abstraction standpoint
Starting point is 00:29:54 falls off of it. There's really nothing else. Once you have signatures, you could ask, I have a function declaration. It's just declaring a function. It doesn't do anything. One of the hardest things about
Starting point is 00:30:03 people wrapping their head about DSPI and signatures is a signature does Apple. absolutely nothing. And it's entirely their job to build it. We actually can't help them at all build the signatures. A lot of the time, people are like, well, couldn't you generate the signature from this or that? The signature is encoding your intent. I know nothing about your intent up front. That's the whole point. Wait, what are this? To be very clear, what are the signatures written in?
Starting point is 00:30:21 I mean, fundamentally, it could be a dragon drop thing. It could be a, but usually, like, it could be whatever. But the point is, it is a Python glass, usually. It's a Python. It's formal. It's formal. It's formal. It's formal. It's not English, right? It's, well, it's a formal structure in which almost every piece is a fuzzy English-based description. So you could say something like, I want a signature that takes a list of documents, and the list of documents is the typed object. You could actually say list of document,
Starting point is 00:30:46 and you have to define what the type document means. And the fact that this type is document, maybe the name matters. Like, a list of documents is not necessarily the same as a list of images, right? There are different things, and they're like semantically and fuzzily different. And basically, like, it says in English,
Starting point is 00:31:02 given these inputs, you have several of them. I want to get these outputs and you have several of them and maybe the order matters. So it's really just like, I argue, it's what a prompt is supposed to be or what a prompt wants to be when it grows up. It really is just a cleaner prompt. Now, if you grant me that, which I argue, like a really small,
Starting point is 00:31:18 it's a very simple contribution. There is really not a lot of richness to this, but that's the point. You get everything else that makes programming great while being able to build really powerful AI systems because you can now isolate your ambiguity at the right joints. You have a notion of where you want the joints to be. And the rest of your systems, the rest of your programs can be very modular. You can have multiple signatures.
Starting point is 00:31:37 So now you get what people call multi-agent systems. Multi-agent systems are just AI programs in which you have multiple functions. You know, it's not really that. It's really nothing. It's not really a complicated idea once you take this. You get things like inference time strategies. People are like chain of thought, you know, you have to write your prompt in this way, or we have to train the model in a certain way, or react agents or, you know,
Starting point is 00:31:55 program of thought. We recently released this thing called recursive language models. You know, the thing is when you're solving a task, none of these inference strategies should be of your concern. It's if when you wanted to, you know, like this is just a thing that should be compositional. And signatures have the shapes that were like, we can use programmatic sort of constructs
Starting point is 00:32:13 to compose over these types of, you know, constructs. Do you, when you think of, you know, DSPI, when you're originally creating it and now as you think about it, do you think about it is something that will fundamentally only be consumed by humans or for humans? Not at all, no. I can imagine, I can imagine cases. where you bridge the gap.
Starting point is 00:32:34 And the reason I ask is there's this obvious question is if the interface LMS is going to be all automated anyways, do we need to enforce these restrictions that are primarily to keep natural language speakers within certain boundaries as opposed to, you know, like whatever, and it's an agent calling it, we may not need to do that. So I think it's just, I think the argument in DSI is intent should be expressed in its most natural form. So that's the declarative part. And the second part is, unfortunately, or fortunately, in the general case, that cannot be reduced below three forms. Some things are really best express this code. And no amount of automation can remove that. There's no amount of automation
Starting point is 00:33:15 that can remove the fact that I actually want to think about three pieces because they're separate to me and I want to maintain them separately. No amount of automation is going to remove the natural language piece. Nobody wants to write Python to describe a really complicated AI system like from scratch. And no amount of automation is going to remove the fact that for some classes of problems, you really need a more RL-like standpoint where you have a distribution of initial states or inputs and you have a way of judging them or like metadata about what correctness looks like because that really captures the wonky and sort of like exceptional long-tail
Starting point is 00:33:47 set of problems that actually vary by implementation or by model. Yeah, you may also just want diversity. Give me something that may solve this problem. right? Like, it may just be that, like, there is no formal specification. Yeah, totally. Right, right. So there is a machine learning, like, people associate DSPI a lot with the one that is most different to what they usually see, which is optimization. So a lot of new users and a lot of people that look at the paradigm and try to critique it conceptually, they miss the fact that you have to have these three pieces or, like, in the general case, you can't get away without any. Now, by the way,
Starting point is 00:34:20 there's a lot of applications where you do not need all three. If you're building yet another a RAG app and the model has been post-trained to death to take a context and answer a question about it. You don't really need a lot of, you know, a lot of that to express your intent because just close what the model is good at. Anyway, a lot of people associate DSPI with the third one, which is the database optimization. And actually a lot of well-intending users would write overly simplistic and general programs and try to distill their intent through data or through kind of this process of trial and error. And that's a really like bad sort of, it's like a misuse of the power of the models and the power of the paradigm, because if you know what you want,
Starting point is 00:34:57 nothing can express it better than just you saying what you want. The database of optimization is there to smoothen the rough edges. It's for you not to have to maintain laundry lists of exceptions. I'll wrap this up quickly, Martin. The other part of the ESPI is the reason we built all of these abstractions, and we haven't been changing them. This has been the case for these abstractions are basically three years old. They've basically not been changing. And what we spend all of our, a lot of our research time on is building algorithms. And the thing The thing about those algorithms is I'm not wedded to any of them. I rarely go out.
Starting point is 00:35:28 I mean, we usually get excited about one for a month or something, but I really go out and get particularly excited about getting anyone to pick one of them over the other. We recently released an amazing genetic optimizer for prompts called JEPA. Before that, we had another one called Simba that was just reflective method. And we had MEPRO before that. We have a lot of these algorithms, and they're really clever and cool. But the thing that I'm interested in is we build these algorithms to expire. As models get better, we can actually come up with better algorithms that, you know, fit the,
Starting point is 00:35:58 fit turning the abstractions into higher quality systems. And what we want to happen over time is that our algorithms expire. We build better ones, but the abstractions that we promised and the systems that people expressed in those abstractions remain as unchanged as possible. So that's kind of like a, that's something that's kind of unusual to a lot of, a lot of sort of folks in the space. It may also help just to kind of pencil up. this sits in the software development lifecycle, right?
Starting point is 00:36:24 There's two places you could put it. You could just be like, I am writing my software, I want to know what's the best prompt to use, you know, and then you could use it there, or you could be like, actually, the best prop is determined at runtime, and so maybe you could invoke this, you know, actually. So do you have, is there a standard use?
Starting point is 00:36:44 Do you do this like basically before the software's deployed, or are there actual runtime uses where you're, you know, trying to find the right? So the two sort of like concepts that exist in DSPI for this, and I don't know how technical we want this to be, but like we have the notion of modules. This is borrowed straight directly from like neural network layers or Pythorch modules, which is just saying once I have the shape of the input and the shape of the output, which is a signature, I can actually build a learnable piece that has some inherent structure, like what a machine learning person would call an inductive bias. and I wanted to take that shape and implement it for me, but carry some parameters internally about what it could learn. So that's a module.
Starting point is 00:37:25 And a module is entirely an inference time object in the sense that it modifies behavior when it's being involved. So things like agents and different types of agents and code-based or tool-based agents or chain of thought reasoning, all of these are inference time strategies that are modules. And the aspect in DSI here is that these must be decoupled from your signature. Your signature should know nothing about the, inference time techniques that you're using.
Starting point is 00:37:49 The other aspect of DSI is optimizers, which are, again, they're just functions, like modules are just functions, but they're functions that take your whole program, like an actual complete piece of software that has potentially many pieces. And they think holistically, how do I use language models in order to get this thing to perform its intended goal, which might be maximizing a score on a test set, but in principle, it could just be like, do what the model understands from the instructions it should be doing. And this could be, people do this at inference time sometimes in the sense of like it happens while the user waits, so to speak, user of a system. But it's fundamentally different contract because it sees the whole, there's extra information.
Starting point is 00:38:28 I see the whole system when I'm an optimizer. I don't just see like an isolated module. I can see sort of all of the pieces. I can see a data distribution. I can see the notion of reward. And so like I have much more like a much richer space because there's strictly more information that no inference to, you know, inference technique, no LLM sort of is able to capture just from an information like flow standpoint. You know, it's interesting because I mean, a lot of people think of DSPI as basically prompt
Starting point is 00:38:55 amplifier, which is, you know, here's my, whatever, my prompt template, tell me what prompt would be the best. But to hear you describe it, it's almost, you know, it's like this kind of, you know, declarative, you know, runtime-y type thing. Do you know what the standard? I don't even know if you have visibility of this stuff because it's an open source project, but do you know what the standard? uses? Is it the naive use case, which is largely prompt optimization? Or are people actually using some more sophisticated ways?
Starting point is 00:39:22 I think one reason where, okay, so I am very, I'm very loud about the abstraction. I talk about them all the time. I give talks. You scalded me on X about this. So, I was in a exact. No, it was fantastic. It was great. I know. You really corrected me. Listen, I was one of the few people that really thought about it as the prompt optimizer. I really thought, listen, I'm going to write my prompt.
Starting point is 00:39:41 I'm going to do some template magic. I'm going to give it to DSPI. And then it's going to give me like, what's the best thing for what I want to accomplish? And then I'll just go to stick that in my program. That's the way that I thought about it until you made the point that it's actually more of a set of abstractions that will evolve with your program. So I tried to learn from what happened historically in computer science. Like you had these machines and, you know, you got general purpose chips and people were programming those directly in whatever language they spoke, right, machine code. And maybe you could abstract it slightly with assembly.
Starting point is 00:40:10 But then there was this amazing time where a lot of languages culminating maybe most popularly in C, but various others before it, got this idea that there's actually a general purpose programming model. You could build a model of a computer without thinking about any specific computer. And actually, that's a bit of an illusion because every specific computer is a lot more complicated.
Starting point is 00:40:28 But you could create this illusion that is much more portable and it's much closer to how humans think. And I know it's funny to you see as close to how humans think, but it really was a fundamental jump. Once you have C, it's important to ask why do people you see instead of writing assembly?
Starting point is 00:40:42 And it's really weird to me that anyone would use see because it's faster than assembly, like the code runs faster. So to me, when someone says they use DSI because the quality of the system is higher, which by the way is very often the case, it's not really that I answer because you're jumping, in my opinion, to a higher level abstraction such that actually I would be willing to give up some speed
Starting point is 00:41:06 in order to have the portability, the maintainability, the closeness to howish, think about the system and I want to manage its pieces. There is a trade of I'm willing to accept. Now, the reason people actually have universalized C and they don't regret it is this amazing compiler ecosystem where people build all of these optimization algorithms and passes and sorts of infrastructure.
Starting point is 00:41:26 You know, you're in line functions. So you break the modularity, right? People are writing modular code, but you're actually breaking a lot of that modularity when it's being turned into an executable artifact. You eliminate that code. You have all these heuristics. different heuristics for different machines sometimes.
Starting point is 00:41:41 And so my vision here is if AI software is a thing, and AI engineering is a thing that needs to exist irrespective of model capability because we want to have these diverse space of systems, what is the abstraction to capture it? And if natural language is too ambiguous to be the only, like the complete specification of these systems, and it's too mushy and we kind of want to have more structure,
Starting point is 00:42:02 well, what would that structure look like? And if we know what that structure looks like, well, if we do it naively, you would actually lose a lot of quality. If we build DSPi poorly, you might have a really elegant program that sucks, right? When you use this, an NLM, it sucks.
Starting point is 00:42:16 So the reason I build optimizers or like we build optimizers as a team is not so much that I think people can't write prompts and I want to write better prompts for them. What a boring reason. I don't care about that. People can write prompts. People can iterate on prompts.
Starting point is 00:42:28 That's not an issue. The thing I'm trying to say is I want them to express their intent that is less model specific and not worry that they're losing or leaving a lot on the table. Yeah.
Starting point is 00:42:38 Yeah. Honestly, this is where, like, you changed the way that I think about this whole thing. And so I'm going to try something,
Starting point is 00:42:43 and I alluded previously in our talk, but I want to try it again on this, because this is kind of how it changed by thinking. You can tell me where I'm right or I'm wrong, which is, so you said assembly and C, but we've actually had a lot of paradigm shifts since then. So let's,
Starting point is 00:42:55 so see, let's just say this is kind of like an imperative language where, like for every event that happens, you have to know how to handle it. Right. Right. So traditionally in distributed systems, like imperative,
Starting point is 00:43:06 imperatives have not been a good approach because you could have some event, you know, show up at the node. And then you don't know what state the node is in. And so you, I mean, it's just there's so, the state space is so huge. And so you had actually a big abstraction shift with declarative languages or declarative languages to be like, okay, listen, we're going to tell you what the end state of the system is. And then the system will figure out all of the state transitions to get there, right? this was a higher level of abstraction for people not to have to worry about everything that kind of comes in and every event. And you can actually declare, this is like data log or something. You're like, here are all of the conditions that exist. And then I just want to make sure that the
Starting point is 00:43:48 system is always in that state. And then the thing you kind of give up in that case is like, you can't bound the amount of computation needed. Like you don't know how long it's going to take to get there, but it'll always get you to like that state. So it's easier for a programmer. So like you can actually now build programs easier for certain classes of programs. Now when I look at DSPY, I feel like it's the same type of leap between like imperative and declarative, but for LLMs where there's certain declarative, like you can't write a declarative program that's going to solve the same problems that an LLM can because there's no fuzzy this and that and you can't really integrate them, right? And so like,
Starting point is 00:44:24 you want the same type of shifts that you went because you've got a new problem domain that you have. And so DSPI kind of gives you that with LM. So you can kind of formally specify it in a way that's kind of natural but also safe, and then it decouples you from the actual implementation below it. So is that a fair way to think about it, or is that like just a martianism? No, I think that's a fair way to think about it. And I think one funny thing is,
Starting point is 00:44:46 I'm, and I think it was probably agree with this. I don't know that declarative is better or imperative is better per se. It's more that. Because of the problem to be, like, declarative is better for ones where, like, you've got a very complex system with a lot of asynchronous events because you don't need to maintain a state machine. Yeah.
Starting point is 00:45:01 You know, all of these things have tradeoffs. All of them do, right? It was an LLM to, like, add two numbers. Right. And so I think, like, a really good shape for this is you want an imperative shell. D.S.P. I actually compared to – there's a lot of sort of folks that create graph abstractions or whatever. Like, things are very declarative. I was able to go on the record saying graph abstractions generally are a bad idea, in my opinion, for basically anything in computer science.
Starting point is 00:45:26 But go ahead. Right. And exactly. And I think it's because humans, when they think top down, we actually think imperatively. And so, like, DSPI is just Python. which is a, you know, I mean, it's a complicated language, but it's fairly imperative in that you're just like, you do this, you do this, do this.
Starting point is 00:45:40 But at the leaves, where you're going to, where you were going to potentially have a fuzzy task, what were you going to do? I think you were going to write the prompt. I think the issue with prompts is actually, fundamentally, they're actually so declarative. They're too declarative that you're forced to break the contract of declarativeness because you're like, well, if I just say what I want, the model is never going to be able to fit in my bigger program.
Starting point is 00:46:01 You know, one reason, by the way, people forget this, is if you work with a chatbot that is tuned for human responses, you're doing most of the work that DSPy has to do in a program. In a program, if I have a function and I want to give it inputs and I want to get back outputs, like those have to actually go into variables. Those actually have to go like, there's, you know, like the output has to, so to speak, parse in a certain way.
Starting point is 00:46:24 And I have to funnel things through this. If you're a human who's just asking the model questions, no matter what forming gives you, you're smart enough to be able to like, you know, bridge the fuzziness in how the shape of the model. It's almost like the imperative is I know every step to the solution, so do every step, right? And declarative is I actually don't know all of the steps,
Starting point is 00:46:46 but I know the solution. So give me the solution. And like, D.S. Pi is almost like, I kind of know how to frame the solution, do the rest of the work. It's like this kind of fuzzier. Right. And I just, listen, there's tradeoffs to each, right? I mean, like, yes, Pi, like, you have to like, whatever,
Starting point is 00:47:01 have the overhead of a soda model, which whatever, took a billion dollars a train and expensive and inference. So these are just different points in the declarative space, the performance space, the cost space, etc. By the way, so I'm totally bought in. I actually think this is such a nice way. Even independent of PSPy, like just the core workup, this is how you should think about interacting with LLMs formally.
Starting point is 00:47:22 I really think that you've kind of nailed that abstraction. So let's just take that as a given. So what are the hard problems now, or what are the next set of problems now to make that more pragmatic, like the optimizations under the covers or like, whatever you need to do to kind of. Yeah. So what we, so everything we talked about today, I do almost nothing about this because this is work we did three years ago and I'm just out there telling people about it. But I'm not really, you know, we're not changing these abstractions. What we actually do is the following set of
Starting point is 00:47:50 questions. We ask, right, someone wrote the program and we assume they did a reasonable job describing what they want. And maybe that means they wrote the control flow, they have the signatures and they have some data. These are the three pieces that you might want to have. Or, they have some, not all of these. How do we actually do a good job at optimizing this? So, you know, we've been, it's actually a really, I think, an interesting progression to sort of see how we progress from the very early optimizers in 2022 to the latest ones. Very early ones had to work with models that basically didn't work, right? And we're not had essentially no instruction following capability, but we're hit and miss for their tasks. So what we did look,
Starting point is 00:48:26 you know, the reinforcement learning people do on LLMs, which is you take the program and you do what we call, like, we bootstrap examples, which just is another, like, way of saying, you just sample, like, you just run the program, maybe with high temperature or something, a lot of times, you see which things actually work, and you keep traces of all of these over time, and then those traces, you know, which are generated by the model, they can become few shot examples. And if you just do that, sometimes it improves a lot, sometimes it becomes a lot worse. So you just do some kind of discrete session top to find which ones actually improve on average. That was like when models are really bad. As models have been getting better,
Starting point is 00:48:59 we've been moving, you know, we've moved a lot, basically all the way to reflective prompt optimization methods where, like, you actually go to the model and you're like, here is my program, here are all the pieces, here is what this language means. Here are the initial instructions I came up with from, like, just the declarative signatures. By the way, here are some rollouts that are generated from this program. Here are how well they perform. Let's debug this. Let's kind of iterate on the system to debug this. And obviously, there's a lot of scaffolding to make sure that search is actually like a formal algorithm that is going to lead to improvement. But increasingly more and more of it is actually carried off by the models. One thing we also do a lot of is we ask, all right, conventional, like, policy gradient enforcement learning methods like GRPO. Nothing about them cannot be applied to a DSPI program because the DSI program says nothing about how the optimization should happen. So actually, for a very long time from February of 2023, you could actually run offline URL.
Starting point is 00:49:51 And since May of 2025, you can run online RL or GRPO on any DSPI program that you write. People think that it's limited to prompt optimization, but I think the only notion that is fundamental in DSI to prompt is that natural language is an irreducible part of your program, but that prompt is human-facing. It's how you say what you want. How it turns, how it gets turned into the artifact may well use reinforcement learning with gradients or natural language sort of learning. So we spend a lot of time on optimization. We also spend a lot of time on inference techniques. Like you just declared that you want your signature, which processes lists of books. Well, guess what? No model has long enough context to work with lists of books. So last week, my PhD student Alex and I released this idea called recursive language models, which sort of takes any model that is good enough and sort of figures out a structure in which it can handle, you know, or scale to essentially unbounded lens of context and we were able to, you know, push it to 10 million tokens and see essentially no degradation. And the reason we build these types of algorithms is we really want to back your signal
Starting point is 00:50:56 by whatever it takes to sort of bridge the gap between whatever the current capability limit of the model is and the intent you specified. And the last thing we think a lot about is, well, we've made this argument conceptually and sort of tried to demonstrate it empirically that you need this irreducible, you need these three pieces, you know, signatures in natural language, structured, control flow and data to fully specify your intent, at least ergonomically enough. The question, though, is this is a very large space of programming where, like, you need to figure out, okay, I have a complete problem. How do I map it into these pieces, knowing that maybe I need all of them. And so we spend a lot of time, this is why it's a big open source project, because we want to see what people actually build and learn from that sort of what are the software engineering? Like, what are the AI software engineering practices that we should encourage and support? So these are the types of questions we think about.
Starting point is 00:51:49 And I think one reason this has to have the structure of this like open source project. It's just like this large fuzzy space is I don't want to be the only, you know, group or, you know, set a small number of teams working on any of these pieces. I think it's a space of the more academics and the more researchers and the more people work on optimizers. All programs benefit. The more people work on modules, all programs benefit. The more people build better models, especially programmable models, whatever that might mean in the future, it sort of models that understand that they're going to get used in this structure that everyone benefits. And it reminds me sort of of the way in which deep learning sort of really took off,
Starting point is 00:52:26 which was some people iterated on the architectures, some people iterated on the optimizers of, you know, you got things like Adam, other methods. And I think like that is what we're trying to really push a community towards. All right. So one last thing just to kind of finish off is getting a little bit more philosophical. But, you know, I think in AI it allows us to do this. And what you're addressing is, again, the ability to declare intent, you know, for these models in a way that hits the abstraction right. If you can guess prophetically, whether in the future, like, the intent, like, one of these models are going to have independent agency like agents or it's going to be humans guiding them.
Starting point is 00:53:07 Do you have any opinion on, like, the direction this goes? I asked this question a bit earlier, but I kind of want to ask it a bit more directly. Do you think that the need for a human to declare things formally is going to go away. And over time, like, we treat these like grad students or whatever. And, you know, this all just becomes the inner working of an agent. Or do you think that these things are formal software systems? This is a language like any other language. And, like, we will expect ESPY to be like the interface.
Starting point is 00:53:37 Something like it, something like that will need to be an interface that's exposed to humans, you know, for the foreseeable future. I think you need some amount of grounding into the world when you build these systems. Like, I just think people in AI a lot talk about, we can talk about AGI, but this kind of just ethereal intelligence. It just is so smart. But the problem is, you know, like intelligence that we care about, as far as I can tell, is really about, it's almost about the things you might want to ask or the way things are in the world.
Starting point is 00:54:08 It's very world-oriented. It's not really this. you know, it's not this very abstract thing. So as models get smarter and smarter, I imagine that a lot of the problems people write programs for today could get a lot simpler because that use case, it's kind of like risk versus just architectures. As CPUs, you know, if you believe in sort of complex instruction sets,
Starting point is 00:54:32 it's possible that like you had to do all of these, jump through all these hoops to do a fast square route before, but like somebody just gives you an instruction for that. Like, models can keep absorbing with keywords or, like, in their language, more use cases. But the human, you know, philosophically, as you say, like, the human condition is that we will just want more complex things. And once you want these complex things, you know, in a repeatable way, you've got to build a system. And if you want to build a system, I don't really see that not having a structure. Like, I don't see that having the structure of LLM APIs today.
Starting point is 00:55:06 I see it maybe some nicer, nicer, faster. say on top of the data. Maybe I could ask you a point of questions. You've got had students, right? Yeah. Do you ever wish that you had a DS Pi interface to them? Right? Like, the limit, it's a very structured way to have asks, right?
Starting point is 00:55:24 And then if not, wouldn't that argue that you wouldn't want that for an LLM in the limit either? So I, I've got to go into question, but I actually mean it seriously. Is it like humans just aren't capable of doing this stuff? And so that's the reason that we don't have formalism. Or is it just a totally different? I promise this is an actual answer to your question about grad students. But here's the answer. The answer is the question sounds to me, don't you have chairs at home?
Starting point is 00:55:49 Don't you wish that they all look like tables? I need both. I really want to have. There's a software system. There's a grad student. They're totally different thing. And there's nothing saying that AI, there's nothing saying that AI that operates as a chat bot, as an agent, as an employee-like agent, is a problem. I think we need it.
Starting point is 00:56:07 I love. That's wonderful. That's wonderful. So sometimes you want to specify something to a machine that has an LLM. And for that, sometimes you just want to talk to something. We're two different solutions to that. I love that. This is a great answer.
Starting point is 00:56:20 It's a great way to end this. Thank you so much for your time. This has been fantastic. Thank you, Martin. Thanks for listening to the A16Z podcast. If you enjoy the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way.
Starting point is 00:56:37 See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.