The a16z Show - How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era
Episode Date: January 16, 2026The Stanford PhD who built DSPy thought he was just creating better prompts—until he realized he'd accidentally invented a new paradigm that makes LLMs actually programmable. While everyone obsesse...s over whether LLMs will get us to AGI, Omar Khattab is solving a more urgent problem: the gap between what you want AI to do and your ability to tell it, the absence of a real programming language for intent. He argues the entire field has been approaching this backwards, treating natural language prompts as the interface when we actually need something between imperative code and pure English, and the implications could determine whether AI systems remain unpredictable black boxes or become the reliable infrastructure layer everyone's betting on. Follow Omar Khattab on X: https://x.com/lateinteractionFollow Martin Casado on X: https://x.com/martin_casadoCheck out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Nobody wants intelligence, period.
I want something else, right?
And that something else is always specific, or at least more specific.
There is this kind of observed phenomenon where if you over-engineer intelligence,
you regret it because somebody figures out a more general and maybe potentially simpler method
that scales better.
And a lot of the hard-coded decisions you made are things you end up regretting.
So I think it's fair to assume that, like, models will get better and algorithms will get better,
and a lot of that stuff will improve.
Then the question we really ask is, intelligence is great, but what problems are you actually trying to solve?
That idea that scaling model parameters and scaling, just pre-training data is all you need,
is exists nowhere anymore.
Nobody thinks that.
Actually, people deny they ever thought that at this point.
Now you see this massively human designed and very carefully constructed pipelines for post-training,
where we really encode a lot of the things we want to do.
You see massive emphasis on retrieval and web search and tool use and agent training.
There is clearly a sense in which the labs have already recognized that the overall,
old playbook doesn't work. The question is, is that actually sufficient for making the best use
and the most use of these language models? It's not a problem of capabilities. It's a problem of
actually we don't necessarily just need models. We want systems. The conventional wisdom says
we're racing toward EGI by making language models bigger and bigger. But what if the entire framing
is wrong? On today's episode, you'll hear from A16Z general partner, Martin Casado, and guest
Omar Khatab, assistant professor at MIT and creator of DS Pi.
Omar doesn't think we need artificial general intelligence.
He thinks we need artificial programmable intelligence,
and the difference matters more than you think.
Here's the paradox.
Katab has built one of the most widely used frameworks for working with LLMs,
DS Pi, but he's skeptical that raw model capabilities will solve our problems.
While others obsessed over scaling laws and parameter accounts,
he's asking a more fundamental question.
Even if models become infinitely scalable, infinitely capable,
How do humans actually specify what they want?
Natural language is too ambiguous.
Code is too rigid.
We need something in between,
a new abstraction layer that lets us to clear intent
without drowning and implementation details.
Think of it as a jump from assembly to sea,
but for AI systems.
The stakes are higher than prompt engineering.
This is about whether AI becomes a programmable tool
we can reason about and compose,
or just an inscrutable oracle we prompt and pray.
We get into the three irreducible pieces of an AI system.
Why the model God,
is a dead end, and what it actually means to build software when intelligence is cheap,
but the specification is hard.
Well, listen, Omar is great to have you, and congratulations on everything.
Just so for everybody that's listening, Omar is doing some, in my opinion, of kind of the more
interesting technical work in building frameworks around LLMs and models.
And a lot of this has consequences on things like, you know, AGI and capabilities and everything
else.
And a lot of your comments on social media to me have been kind of some of the most insightful.
also I've been really looking forward to having you on the podcast.
Thank you for seeing me, Martin.
I'm many of your chat as well.
Awesome.
So listen, maybe let's just start with your background, you know,
since we have some shared roots and then we'll go from there to a general conversation.
So, I mean, I'm now an assistant professor at MIT.
I started a few months ago in electrical engineering and computer science and part of C-Sail.
I did my PhD at Stanford where I think the timing was really interesting.
I started in 2019 and I graduated about.
a year ago. That timing was really great because foundation models as a concept, didn't even
necessarily have that name. We hadn't coined it at Stanford yet, was starting to take shape.
You know, Burke was around for about a year at the time. But people were sort of hadn't really
figured out how to make them work. But I would say as importantly, how to make use of them to build
different types of systems and applications, which is basically what I did throughout my whole PhD.
So, I mean, you're the, I presume the primary person behind DSPI. Is that correct?
You could say that, yeah.
Yeah, yeah.
So for those that you don't know, DSPY is widely used,
widely use, I would say open source projects around
pumped optimization for LLMs.
So maybe let's just go ahead and start.
You have tweeted, you know, about, you know,
whether LLMs will get to AGI or non.
I know it's a kind of very fluffy,
high-level place to start,
but we'd love to your thoughts on,
on, you know, are we headed towards AGI in the near term?
Is this an apt goal?
like, where do you weigh down?
And it is particularly timely right now,
given the conversation that Andre Kapathi just had the dorkish podcast
where he was like, well, you know, maybe 10 years if you're optimistic.
What do you weigh in on this debate?
So, I mean, I think honestly it's a surprising position
because I feel like I'm not sure, but I'm less sort of,
say, bearish than Carpathie necessarily.
Oh, you are?
Oh, you are.
You're less bearish than Carpati on AGI.
Right, which is very strange to you.
But let me tell you what I think.
So back when I started my PhD,
like basically you can look at a lot of sort of the work that we've done,
you know, with my advisors and collaborators and others
over the past six years or so as pushing back on this perspective
that scaling model size and maybe doing a little bit more pre-training,
and, you know, especially at the time, it really was about model size.
And just sort of doing more uniform scaling of that nature is just going to solve all of your problems.
And the pushback has to, you know, has to sides.
One side is this is an incredibly inefficient way to build capabilities that you care about.
If you know what you want, that's just waiting for everything to emerge is just incredibly inefficient.
And the diminishing returns just speak for themselves.
The other problem is really a problem of specification or like abstractions.
Scaling language models makes this un, I think, realistic bet that anything people want to build with these models is just a few keywords away or a few words away.
and that people know how to actually think of what these words should be.
I think it's an incredibly limiting abstraction.
But the reason I'm less bearish than maybe Carpathie sounded,
although again, I'm not really sure, is, you know,
I mean, I think we're seeing very rapid,
I would say, improvement in the perspective that we see out of the frontier laps.
Like, that idea that scaling model parameters and scaling, you know,
just pre-training data is all you need exists nowhere anymore.
Nobody thinks that. Actually, people deny they ever thought that at this point. And now you see this
massively human designed and very carefully sort of constructed pipelines for post-training,
where, like, we really encode a lot of the things we want to do. You see massive emphasis on
retrieval and web search and tool use and sort of agent training. And you see all of this
emphasis on, you know, opening eye at their latest thing was building this agent builder.
And they have products like codex and others. So there is clearly a sense in which the labs have
already recognize that the old playbook doesn't work and that they are actually, or at least
is not like complete. And so if by AGI, we just mean this thing that, you know, at a very large
set of problems, you can ask it sort of problems. And as long as you give it enough context,
it's able to handle them, you know, the models are increasingly powerful and reliable.
The question is, is that actually sufficient for making the best use and the most use of these
language models? And I think that's where my fundamental pushback doesn't go anywhere.
because I think the problem is just,
it's not a problem of capabilities.
It's a problem of actually we don't
necessarily just need models. We want systems.
And I can speak a lot more about that.
Yeah, it's actually one of, yeah, so
it's just a little bit.
So there is a view of the world that like,
kind of like some variant of the, you know,
transformer architecture is going to get us there.
And then the end of an argument
kind of suggests that, you know,
you put all the data into one model and you have one model that will just become, you know,
so good because Skittling laws hold that it solves all of reasoning, right? That's kind of this
absoluteest end of an argument. I think nobody believes that anymore anyway. I think people do in
video, maybe not in LLMs, but in video, I think a lot of people are like, listen, there will be
one video model that you put everything in. It does everything. It does 3D. It does physics. It does whatever.
So maybe at LLMs, you know, people don't believe that anymore because they're for long to suggest it's not true.
There's another view which is like LLMs are totally a dead end.
You know, wouldn't Carpapie call them ghosts, which I thought it was so beautiful,
which is, you know, they can kind of, you know, do some sort of linear interpolation of stuff that they've heard in the past,
but like they can't do planning.
And so you need an entirely new architecture.
And you're saying that you're not in that camp of getting an entirely new architecture.
It depends, because I've been arguing for a different architecture for years,
but that different architecture is built around having these models.
No, no, 100%.
Yeah, that was the third of what I was going to say.
So the first one is like one model rules them all.
The second one is this is the wrong path,
and there is no kind of system you could build with these models.
You know, you've got to do something totally different, right?
I would say like Jan Lecun would say that with Jepard or whatever.
Like, you need to do something fundamentally different.
And then you're in the start spot, which is you can build some sort of system with these models
and you can get to.
I mean, AGI is such a loose word, but you can actually get to what we're trying to achieve,
which is pretty generalized intelligence
to tackle any sort of problems.
Is that a fair characterization?
I think so.
I mean,
I think you could think of it as,
I think AGI is fairly irrelevant.
Like,
it's not the thing I'm interested in.
I'm interested,
I joke sometimes,
I'm interested in API
or artificial programmable intelligence.
And the reason I say this is,
why are we building AI?
Why are we building seeking to build AGI?
I think we're not like,
you know,
and you can take a step back and ask,
you know,
well, maybe it's a scientific question
or maybe it's just like a dream people,
have. But I think fundamentally, it's, in my opinion, a way of improving and expanding the set of
software systems we can build or just systems we can build in the world. And if you think about why
people build systems, software systems as an example, but really any engineering endeavor,
it's not so that it's not really about that we lack general intelligences. There are a billion
general intelligences out there. Like, they're a billion people. We build the systems because,
you know, we want them to be processes that are reliable, that are interpretable,
that are easy, you know, easy, you know, that we can iterate on, that are modular, that we can study, that we can, right?
So there is all these properties that are scalable and efficient.
There is a reason we care about systems.
And that is not like, you know, it's not that we lack intelligence.
So the question that I think is most important is, how do we build programmable intelligences?
And I think the alignment folks get some of this right.
It's not so, it's, you know, you could have a very powerful model that doesn't listen to what you say.
And a lot of pre-trained models could be perceived that way.
they have a lot of latent capabilities, presumably.
And the question is, you know, could you make it do what you want?
But I think what alignment fails to do, at least as a general sort of way of thinking,
is it sort of omits to think about, well, what is actually the,
what is the shape of the intent that people want to communicate to these models?
How can I get people to actually express what it is that they want to happen?
And with that bottleneck being, you know, as narrow and tight as it is,
it's not a question of are the models capable enough or not.
So that's what I'm saying.
I might be even less bearish than Carpathy about whether the models will get so good such that,
given all the right context and the right instructions and the right tools and the right, they become.
I think this is very aligned.
Again, we don't, you know, not to refer to another discussion that's not on here,
but just in general,
you more take issue with the definition of AGI as being like the same thing as an animal or a human,
which is that's not actually particularly interesting.
seeing, given a bunch of animals and humans, but like, we actually want smarter software systems.
Yeah.
And then you think a, like a systems-based approach to models is the right way to get there.
Is that fair enough?
So, like, it's not going to be one model.
It's going to be said, then can you maybe roughly sketch out what you think is the right way
to build a system to do?
What are the components that are meaningful in this?
So I would say, like, the first inspiring sort of concept here, or like the starting point
for this conversation is, look, to be honest, I have no idea what the capabilities of the models,
the core capabilities of the models will be today, tomorrow, in a year, in 10 years. I just don't really
know. And I'm invested in getting a sense of how that will happen and progress. And I think,
like, it's easy to sort of like model different paths based on how you think the progress
that's been happening has been happening. But in any case, like, there is kind of a bitter lesson
to keep in mind. And I don't mean necessarily Rich Sutton's own interpretation of his great essay.
I just mean like, it is true that there is this kind of observed phenomenon where, you know,
if you over-engineer intelligence in AI, you regret it because somebody figures out a more
general and maybe potentially simpler method that scales better.
And a lot of the hard-coded decisions you made are things you end up regretting.
So I think it's fair to assume that like models will get better and algorithms will get better.
and a lot of that stuff will improve.
And then the question we really ask is, well, you know, intelligence is great,
but what problems are you actually trying to solve?
Like, what is the, you know, what is the application that you want to improve?
Are you trying to, I don't know, approach helping doctors do medicine?
Are you trying to improve doing certain types of research, you know,
fewer cancer maybe are you trying to build the next codex or cursor or, you know,
one of these types of coding applications?
Are you building, you know, so the question is like,
what are you actually trying to solve?
And I would argue that intelligence is this really amazing, powerful concept,
but precisely because it's a foundation for a lot of applications.
And sort of the analogy I like to draw here is improvements in chip manufacturing
and increasing numbers of transistors in sort of CPUs,
nobody thinks that more general purpose and more powerful general purpose computers
make software obsolete or, you know, make us forget about systems.
The thing you think about is, like, they make software possible,
but you kind of need to have a stack.
So back to your question, what should the stack look like?
I think the first thing we need to agree on is like, what is the language, so to speak?
Like, what is the medium of encoding of intent and of like structure with which we can specify our systems to that computational?
So yeah, so could we approach exactly this question with the, I just have a line of inquiry on exactly this question, which is like what is the right, what is the right language to specify?
So, so I'm going to, you know, I'll love you to tell me why this is the wrong approach.
So let's assume I'm an advocate.
kit for the God model, right? So models just keep getting better. There's one model that keeps
getting better. Let's say that my task is software, and I want to build, do you know the game
Core Wars? What is it cool? Corwar. Okay, this is a very old hacker game from like the 1970s,
where like you would write software that would try to kill each other. So let's say here,
so I want to build, I want to build a online multiplayer version of Core War. So what is wrong with
the following approach? I have a prompt that says,
says, I want to build a multiplayer version of Core War that's online.
And that's my prompt.
And then I just sit and I just wait for models to get better.
Why is that not a right approach to this?
So actually, so something about what you said is great.
What you said is you just said it, you express the thing you want.
And you were so lucky that the thing you wanted was easy enough to express.
Like, you know, you're assuming that the speaker, you know, in this abstract hypothetical scenario is being honest.
What they want is to build that particular software.
where you mentioned, they fully specified it in a single sentence,
and they're not doing anything else,
they're waiting for the models to get better.
The only issue I have with this, by the way,
is that, well, I don't know how long you're going to wait,
but if you're comfortable with it, that actually, that's endorsed by me.
The problem is, as you're probably actually trying to hint at maybe,
is, well, most things people want,
especially most things that don't exist yet,
there is no five-word statement that even a math,
you know, even the best intelligence in the world is going to do for you.
And there is really an untrivial amount of alignment-ish, like,
it's such a loaded, it's such an important statement for this discussion,
which is like there is no, say, simple way to describe what you want.
There's multiple ways to interpret why that is.
One of them is, well, yeah, one of them is I don't know what I want.
Another is my wants are complex, so I want to use a lot of words.
And another one is there's actually fundamental tradeoffs.
So, like, my wants would be ambiguous.
Are you talking about all three of those, or...
Yeah, I'm talking about all three.
If, when it comes to, like, actually getting people to express what they want from a system,
I mean, the kind of the premise I start from is people want systems.
Nobody wants intelligence, period.
I think this is a great, this is such a great point.
I actually really like how you said that.
I hadn't thought about it that way.
I don't want better GPUs, right?
I want maybe a neural network.
I want something else, right?
And that something else is always specific.
Or at least more specific.
And so the question is, what is the number of things people can want?
So if the vision of AGI, and by the way, the reason I said, like, I'm pretty, I'm not necessarily super pessimistic in practice is the frontier AI labs have been, like, they kind of tackle them one at the time, but they've had a track record enough for me to be reasonable enough to like when they reach a bottleneck, they kind of go to the next thing and then unblock themselves.
And right, so that's great.
But, you know, at some point, like, there's a view of AGI, which is GPT3, the original GPD3.
but scaled one million times or one billion times.
And you get, you know, a GPD 10.
And there's that GPD 10 that you go to.
And in order to build, you know, a complex system,
sorry, in order to not build a system, right,
in order to just treat it as the end user facing system,
every time you go, you juggle your context
and you juggle your prompting,
which might, you know, maybe because the model is so good,
it might not be, the prompting part might not be that hard.
And you, like, you ask it from scratch every time
or some ridiculous thing like that.
And I think in the grand scheme,
of things people are slowly realizing, obviously that's not what you want. And so this is the argument
for systems, is that it's just all of this decision making that happens in making a concrete
application or product or saying that encodes taste and knowledge about the world and also knowledge
about human preferences or some substrate of a complete story that you want. And it kind of
systematizes it, encodes it, makes it sort of maintainable and portable and modular. There's all
this stuff that we like to have in building systems. And the moment you start thinking that way,
You don't want that to be like a blurb, like a string blurb.
Yeah.
So let's say, I mean, I'm going to, I don't want to get too philosophical, but for me,
this is always begs this very interesting question.
So let's just take what you're saying at face value, which is I have a lot of complex wants
and those shift over time.
And so like a string will never encapsulate it.
And so, you know, I want to say a whole bunch of stuff and like maybe pull some context in.
But it could be the case that these models are so powerful that I just start to
apticate want.
Do you ever think about that?
I'll just want less.
I'll be like, I want whatever the model gives me.
Do you think that there's any direction in the future where like we just are less
picky about our actual wants and we do converge to like these high level things?
Or are you really convicted that?
No, this is totally possible.
I mean, recommendation algorithms versus like search algorithms.
Recommendation algorithms are like, give me what I want.
Yeah.
So literally like my universal prop that I could just like, I can just go to the beach and every time
there's like a new model, I just go use it on that new model.
rather than building a complex system
would be,
give me what I want right now.
Right. And over time,
that model can train you,
like a recommendation feed, right?
Like, you just open the 4U tab.
Exactly.
That's right.
But I mean, hope it doesn't,
you know,
I mean,
but that requires such a fundamental.
That's a choice we can make.
And a different choice we can make is,
well,
actually, no,
we do care about,
you know,
building systems and encoding knowledge into them.
One thing that's been growing on me
for a while.
It's kind of,
to make this
slightly less philosophical,
although maybe not much.
You know,
the idea in machine learning
that,
you know,
like,
it's kind of a fundamental
and old and known idea,
but that there is no free lunch.
And there's like a lot of interpretations
of the same theory.
Like the theory is true.
It's a mathematical statement.
And it basically just says,
if you assume nothing about the world,
all learning algorithms you can build
and all learners you build are equally bad,
pretty much.
And once you sort of understand
the mathematical version of that,
it's kind of, it's almost a, it's a really simple statement. And what that really means, and I think
like it's a, it's something that comes time and, you know, time and time again, is that something
fundamental about intelligence, as we call it, is actually about knowing our world and knowing
what, because we're humans, what humans are like and what humans are interested in. And, you know,
like, you can't kind of scale your way into that. Now, if humans themselves change their preferences
to be simpler, yeah, that's the future that's possible. I actually agree. I think, like, there are, like,
But sometimes we want real solutions to problems.
There are fundamental tradeoffs.
We have to articulate those tradeoffs, right?
I mean, the only, like, there is no simplified version of the answer, given what you
want to accomplish.
So let's assume we're in that world.
So I can't go to the beach with my one prompt.
Instead, I have to, like, describe.
So you've done work on DSPY, which I think is, in my opinion, is just the most systematic approach
to making the prompt more powerful.
So maybe you can describe DSPI why it works and how it addresses that.
this problem.
Yeah.
Very specifically the problem is like we've decided that like my one prompt and just waiting
for the bottle to get better is not going to be sufficient for whatever reason.
So now I need a better way to think about prompting.
Yeah.
So actually, back to your example, suppose that what you wanted to build was a bit more complex.
So there was more specification involved.
But suppose also that you were in more rush because again, applications don't like,
they don't want to wait.
Nobody, but I'm building a system.
I want to use the best, you know, intelligence, so to speak, that exists now.
But I do need to progress.
I do need to proceed.
And so the question is, what are you going to do?
And when I started, one of the hardest things that makes communicating DSPI stuff difficult
is we've been doing this some version or another of this for something like six years.
And DSPI itself is three years old.
Like a lot of this is codified before a lot of the changes in the field.
So it kind of makes some of the conversations slightly trickier.
But what people did for the longest time.
So that's what they did in 2022, when people, you know, were thinking with early models.
That's what they did in 2023.
only in 2024 did there start to be some slight change to this.
But fundamentally to this day, the biggest hurdle in using a model is front engineering,
which is basically at least my understanding of it.
And really, I think the most canonical understanding of it is changing the way in which you express what you want,
such that it evokes the model's capabilities in the right ways.
And so this is less about, I would argue, things that are much more timeless and important.
and it's really about the belief that there is like a slightly different wording of what you ask
that could get the model to behave a lot better.
And the problem is that this is actually true.
This is true for the latest.
This is why, you know, opening eye on others, you know, and Anthropic and others,
they release like even for the latest models.
They release prompting guides and why not.
And they say, well, you're not holding it right, right?
And they're correct.
But for the most part, the argument that early DSPI was making was...
Is that how do you pronounce it?
DSPI?
Yeah, DSPI, like NAMPI or...
Oh, I love that.
Oh, I always thought.
The argument that we were making was the models keep getting better, but in any case, they keep
changing.
And the thing you want to build changes a lot more slowly.
I'm not saying it doesn't change, but it is actually a conceptual separation between what
you were trying to build and LMs and, you know, V-A vision language models.
Like that space is basically separate.
And so what if we could try to capture your intent in some kind of purer form, you know,
And that intent has to go through language.
That's why you're trying to do AI,
is that there's some inherent under specification and fuzziness.
You're trying to defer some decision-making.
I don't know how this function should exactly behave in every edge case,
but please be reasonable is what you're trying to communicate, right,
with these types of programs you're building.
So DSPI says, basically, there is a number of ideas that you need,
and you need them together,
which is the thing that I think is a little trickier to a lot of people.
You need these five things.
There's five bets that DSPI makes,
and you need them together,
and you need to be seamlessly composable.
And actually, in order to get 05, you don't need five concepts.
You basically fundamentally need one concept.
So the idea is we have Python or we have programming languages.
These programming languages encode a lot of things that are highly not trivial.
First of all, they have control flow.
And control flow means that I can get modular pieces really easily
because I can define separate functions.
These separate functions and modules,
the nice thing about them is that they really give you a bunch of stuff.
So they create like a notion of separation of concerns
where contracts of different functions can be described
without you knowing everything inside the function
or getting about everything inside the function,
if you trust that it was built properly,
you could just invoke it and it does its job,
and then you can reason about sort of how you can compose these things.
But you can also compose algorithms over functions.
Like I can have a more general sort of processor or function
or something that takes these functions
and applies things on top of them
that are sort of higher level concerns.
I can refer to variables and objects
and mutate them or pass them around.
When I say, if this, then that, and I really mean it, I don't have to go to the model to reassure it that if it doesn't do this or if it, you know, if it does listen to that if statement, because I really mean it, I will tip it $1,000.
You know, and one thing here is this is a really limiting paradigm. Conventional programming is a really limiting paradigm.
Why would we want to go back to it? And I think the answer is like all of the things I mentioned now, like all these symbolic benefits from a specification standpoint.
Now, this is not about capabilities or really hard to encode in natural language. You can reinvent them.
You can tell the model, you know, if you see this, then do that.
And the model might reasonably say, well, you know, he didn't actually really mean like 100% of a time.
I think the reasonable thing this time is an exception, right?
Well, you actually can't.
I mean, you actually can't do that with natural languages without implicitly for creating a formal language.
I mean, the most obvious version of this is ambiguity, right?
So the dog brought me the ball and I kicked it.
That's fundamentally ambiguous.
You don't know if I kicked the dong or if I kicked the ball.
or if I kicked the ball.
And both are totally reasonable
depending on the person, right?
And so at some level,
English actually doesn't do the job.
Right.
So, but programming languages, I agree, right?
But programming languages are also really fundamentally limited
in that you have to over-specify what you want.
Like, you kind of have to go be above and beyond what you actually want
because no ambiguity is allowed.
And that forces you to think through things, you know,
maybe you don't even know how to do.
Like, how do you write a function that generates,
good search queries or that, you know, plays a game well or, you know, it's very difficult to do something.
Yeah, I don't want to get too wonkyer because I know where you're going and I just have to
say this because it just helps frame this conversation. I mean, by the way, what you said on X
with what we're getting to really kind of change my brain, which is, so for imperative programming,
that's absolutely the case, right? Which is you need to know everything that possibly happens,
right? Or if you don't know, you make it, you're making, like, the language is going to make a very
fixed assumption. Yeah, it's going to make some basic assumption, right? And so it's almost
like if you're managing a state machine, you've got to know every state machine transaction.
That's imperative languages. Declarative languages are quite different, right? So in declarative languages,
you actually specify what you want formally, right? And then the system kind of figures on how to get to
get to that end state. Right? But the problem is you have to be able to specify every aspect of that
end state perfectly, you know, which again, like for some problems is very complex. So that's also
limited and you just have to know the end state. Yeah.
So now, you know, you're working on DSPI, and I would love, you know, you talk about how using LLMs with a bit more formalism pushes it to like yet another level.
Right.
So the only new abstraction in DSPI, and it's incredibly simple.
It's just this notion of signatures.
It's just borrowed from the word like for function signatures.
Our most fundamental idea is that, which is just so basic and simple, is that interactions with language models in order to build AI software should.
decompose the role of ambiguity,
should isolate ambiguity into functions.
And what do you want to specify a function?
Like, how do you declaratively specify a function?
I think the most fundamental thing is it takes a bunch of objects.
They better be typed and, you know,
like they better take, you know,
they better have like interesting and meaningful names.
It does a transformation to them and you get back some structured object,
you know, potentially carrying multiple pieces.
And when you do this, it's your job to,
It's your job, and this is not easy, but it is your job to describe exactly what you want
without thinking particularly about the specific model or compositions you're thinking of.
And this is actually a lot harder than it sounds to most people.
So for example, you would not, you know, so there is a class of problems for which some people
actually write prompts that are almost signatures.
So these are cases where you only have one input.
Your output is just a response.
You're not trying to like, you know, like it basically like a chatbot, you know, because
the APIs are usually, or the.
models are structured such that. This is a very natural use case. And people like, they try to
prompt minimally, right? So they don't encode a lot of, you know, they don't say, I don't know,
think step by step, or you're an agent that's supposed to do this, or they just kind of just say
what they want. So there's a class of people that almost implicitly write signatures. But there's
something wrong with the original, like, shape of the API that usually exists. And so signatures
are just saying, it is a better shape and we made every decision here slightly more carefully.
Now, once you have signatures, every other part of DSPI from an abstraction standpoint,
falls off of it.
There's really nothing else.
Once you have signatures,
you could ask,
I have a function declaration.
It's just declaring a function.
It doesn't do anything.
One of the hardest things about,
like, people wrapping their head
about DSP-in signatures
is a signature does absolutely nothing.
And it's entirely their job to build it.
We actually can't help them at all build the signatures.
A lot of the time, people are like,
well, couldn't you generate the signature from this or that?
The signature is encoding your intent.
I know nothing about your intent up front.
That's the whole point.
Wait, what are this?
To be very clear,
what are the signatures written in?
I mean, fundamentally,
it could be a dragon drop thing.
It could be a, but usually like, it could be whatever.
But the point is, it is a Python class, usually.
It's a Python.
It's formal.
It's formal.
It's not English, right?
It's, well, it's a formal structure in which almost every piece is a fuzzy English-based description.
So you could say something like, I want a signature that takes a list of documents.
And the list of documents is the typed object.
You could actually say list of document.
And you have to define what the type document means.
And the fact that it, you know, the fact that this type is document, maybe the name matters.
Like, a list of documents is not.
not necessarily the same as a list of images, right?
They're different things.
And they're like semantically and fuzzily different.
And basically like it says in English, given these inputs, you have several of them.
I want to get these outputs and you have several of them and maybe the order matters.
So it's really just like I argue.
It's like it's what a prompt is supposed to be or what a prompt wants to be when it grows up.
It really is just a cleaner prompt.
Now, if you grant me that, which I argue like a really small, it's a very simple contribution.
There is really not a lot of richness to this, but that's the point.
You get everything else that makes programming great while being.
able to build really powerful AI systems because you can now isolate your ambiguity at the right
joints. You have a notion of like where you want the joints to be. And the rest of your systems,
the rest of your programs can be very modular. You can have multiple signatures. So now you get what
people call multi-agent systems. Multi-agent systems are just AI programs in which you have multiple
functions. You know, it's not really that. It's really nothing. It's not really a complicated idea
once you take this. You get things like inference time strategies. People are like chain of thought,
you know, you have to write your prompt in this way or we have to train the model in a certain way or
React agents or, you know, program of thought.
We recently released this thing called recursive language models.
You know, the thing is when you're solving a task,
none of these inference strategies should be of your concern.
It's if when you wanted to, you know, like this is just a thing that should be compositional.
And signatures have the shapes that were like we can use programmatic sort of constructs
to compose over these types of, you know, constructs.
Do you, when you think of, you know, DSPI, when you're originally creating it
and now as you think about it, do you think about it in something that, well,
fundamentally only be consumed by humans or for humans?
Not at all, no.
I can't imagine cases where you bridge the gap.
And the reason I ask is there's this obvious question is if the interface
LMS is going to be all automated anyways, do we need to enforce these restrictions
that are primarily to keep natural language speakers within certain boundaries as opposed
to if it's whatever, and it's an agent calling it, we may not need to do that.
So I think the argument in DSPI is intent should be expressed in its most natural form.
So that's the declarative part.
And the second part is unfortunately, or fortunately, in the general case, that cannot be reduced below three forms.
Some things are really best expressed as code.
And no amount of automation can remove that.
There's no amount of automation that can remove the fact that I actually want to think about three pieces because they're separate to me and I want to maintain them separately.
no amount of automation is going to remove the natural language piece.
Nobody wants to write Python to describe a really complicated AI system from scratch.
And no amount of automation is going to remove the fact that for some classes of problems,
you really need a more RL-like standpoint where you have a distribution of initial states or inputs
and you have a way of judging them or like metadata about what correctness looks like
because that really captures the wonky and sort of like exceptional long-tail set of problems
that actually vary by implementation or by model.
Yeah, you may also just want diversity.
Give me something that may solve this problem, right?
Like, it may just be that, like,
there is no formal specification.
Yeah, totally.
Right, right.
So there is a machine learning.
People associate DSPI a lot with the one that is most different
to what they usually see, which is optimization.
So a lot of new users and a lot of people that look at the paradigm
and try to critique it conceptually,
they miss the fact that you have to have these three pieces or,
works. Like, in the general case, you can get away without any. Now, by the way, there's a lot of
applications where you do not need all three. If you're building yet another rag app and the model
has been post-trained to death to take a context and answer a question about it, you don't really
need a lot of, you know, a lot of that to express your intent because just close what the model
is good at. Anyway, a lot of people associate DSPI with the third one, which is the database
optimization. And actually, a lot of well-intending users would write overly simplistic and
general programs and try to distill their intent through data or through kind of this process
of trial and error.
And that's a really bad sort of, that's like a misuse of the power of the models and the power
of the paradigm, because if you know what you want, nothing can express it better than just
you saying what you want.
The data base of optimization is there to smoothen the rough edges.
It's for you not to have to maintain laundry lists of exceptions.
I'll wrap this up quickly, Martin.
The other part of the ESPI is the reason we built all of these abstractions and we haven't
been changing them. This has been the case for these abstractions are basically three years old.
They've basically not been changing. And what we spend all of our, a lot of our research time on is
building algorithms. And the thing about those algorithms is I'm not wedded to any of them.
I rarely go out. I mean, we usually get excited about one for, you know, a month or something.
But I really go out and get particularly excited about getting anyone to pick one of them
over the other. We recently released an amazing genetic optimizer for prompts called JEPA.
Before that, we had another one called Simba that was just reflective method, and we had MEPRO before that.
We have a lot of these algorithms, and they're really clever and cool.
But the thing that I'm interested in is we build these algorithms to expire.
As models get better, we can actually come up with better algorithms that, you know, fit turning the abstractions into higher quality systems.
And what we want to happen over time is that our algorithms expire, we build better ones, but the abstractions that we promised and the systems that people express in the system.
those obstructions remain as unchanged as possible.
So that's kind of like a,
that's something that's kind of unusual to a lot of,
a lot of folks in the space.
It may also help just to kind of pencil up
where this sits on the software development lifecycle, right?
There's two places you could put it.
You could just be like, I am writing my software.
I want to know what's the best prompt to use, you know,
and then you could use it there.
Or you could be like, actually,
the best prop is determined at runtime.
And so maybe you could invoke this, you know,
Actually, so do you have, is there a standard use?
Do you do this like basically before this offers deployed or are there actual runtime uses where you're, you know, trying to find the right?
So the two sort of like concepts that exist in DSPI for this and I don't know how technical we want this to be, but like we have the notion of modules.
This is borrowed straight directly from like neural network layers or Pythorch modules, which is just saying once I have the shape of the input and the shape of the output, which is a signal.
I can actually build a learnable piece that has some inherent structure, like what a machine
learning person would call an inductive bias.
And I wanted to take that shape and implement it for me, but carry some parameters internally
about what it could learn.
So that's a module.
And a module is entirely an inference time object in the sense that it modifies behavior
at, you know, when it's being involved.
So things like agents and different types of agents and code base or tool-based agents or chain
of thought reasoning, like all of these are inference time strategies that are module.
And the aspect in DSI here is that these must be decoupled from your signature.
Your signature should know nothing about the inference time techniques that you're using.
The other aspect of DSI is optimizers, which are, again, they're just functions, like modules are just functions,
but they're functions that take your whole program, like an actual complete piece of software that has potentially many pieces.
And they think holistically, how do I use language models in order to get this thing to perform its intended goal,
which might be maximizing a score on a test set,
but in principle it could just be like,
do what the model understands
some of the instructions that should be doing.
And this could be,
people do this at inference time sometimes
in the sense that like it happens
while the user waits, so to speak,
user of a system.
But it's fundamentally different contract
because it sees the whole,
there's extra information.
I see the whole system
when I'm an optimizer.
I don't just see like an isolated module.
I can see sort of all of the pieces.
I can see a data distribution.
I can see the notion of reward.
And so, like, I have much more, like a much richer space because there's strictly more information that no inference technique, no LLM sort of is able to capture just from an information like flow standpoint.
You know, it's interesting because I mean, a lot of people think of DS Pi as basically prompt amplifier, which is, you know, here's my, whatever, my prompt template, tell me what prompt would be the best.
But to hear you describe it, it's almost like, you know, it's like this kind of, you know, declarative, you know, run.
timey type thing. Do you know
with the standard? I don't even know if you have
visibility of this stuff because it's an open source project,
but do you know what the standard use is? Is it the
naive use case, which is largely
prompt optimization? Or are people
actually using this in some more sophisticated ways?
I think one reason where, okay,
so I am very
loud about the abstraction. I talk about them all the time.
I give talks. You scalded me on X about this.
So I was in a exact. No, it was fantastic.
It was great. You really corrected me.
I was one of the few people that really thought
about it as a prompt. I really thought, listen, I'm going to write my prompt. I'm going to do
some template magic. I'm going to give it to DSPy. And then it's going to give me, like, what's the
best thing for what I want to accomplish? And then I'll just go to stick that in my program. That's the
way that I thought about it until you made the point that it's actually more of a set of abstractions
that will evolve with your program. So I tried to learn from what happened historically in computer
science. Like, you had these machines and, you know, you got general purpose chips. And
people were programming those directly
in whatever language they spoke, right, machine code,
and maybe you could abstract it slightly with assembly.
But then there was this amazing time
where a lot of languages culminating maybe most popularly
in C, but various others before it,
got this idea that there's actually a general purpose programming model.
Like, you could build a model of a computer
without thinking about any specific computer.
And actually, that's a bit of an illusion
because every specific computer is a lot more complicated.
But, like, you could create this illusion
that is much more portable
and it's much closer to how humans think.
And I know it's funny to you see
as close to how humans think,
but it really was a fundamental jump.
Once you have C,
it's important to ask why do people you see
instead of writing assembly.
And it's really weird to me
that anyone would use see
because it's faster than assembly,
like the code runs faster.
So to me, when someone says
they use DSI because the quality of the system is higher,
which by the way is very often the case,
it's not really the right answer
because you're jumping, in my opinion,
to a higher level abstraction
such that actually,
I would be willing to give up some speed
in order to have the portability,
the maintainability, the closeness to how I think about the system
and I want to manage its pieces.
There is a trade that I'm willing to accept.
Now, the reason people actually have universalized C
and they don't regret it is this amazing compiler ecosystem
where people build all of these optimization algorithms
and passes and sorts of infrastructure.
You're in line functions.
So you break the modularity, right?
People are writing modular code,
but you're actually breaking a lot of that modularity
when it's being turned into an executable artifact,
you eliminate dead code.
You have all these heuristics,
different heuristics for different machines sometimes.
And so my vision here is if AI software is a thing,
and AI engineering is a thing that needs to exist
irrespective of model capability because we want to have
these diverse space of systems,
what is the abstraction to capture it?
And if natural language is too ambiguous
to be the only, like, the complete specification of these systems
and it's too mushy and we kind of want to have more structure,
well, what would that structure look like?
And if we know what that structure looks like,
well, if we do it naively,
you would actually lose a lot of quality.
If we build DSPi poorly,
you might have a really elegant program that sucks, right?
Like when you use this and an NLM, it sucks.
So the reason I build optimizers or like we build optimizers as a team
is not so much that I think people can't write prompts
and I want to write better prompts for them.
What a boring reason.
I don't care about that.
Like, people can write prompts.
People can iterate on prompts.
That's not an issue.
The thing I'm trying to say is I want them to express their,
intent that is less model specific and not worry that they're losing or leaving a lot on the table.
Yeah.
Yeah.
Honestly, this is where, like, you changed the way that I think about this whole thing.
And so I'm going to try something.
I alluded previously in our talk, but I want to try it again on this.
Because this is kind of how it changed by thinking.
You can tell me where I'm right or I'm wrong, which is, so you said assembly and C,
but we've actually had a lot of paradigm shifts since then.
So let's see, let's just say this is kind of like an imperative language where, like,
for every event that happens, you have to know how.
to handle it.
Right.
Right.
So traditionally in distributed systems,
like imperatives have not been a good approach because you could have some event,
you know, show up at the node.
And then you don't know what state the node is in.
And so you, I mean, it's just there's so, the state space is so huge.
And so you had actually a big abstraction shift with declarative languages or declarative
language to be like, okay, listen, we're going to tell you what the end state of the system is.
And then the system will figure out all of the state transitions to get there, right?
So it was a higher level abstraction for people not to have to worry about everything that kind of comes in and every event.
And you can actually declare, this is like data log or something.
You're like, here are all of the conditions that exist.
And then I just want to make sure that the system is always in that state.
And then the thing you kind of give up in that case is like you can't bound the amount of computation needed.
Like you don't know how long it's going to take to get there, but it will always get you to like that state.
So it's easier for a programmer.
So, like, you can actually now build programs easier for certain classes of programs.
Now when I look at DSPY, I feel like it's the same type of leap between, like,
imperative and declarative, but for LLMs, where there's certain declarative,
like, you can't write a declarative program that's going to solve the same problems at an LLM can
because there's no fuzzy this and that, and you can't really integrate them, right?
And so, like, you want the same type of shifts that you went because we've got a new problem domain
that you have.
So DSPI kind of gives you that with LM.
So you can kind of formally specify it
in a way that's kind of natural but also safe.
And then it decouples you
from the actual implementation below it.
So is that a fair way to think about it
or is that like just a martinism?
No, I think that's a fair way to think about it.
And I think one funny thing is,
and I think it was probably agree with this.
I don't know that declarative is better
or imperative is better per se.
It's more that...
Because of the problem to me.
Like, declarative is better for ones
where like you've got a very complex system
with a lot of asynchronous events
because you don't need to maintain a state machine.
Yeah.
You know, all of these things have tradeoffs.
All of them do, right?
Like, it was an LLM to, like, add two numbers.
Right.
And so I think, like, a really good shape for this is you want an imperative shell.
DSMI actually compared to, there's a lot of sort of folks that create graph abstractions
or whatever, like things are very declarative.
I was going to go on the record saying graph abstractions generally are a bad idea, in my opinion,
for basically anything in computer science.
But go ahead.
Right.
Exactly. And I think it's because humans, when they think top down, we actually think
imperatively. And so like DSPy is just Python, which is a, you know, I mean, it's a complicated
language, but it's fairly imperative in that you're just like, do this, you do this, do this.
But at the leaves, where you're going to, where you were going to potentially have a fuzzy task,
what were you going to do? I think you were going to write a prompt. I think the issue with
prompts is actually, fundamentally, they're actually so declarative. They're too declarative
that you're forced to break the contract of declarativeness because you're like,
Well, if I just say what I want, the model is never going to be able to fit in my bigger program.
You know, one reason, by the way, people forget this is if you work with a chatbot that is tuned for human responses,
you're doing most of the work that DSPOT has to do in a program.
In a program, if I have a function and I want to give it inputs and I want to get back outputs,
like those have to actually go into variables.
Those actually have to go like, there's, you know, like the output has to, so to speak, parse in a certain way.
And I have to funnel things through this.
If you're a human who's just asking the model questions,
no matter what forming gives you,
you're smart enough to be able to, like, you know,
be, you know, inter-bridge the fuzziness
in how the shape of the model.
It's almost like the imperative is,
I know every step to the solution,
so do every step, right?
And declarative is,
I actually don't know all of the steps,
but I know the solution.
So give me the solution.
And, like, DS-Pye is almost,
I kind of know how to frame the solution,
do the rest of the word.
It's like this kind of fuzzier.
Right.
And I just, listen, there's tradeoffs to each, right?
I mean, like, DSPy, like, you have to like, whatever, have the overhead of a soda model,
which whatever took a billion dollars to train and expensive an inference.
So these are just different points in the declarative space, the performance space, the cost space, etc.
By the way, so I saw it, I'm totally bought in.
I actually think this is such a nice way.
Even independent of PSPy, like, just the core workup, this is how you should think about
interacting with LLMs formally.
I really think that you've kind of nailed that abstraction.
So let's just take that as a gimmick.
So what are the hard problems now, or what are the next set of problems now to make that more pragmatic?
Like the optimizations under the covers or like whatever you need to do to kind of.
Yeah.
So everything we talked about today, I do almost nothing about this because this is work we did three years ago.
And I'm just out there telling people about it.
But I'm not really, you know, we're not changing these abstractions.
What we actually do is the following set of questions.
We ask, all right, someone wrote the program and we assume they did a reasonable job describing what they want.
and maybe that means they wrote the control flow,
they have the signatures, and they have some data.
These are the three pieces that you might want to have,
or they have some, not all of these.
How do we actually do a good job at optimizing this?
So, you know, we've been,
it's actually a really, I think, an interesting progression
to sort of see how we progress from the very early optimizers in 2022
to the latest ones.
Very early ones had to work with models that basically didn't work, right?
And we're not, had essentially no instruction following capability,
but we're hit and miss for their tap.
So what we did look, you know, the reinforcement learning people do on LLMs, which is you take the program and you do what we call like we bootstrap examples, which just is another like way of saying you just sample like you just run the program, maybe with high temperature or something. A lot of times, you see which things actually work. And you keep traces of all of these over time. And then those traces, you know, which are generated by the model, they can become few shot examples. And if you just do that, sometimes it improves a lot. Sometimes it becomes a lot worse. So you just do some kind of discrete session top to find which ones that. You know, which is generated by the model. And then you just do that.
actually improve on average. That was like when models are really bad. As models have been getting
better, we've been moving, you know, we've moved a lot, basically all the way to reflective
prompt optimization methods where like, you actually go to the model and you're like, here is my
program, here are all the pieces, here is what this language means. Here are the initial instructions
I came up with from like just the declarative signatures. By the way, here are some rollouts
that are generated from this program. Here are how well they perform. Let's debug this. Let's
kind of iterate on the system to debug this. And obviously, there's a lot of scaffolding to make
sure that search is actually like a formal algorithm that is going to lead to improvement. But
increasingly more and more of it is actually carried off by the models. One thing we also do a lot of
is we ask, all right, conventional like policy gradient enforcement learning methods like GRPO.
Nothing about them cannot be applied to a DSPI program because the DSI program says nothing
about how the optimization should happen. So actually, for the, for a very long time from
February of 2023, you could actually run offline or.
and since May of 2025, you can run online RLOGRPO on any DSI program that you write.
You know, people think that it's limited to prompt optimization.
But, you know, I think the only notion that is fundamental in DSPI to prompt is that natural
language is an irreducible part of your program.
But that prompt is human-facing.
It's how you say what you want.
How it turns, how it gets turned into the artifact may well use reinforcement learning with gradients
or natural language sort of learning.
So we spend a lot of time on optimization.
We also spent a lot of time on inference techniques.
You just declared that you want your signature, which processes lists of books.
Well, guess what?
No model has long enough context to work with lists of books.
So last week, my PhD student Alex, and I released this idea called recursive language models,
which sort of takes any model that is good enough and sort of figures out a structure
in which it can handle or scale to essentially unbounded lens of context,
and we were able to push it to 10 million tokens
and see essentially no degradation.
And the reason we build these types of algorithms
is we really want to back your signatures
by whatever it takes to sort of bridge the gap
between whatever the current capability limit of the model is
and the intent you specified.
And the last thing we think a lot about is,
well, we've made this argument conceptually
and sort of tried to demonstrate it empirically
that you need this irreducible,
you need these three pieces, you know, signatures,
in natural language, structured.
control flow and data to fully specify your intent, at least ergonomically enough.
The question, though, is this is a very large space of programming where, like, you need to
figure out, okay, I have a complete problem. How do I map it into these pieces, knowing that maybe
I need all of them. And so we spend a lot of time, this is why it's a big open source project,
because we want to see what people actually build and learn, like, from that sort of what are the
software engineering, like what are the AI software engineering practices that we should
that encourage and support.
So these are the types of questions we think about.
And I think one reason this has to have the structure of this like open source project.
It's just like this large fuzzy space is I don't want to be the only, you know, group or, you know,
set a small number of teams working on any of these pieces.
I think it's a space of the more academics and the more researchers and the more people work
on optimizers, all programs benefit.
The more people work on modules, all programs benefit.
The more people build better models, especially programmable models, whatever.
that might mean in the future, it sort of models that understand that they're going to get used
in this structure that everyone benefits. And, you know, it reminds me sort of of the way in which
deep learning sort of really took off, which was some people iterated on the architectures,
some people iterated on the optimizers of, you know, you got things like Adam, other methods.
And I think like that is what we're trying to really push a community towards.
All right. So one last thing, just to kind of finish off is getting a little bit more philosophical.
But, you know, I think in AI it allows us to do this.
And what you're addressing is, again, the ability to declare intent, you know, for these models in a way that hits the abstraction right.
If you can guess prophetically, whether in the future, like, the intent, like, one of these models are going to have independent agency like agents or it's going to be humans guiding them.
Do you have any opinion on, like, the direction this goes?
I asked this question a bit earlier, but I kind of want to ask it a bit more directly.
Do you think that the need for a human to declare things formally is going to go away.
And over time, like, we treat these like grad students or whatever.
And, you know, this all just becomes the inner working of an agent.
Or do you think that these things are formal software systems?
This is a language like any other language.
And, like, we will expect ESPY to be like the interface.
Something like it, something like that will need to be an interface that's exposed to humans, you know, for the foreseeable future.
I think you need some amount of grounding into the world when you build these systems.
Like I just think people in AI a lot talk about, we can talk about AGI, but this kind of just
ethereal intelligence. It just is so smart. But the problem is, you know, like intelligence
that we care about, as far as I can tell, is really about, it's almost about the things you
might want to ask or the way things are in the world. It's very world-oriented. It's not really
this, you know, it's not this very abstract thing. So as models get smarter and smarter,
I imagine that a lot of the problems people write programs for today could get a lot simpler
because that use case, it's kind of like risk versus just architectures as CPUs, you know,
if you believe in sort of complex instruction sets, it's possible that like you had to do all
of these, jump through all these hoops to do a fast square route before, but like somebody just
gives you an instruction for that.
Like, models can keep absorbing with keywords or, like, in their language, more use cases.
But the human, you know, philosophically, as you say, like, the human condition is that we will
just want more complex things.
And once you want these complex things, you know, in a repeatable way, you've got to build
the system.
And if you want to build a system, I don't really see that not having a structure, like,
I don't see that having the structure of LLM APIs today.
I see it maybe some nicer, nicer.
nicer facade on top of the
Maybe I could ask you a point of questions.
You've graded students, right?
Yeah.
Do you ever wish that you had a DS Pi interface to them?
Right?
The limit, it's a very structured way to have asks, right?
And then if not, wouldn't that argue that you wouldn't want that for an LLM in the limit either?
So I...
I've got to go ahead.
But I actually mean to seriously, is it like humans just aren't capable of doing this stuff.
And so that's the reason that we don't have formalism would talk to them?
Or is it just a total question?
different. I promise this is an actual answer to your question about grad students.
But to me, so here's the answer. The answer is the question sounds to me, don't you have chairs
at home? Don't you wish that they all look like tables? I need both. I really want to have.
There's a software system. There's a grad student. They're totally different thing.
And there's nothing saying that AI, there's nothing saying that AI that operates as a chatbot,
as an agent is, as an employee like agent is a problem. I think we need it. I love.
That's wonderful. That's wonderful.
So sometimes you want to specify something to a machine that has an LLM.
And for that, sometimes you just want to talk to something.
And we're two different solutions to that.
I love that.
This is a great answer.
It's a great way to end this.
Thank you so much for your time.
This has been fantastic.
Thank you, Martin.
Thanks for listening to the A16Z podcast.
If you enjoy the episode, let us know by leaving a review at rate thispodcast.com
slash a 16Z.
We've got more great conversations coming your way.
See you next time.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security,
and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16Z.com forward slash disclosures.
