a16z Podcast - How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era
Episode Date: January 16, 2026The Stanford PhD who built DSPy thought he was just creating better prompts—until he realized he'd accidentally invented a new paradigm that makes LLMs actually programmable. While everyone obsesse...s over whether LLMs will get us to AGI, Omar Khattab is solving a more urgent problem: the gap between what you want AI to do and your ability to tell it, the absence of a real programming language for intent. He argues the entire field has been approaching this backwards, treating natural language prompts as the interface when we actually need something between imperative code and pure English, and the implications could determine whether AI systems remain unpredictable black boxes or become the reliable infrastructure layer everyone's betting on. Follow Omar Khattab on X: https://x.com/lateinteractionFollow Martin Casado on X: https://x.com/martin_casadoCheck out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Nobody wants intelligence, period.
I want something else, right?
And that something else is always specific, or at least more specific.
There is this kind of observed phenomenon where if you over-engineer intelligence,
you regret it because somebody figures out a more general and maybe potentially simpler method
that scales better.
And a lot of the hard-coded decisions you made are things you end up regretting.
So I think it's fair to assume that, like, models will get better and algorithms will get better,
and a lot of that stuff will improve.
Then the question we really ask is, intelligence is great, but what problems are you actually trying to solve?
That idea that scaling model parameters and scaling just pre-training data is all you need,
exists nowhere anymore.
Nobody thinks that.
Actually, people deny they ever thought that at this point.
Now you see this massively human-designed and very carefully constructed pipelines for post-training,
where we really encode a lot of the things we want to do.
You see massive emphasis on retrieval and web search and tool use and agent training.
There is clearly a sense in which the labs have already recognized that the overall,
old playbook doesn't work. The question is, is that actually sufficient for making the best use
and the most use of these language models? It's not a problem of capabilities. It's a problem of
actually we don't necessarily just need models. We want systems. The conventional wisdom says
we're racing toward EGI by making language models bigger and bigger. But what if the entire framing
is wrong? On today's episode, you'll hear from A16Z general partner, Martine Casado, and guest
Omar Khatab, assistant professor at MIT and creator of DS Pi. Omar doesn't think we need artificial
general intelligence. He thinks we need artificial programmable intelligence, and the difference matters
more than you think. Here's the paradox. Katab has built one of the most widely used frameworks
for working with LLMs, DS Pi, but he's skeptical that raw model capabilities will solve our problems.
While others obsess over scaling laws and parameter accounts, he's asking a more fundamental question.
Even if models become infinitely scalable, infinitely capable, how do humans actually specify what they want?
Natural language is too ambiguous, code is too rigid.
We need something in between, a new abstraction layer that lets us to clear intent without drowning implementation details.
Think of it as a jump from assembly to sea, but for AI systems.
The stakes are higher than prompt engineering.
This is about whether AI becomes a programmable tool we can reason about and compose,
or just an inscrutable oracle we prompt and pray.
We get into the three irreducible pieces of an AI system,
why the model god is a dead end
and what it actually means to build software
when intelligence is cheap, but the specification is hard.
Well, listen, Omar is great to have you,
and congratulations on everything.
Just so for everybody that's listening,
Omar is doing some, in my opinion,
of the more interesting technical work
in building frameworks around LLMs and models.
And a lot of this has consequences on things like,
like, you know, AGI and capabilities and everything else.
And a lot of, like your comments on social media to me, have been kind of some of the most insightful.
So I've been really looking forward to having you on the podcast.
Thank you for seeing me, Martin.
And many of you go to chat as well.
Awesome.
So listen, maybe let's just start with your background, you know, since we have some shared roots,
and then we'll go from there to a general conversation.
So, I mean, I'm now an assistant professor at MIT.
I started a few months ago in electrical engineering and computer science and part of C-Sail.
I did my PhD at Stanford where I think the timing was really interesting. I started in 2019 and I graduated about a year ago.
That timing was really great because foundation models as a concept, didn't even necessarily have that name.
We hadn't coined it at Stanford yet.
Was starting to take shape.
You know, Burke was around for about a year at the time.
But people were sort of hadn't really figured out how to make them work.
But I would say as importantly, how to make use of them to build different types of systems and applications.
which is basically what I did throughout my whole PhD.
So I mean, you're the, I presume the primary person behind DSPY.
Is that correct?
That's, you could say that, yeah.
Yeah, yeah.
So for those that you don't know, DSPY is widely, why the users we're going to be talking about it.
That's one of the most widely used, I would say open source projects around prompt optimization for LLMs.
So maybe let's just go ahead and start.
You know, you have tweeted, you know, about, you know, whether LLMs will get to AGI or
non.
I know it's a kind of very fluffy, a high-level place to start, but we'd love to your thoughts
thoughts on, you know, are we headed towards AGI in the near term? Is this an apt goal? Like,
where do you weigh down? And it is particularly timely right now, given the conversation that
Andre Kapathi just had the Dorkish podcast where he was like, well, you know, maybe 10 years
if you're optimistic. What do you weigh in on this to be? So, I mean, I think, honestly,
it's a surprising position because I feel like I'm not sure, but I'm less sort of, say, bearish than
Carpathie necessarily.
Oh, you are?
Yeah, which is very surprising.
You're less bearish than Carpati on AGI.
Right, which is very strange to me.
Let me tell you what I think.
So back when I started my PhD,
basically you can look at a lot of sort of the work that we've done,
you know, with my advisors and collaborators and others over the past six years or so
as pushing back on this perspective that scaling model size and maybe doing a little bit more
pre-training, and, you know, especially at the time, it really was about model size.
and just sort of doing more uniform scaling of that nature
is just going to solve all of your problems.
And the pushback has to, you know, has two sides.
One side is this is an incredibly inefficient way
to build capabilities that you care about.
If you know what you want,
that's just waiting for everything to emerge
is just incredibly inefficient
and the diminishing returns just speak for themselves.
The other problem is really a problem of specification or like abstractions.
Scaling language models makes this
I think realistic bet
that anything people want to build with these models
is just a few keywords away or a few words away
and that people know how to actually think
of what these words should be.
I think it's an incredibly limiting abstraction.
But the reason I'm less bearish
than maybe Carpathie sounded,
although again, I'm not really sure,
is, you know, I mean,
I think we're seeing very rapid,
I would say improvement in the perspective
that we see out of the frontier laps,
like that idea that scaling model parameters
and scaling, you know, just pre-training data is all you need,
exists nowhere anymore.
Nobody thinks that.
Actually, people deny they ever thought that at this point.
And now you see this massively human designed
and very carefully sort of constructed pipelines for post-training
where, like, we really encode a lot of the things we want to do.
You see massive emphasis on retrieval and web search and tool use
and sort of agent training.
And you see all of this emphasis on.
on, you know, opening eye at their latest thing was building this agent builder, and they have
products like codex and others. So there is clearly a sense in which the labs have already recognized
that the old playbook doesn't work and that they are actually, or at least is not like complete.
And so if by AGI, we just mean this thing that, you know, at a very large set of problems,
you can ask it sort of problems. And as long as you give it enough context, it's able to handle them,
you know, the models are increasingly powerful and reliable. The question is, is that actually
sufficient for making the best use and the most use of these language models.
And I think that's where my fundamental pushback doesn't go anywhere because I think the problem
is just, it's not a problem of capabilities.
It's a problem of actually we don't necessarily just need models.
We want systems.
And I can speak a little more about that.
Yeah, so it's actually one.
Yeah.
So it's just a little bit.
So there is a view of the world that like kind of like some variant of the, you know,
transformer architecture.
is going to get us there.
And then the end-to-end argument kind of suggests that,
you know, you put all the data into one model,
and you have one model that will just become, you know,
so good because Skittling Laws Hole that it solves all of reasoning, right?
That's kind of this, you know, absoluteest end-to-in argument.
I think nobody believes that anymore anyway.
I think people do in video, maybe not in LLLLNs,
but in video, I think a lot of people are like,
listen, there will be one video model that you put everything in
It does everything.
It does 3D.
It does physics.
It does whatever.
So maybe at LLMs,
you know,
people don't believe that anymore
because they're for long
to suggest it's not true.
There's another view,
which is like,
LLMs are totally a dead end.
You know,
what in Carpapie called them ghosts,
which I thought it was so beautiful,
which is,
you know,
they can kind of,
you know,
do some sort of
linear interpolation of stuff
that they've heard in the past,
but like,
they can't do planning.
And so you need an entirely new architecture.
And you're saying that you're not in that camp
of getting an entirely new architecture.
It depends, because I've been arguing for a different architecture for years,
but that different architecture is built around having these models.
No, no, 100%.
Yeah, that was the third of what I was going to say.
So the first one is like one model rules them all.
The second one is this is the wrong path,
and there is no kind of system you could build with these models.
You know, you've got to do something totally different, right?
I would say like Jan Lecun would say that with Jepa or whatever.
Like you need to do something fundamentally different.
And then you're in this third spot, which is you can build some sort of system
with these models and you can get to,
I mean, AGI is such a loose word,
but you can actually get to what we're trying to achieve,
which is pretty generalized intelligence
to tackle any sort of problems.
Is that a fair characterization?
I think so.
I mean, I think you could think of it as,
I think AGI is fairly irrelevant.
Like, it's not the thing I'm interested in.
I'm interested, I joke sometimes,
I'm interested in API or artificial programmable intelligence.
And the reason I say this is,
why are we building AI?
Why are we building seeking to build AGI?
I think we're not,
And you can take a step back and ask, you know, well, maybe it's a scientific question or maybe it's just like a dream people have.
But I think fundamentally, it's in my opinion a way of improving and expanding the set of software systems we can build or just systems we can build in the world.
And if you think about why people build systems, software systems as an example, but really any engineering endeavor, it's not so much so that it's not really about that we lack general intelligences.
There are a billion general intelligences out there.
They're 8 billion people.
We build the systems because we want them to be processes that are reliable, that are
interpretable, that are, you know, easy, you know, that we can iterate on, that are modular,
that we can study, that we can, right?
So there is all these properties that are scalable and efficient.
There is a reason we care about systems.
And that is not like, you know, it's not that we lack intelligence.
So the question that I think is most important is, how do we build programmable intelligences?
And I think the alignment folks get some of this right.
You could have a very powerful model that doesn't listen to what you say.
And a lot of pre-trained models could be perceived that way.
You know, they have a lot of latent capabilities, presumably.
And the question is, you know, could you make it do what you want?
But I think what alignment fails to do, at least as a general sort of way of thinking,
is it sort of omits to think about, well, what is actually the shape of the intent
that people want to communicate to these models?
How can I get people to actually express what it is that they want to happen?
And with that bottleneck being, you know, as narrow and tight as it is,
it's not a question of are the models capable enough or not.
So that's what I'm saying.
I might be even less bearish than Carpathy about whether the models will get so good
such that given all the right context and the right instructions and the right tools and the right,
they become.
I see.
Yeah.
Yeah, like maybe.
I think this is very aligned.
Again, we don't, you know, not to refer to another discussion that's not on here,
but just in general, a few more take issue with the definition.
of AGI as being like the same thing as an animal or a human,
which is that's not actually particularly interesting,
given a bunch of animals and some humans,
but like we actually want smarter software systems.
And then you think a comp,
like a systems based approach to models is the right way to get there.
Is that fair enough?
So like it's not going to be one model.
It's going to be said,
then can you maybe roughly sketch out what you think is the right way
to build a system to do this?
What are the components that are meaningfulness?
So I would say like the first inspiring sort of concept here or like the starting point for this
conversation is, look, to be honest, I have no idea what the capabilities of the models,
the core capabilities of the models will be today, tomorrow, in a year, in 10 years.
I just don't really know.
And I, you know, I'm invested in getting a sense of how that will happen and progress.
And I think like it's kind of, it's easy to sort of like model different paths based on how
you think the progress that's been happening has been happening.
But in any case, like, there is kind of a bitter lesson to keep in mind, and I don't mean necessarily Rich Sutton's own interpretation of his great essay.
I just mean like it is true that there is this kind of observed phenomenon where, you know, if you over engineer intelligence in AI, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better.
And a lot of the hard-coded decisions you made are things you end up regretting.
So I think it's fair to assume that like models will get better and algorithms will get better and a lot of that stuff will improve.
And then the question we really ask is, well, you know, intelligence is great, but what problems are you actually trying to solve?
Like what is the, you know, what is the application that you want to improve?
Are you trying to, I don't know, approach helping doctors do medicine?
Are you trying to improve doing certain types of research, you know, fewer cancer maybe are you trying to build the next.
codex or cursor or, you know, one of these types of coding applications, are you building, you know,
so the question is like, what are you actually trying to solve? And I would argue that intelligence
is this really amazing, powerful concept, but precisely because it's a foundation for a lot of
applications. And sort of the analogy I like to draw here is improvements in chip manufacturing
and increasing numbers of transistors in, in sort of CPUs. Nobody thinks that more general purpose
and more powerful general purpose computers
make software obsolete or
make us forget about systems.
The thing you think about is they make software possible,
but you kind of need to have a stack.
So back to your question,
what should the stack look like?
I think the first thing we need to agree on is like,
what is the language, so to speak?
What is the medium of encoding of intent
and of structure with which we can specify
our systems to that computational?
So could we approach exactly this question?
I just have a line of inquiry on exactly this question,
which is like,
What is the right language to specify?
So I love you to tell me why this is the wrong approach.
So let's assume I'm an advocate for the God model, right?
So models just keep getting better.
There's one model that keeps getting better.
Let's say that my task is software.
And I want to build, do you know the game, Core Wars?
What is it cool?
Four War.
Okay, this is a very old hacker game from like the 1970s where like you would write software
and that would try to kill each other.
So let's say here,
so I want to build an online multiplayer version of Core War.
So what is wrong with the following approach?
I have a prompt that says,
I want to build a multiplayer version of Core War that's online,
and that's my prompt,
and then I just sit and I just wait for models to get better.
Why is that not the right approach to this?
Wait, so actually, so something about what you said is great.
What you said is, you just said it,
you express the thing you want,
and you were so lucky that the thing you wanted,
was easy enough to express.
Like, you know, you're assuming that the speaker, you know,
in this abstract hypothetical scenario is being honest,
what they want is to build that particular software you mentioned.
They fully specified it in a single sentence,
and they're not doing anything else.
They're waiting for the models to get better.
The only issue I have with this, by the way, is that,
well, I don't know how long you're going to wait,
but if you're comfortable with it, that actually, that's endorsed by me.
The problem is, as you're probably actually trying to hint at maybe,
is, well, most things people want,
especially most things that don't exist yet,
There is no five-word statement that even the best intelligence in the world is going to do for you.
And there is really an untrivial amount of alignment-ish element.
It's such a loaded, it's such an important statement for this discussion, which is like there is no, say, simple way to describe what you want.
There's multiple ways to interpret why that is.
One of them is, well, yeah, one of them is, I don't know what I want.
Another is my wants are complex, so I want to use a lot of words.
And another one is there's actually fundamental tradeoffs.
So like my wants would be ambiguous.
Are you talking about all three of those or?
Yeah, I'm talking about all three.
If when it comes to like actually getting people to express what they want from a system,
I mean, the kind of the premise I start from is people want systems.
Nobody wants intelligence, period.
I think this is a great, this is such a great point.
I actually really like how you said that.
I hadn't thought about it that way.
I don't want better GPUs, right?
I want maybe a neural network.
I want something else, right?
And that's something else is always specific.
Or at least more specific.
And so the question is,
what is the number of things people can want?
So if the vision of AGI,
and by the way, this, like,
the reason I said, like, I'm pretty,
I'm not necessarily super pessimistic in practice
is the frontier AI labs have been,
like, they kind of tackle them one at the time,
but they've had a track record enough for me
to be reasonable enough to like,
when they reach a bottleneck,
they kind of go to the next thing
and then unblocked themselves.
Sure.
Right.
So that's great.
But at some point,
like,
there's a view of AGI,
which is GPD3,
the original GPD3,
but scaled one million times
or one billion times
and you get, you know,
a GPD 10.
And there's that GPT 10 that you go to
and in order to build,
you know,
a complex system,
sorry,
in order to not build a system,
right,
in order to just treat it
as the end user-facing system.
Every time you go,
you juggle your context
and you juggle your prompting,
which might, you know, maybe because the model is so good, it might not be, the prompting part
might not be that hard. And you like, you ask it from scratch every time or some ridiculous thing
like that. And I think in the grand scheme of things, people are slowly realizing, obviously that's
not what you want. And so this is the argument for systems is that it's just all of this
decision making that happens in making a concrete application or product or saying that
encodes, taste and knowledge about the world and also knowledge about human preferences or
some substrate of a complete story that you want. And it kind of systematizes it
encodes it, makes it, makes it sort of maintainable and portable and modular.
Because all this stuff that we like to have in building systems,
and the moment you start thinking that way,
you don't want that to be like a blur, like a string blurb.
So let's, I mean, I don't want to get too philosophical,
but for me, this always begs this very interesting question.
So let's just take what you're saying at face value,
which is I have a lot of complex wants, and those shift over time.
And so, like, a string will never encapsulate it.
And so, you know, I want to say a whole bunch of stuff and, like, maybe pull some context in.
But it could be the case that these models are so powerful that I just start to abdicate want.
When you ever think about that?
I'll just want less.
I'll be like, I want whatever the model gives me.
Do you think that there's any direction in the future where, like, we just are less picky about our actual wants?
And we do converge to, like, these high-level things?
Or are you really convicted that?
No, this is totally possible.
I mean, recommendation algorithms versus like search algorithms.
Recommendation algorithms are like, give me what I want.
Yeah.
So literally like my universal prop that I could just like, I can just go to the beach.
And every time there's like a new model, I just go use it on that new model,
rather than building a complex system would be give me what I want right now.
Right.
And, you know, over time that model can train you like a recommendation feed, right?
Like you just open the 4U tab.
Exactly.
That's right.
What it gives you.
But I mean, hope it doesn't, you know.
I mean, but that requires such a fundamental.
That's a choice we can make.
And a different choice we can make is, well, actually, no, we do care about, you know,
building systems and encoding knowledge into them.
One thing that's been growing on me for a while is kind of, to make this slightly less
philosophical, although maybe not much, you know, the idea in machine learning that, you know,
like, it's kind of a fundamental and old and known idea, but that there is no free lunch.
And there's like a lot of interpretations of the same theory.
Like, the theory is true.
it's a mathematical statement, and it basically just says, if you assume nothing about the world,
all learning algorithms you can build and all learners you build are equally bad, pretty much.
And once you sort of understand the mathematical version of that, it's kind of, it's almost a,
it's a really simple statement.
And what that really means, and I think like it's a, it's something that comes time and,
you know, time and time again, is that something fundamental about intelligence, as we call it,
is actually about knowing our world and knowing what, because we're humans, what humans are like
and what humans are interested in.
And you know, like, you can't kind of scale your way into that.
Now, if humans themselves change their preferences to be simpler,
yeah, that's the future that's possible.
I actually agree.
I think, like, there are, like, sometimes we want real solutions to problems.
There are fundamental tradeoffs.
We have to articulate those tradeoffs, right?
I mean, the only, like, there is no simplified version of the answer,
given what you want to accomplish.
So let's assume we're in that world.
So I can't go to the beach with my one problem.
Instead, I have to like describe.
So you've done work on DSPY, which I think is, in my opinion,
is just the most systematic approach to making the prompt more powerful.
So maybe you can describe DSPY and how it works and how it addresses this problem.
Yeah.
Very specifically the problem is like, we've decided that like my one prompt and just waiting for the bottle to get better
is not going to be sufficient for whatever reason.
So now I need a better way to think about prompting.
Yeah.
So actually, back to your example, suppose that what you wanted to build was a bit more complex.
So there was more specific.
more specification involved.
But suppose also that you were in more rush
because, again, applications don't, like,
they don't want to wait. Nobody, but I'm building a system.
I want to use the best, you know,
intelligence, so to speak, that exists now.
But I do need to progress. I do need to proceed.
And so the question is, what are you going to do?
And when I started, you know,
one of the hardest things that makes communicating
DSPI stuff difficult is
we've been doing this, some version
or another of this for something like six years.
And DSPI itself is three years old.
Like, a lot of this is codified
before a lot of the changes in the field.
So it kind of makes some of the conversations slightly trickier.
But what people did for the longest time.
So that's what they did in 2022,
when people were thinking with early models.
That's what they did in 2023.
Only in 2024 did there start to be some slight change to this.
But fundamentally, to this day,
the biggest hurdle in using a model is front engineering,
which is basically at least my understanding of it
and really I think the most canonical understanding of it
is changing the way in which you express what you want,
such that it evokes the model's capabilities in the right ways.
And so this is less about, I would argue,
things that are much more timeless and important.
And it's really about the belief that there is like a slightly different wording of what you
ask that could get the model to behave a lot better.
And the problem is that this is actually true.
This is true for the latest.
This is why, you know, opening eye on others, you know, and Anthropic and others,
they release like even for the latest models, they release prompting guides and why not.
And they say, well, you're not holding it right, right?
You're not.
And they're correct.
But for the most part, the argument that early DSPI was making was...
How do you pronounce it?
DSPI?
Yeah, DSPI, like NNPai or...
Oh, I love that.
Oh, I always thought DSPI.
The argument that we were making was the models keep getting better, but in any case,
they keep changing.
And the thing you want to build changes a lot more slowly.
I'm not saying it doesn't change, but it is actually a conceptual separation between what
you were trying to build and LMs and, you know, V-A vision-like.
language models, like that space is basically separate. And so what if we could try to capture
your intent in some kind of purer form, you know, and that intent has to go through language.
That's why you're trying to do AI, is that there's some inherent under specification and fuzziness.
You're trying to defer some decision making. You know, I don't know how this function should
exactly behave in every edge case, but please be reasonable is what you're trying to communicate,
right, with these types of programs you're building. So, so ESPI says basically there is a number of
ideas that you need and you need them together, which is the thing that I think is a little trickier
to a lot of people, you need these five things. There's five bets that DSPI makes, and you need them
together and you need to be seamlessly composable. And actually, in order to get 05, you don't need five
concepts. You basically fundamentally need one concept. So the idea is we have Python or we have
programming languages. These programming languages encode a lot of things that are highly not trivial.
First of all, they have control flow. And control flow means that I can get modular pieces really
easily because they can define separate functions. These separate functions and modules, the nice
thing about them is that they really give you a bunch of stuff. So they create like a notion of
separation of concerns where contracts of different functions can be described without you knowing
everything inside the function or caring about everything inside the function. If you trust that it
was built properly, you could just invoke it and it does its job and then you can reason about
how you can compose these things. But you can also compose algorithms over functions. Like I can
have a, you know, a more general sort of, you know, processor or function or something that
takes these functions and applies things on top of them that are sort of higher level of concerns.
I can refer to variables and objects and mutate them or, you know, pass them around.
When I say, if this, then that, and I really mean it, I don't have to go to the model to
reassure it that if it doesn't do this or if it, you know, if it does listen to that if statement,
because I really mean it, I will tip it a thousand dollars.
You know, and one thing here is this is a really limiting paradigm.
Conventional programming is a really limiting paradigm.
Why would we want to go back to it?
And I think the answer is like all of the things I mentioned now,
like all these symbolic benefits from a specification standpoint,
this is not about capabilities,
are really hard to encode in natural language.
You can reinvent them.
You can tell the model, you know, if you see this, then do that.
And the model might reasonably say, well, you know,
he didn't actually really mean like 100% of a time.
I think the reasonable thing this time is an exception, right?
Well, you actually can't.
I mean, you actually can't do that with natural language.
which is without implicitly for creating a formal language.
I mean, the most obvious version of this is ambiguity, right?
So the dog brought me the ball and I kicked it.
That's fundamentally ambiguous.
You don't know if I kicked the dong or if I kicked the ball.
And both are totally reasonable depending on the person, right?
And so at some level, English doesn't do the job.
Right.
So, but programming languages, I agree, right?
But programming languages are also really fundamentally limited in that
you have to over-specify what you want.
100%.
You kind of have to go be above and beyond what you actually want
because no ambiguity is allowed.
And that forces you to think through things,
you know,
maybe you don't even know how to do.
Like,
how do you write a function that generates good search queries
or that,
you know,
plays a game well or,
you know,
it's very difficult to do something.
Yeah,
I don't want to get too wonkyer
because I know where you're going
and I just have to say this because it just helps frame this conversation.
I mean,
by the way,
what you said on X,
which when we're getting to really kind of change my brain,
which is, so for imperative programming, that's absolutely the case, right?
Which is, you need to know everything that possibly happens, right?
Or if you don't know, you make it, you're making, like, the language is going to make a very fixed assumption.
Yeah, it's going to make some basic assumption, right?
So it's almost like if you're managing a state machine, you've got to know every state machine transaction.
That's imperative language.
Declarative languages are quite different, right?
So in declarative languages, you actually specify what you want formally, right?
And then the system kind of figures out how to get to that end state.
Right?
But the problem is you have to be able to specify every aspect of that end state perfectly,
which again, like for some problems is very complex.
So that's also limited and you just have to know the end state.
Yeah.
So now, you know, you're working on DSPI.
And I would love, you know, you talk about how using LLMs with a bit more formalism
pushes it to like yet another level.
Right.
So the only new abstraction in DSI, and it's incredibly simple.
It's just this notion of signatures.
It's just borrowed from the word for function signatures.
Our most fundamental idea is that, which is just so basic and simple, is that interactions
with language models in order to build AI software should decompose the role of ambiguity,
should isolate ambiguity into functions.
And what do you want to specify a function?
Like, how do you declaratively specify a function?
I think the first, like, the most fundamental thing is it takes a bunch of objects.
They better be typed and, you know, like, they better take, you know, they better have, like, interesting and meaningful names.
It does a transformation to them and you get back some structured object, you know, potentially carrying multiple pieces.
And when you do this, it's your job to, it's your job.
And this is not easy, but it is your job to describe exactly what you want without thinking particularly about the specific model or compositions you're thinking of.
And this is actually a lot harder than it sounds to most people.
So for example, you would not, you know, so there is a class of, there is a class of problems
for which some people actually write prompts that are almost signatures.
So these are cases where you only have one input.
Your output is just a response.
You're not trying to like, you know, like you basically take a chat bot, you know, because
the APIs are usually or the models are structured such that.
This is a very natural use case.
And people like, they try to prompt minimally, right?
So they don't encode a lot of, you know, they don't say, I don't know, think step by step
or you're an agent that's supposed to do this or they just kind of,
just say what they want.
So there's a class of people
that almost implicitly write signatures.
But there's something wrong
with the fundamental shape
of the API that usually exists.
And so signatures are just saying
here is a better shape
and we made every decision here
slightly more carefully.
Now, once you have signatures,
every other part of DSPI
from an abstraction standpoint
falls off of it.
There's really nothing else.
Once you have signatures,
you could ask,
I have a function declaration.
It's just declaring a function.
It doesn't do anything.
One of the hardest things about
people wrapping their head
about DSPI and signatures
is a signature
does Apple.
absolutely nothing. And it's entirely their job to build it. We actually can't help them at all build
the signatures. A lot of the time, people are like, well, couldn't you generate the signature from this
or that? The signature is encoding your intent. I know nothing about your intent up front. That's the
whole point. Wait, what are this? To be very clear, what are the signatures written in?
I mean, fundamentally, it could be a dragon drop thing. It could be a, but usually, like, it could
be whatever. But the point is, it is a Python glass, usually. It's a Python. It's formal. It's formal.
It's formal. It's formal. It's not English, right? It's, well, it's a formal structure in which
almost every piece is a fuzzy English-based description.
So you could say something like,
I want a signature that takes a list of documents,
and the list of documents is the typed object.
You could actually say list of document,
and you have to define what the type document means.
And the fact that this type is document,
maybe the name matters.
Like, a list of documents is not necessarily the same
as a list of images, right?
There are different things,
and they're like semantically and fuzzily different.
And basically, like, it says in English,
given these inputs,
you have several of them.
I want to get these outputs and you have several of them
and maybe the order matters.
So it's really just like, I argue, it's what a prompt is supposed to be
or what a prompt wants to be when it grows up.
It really is just a cleaner prompt.
Now, if you grant me that, which I argue, like a really small,
it's a very simple contribution.
There is really not a lot of richness to this, but that's the point.
You get everything else that makes programming great
while being able to build really powerful AI systems
because you can now isolate your ambiguity at the right joints.
You have a notion of where you want the joints to be.
And the rest of your systems, the rest of your programs can be very modular.
You can have multiple signatures.
So now you get what people call multi-agent systems.
Multi-agent systems are just AI programs in which you have multiple functions.
You know, it's not really that.
It's really nothing.
It's not really a complicated idea once you take this.
You get things like inference time strategies.
People are like chain of thought, you know, you have to write your prompt in this way,
or we have to train the model in a certain way, or react agents or, you know,
program of thought.
We recently released this thing called recursive language models.
You know, the thing is when you're solving a task,
none of these inference strategies should be of your concern.
It's if when you wanted to, you know,
like this is just a thing that should be compositional.
And signatures have the shapes that were like,
we can use programmatic sort of constructs
to compose over these types of, you know, constructs.
Do you, when you think of, you know, DSPI,
when you're originally creating it and now as you think about it,
do you think about it is something that will fundamentally
only be consumed by humans or for humans?
Not at all, no.
I can imagine, I can imagine cases.
where you bridge the gap.
And the reason I ask is there's this obvious question is if the interface
LMS is going to be all automated anyways, do we need to enforce these restrictions
that are primarily to keep natural language speakers within certain boundaries as opposed
to, you know, like whatever, and it's an agent calling it, we may not need to do that.
So I think it's just, I think the argument in DSI is intent should be expressed in its most
natural form. So that's the declarative part. And the second part is, unfortunately, or fortunately,
in the general case, that cannot be reduced below three forms. Some things are really best
express this code. And no amount of automation can remove that. There's no amount of automation
that can remove the fact that I actually want to think about three pieces because they're
separate to me and I want to maintain them separately. No amount of automation is going to remove
the natural language piece. Nobody wants to write Python to describe a really complicated AI
system like from scratch.
And no amount of automation is going to remove the fact that for some classes of
problems, you really need a more RL-like standpoint where you have a distribution of initial
states or inputs and you have a way of judging them or like metadata about what correctness
looks like because that really captures the wonky and sort of like exceptional long-tail
set of problems that actually vary by implementation or by model.
Yeah, you may also just want diversity.
Give me something that may solve this problem.
right? Like, it may just be that, like, there is no formal specification. Yeah, totally.
Right, right. So there is a machine learning, like, people associate DSPI a lot with the one that is most
different to what they usually see, which is optimization. So a lot of new users and a lot of people that
look at the paradigm and try to critique it conceptually, they miss the fact that you have to have
these three pieces or, like, in the general case, you can't get away without any. Now, by the way,
there's a lot of applications where you do not need all three. If you're building yet another
a RAG app and the model has been post-trained to death to take a context and answer a question
about it. You don't really need a lot of, you know, a lot of that to express your intent because
just close what the model is good at. Anyway, a lot of people associate DSPI with the third one,
which is the database optimization. And actually a lot of well-intending users would write overly
simplistic and general programs and try to distill their intent through data or through
kind of this process of trial and error. And that's a really like bad sort of, it's like a misuse of
the power of the models and the power of the paradigm, because if you know what you want,
nothing can express it better than just you saying what you want. The database of optimization
is there to smoothen the rough edges. It's for you not to have to maintain laundry lists of
exceptions. I'll wrap this up quickly, Martin. The other part of the ESPI is the reason we built
all of these abstractions, and we haven't been changing them. This has been the case for these
abstractions are basically three years old. They've basically not been changing. And what we
spend all of our, a lot of our research time on is building algorithms. And the thing
The thing about those algorithms is I'm not wedded to any of them.
I rarely go out.
I mean, we usually get excited about one for a month or something,
but I really go out and get particularly excited about getting anyone to pick one of them over the other.
We recently released an amazing genetic optimizer for prompts called JEPA.
Before that, we had another one called Simba that was just reflective method.
And we had MEPRO before that.
We have a lot of these algorithms, and they're really clever and cool.
But the thing that I'm interested in is we build these algorithms to expire.
As models get better, we can actually come up with better algorithms that, you know, fit the,
fit turning the abstractions into higher quality systems.
And what we want to happen over time is that our algorithms expire.
We build better ones, but the abstractions that we promised and the systems that people
expressed in those abstractions remain as unchanged as possible.
So that's kind of like a, that's something that's kind of unusual to a lot of, a lot of sort of
folks in the space.
It may also help just to kind of pencil up.
this sits in the software development lifecycle, right?
There's two places you could put it.
You could just be like, I am writing my software,
I want to know what's the best prompt to use,
you know, and then you could use it there,
or you could be like, actually,
the best prop is determined at runtime,
and so maybe you could invoke this, you know, actually.
So do you have, is there a standard use?
Do you do this like basically before the software's deployed,
or are there actual runtime uses where you're, you know,
trying to find the right?
So the two sort of like concepts that exist in DSPI for this, and I don't know how technical we want this to be, but like we have the notion of modules.
This is borrowed straight directly from like neural network layers or Pythorch modules, which is just saying once I have the shape of the input and the shape of the output, which is a signature, I can actually build a learnable piece that has some inherent structure, like what a machine learning person would call an inductive bias.
and I wanted to take that shape and implement it for me,
but carry some parameters internally about what it could learn.
So that's a module.
And a module is entirely an inference time object
in the sense that it modifies behavior when it's being involved.
So things like agents and different types of agents
and code-based or tool-based agents or chain of thought reasoning,
all of these are inference time strategies that are modules.
And the aspect in DSI here is that these must be decoupled from your signature.
Your signature should know nothing about the,
inference time techniques that you're using.
The other aspect of DSI is optimizers, which are, again, they're just functions, like modules
are just functions, but they're functions that take your whole program, like an actual complete
piece of software that has potentially many pieces.
And they think holistically, how do I use language models in order to get this thing to
perform its intended goal, which might be maximizing a score on a test set, but in principle,
it could just be like, do what the model understands from the instructions it should be doing.
And this could be, people do this at inference time sometimes in the sense of like it happens while the user waits, so to speak, user of a system.
But it's fundamentally different contract because it sees the whole, there's extra information.
I see the whole system when I'm an optimizer.
I don't just see like an isolated module.
I can see sort of all of the pieces.
I can see a data distribution.
I can see the notion of reward.
And so like I have much more like a much richer space because there's strictly more information that no inference to, you know,
inference technique, no LLM sort of is able to capture just from an information like flow standpoint.
You know, it's interesting because I mean, a lot of people think of DSPI as basically prompt
amplifier, which is, you know, here's my, whatever, my prompt template, tell me what prompt would be the best.
But to hear you describe it, it's almost, you know, it's like this kind of, you know,
declarative, you know, runtime-y type thing.
Do you know what the standard?
I don't even know if you have visibility of this stuff because it's an open source project,
but do you know what the standard?
uses? Is it the naive use case, which is largely prompt optimization?
Or are people actually using some more sophisticated ways?
I think one reason where, okay, so I am very, I'm very loud about the abstraction.
I talk about them all the time. I give talks.
You scalded me on X about this.
So, I was in a exact.
No, it was fantastic. It was great.
I know. You really corrected me.
Listen, I was one of the few people that really thought about it as the prompt optimizer.
I really thought, listen, I'm going to write my prompt.
I'm going to do some template magic.
I'm going to give it to DSPI.
And then it's going to give me like, what's the best thing for what I want to accomplish?
And then I'll just go to stick that in my program.
That's the way that I thought about it until you made the point that it's actually more of a set of abstractions that will evolve with your program.
So I tried to learn from what happened historically in computer science.
Like you had these machines and, you know, you got general purpose chips and people were programming those directly in whatever language they spoke, right, machine code.
And maybe you could abstract it slightly with assembly.
But then there was this amazing time
where a lot of languages culminating
maybe most popularly in C, but various others before it,
got this idea that there's actually a general purpose programming model.
You could build a model of a computer
without thinking about any specific computer.
And actually, that's a bit of an illusion
because every specific computer is a lot more complicated.
But you could create this illusion
that is much more portable
and it's much closer to how humans think.
And I know it's funny to you see
as close to how humans think,
but it really was a fundamental jump.
Once you have C, it's important to ask
why do people you see instead of writing assembly?
And it's really weird to me that anyone would use see
because it's faster than assembly, like the code runs faster.
So to me, when someone says they use DSI
because the quality of the system is higher,
which by the way is very often the case,
it's not really that I answer because you're jumping, in my opinion,
to a higher level abstraction such that actually
I would be willing to give up some speed
in order to have the portability,
the maintainability, the closeness to howish,
think about the system and I want to manage its pieces.
There is a trade of I'm willing to accept.
Now, the reason people actually have universalized C
and they don't regret it is this amazing compiler ecosystem
where people build all of these optimization algorithms
and passes and sorts of infrastructure.
You know, you're in line functions.
So you break the modularity, right?
People are writing modular code,
but you're actually breaking a lot of that modularity
when it's being turned into an executable artifact.
You eliminate that code.
You have all these heuristics.
different heuristics for different machines sometimes.
And so my vision here is if AI software is a thing,
and AI engineering is a thing that needs to exist
irrespective of model capability because we want to have
these diverse space of systems,
what is the abstraction to capture it?
And if natural language is too ambiguous
to be the only, like the complete specification of these systems,
and it's too mushy and we kind of want to have more structure,
well, what would that structure look like?
And if we know what that structure looks like,
well, if we do it naively,
you would actually lose a lot of quality.
If we build DSPi poorly,
you might have a really elegant program
that sucks, right?
When you use this, an NLM, it sucks.
So the reason I build optimizers
or like we build optimizers as a team
is not so much that I think people can't write prompts
and I want to write better prompts for them.
What a boring reason.
I don't care about that.
People can write prompts.
People can iterate on prompts.
That's not an issue.
The thing I'm trying to say is
I want them to express their intent
that is less model specific
and not worry that they're losing
or leaving a lot
on the table.
Yeah.
Yeah.
Honestly,
this is where,
like,
you changed the way
that I think about
this whole thing.
And so I'm going to try something,
and I alluded previously in our talk,
but I want to try it again on this,
because this is kind of how it changed by thinking.
You can tell me where I'm right or I'm wrong,
which is,
so you said assembly and C,
but we've actually had a lot of paradigm shifts since then.
So let's,
so see,
let's just say this is kind of like an imperative language where,
like for every event that happens,
you have to know how to handle it.
Right.
Right.
So traditionally in distributed systems,
like imperative,
imperatives have not been a good approach because you could have some event, you know, show up at the node.
And then you don't know what state the node is in.
And so you, I mean, it's just there's so, the state space is so huge.
And so you had actually a big abstraction shift with declarative languages or declarative languages to be like, okay, listen, we're going to tell you what the end state of the system is.
And then the system will figure out all of the state transitions to get there, right?
this was a higher level of abstraction for people not to have to worry about everything that
kind of comes in and every event. And you can actually declare, this is like data log or something.
You're like, here are all of the conditions that exist. And then I just want to make sure that the
system is always in that state. And then the thing you kind of give up in that case is like,
you can't bound the amount of computation needed. Like you don't know how long it's going to
take to get there, but it'll always get you to like that state. So it's easier for a programmer.
So like you can actually now build programs easier for certain classes of
programs. Now when I look at DSPY, I feel like it's the same type of leap between like
imperative and declarative, but for LLMs where there's certain declarative, like you can't
write a declarative program that's going to solve the same problems that an LLM can because
there's no fuzzy this and that and you can't really integrate them, right? And so like,
you want the same type of shifts that you went because you've got a new problem domain that
you have. And so DSPI kind of gives you that with LM. So you can kind of formally specify it in a way
that's kind of natural but also safe,
and then it decouples you from the actual implementation below it.
So is that a fair way to think about it,
or is that like just a martianism?
No, I think that's a fair way to think about it.
And I think one funny thing is,
I'm, and I think it was probably agree with this.
I don't know that declarative is better or imperative is better per se.
It's more that.
Because of the problem to be, like,
declarative is better for ones where, like,
you've got a very complex system with a lot of asynchronous events
because you don't need to maintain a state machine.
Yeah.
You know, all of these things have tradeoffs.
All of them do, right?
It was an LLM to, like, add two numbers.
Right.
And so I think, like, a really good shape for this is you want an imperative shell.
D.S.P. I actually compared to – there's a lot of sort of folks that create graph abstractions or whatever.
Like, things are very declarative.
I was able to go on the record saying graph abstractions generally are a bad idea, in my opinion, for basically anything in computer science.
But go ahead.
Right.
And exactly.
And I think it's because humans, when they think top down, we actually think imperatively.
And so, like, DSPI is just Python.
which is a, you know, I mean, it's a complicated language,
but it's fairly imperative in that you're just like,
you do this, you do this, do this.
But at the leaves, where you're going to,
where you were going to potentially have a fuzzy task,
what were you going to do? I think you were going to write the prompt.
I think the issue with prompts is actually,
fundamentally, they're actually so declarative.
They're too declarative that you're forced to break the contract of declarativeness
because you're like, well, if I just say what I want,
the model is never going to be able to fit in my bigger program.
You know, one reason, by the way, people forget this,
is if you work with a chatbot that is tuned for human responses,
you're doing most of the work that DSPy has to do in a program.
In a program, if I have a function and I want to give it inputs
and I want to get back outputs,
like those have to actually go into variables.
Those actually have to go like, there's, you know,
like the output has to, so to speak, parse in a certain way.
And I have to funnel things through this.
If you're a human who's just asking the model questions,
no matter what forming gives you,
you're smart enough to be able to like, you know,
bridge the fuzziness in how the shape of the model.
It's almost like the imperative is I know every step to the solution,
so do every step, right?
And declarative is I actually don't know all of the steps,
but I know the solution.
So give me the solution.
And like, D.S. Pi is almost like,
I kind of know how to frame the solution, do the rest of the work.
It's like this kind of fuzzier.
Right.
And I just, listen, there's tradeoffs to each, right?
I mean, like, yes, Pi, like, you have to like, whatever,
have the overhead of a soda model, which whatever,
took a billion dollars a train and expensive and inference.
So these are just different points in the declarative space,
the performance space, the cost space, etc.
By the way, so I'm totally bought in.
I actually think this is such a nice way.
Even independent of PSPy, like just the core workup,
this is how you should think about interacting with LLMs formally.
I really think that you've kind of nailed that abstraction.
So let's just take that as a given.
So what are the hard problems now,
or what are the next set of problems now to make
that more pragmatic, like the optimizations under the covers or like, whatever you need to do
to kind of. Yeah. So what we, so everything we talked about today, I do almost nothing about this
because this is work we did three years ago and I'm just out there telling people about it. But I'm not
really, you know, we're not changing these abstractions. What we actually do is the following set of
questions. We ask, right, someone wrote the program and we assume they did a reasonable job describing
what they want. And maybe that means they wrote the control flow, they have the signatures and they
have some data. These are the three pieces that you might want to have. Or,
they have some, not all of these. How do we actually do a good job at optimizing this? So,
you know, we've been, it's actually a really, I think, an interesting progression to sort of
see how we progress from the very early optimizers in 2022 to the latest ones. Very early ones
had to work with models that basically didn't work, right? And we're not had essentially no
instruction following capability, but we're hit and miss for their tasks. So what we did look,
you know, the reinforcement learning people do on LLMs, which is you take the program and you do
what we call, like, we bootstrap examples, which just is another, like, way of saying,
you just sample, like, you just run the program, maybe with high temperature or something,
a lot of times, you see which things actually work, and you keep traces of all of these
over time, and then those traces, you know, which are generated by the model, they can become
few shot examples. And if you just do that, sometimes it improves a lot, sometimes it becomes a lot
worse. So you just do some kind of discrete session top to find which ones actually improve
on average. That was like when models are really bad. As models have been getting better,
we've been moving, you know, we've moved a lot, basically all the way to reflective prompt optimization methods where, like, you actually go to the model and you're like, here is my program, here are all the pieces, here is what this language means. Here are the initial instructions I came up with from, like, just the declarative signatures. By the way, here are some rollouts that are generated from this program. Here are how well they perform. Let's debug this. Let's kind of iterate on the system to debug this. And obviously, there's a lot of scaffolding to make sure that search is actually like a formal algorithm that is going to lead to improvement.
But increasingly more and more of it is actually carried off by the models.
One thing we also do a lot of is we ask, all right, conventional, like, policy gradient
enforcement learning methods like GRPO.
Nothing about them cannot be applied to a DSPI program because the DSI program says nothing
about how the optimization should happen.
So actually, for a very long time from February of 2023, you could actually run offline
URL.
And since May of 2025, you can run online RL or GRPO on any DSPI program that you write.
People think that it's limited to prompt optimization, but I think the only notion that is fundamental in DSI to prompt is that natural language is an irreducible part of your program, but that prompt is human-facing. It's how you say what you want. How it turns, how it gets turned into the artifact may well use reinforcement learning with gradients or natural language sort of learning. So we spend a lot of time on optimization. We also spend a lot of time on inference techniques. Like you just declared that you want your signature, which processes
lists of books. Well, guess what? No model has long enough context to work with lists of books.
So last week, my PhD student Alex and I released this idea called recursive language models,
which sort of takes any model that is good enough and sort of figures out a structure in which
it can handle, you know, or scale to essentially unbounded lens of context and we were able to,
you know, push it to 10 million tokens and see essentially no degradation.
And the reason we build these types of algorithms is we really want to back your signal
by whatever it takes to sort of bridge the gap between whatever the current capability limit of the model is and the intent you specified.
And the last thing we think a lot about is, well, we've made this argument conceptually and sort of tried to demonstrate it empirically that you need this irreducible, you need these three pieces, you know, signatures in natural language, structured, control flow and data to fully specify your intent, at least ergonomically enough.
The question, though, is this is a very large space of programming where, like, you need to figure out,
okay, I have a complete problem.
How do I map it into these pieces, knowing that maybe I need all of them.
And so we spend a lot of time, this is why it's a big open source project, because we want to see what people actually build and learn from that sort of what are the software engineering?
Like, what are the AI software engineering practices that we should encourage and support?
So these are the types of questions we think about.
And I think one reason this has to have the structure of this like open source project.
It's just like this large fuzzy space is I don't want to be the only, you know, group or, you know, set a small number of teams working on any of these pieces.
I think it's a space of the more academics and the more researchers and the more people work on optimizers.
All programs benefit.
The more people work on modules, all programs benefit.
The more people build better models, especially programmable models, whatever that might mean in the future,
it sort of models that understand that they're going to get used in this structure that everyone benefits.
And it reminds me sort of of the way in which deep learning sort of really took off,
which was some people iterated on the architectures, some people iterated on the optimizers of,
you know, you got things like Adam, other methods.
And I think like that is what we're trying to really push a community towards.
All right.
So one last thing just to kind of finish off is getting a little bit more philosophical.
But, you know, I think in AI it allows us to do this.
And what you're addressing is, again, the ability to declare intent, you know, for these models in a way that hits the abstraction right.
If you can guess prophetically, whether in the future, like, the intent, like, one of these models are going to have independent agency like agents or it's going to be humans guiding them.
Do you have any opinion on, like, the direction this goes?
I asked this question a bit earlier, but I kind of want to ask it a bit more directly.
Do you think that the need for a human to declare things formally is going to go away.
And over time, like, we treat these like grad students or whatever.
And, you know, this all just becomes the inner working of an agent.
Or do you think that these things are formal software systems?
This is a language like any other language.
And, like, we will expect ESPY to be like the interface.
Something like it, something like that will need to be an interface that's exposed to humans, you know, for the foreseeable future.
I think you need some amount of grounding into the world when you build these systems.
Like, I just think people in AI a lot talk about, we can talk about AGI, but this kind of just
ethereal intelligence.
It just is so smart.
But the problem is, you know, like intelligence that we care about, as far as I can tell,
is really about, it's almost about the things you might want to ask or the way things are in
the world.
It's very world-oriented.
It's not really this.
you know, it's not this very abstract thing.
So as models get smarter and smarter,
I imagine that a lot of the problems people write programs for today
could get a lot simpler because that use case,
it's kind of like risk versus just architectures.
As CPUs, you know, if you believe in sort of complex instruction sets,
it's possible that like you had to do all of these,
jump through all these hoops to do a fast square route before,
but like somebody just gives you an instruction for that.
Like, models can keep absorbing with keywords or, like, in their language, more use cases.
But the human, you know, philosophically, as you say, like, the human condition is that we will just want more complex things.
And once you want these complex things, you know, in a repeatable way, you've got to build a system.
And if you want to build a system, I don't really see that not having a structure.
Like, I don't see that having the structure of LLM APIs today.
I see it maybe some nicer, nicer, faster.
say on top of the data.
Maybe I could ask you a point of questions.
You've got had students, right?
Yeah.
Do you ever wish that you had a DS Pi interface to them?
Right?
Like, the limit, it's a very structured way to have asks, right?
And then if not, wouldn't that argue that you wouldn't want that for an LLM in the limit either?
So I, I've got to go into question, but I actually mean it seriously.
Is it like humans just aren't capable of doing this stuff?
And so that's the reason that we don't have formalism.
Or is it just a totally different?
I promise this is an actual answer to your question about grad students.
But here's the answer.
The answer is the question sounds to me, don't you have chairs at home?
Don't you wish that they all look like tables?
I need both.
I really want to have.
There's a software system.
There's a grad student.
They're totally different thing.
And there's nothing saying that AI, there's nothing saying that AI that operates as a chat bot, as an agent, as an employee-like agent, is a problem.
I think we need it.
I love.
That's wonderful.
That's wonderful.
So sometimes you want to specify something to a machine that has an LLM.
And for that, sometimes you just want to talk to something.
We're two different solutions to that.
I love that.
This is a great answer.
It's a great way to end this.
Thank you so much for your time.
This has been fantastic.
Thank you, Martin.
Thanks for listening to the A16Z podcast.
If you enjoy the episode, let us know by leaving a review at rate thispodcast.com slash
A16Z.
We've got more great conversations coming your way.
See you next time.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security,
and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16Z.com forward slash disclosures.
