The a16z Show - What Is an AI Agent?
Episode Date: May 22, 2025What exactly is an AI agent — and does anyone actually agree?In this episode, taken from of AI + a16z, General Partner Guido Appenzeller and partners Matt Bornstein and Yoko Li break down one of the... most hyped- and most hotly debated 0 concepts in AI right now: agents.Are agents just clever wrappers around LLMs? Tools that can reason and act? Or simply a fresh label for familiar tech?Whether you're building, investing, or just trying to make sense of the buzz, this episode is for you.Resources:Find Guido on X: https://x.com/appenzFind Yoko on X: https://x.com/stuffyokodrawsFind Matt on X: https://x.com/BornsteinMattStay Updated: Let us know what you think: https://ratethispodcast.com/a16zFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Today, we're discussing one of the busiest and most confusing terms in AI right now, agents.
Are they just fancy rappers around LLMs, full-blown autonomous workers, or something in between?
A16Z Info Partners, Guido Appenzeller, Matt Bornstein, and Yoko Lee,
break down the technical definitions, pricing models, use cases, and why the term agent
means so many different things to different people.
If you're building, buying, or just curious about what agents are and aren't, this episode is for you.
Let's get into it.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security,
and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments,
please see a16Z.com forward slash disclosures.
So I think there's some things which are probably kind of easy to say,
there's a good amount of disagreement what is an agent.
We've heard a lot of different definitions of it,
both on the technical side as well,
I'd say on the marketing and sales side in some cases
because there's some sales models associated with it.
So let's start with the technical side.
I think there's sort of continuum here, you know,
The simplest thing that I've heard being called an agent is basically just a clever prompt
on top of some kind of knowledge base or some kind of context that has this of a chat type
interface.
So from a user's perspective, this looks like an human agent would look like, right?
So, for example, I ask it, hey, I have a technical problem with my product, XYZ.
It looks at the knowledge base and comes back with a canned response.
Well, there doesn't have to be a knowledge base, right?
It doesn't even have to be a knowledge base.
I see, got it.
So maybe it's just a train model.
It's all in the model way it's the knowledge.
it's even simpler.
So an agent could just be an LLM.
Right.
But the chat interface or something like that, by some definition, right?
I think on the other end of the spectrum, there are some people who basically say,
for something to be a real agent, it has to be something fairly close to AGI, right?
It needs to persist over long periods of time.
It needs to be able to learn.
It needs to have a knowledge base.
It needs to work independently on problems.
If you take them the most extensive definition, is it fair to say that doesn't work yet?
I think so.
It doesn't work yet, although.
Will it ever work?
That's a philosophical question.
All right.
Fair.
Very fair.
Very fair.
So if we take that continuum in between,
is that at least a way to chop that up into a couple of categories of maybe degrees of agentic behavior?
And different types of agent.
There's some artsy agent that help artists to come up with new bezier curves.
There's coding agent, which we like to talk about as the agent.
Which we use, yeah.
Yeah, which we use.
There's Agent that's just a wrap around on top of LLMs.
That's right, yeah.
I may be the contrarian in this group.
All right.
Look, I kind of think Agent is just a word for AI applications, right?
Anything that uses AI kind of can be an agent now.
Before we started this talk, I actually went online just to refresh myself about some of the more interesting AI agent perspectives out there.
I found a really cool talk from Carpathie that he gave a couple of years ago about A.
which I can describe a little bit, but the really funny part was on the YouTube recommended videos to watch next.
It's like AI agents are going to revolutionize your lifestyle and the rise of super intelligent AI.
You know, it's just kind of like marketing.
And so I actually do think that's what's going on in a lot of ways.
The cleanest definition I've seen of an agent is just something that does complex planning
and something that interacts with outside systems.
The problem with that definition is all LLMs now do both of those things.
They have built-in planning in many cases,
and they at least consume information,
at least from the Internet,
maybe from some servers that expose information
through MCP or some of their protocol.
So the line really is very blurry.
And, you know, what was so interesting
about the Carpathie talk is he basically,
he related to autonomous vehicles and said,
AI agents are a real problem,
but it's like a 10-year problem.
It's like a decade problem that we need to work on.
And I think most of what we're seeing in the market now
is not the decade version of this problem.
It's like the weekend demo version of this problem.
And this is why we sort of generate so much confusion.
You have this kind of poorly defined nebulous thing that LLMs are kind of consuming themselves over time.
And so I don't think anything we have are actually agents is kind of an agent itself may be a poorly defined and kind of overloaded term.
But if someone's willing to do the hard work and define exactly what it's like to kind of be a human but in digital form and spend 10 years to make it actually work, you know, that's sort of what I'm excited to see.
Okay.
So defining agents is a difficult job.
Maybe it's easier to talk about how people use the tools they call agents and what are the different degrees of agentic behavior.
I wonder if part of the conversation is redefining agents because we all know that agent as a term, which is not a great term.
It means so many things to so many people.
If it's interesting to dissect, like, what do we mean?
What do different people mean when they say agents where are different ways?
We could utilize this process we call agents.
So it seems to me this.
If we're trying to define agents or maybe even degrees of agentic behavior, which might be a little.
easier. There's something like a user interface aspect to it, right, where something that's a pure
copilot, where basically user goes back and forth with an LM to work in a particular task
that's often not called an agent. Is that fair? There's a little bit the copilot's versus agents'
UI models. Yeah, I guess like what are the elements we will think that goes into agentic behavior?
Like Matt mentioned planning could be one. There could be decisions.
made by the agent.
There has to be LLM somewhere,
but curious about your take.
So I think another definition
we heard from Anthropic recently
was this idea that an agent
is an LLM running in a loop with tool use, right?
Which there's two important parts of that.
One is this notion that it's not just a single prompt
and not even just a single static sequence of prompts,
right?
But something where the LLM takes the output of a prompt,
feeds it back into itself,
and based on that makes decisions on what the next prompt
and likely also went to abort.
You went to complete a task.
I think that for the real agents or the more agentic behaviors,
I think that's a reasonably good definition.
I think the other thing is...
But just by that definition, isn't every chatbot effectively an agent in this world, right?
Like if I go just to chat gpt.com and use their latest reasoning model
with web search.
Right?
Isn't using tools and, like, feeding its outputs into a new,
prompt in order to do kind of chain of thought.
Chain of thought is a little bit in between.
If it's just a single prompt that comes back with a result,
then it wouldn't have this notion of planning
and doing a more long-term concept
and deciding itself when it is complete, right?
If you have a chain of thought reasoning
where I'm giving a more complex task,
that's starting to look agetic, I agree.
I just think it's really tough
to define a system based on what someone says to it, right?
Because these are by design, unstructured inputs.
These systems will accept literally anything.
And so sure, if you tell it,
you know, what's today's weather.
I would agree that's not agentic, right?
That's just fetching, you know, from an API.
If you ask it, define a new philosophy of weather, right?
It'll happily go do it, right?
So it's like an agent if you ask it one thing,
but not an agent if you ask it another thing.
I think that's kind of a lot of the confusion in the market around this.
And, you know, if we spoke in the terms that you're talking about,
Guido, of like, hey, this is an LLM in a loop with a tool.
Like, that's actually a much more productive way to talk about it, I think.
Yeah, yeah.
I mean, that's it seems like we've seen, to some degree a specializes,
of user interfaces in sort of two directions, right?
There's, let's say, a cursor or something like that,
which really emphasizes the tight loop between the user,
the tight feedback loop between the user and the LLM
and the thing I'm working on, right?
So I want immediate gratification when I do something,
you know, and sort of response time matters.
Then there's sort of more the backend as, you know,
source club management system type plugins
where it's more about throwing something over the wall
by maybe answering a couple of questions,
and then you try to maximize the amount of time
the agent can work independent.
So it seems like, I think you're right that there's no clean system, like definition,
split between the two.
But there seems to be a little bit of user interface specialization.
Is that a fair statement?
It almost feel like for all the use cases we've described, there's one element that all agents have,
which is reasoning and decision.
Would you call just a call to LLM to say, translate this text to JSON?
That's probably not the agent.
But then if you ask LLM to say, hey, this side.
where, you know, this response goes and routed for me.
It feels more like an agent than before.
So it almost felt like planning.
I'm actually not sure does the agent need to plan
or does it need to decide maybe both.
I actually feel like it's like multi-step L-I'm chained
with a decision tree.
A dynamic decision tree.
Yeah, I think that's fair.
I think we've all just been nerd-sniped.
I just think, you know, it's like humanities people love classifying
and, you know, they draw kind of like
fine distinctions between different types of,
things, entities, whatever. We're computer scientists. Like, we're, you know, no, there's anything wrong with
humanities, but we're just not that. So I think we're not well equipped when it's a bit isn't just
zero or one. It's maybe something in between and we just talk about it a lot. We try to like coerce it
to one value or the other. Yeah. Of course, agents are more than pure technology. They're also
becoming products, which means they need to be marketed and how someone positions their product
has a major effect on how they price it. What's more, the ultimate
value of any given agent, which is still to be determined for the vast majority of them,
is to what degree they can actually replace or simply augment human workers.
A very interesting point, which is, I think there is a marketing angle to agents.
I've heard this narrative from a couple of startups that they're busy saying, like,
hey, you know, we can price the software that we're building much, much higher because this
is an agent.
So we can go to a company and say, you're replacing a human worker with this agent, the human
worker makes, I don't know, $50,000 a year.
And therefore, this agent, you can get phone.
the $30,000 a year.
This sounds really compelling from a first glance.
And actually, I mean, there's some value to it
in the very early days because
it essentially, it's very easy to understand
comparative pricing for somebody
who has to make a buying decision, right?
On the flip side, we all know that the cost of a
product over time converges towards the marginal
cost of production, right? And so today
if I used to use a translator, maybe
to translate a page of text, today you use
chat GPT. I do not pay
chat GDPD like I paid my translator.
I paid a tiny fraction of a cent,
right, which is the API, which is the actual cost.
So I sort of wonder how much of the agent debate is different by marketing and pricing.
I just actually think this is a really interesting topic.
What fields can you think of that are actually suffering complete replacement from AI or AI agent?
And this is a setup.
I'll warn you.
I have another extreme point of view that I'll say afterward.
But can you think of fields where this is actually happening?
Not completely, but definitely partially, because there's a lot of, for example, voice agents
I replaced receptionists.
I don't know if we should name.
Replace people who would, you know, get back to customers.
So there's definitely a lot of workloads that have been offloaded from the folks who traditionally did the job.
But I don't think they're, you know, 100% replaced.
They can, you know, they can do something else.
But we are seeing headcount growth in some areas are slowing.
So it's not that existing jobs are being replaced.
It's more like they're hiring net new humans slower.
I think it's exactly right.
I mean, I think in few cases, humans will get replaced by AI.
In most cases, you know, two humans will get replaced, one human by one human that's more productive with AI.
Or, yeah, or maybe they keep the two employees.
Maybe they go to three employees because now they're more productive.
Yeah, right, right.
It's just a really interesting question.
And the reason I think it's really relevant to agents is I think part of the ethos and part of the confusion around agents is this idea that we actually will develop human replacements.
Right.
And that this thing we called an agent.
agent, which by the way is a name for a person, right? Before we had AI, we had people called agents,
and we still have all kinds of people called agents. And it just doesn't seem like that's
happening, right? Not in the replacement sense, right? You mentioned Yoko with agents. We've always
had, you know, customer support automation. You know, we've had 1,800 numbers where, like,
press 1 for sales, plus, you know, if that's existed for a long time, this is a much better form
of that, obviously. Translation is a great example, too, Guido. These systems can perform
translation extremely well, but you're probably not going to just stick something to chat
GPT and then publish it on your website, right? There is actually work that needs to take place.
And I think the reason for this is there's just fundamental creative work in most things
that humans do, right? I think from our kind of perch in Silicon Valley, we can forget that
sometimes that people all over the country and doing all sorts of jobs actually have hard jobs
and not just hard in the sense of someone's got to do it jobs, but hard in the sense of it does
take thinking and human decision making, which I just don't know that AI kind of has what we would
think of as decision making or intent, right? It's a system that still somebody has to push the
button, right? It may be running somewhere. It may do a great job of whatever. Someone still
us to give it a prompt and hit go. And to me, that's a lot of the confusion around agents.
We're all thinking at some point a human person with intent and creativity and thinking is going
to be replaced. I'm just not sure that even is theoretically possible, right? It's almost just like
a catch-22 to say an AI system is thinking for itself, right, because somebody has to have sort
created. You know, this is old sci-fi philosophy I'm getting it to now, but like I actually do think
it's a big reason for the confusion that, you know, we sort of experience now.
It's interesting because there's two types of agent we're already talking about. There's one
type where the agent is replacing humans, work with humans, do you things humans can do?
There's the other type of agents as more low-level system processes. They work with each other.
They hand off tasks to each other. To some extent, agents are like technical details in the system
in that way, but we mean both when we talk about agents.
In that case, is there actually a difference between an agent and a function?
I think so.
I think agent will be multiple functions with LMs in the middle.
If I have a low-level agent, and I'm giving this low-level agent a task,
and I get back a task result, it looks a little bit like a classic API call.
But with the LM in the middle to make decisions on what to do for that API call.
So I understood, but that's sort of how this function works internally.
Yes.
To some degree?
Yes.
Right?
Yeah.
So from the outside, would I care?
You wouldn't care.
It's like most of the time when we see AISDRs,
what we talk about AISDR agents.
What we mean by that is when the agent can go to the CRM, pull something out,
and then filter the list, draft an email, and send the email.
So that feels very process level instead of human level.
Yeah, so that's what I meant.
If you don't know how this thing works internally,
a classic function and an agent become indistinguishable.
Totally.
I absolutely agree, but when you, as a programmer, when you find, right, the function, you will define agent that that's this thing of face.
Implementation.
We'll get back to pricing shortly.
But first, let's dive a little deeper into this discussion of how interacting with an agent is different than, or similar to, traditional software-based functions.
So here's one interesting thing to think about on that topic.
I totally agree with you, Guido, and I think you sort of agreed to.
it's really a function if you kind of just look at it that way.
Shareable, reproducible functions have never really been a thing.
This has been one of these long-time goals that people in the market have tried to say,
oh, I can just write a function and then anybody on Earth can use it, right?
Like, you know, we have packages, right, that you can download a whole package with various functionality,
but literally just one function that you can share.
If you kind of squint a little bit, that kind of exists now with AI, right?
Because you have these models that's trained by somebody.
somebody else may download it, fine tune it, train a Laura, package it up into some new and interesting way.
And then it's actually immediately available for someone else to use on hosting services or hugging face or something like that.
So while it does seem to be just an implementation detail, whether you're using an LLM or not,
there is this interesting thing where the model itself takes up so much of that functionality in the function.
And it's just a different kind of animal compared to normal code.
It's actually more, it's kind of shared by default in a way because nobody's going in and training their own model every time they're
writing code. You know, it's obviously heavy, right? It's harder to move around. There are all
these different characteristics from normal functions, some of which are actually very desirable.
Some are kind of, you know, bad, right, characteristics you don't want, but many of them are
kind of interesting. And I think we'll actually see new infrastructure, new dev tools, kind of built
around this in the long run. I think it would make sense. I mean, when, if we go back in time,
the last time we sort of invented a major new component for building systems, which was probably
networking, right? How we thought about calling a function before networking afterwards changed a lot,
I don't need the complexities of APIs and the infrastructure on that is completely different today.
This is such a good point because now I think about it, I feel like humans are just functions too.
Like if you have a thought experiment and then replace LLMs in the program to a human,
like the kind of answers will give to the program is not that different from whether LM will give to the program.
So if we actually all get hooked up to servers one day and can be called as a function from Lambda,
then I will agree that agents have been created.
That's what an agent is.
Isn't Mechanical Turk exactly that or maybe even your email inbox?
There's an Amazon Go supermarket a while back in SoMal.
I think they were advertising that it's computer vision models behind the scenes,
identifying what you took from the supermarket.
But then people found that they hired a lot of people behind the scenes
to actually label the data in real time.
So the humans in that case are the functions that today may be...
Secret agents.
Right. Replaced by all ends was...
Well, but this was exactly my point, though, right?
There actually is important creative work.
Even in a grocery store checkout clerk, right?
You could naively think, oh, this is an easy job.
Actually, it's not an easy job at all.
Right, yeah.
And so you can take this work and kind of shift it, right?
And you can squeeze it down with automation and stuff.
But it never really goes away.
Oh, yeah, absolutely.
Yeah.
All right.
So given all of this, how should company
he's thinking about pricing their agents.
Per seat, per token, per task?
Hint, it might be too early to truly tell.
Usually, if you introduce a brand new product category, right,
you often initially put a pricing that prices against the status quo, right,
whatever you replace or augment in some cases.
But let's assume we have a direct replacement, right?
So that's, I think, where this idea from, oh, this replaces a human, which it doesn't.
But if it would, right, then you could charge X amount for it.
usually over time competition kicks in, right,
and you're effectively priced by how much your competitors are charging.
And you start sort of an erosion.
Then it depends on many things like how much of a mode do you have,
do you have customer lock in, right, and so on.
Long term converge against the marginal cost of production, right?
Which, I mean, look, if I look at most agents today, is probably very low.
Any agent you can purely model in software,
with a couple of LLMs calls, you can run at a very very,
low cost. The cost is decreasing
over time. And I would sort of argue
that's kind of already
what's happening, that in practice
most AI
applications, and in particular
if we want to call them AI agent applications,
you know, they have their sales
pitch around, you should pay us
X because we're saving you. You know, it's like a
classic ROI calculation. Establish value.
Yeah, exactly. Yeah, exactly. Value based
pricing, you know, but in practice, I think
most buyers are actually pretty sophisticated
about what's going on under the hood. And to your
point, they know it's a pretty simple stuff happening. And so it's like, hey, what does it cost you to run all these
GPUs and we'll pay you some premium over that? And I think that's how a lot of vendors are pricing
in practice these days. I mean, long time you'd expect pretty healthy margins, just like in SaaS, right?
Which software traditionally has very good margins. It's so funny because we always advise companies to
not price based on the margin, but price based on the value you add, whatever that could be.
It could be compared to other vendors on the market. It could be compared to just, you know, what it is
building in-house. And traditionally for infra, a rule of thumb, not always the case is that
if the surface is used by a human, it's a per se pricing. And if it's a service is used by other
machines, it's a usage-based pricing. And I actually don't know where to put agents here.
Well, it could be used by either, right?
It could be used to by either.
Look, I think your analysis is exactly right. And the reality is most AI companies don't know what
value they're generating yet this is so new and so nascent that it's like, hey, we're just
going to charge something that we're not going to lose money on. And, you know, in the case of
Open AI, they have how many millions of users, they probably don't have a very strong sense of what
they're all using it for. And once they do, right, and you see this more, they're trying to
verticalize a bit more and have kind of specific products for a specific use cases, code, obviously
being the big one. You know, then you'll be able to see the pricing kind of catch up is kind of my
hypothesis. This reminds me the Open AI point you brought up. I was a very important. I was
I was thinking about AI companions, because that's the closest to per seat, per seat human pricing.
Like, you can't charge someone every sentence they talk to their companion, although some of the foundation of models.
There are services that will charge you per response.
I haven't used them, but they do exist.
I see.
Wow.
Okay.
So usually, it's kind of weird to charge someone, like, buy tokens of how much they talk to the companion, whether than like a flat monthly fee.
It doesn't feel like a true friend.
Right, exactly.
It's very transactional.
This is, look, this is all theory, right?
People love sitting around and talking, oh, we're going to charge per person, per task, per, you know, world economy that we rescue.
You know, it's like, it's all made up, right?
I think Guido's thing was exactly right.
Let's look at the actual technology underlying what we're calling agents right now, where they're being deployed and why.
And honestly, the pricing, the marketing, the sales tactic, all of this kind of follows from what they're actually selling.
If I'm selling something that looks like an agent,
but I haven't truly figured out the value I'm providing to my users.
How do I justify the jump to a higher price point when I do figure out that value?
You just need to be selling a solution rather than a product, right?
This is really well-worn expertise in enterprise go-to-market.
Code, you can somewhat see the decoupling of price from the underlying technology now,
because it really works.
There's a very clear ROI to people who use it.
And so as a VP of Engineering or a CTO, you can look at this and say,
okay, I'm actually saving a lot of money
and my guys are getting a lot more productive.
I can do a normal.
And they happier?
Yeah, so you're kind of buying a solution, right?
You're buying from a vendor
something that solves a problem for you,
which, again, Microsoft, Oracle, salesworth people
have been doing forever.
Once we start to see more of that,
it's going to be these things
that become real products
and kind of decouple pricing
and look kind of like real businesses, I think.
I think it's dictated by the high-level application.
So I'll give you an example.
So I'm a Pokemon Go player.
So for those who have played Pokemon Go,
once you're collecting enough Pokemon's,
you are out of storage in your pocket.
So you need to pay extra to buy a new bag, virtual bag,
that you can put more Pokemon in.
And as an infrastructure investor, I invest in storage businesses.
And then when I look at how much I need to pay for like 30 extra Pokemon,
it was thousands of types more expensive than what storage is.
So it actually reminded...
I'm surprised at something thousands.
It's 10 to the 15 or so.
There's a whole price curve on Pokemon storage.
it turns out.
Because this is one JSON blob, basically.
It's one JSON blob.
I know.
And they charge you like $5.
Yeah.
And then the Pokemon, normal Pokemon players,
they wouldn't think about this,
like how much do you storage costs, right?
Like a normal Pokemon player would be like,
oh, this capability, I would be
happily paying thousands of more
than I were to have an S3 bucket
somewhere. So one of it is
monopoly. So it's an application layer
monopoly that you wouldn't have been able to
store the Pokemon anywhere else.
And two, it's a use case. It's for a different audience. It wouldn't be asking these questions.
It would be thinking about what is the net new value. What's the net new cost I will be willing to, you know, for the bill for if I were to get this value. Is it a fun game? It's a fun game. Take 100 more dollars.
Yeah, I think that's exactly right. And implicit is what you're saying is this idea that the product or the solution has to actually work for them, right, for less technical person who's, you know, the person who's not going to try to provision their own storage bucket to self-hosts.
Just kind of bring more on S3 for Pokemon, yeah.
And it's quite defensible, differentiated, too, because, you know, Pokemon Go is not open source.
There's no other replacement of Pokemon Go.
There's only one Pokemon Go.
So there's only one place where you would be willing to pay so much money for Pokemon storage.
Plus, very strong brand.
Plus, you have a little bit of network effect because you can play together.
Yeah, and then we'll see the AI agent version of this.
I can't wait to see the AI companion version of this.
Paying storage for AI Companions, wardrobe.
As the AI market continues to shake out and evolve,
where will agent capabilities ultimately live?
For example, can they live inside LLMs,
or must they call external tools?
And who's ultimately in the best position to influence this?
Super interesting question, right?
What's the system's perspective of how an agent is built?
And I personally think that architecturally,
there really is no difference between your typical SaaS software
to do an agent in terms of how you build it, right?
And let me explain why.
So an agent, you have sort of an overall loop with an LLM and prompts that feeds into itself, plus external tool use.
The LLM itself, you probably want to run a separate infrastructure just because it's highly specialized.
You need these vast GPU farms.
You can't easily run today's large LLMs and a single GPU.
So that's a very specialized infrastructure that's externally.
So the LLM call is external.
The state management, well, today in SaaS applications, we do all the state management externally in databases or something like that.
So you probably also want to externalize that, right?
And then what remains is fairly lightweight logic, right?
Where I basically I'm taking context that I retrieve somehow from databases.
I assemble that into a prompt.
I run the prompt.
And then I occasionally invoke tools.
Maybe I do that with MCP or something like that with an external server.
But the core loop is actually pretty lightweight, right?
And I can run a gazillion agents on a single server.
Not exilient, but many agents on a single server.
I don't need a lot of compute performance for that.
Does that sound about right?
Yeah, yeah, I totally agree. The interesting architectural question for me has always been,
how do you handle the kind of nondeterminism that may come? Many of the successful AI applications
that we all use and love really just spit model outputs back out to the user, right? Like a chatbot
or image generator, it's like, hey, I called the LLM, here's what I got, you know, good luck.
When you try to actually incorporate the output from an LLM into the control flow of your program,
that is actually a very hard, very unsolved problem. To your point, to your point,
there are relatively minor architectural differences today, but this may actually drive more significant changes in the future.
I actually think the winners will be the specialists, not the foundational models.
It's the people who will build on top of the foundational models or fine-tune the foundational models.
So like a very artistic example of this is that I've been spending the last two weeks just prompting GP4O, their image model.
It's very good at cartooning, so it's very good at manga.
It can spell, so it has a storyline.
But then I realized that there's only top two or three style is good at.
So it's good at Jibli.
It's good at manga.
And then there's variations of a style in that realm.
So now where art comes in it is that the market likes out of distribution art.
Everyone doesn't want to see the same things over and over again because that's how they value art, something that's different.
Ideally, maybe.
Did in summary reason to define art as out-of-distribution samples?
Yeah, art can be in distribution
That's pop art, right?
It could also be out of distribution
That's like when Impressionism came up many years ago
Everyone was drawing Impressionism
And at the time, the painters before
They were like, what's wrong with your eyes?
Why are you drawing blurry images?
So styles come and go.
But because of that, I think it's a pushing distribution question
How the foundational model will never cover 100% of everything.
So it's really up to the humans.
and specialists of the next wave to come up with the new data, new workflows, new aesthetics
to push that distribution.
Of course, at the end of the day, agents are only as useful as the tools and data to which they have access.
So what happens if major web platforms decide they want to keep agents from accessing their data?
It seems like one of the hardest things about agents today are data modes.
In some cases, just because they're technically difficult, and agents trying to access data,
and agents trying to access data and it's just very hard to integrate with that system.
In some cases, it's very deliberate, right?
My iPhone, the photos are not accessible via any API because it's a walled garden.
So it's sort of data silos you're talking about.
So is that something that's holding back agents or is making them more difficult or to make it even stronger?
Consumer companies traditionally often were opposed to offering automated access to their services
because they want their user engagement, they want the time to advertise to the user.
Will that limit how much we can deploy agents?
And would that be changed once we have the browser native agents
that can browse the web and browse our phone?
Great question, yes.
I think that I think Yoko is totally right.
You know, it's like there's strong incentives for people who own data
about, you know, physical entities, you know, people, businesses, et cetera,
to keep it to themselves, right?
Especially because they may be scared what AI is going to do to them, by the way.
So they're kind of clinging tight to what they have.
And these problems are rarely solved.
by defining a new protocol and just saying,
hey, if we make it easy for people to give away their core assets,
they'll just do it.
You know, obviously, you know, that's very unlikely to work.
But someone eventually will solve this by saying,
hey, if your data is publicly visible, we're going to get it.
You know, it's like, by the way, it's not actually your data.
It's not actually your data.
It's not about me.
Actually, I feel like the new advancement in models may just change the data mode.
Kind of to the point of today, web browsing, using an agent,
doesn't work super well.
It's very slow.
It's very clunky.
You have to try it multiple times for it to do any task.
But imagine if we have foundational model capability of giving an agent ability to go to any website,
logging as a human, we'll table that one.
I don't know how agent identity works yet.
Or go SSH into a server, like execute certain commands, or like spin up a virtual machine for mobile
or access the device farm, devising of device farm to play Pokemon Go.
Like maybe those are the data traditionally only available to humans under that account now may be available to agents.
There's also the opposite that could happen, right?
That basically all the consumer sites are starting with more and more complex anti-agent captas trying to keep out their agents because they only want the humans that have attention to come to those sides.
I mean, I recently did use one of these deep research tools, one of the major LLMs.
And one of the steps, if you look through it, all the steps that went through was like, you know, trying to do.
see how it can get around the capture mechanism for a site. That was an actually reasoning step,
right? Where basically it felt it'd know what information I wanted and it was blocked from accessing
it. So is that, you know, how dystopian is the future going to be here? It solved it, actually.
I mean, it's so interesting. So here's a really early machine learning example of this. I don't know if
you guys remember when Gmail first implemented ads. It was a big controversy because they basically
said, okay, we're not going to read your emails, but our algorithms are going to read your emails.
and we're going to suggest ads that you should watch,
or click on based on that.
We all sort of, I think, just forgot and got used to it.
I still think we don't love the idea,
but we kind of lived with it.
But some of the data providers reacted by removing data from email, right?
So Amazon famously now when you order something,
they send you a confirmation email that says,
hey, you just ordered something.
Click here to find out what you ordered,
when it's going to arrive,
or any information you might want to know.
And so that actually did happen in practice in that example
that the major data holders kind of found
ways to withhold it. It'll be interesting to see whether that's
possible now or not. But that same data
is script on the client's side
from the ad network that
install. Oh, sure. Yeah, yeah.
Yeah, there's always some other way.
Yeah, not maybe exactly the same, but pretty good
proxy. Yeah, yeah. It may be that
it's much harder to
tell the difference between an LOM and a human
than a classic, you know, so
the API call mechanism at a human.
That may change the dynamics.
Finally, Guido, Matt, and Yoko
answer an obvious question on the longest
timeline into which we might have clear visibility. What needs to happen to make agents a truly
game-changing innovation within the next, say, two years? I think the positive vision is that in two
years, we figured out how an agent working on my behalf can use most of the tools that I have
access to. I think it's also clear what are all the pieces that are missing for that, right? We have
not figured out security authentication access control for agents working on my behalf yet. We have
not figured out how data retention works.
We have not figured out the relationship with consumer websites that potentially want
to block that agent.
But if you had that, it could make many tasks much, much easier.
Today, if I have data sitting, say, my Google Drive or so, right, how easy I can reason
about that data versus other data that's in more fragmented sources, it makes an incredible
difference.
So I think that's the bulk case, right, where you have agents that can take all the data
that you can access, they can access it, and you'll be.
behalf and perform tasks on your behalf, right, and save you a ton of time. It could make you,
depending what you do, like, you know, multiple times as productive as you are today.
My answer to that is actually different modalities on the foundational model. Today is still
very much text-based, and that worked really well for coding and text-based tests. But then for
more visual first tests, there's just no one-to-one mapping. Even for web browsing, it's like a very
clunky experience of take screenshot every couple of seconds and send it back to the foundational model.
So I will actually bet on multi-modality when it comes to if we train the model with different traces of clicking on buttons on the website, navigating the web, using different devices, drawing, producing vector art.
I think there will be new things that the model could unlock on the agent level.
You can probably guess my answer.
If we don't use the word agent two years from now or five years from now, I think that's a huge win.
There's actually a fun paper put out by some folks at Columbia, I think, called AI.
as normal technology.
And they sort of make the argument
that there's a false dichotomy out there.
It's like,
AI is either going to bring about utopia or dystopia,
meaning everything's going to be amazing
because we have AI or everything's going to be terrible.
This is kind of the national discourse.
But if you just think of it as normal, right,
like water or electricity or the internet or things like that,
I think that's the world we're kind of headed towards
an agent is this kind of way to help us get there.
And so that's my goal.
I mean, this stuff is just incredibly powerful.
We understand how to use it.
understand any use cases and we're kind of, you know, we're kind of putting it to use for us.
Thanks for listening to the A16Z podcast. If you enjoy the episode, let us know by leaving a review
at rate thispodcast.com slash A16Z. We've got more great conversations coming your way.
See you next time.
