The a16z Show - What Is an AI Agent?

Starting point is 00:00:03 Today, we're discussing one of the busiest and most confusing terms in AI right now, agents. Are they just fancy rappers around LLMs, full-blown autonomous workers, or something in between? A16Z Info Partners, Guido Appenzeller, Matt Bornstein, and Yoko Lee, break down the technical definitions, pricing models, use cases, and why the term agent means so many different things to different people. If you're building, buying, or just curious about what agents are and aren't, this episode is for you. Let's get into it. As a reminder, the content here is for informational purposes only.

Starting point is 00:00:41 Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16Z.com forward slash disclosures. So I think there's some things which are probably kind of easy to say, there's a good amount of disagreement what is an agent. We've heard a lot of different definitions of it,

Starting point is 00:01:17 both on the technical side as well, I'd say on the marketing and sales side in some cases because there's some sales models associated with it. So let's start with the technical side. I think there's sort of continuum here, you know, The simplest thing that I've heard being called an agent is basically just a clever prompt on top of some kind of knowledge base or some kind of context that has this of a chat type interface.

Starting point is 00:01:41 So from a user's perspective, this looks like an human agent would look like, right? So, for example, I ask it, hey, I have a technical problem with my product, XYZ. It looks at the knowledge base and comes back with a canned response. Well, there doesn't have to be a knowledge base, right? It doesn't even have to be a knowledge base. I see, got it. So maybe it's just a train model. It's all in the model way it's the knowledge.

Starting point is 00:02:03 it's even simpler. So an agent could just be an LLM. Right. But the chat interface or something like that, by some definition, right? I think on the other end of the spectrum, there are some people who basically say, for something to be a real agent, it has to be something fairly close to AGI, right? It needs to persist over long periods of time. It needs to be able to learn.

Starting point is 00:02:21 It needs to have a knowledge base. It needs to work independently on problems. If you take them the most extensive definition, is it fair to say that doesn't work yet? I think so. It doesn't work yet, although. Will it ever work? That's a philosophical question. All right.

Starting point is 00:02:38 Fair. Very fair. Very fair. So if we take that continuum in between, is that at least a way to chop that up into a couple of categories of maybe degrees of agentic behavior? And different types of agent. There's some artsy agent that help artists to come up with new bezier curves. There's coding agent, which we like to talk about as the agent.

Starting point is 00:03:03 Which we use, yeah. Yeah, which we use. There's Agent that's just a wrap around on top of LLMs. That's right, yeah. I may be the contrarian in this group. All right. Look, I kind of think Agent is just a word for AI applications, right? Anything that uses AI kind of can be an agent now.

Starting point is 00:03:21 Before we started this talk, I actually went online just to refresh myself about some of the more interesting AI agent perspectives out there. I found a really cool talk from Carpathie that he gave a couple of years ago about A. which I can describe a little bit, but the really funny part was on the YouTube recommended videos to watch next. It's like AI agents are going to revolutionize your lifestyle and the rise of super intelligent AI. You know, it's just kind of like marketing. And so I actually do think that's what's going on in a lot of ways. The cleanest definition I've seen of an agent is just something that does complex planning and something that interacts with outside systems.

Starting point is 00:03:59 The problem with that definition is all LLMs now do both of those things. They have built-in planning in many cases, and they at least consume information, at least from the Internet, maybe from some servers that expose information through MCP or some of their protocol. So the line really is very blurry. And, you know, what was so interesting

Starting point is 00:04:17 about the Carpathie talk is he basically, he related to autonomous vehicles and said, AI agents are a real problem, but it's like a 10-year problem. It's like a decade problem that we need to work on. And I think most of what we're seeing in the market now is not the decade version of this problem. It's like the weekend demo version of this problem.

Starting point is 00:04:36 And this is why we sort of generate so much confusion. You have this kind of poorly defined nebulous thing that LLMs are kind of consuming themselves over time. And so I don't think anything we have are actually agents is kind of an agent itself may be a poorly defined and kind of overloaded term. But if someone's willing to do the hard work and define exactly what it's like to kind of be a human but in digital form and spend 10 years to make it actually work, you know, that's sort of what I'm excited to see. Okay. So defining agents is a difficult job. Maybe it's easier to talk about how people use the tools they call agents and what are the different degrees of agentic behavior. I wonder if part of the conversation is redefining agents because we all know that agent as a term, which is not a great term.

Starting point is 00:05:16 It means so many things to so many people. If it's interesting to dissect, like, what do we mean? What do different people mean when they say agents where are different ways? We could utilize this process we call agents. So it seems to me this. If we're trying to define agents or maybe even degrees of agentic behavior, which might be a little. easier. There's something like a user interface aspect to it, right, where something that's a pure copilot, where basically user goes back and forth with an LM to work in a particular task

Starting point is 00:05:46 that's often not called an agent. Is that fair? There's a little bit the copilot's versus agents' UI models. Yeah, I guess like what are the elements we will think that goes into agentic behavior? Like Matt mentioned planning could be one. There could be decisions. made by the agent. There has to be LLM somewhere, but curious about your take. So I think another definition we heard from Anthropic recently

Starting point is 00:06:11 was this idea that an agent is an LLM running in a loop with tool use, right? Which there's two important parts of that. One is this notion that it's not just a single prompt and not even just a single static sequence of prompts, right? But something where the LLM takes the output of a prompt, feeds it back into itself,

Starting point is 00:06:30 and based on that makes decisions on what the next prompt and likely also went to abort. You went to complete a task. I think that for the real agents or the more agentic behaviors, I think that's a reasonably good definition. I think the other thing is... But just by that definition, isn't every chatbot effectively an agent in this world, right? Like if I go just to chat gpt.com and use their latest reasoning model

Starting point is 00:06:57 with web search. Right? Isn't using tools and, like, feeding its outputs into a new, prompt in order to do kind of chain of thought. Chain of thought is a little bit in between. If it's just a single prompt that comes back with a result, then it wouldn't have this notion of planning and doing a more long-term concept

Starting point is 00:07:12 and deciding itself when it is complete, right? If you have a chain of thought reasoning where I'm giving a more complex task, that's starting to look agetic, I agree. I just think it's really tough to define a system based on what someone says to it, right? Because these are by design, unstructured inputs. These systems will accept literally anything.

Starting point is 00:07:31 And so sure, if you tell it, you know, what's today's weather. I would agree that's not agentic, right? That's just fetching, you know, from an API. If you ask it, define a new philosophy of weather, right? It'll happily go do it, right? So it's like an agent if you ask it one thing, but not an agent if you ask it another thing.

Starting point is 00:07:47 I think that's kind of a lot of the confusion in the market around this. And, you know, if we spoke in the terms that you're talking about, Guido, of like, hey, this is an LLM in a loop with a tool. Like, that's actually a much more productive way to talk about it, I think. Yeah, yeah. I mean, that's it seems like we've seen, to some degree a specializes, of user interfaces in sort of two directions, right? There's, let's say, a cursor or something like that,

Starting point is 00:08:09 which really emphasizes the tight loop between the user, the tight feedback loop between the user and the LLM and the thing I'm working on, right? So I want immediate gratification when I do something, you know, and sort of response time matters. Then there's sort of more the backend as, you know, source club management system type plugins where it's more about throwing something over the wall

Starting point is 00:08:27 by maybe answering a couple of questions, and then you try to maximize the amount of time the agent can work independent. So it seems like, I think you're right that there's no clean system, like definition, split between the two. But there seems to be a little bit of user interface specialization. Is that a fair statement? It almost feel like for all the use cases we've described, there's one element that all agents have,

Starting point is 00:08:49 which is reasoning and decision. Would you call just a call to LLM to say, translate this text to JSON? That's probably not the agent. But then if you ask LLM to say, hey, this side. where, you know, this response goes and routed for me. It feels more like an agent than before. So it almost felt like planning. I'm actually not sure does the agent need to plan

Starting point is 00:09:12 or does it need to decide maybe both. I actually feel like it's like multi-step L-I'm chained with a decision tree. A dynamic decision tree. Yeah, I think that's fair. I think we've all just been nerd-sniped. I just think, you know, it's like humanities people love classifying and, you know, they draw kind of like

Starting point is 00:09:30 fine distinctions between different types of, things, entities, whatever. We're computer scientists. Like, we're, you know, no, there's anything wrong with humanities, but we're just not that. So I think we're not well equipped when it's a bit isn't just zero or one. It's maybe something in between and we just talk about it a lot. We try to like coerce it to one value or the other. Yeah. Of course, agents are more than pure technology. They're also becoming products, which means they need to be marketed and how someone positions their product has a major effect on how they price it. What's more, the ultimate value of any given agent, which is still to be determined for the vast majority of them,

Starting point is 00:10:07 is to what degree they can actually replace or simply augment human workers. A very interesting point, which is, I think there is a marketing angle to agents. I've heard this narrative from a couple of startups that they're busy saying, like, hey, you know, we can price the software that we're building much, much higher because this is an agent. So we can go to a company and say, you're replacing a human worker with this agent, the human worker makes, I don't know, $50,000 a year. And therefore, this agent, you can get phone.

Starting point is 00:10:32 the $30,000 a year. This sounds really compelling from a first glance. And actually, I mean, there's some value to it in the very early days because it essentially, it's very easy to understand comparative pricing for somebody who has to make a buying decision, right? On the flip side, we all know that the cost of a

Starting point is 00:10:48 product over time converges towards the marginal cost of production, right? And so today if I used to use a translator, maybe to translate a page of text, today you use chat GPT. I do not pay chat GDPD like I paid my translator. I paid a tiny fraction of a cent, right, which is the API, which is the actual cost.

Starting point is 00:11:06 So I sort of wonder how much of the agent debate is different by marketing and pricing. I just actually think this is a really interesting topic. What fields can you think of that are actually suffering complete replacement from AI or AI agent? And this is a setup. I'll warn you. I have another extreme point of view that I'll say afterward. But can you think of fields where this is actually happening? Not completely, but definitely partially, because there's a lot of, for example, voice agents

Starting point is 00:11:32 I replaced receptionists. I don't know if we should name. Replace people who would, you know, get back to customers. So there's definitely a lot of workloads that have been offloaded from the folks who traditionally did the job. But I don't think they're, you know, 100% replaced. They can, you know, they can do something else. But we are seeing headcount growth in some areas are slowing. So it's not that existing jobs are being replaced.

Starting point is 00:11:58 It's more like they're hiring net new humans slower. I think it's exactly right. I mean, I think in few cases, humans will get replaced by AI. In most cases, you know, two humans will get replaced, one human by one human that's more productive with AI. Or, yeah, or maybe they keep the two employees. Maybe they go to three employees because now they're more productive. Yeah, right, right. It's just a really interesting question.

Starting point is 00:12:20 And the reason I think it's really relevant to agents is I think part of the ethos and part of the confusion around agents is this idea that we actually will develop human replacements. Right. And that this thing we called an agent. agent, which by the way is a name for a person, right? Before we had AI, we had people called agents, and we still have all kinds of people called agents. And it just doesn't seem like that's happening, right? Not in the replacement sense, right? You mentioned Yoko with agents. We've always had, you know, customer support automation. You know, we've had 1,800 numbers where, like, press 1 for sales, plus, you know, if that's existed for a long time, this is a much better form

Starting point is 00:12:54 of that, obviously. Translation is a great example, too, Guido. These systems can perform translation extremely well, but you're probably not going to just stick something to chat GPT and then publish it on your website, right? There is actually work that needs to take place. And I think the reason for this is there's just fundamental creative work in most things that humans do, right? I think from our kind of perch in Silicon Valley, we can forget that sometimes that people all over the country and doing all sorts of jobs actually have hard jobs and not just hard in the sense of someone's got to do it jobs, but hard in the sense of it does take thinking and human decision making, which I just don't know that AI kind of has what we would

Starting point is 00:13:31 think of as decision making or intent, right? It's a system that still somebody has to push the button, right? It may be running somewhere. It may do a great job of whatever. Someone still us to give it a prompt and hit go. And to me, that's a lot of the confusion around agents. We're all thinking at some point a human person with intent and creativity and thinking is going to be replaced. I'm just not sure that even is theoretically possible, right? It's almost just like a catch-22 to say an AI system is thinking for itself, right, because somebody has to have sort created. You know, this is old sci-fi philosophy I'm getting it to now, but like I actually do think it's a big reason for the confusion that, you know, we sort of experience now.

Starting point is 00:14:04 It's interesting because there's two types of agent we're already talking about. There's one type where the agent is replacing humans, work with humans, do you things humans can do? There's the other type of agents as more low-level system processes. They work with each other. They hand off tasks to each other. To some extent, agents are like technical details in the system in that way, but we mean both when we talk about agents. In that case, is there actually a difference between an agent and a function? I think so. I think agent will be multiple functions with LMs in the middle.

Starting point is 00:14:35 If I have a low-level agent, and I'm giving this low-level agent a task, and I get back a task result, it looks a little bit like a classic API call. But with the LM in the middle to make decisions on what to do for that API call. So I understood, but that's sort of how this function works internally. Yes. To some degree? Yes. Right?

Starting point is 00:14:53 Yeah. So from the outside, would I care? You wouldn't care. It's like most of the time when we see AISDRs, what we talk about AISDR agents. What we mean by that is when the agent can go to the CRM, pull something out, and then filter the list, draft an email, and send the email. So that feels very process level instead of human level.

Starting point is 00:15:18 Yeah, so that's what I meant. If you don't know how this thing works internally, a classic function and an agent become indistinguishable. Totally. I absolutely agree, but when you, as a programmer, when you find, right, the function, you will define agent that that's this thing of face. Implementation. We'll get back to pricing shortly. But first, let's dive a little deeper into this discussion of how interacting with an agent is different than, or similar to, traditional software-based functions.

Starting point is 00:15:47 So here's one interesting thing to think about on that topic. I totally agree with you, Guido, and I think you sort of agreed to. it's really a function if you kind of just look at it that way. Shareable, reproducible functions have never really been a thing. This has been one of these long-time goals that people in the market have tried to say, oh, I can just write a function and then anybody on Earth can use it, right? Like, you know, we have packages, right, that you can download a whole package with various functionality, but literally just one function that you can share.

Starting point is 00:16:14 If you kind of squint a little bit, that kind of exists now with AI, right? Because you have these models that's trained by somebody. somebody else may download it, fine tune it, train a Laura, package it up into some new and interesting way. And then it's actually immediately available for someone else to use on hosting services or hugging face or something like that. So while it does seem to be just an implementation detail, whether you're using an LLM or not, there is this interesting thing where the model itself takes up so much of that functionality in the function. And it's just a different kind of animal compared to normal code. It's actually more, it's kind of shared by default in a way because nobody's going in and training their own model every time they're

Starting point is 00:16:50 writing code. You know, it's obviously heavy, right? It's harder to move around. There are all these different characteristics from normal functions, some of which are actually very desirable. Some are kind of, you know, bad, right, characteristics you don't want, but many of them are kind of interesting. And I think we'll actually see new infrastructure, new dev tools, kind of built around this in the long run. I think it would make sense. I mean, when, if we go back in time, the last time we sort of invented a major new component for building systems, which was probably networking, right? How we thought about calling a function before networking afterwards changed a lot, I don't need the complexities of APIs and the infrastructure on that is completely different today.

Starting point is 00:17:25 This is such a good point because now I think about it, I feel like humans are just functions too. Like if you have a thought experiment and then replace LLMs in the program to a human, like the kind of answers will give to the program is not that different from whether LM will give to the program. So if we actually all get hooked up to servers one day and can be called as a function from Lambda, then I will agree that agents have been created. That's what an agent is. Isn't Mechanical Turk exactly that or maybe even your email inbox? There's an Amazon Go supermarket a while back in SoMal.

Starting point is 00:18:01 I think they were advertising that it's computer vision models behind the scenes, identifying what you took from the supermarket. But then people found that they hired a lot of people behind the scenes to actually label the data in real time. So the humans in that case are the functions that today may be... Secret agents. Right. Replaced by all ends was... Well, but this was exactly my point, though, right?

Starting point is 00:18:24 There actually is important creative work. Even in a grocery store checkout clerk, right? You could naively think, oh, this is an easy job. Actually, it's not an easy job at all. Right, yeah. And so you can take this work and kind of shift it, right? And you can squeeze it down with automation and stuff. But it never really goes away.

Starting point is 00:18:41 Oh, yeah, absolutely. Yeah. All right. So given all of this, how should company he's thinking about pricing their agents. Per seat, per token, per task? Hint, it might be too early to truly tell. Usually, if you introduce a brand new product category, right,

Starting point is 00:18:59 you often initially put a pricing that prices against the status quo, right, whatever you replace or augment in some cases. But let's assume we have a direct replacement, right? So that's, I think, where this idea from, oh, this replaces a human, which it doesn't. But if it would, right, then you could charge X amount for it. usually over time competition kicks in, right, and you're effectively priced by how much your competitors are charging. And you start sort of an erosion.

Starting point is 00:19:25 Then it depends on many things like how much of a mode do you have, do you have customer lock in, right, and so on. Long term converge against the marginal cost of production, right? Which, I mean, look, if I look at most agents today, is probably very low. Any agent you can purely model in software, with a couple of LLMs calls, you can run at a very very, low cost. The cost is decreasing over time. And I would sort of argue

Starting point is 00:19:49 that's kind of already what's happening, that in practice most AI applications, and in particular if we want to call them AI agent applications, you know, they have their sales pitch around, you should pay us X because we're saving you. You know, it's like a

Starting point is 00:20:05 classic ROI calculation. Establish value. Yeah, exactly. Yeah, exactly. Value based pricing, you know, but in practice, I think most buyers are actually pretty sophisticated about what's going on under the hood. And to your point, they know it's a pretty simple stuff happening. And so it's like, hey, what does it cost you to run all these GPUs and we'll pay you some premium over that? And I think that's how a lot of vendors are pricing in practice these days. I mean, long time you'd expect pretty healthy margins, just like in SaaS, right?

Starting point is 00:20:30 Which software traditionally has very good margins. It's so funny because we always advise companies to not price based on the margin, but price based on the value you add, whatever that could be. It could be compared to other vendors on the market. It could be compared to just, you know, what it is building in-house. And traditionally for infra, a rule of thumb, not always the case is that if the surface is used by a human, it's a per se pricing. And if it's a service is used by other machines, it's a usage-based pricing. And I actually don't know where to put agents here. Well, it could be used by either, right? It could be used to by either.

Starting point is 00:21:06 Look, I think your analysis is exactly right. And the reality is most AI companies don't know what value they're generating yet this is so new and so nascent that it's like, hey, we're just going to charge something that we're not going to lose money on. And, you know, in the case of Open AI, they have how many millions of users, they probably don't have a very strong sense of what they're all using it for. And once they do, right, and you see this more, they're trying to verticalize a bit more and have kind of specific products for a specific use cases, code, obviously being the big one. You know, then you'll be able to see the pricing kind of catch up is kind of my hypothesis. This reminds me the Open AI point you brought up. I was a very important. I was

Starting point is 00:21:43 I was thinking about AI companions, because that's the closest to per seat, per seat human pricing. Like, you can't charge someone every sentence they talk to their companion, although some of the foundation of models. There are services that will charge you per response. I haven't used them, but they do exist. I see. Wow. Okay. So usually, it's kind of weird to charge someone, like, buy tokens of how much they talk to the companion, whether than like a flat monthly fee.

Starting point is 00:22:12 It doesn't feel like a true friend. Right, exactly. It's very transactional. This is, look, this is all theory, right? People love sitting around and talking, oh, we're going to charge per person, per task, per, you know, world economy that we rescue. You know, it's like, it's all made up, right? I think Guido's thing was exactly right. Let's look at the actual technology underlying what we're calling agents right now, where they're being deployed and why.

Starting point is 00:22:35 And honestly, the pricing, the marketing, the sales tactic, all of this kind of follows from what they're actually selling. If I'm selling something that looks like an agent, but I haven't truly figured out the value I'm providing to my users. How do I justify the jump to a higher price point when I do figure out that value? You just need to be selling a solution rather than a product, right? This is really well-worn expertise in enterprise go-to-market. Code, you can somewhat see the decoupling of price from the underlying technology now, because it really works.

Starting point is 00:23:04 There's a very clear ROI to people who use it. And so as a VP of Engineering or a CTO, you can look at this and say, okay, I'm actually saving a lot of money and my guys are getting a lot more productive. I can do a normal. And they happier? Yeah, so you're kind of buying a solution, right? You're buying from a vendor

Starting point is 00:23:17 something that solves a problem for you, which, again, Microsoft, Oracle, salesworth people have been doing forever. Once we start to see more of that, it's going to be these things that become real products and kind of decouple pricing and look kind of like real businesses, I think.

Starting point is 00:23:28 I think it's dictated by the high-level application. So I'll give you an example. So I'm a Pokemon Go player. So for those who have played Pokemon Go, once you're collecting enough Pokemon's, you are out of storage in your pocket. So you need to pay extra to buy a new bag, virtual bag, that you can put more Pokemon in.

Starting point is 00:23:49 And as an infrastructure investor, I invest in storage businesses. And then when I look at how much I need to pay for like 30 extra Pokemon, it was thousands of types more expensive than what storage is. So it actually reminded... I'm surprised at something thousands. It's 10 to the 15 or so. There's a whole price curve on Pokemon storage. it turns out.

Starting point is 00:24:10 Because this is one JSON blob, basically. It's one JSON blob. I know. And they charge you like $5. Yeah. And then the Pokemon, normal Pokemon players, they wouldn't think about this, like how much do you storage costs, right?

Starting point is 00:24:22 Like a normal Pokemon player would be like, oh, this capability, I would be happily paying thousands of more than I were to have an S3 bucket somewhere. So one of it is monopoly. So it's an application layer monopoly that you wouldn't have been able to store the Pokemon anywhere else.

Starting point is 00:24:38 And two, it's a use case. It's for a different audience. It wouldn't be asking these questions. It would be thinking about what is the net new value. What's the net new cost I will be willing to, you know, for the bill for if I were to get this value. Is it a fun game? It's a fun game. Take 100 more dollars. Yeah, I think that's exactly right. And implicit is what you're saying is this idea that the product or the solution has to actually work for them, right, for less technical person who's, you know, the person who's not going to try to provision their own storage bucket to self-hosts. Just kind of bring more on S3 for Pokemon, yeah. And it's quite defensible, differentiated, too, because, you know, Pokemon Go is not open source. There's no other replacement of Pokemon Go. There's only one Pokemon Go. So there's only one place where you would be willing to pay so much money for Pokemon storage.

Starting point is 00:25:25 Plus, very strong brand. Plus, you have a little bit of network effect because you can play together. Yeah, and then we'll see the AI agent version of this. I can't wait to see the AI companion version of this. Paying storage for AI Companions, wardrobe. As the AI market continues to shake out and evolve, where will agent capabilities ultimately live? For example, can they live inside LLMs,

Starting point is 00:25:46 or must they call external tools? And who's ultimately in the best position to influence this? Super interesting question, right? What's the system's perspective of how an agent is built? And I personally think that architecturally, there really is no difference between your typical SaaS software to do an agent in terms of how you build it, right? And let me explain why.

Starting point is 00:26:07 So an agent, you have sort of an overall loop with an LLM and prompts that feeds into itself, plus external tool use. The LLM itself, you probably want to run a separate infrastructure just because it's highly specialized. You need these vast GPU farms. You can't easily run today's large LLMs and a single GPU. So that's a very specialized infrastructure that's externally. So the LLM call is external. The state management, well, today in SaaS applications, we do all the state management externally in databases or something like that. So you probably also want to externalize that, right?

Starting point is 00:26:38 And then what remains is fairly lightweight logic, right? Where I basically I'm taking context that I retrieve somehow from databases. I assemble that into a prompt. I run the prompt. And then I occasionally invoke tools. Maybe I do that with MCP or something like that with an external server. But the core loop is actually pretty lightweight, right? And I can run a gazillion agents on a single server.

Starting point is 00:26:59 Not exilient, but many agents on a single server. I don't need a lot of compute performance for that. Does that sound about right? Yeah, yeah, I totally agree. The interesting architectural question for me has always been, how do you handle the kind of nondeterminism that may come? Many of the successful AI applications that we all use and love really just spit model outputs back out to the user, right? Like a chatbot or image generator, it's like, hey, I called the LLM, here's what I got, you know, good luck. When you try to actually incorporate the output from an LLM into the control flow of your program,

Starting point is 00:27:32 that is actually a very hard, very unsolved problem. To your point, to your point, there are relatively minor architectural differences today, but this may actually drive more significant changes in the future. I actually think the winners will be the specialists, not the foundational models. It's the people who will build on top of the foundational models or fine-tune the foundational models. So like a very artistic example of this is that I've been spending the last two weeks just prompting GP4O, their image model. It's very good at cartooning, so it's very good at manga. It can spell, so it has a storyline. But then I realized that there's only top two or three style is good at.

Starting point is 00:28:09 So it's good at Jibli. It's good at manga. And then there's variations of a style in that realm. So now where art comes in it is that the market likes out of distribution art. Everyone doesn't want to see the same things over and over again because that's how they value art, something that's different. Ideally, maybe. Did in summary reason to define art as out-of-distribution samples? Yeah, art can be in distribution

Starting point is 00:28:37 That's pop art, right? It could also be out of distribution That's like when Impressionism came up many years ago Everyone was drawing Impressionism And at the time, the painters before They were like, what's wrong with your eyes? Why are you drawing blurry images? So styles come and go.

Starting point is 00:28:54 But because of that, I think it's a pushing distribution question How the foundational model will never cover 100% of everything. So it's really up to the humans. and specialists of the next wave to come up with the new data, new workflows, new aesthetics to push that distribution. Of course, at the end of the day, agents are only as useful as the tools and data to which they have access. So what happens if major web platforms decide they want to keep agents from accessing their data? It seems like one of the hardest things about agents today are data modes.

Starting point is 00:29:27 In some cases, just because they're technically difficult, and agents trying to access data, and agents trying to access data and it's just very hard to integrate with that system. In some cases, it's very deliberate, right? My iPhone, the photos are not accessible via any API because it's a walled garden. So it's sort of data silos you're talking about. So is that something that's holding back agents or is making them more difficult or to make it even stronger? Consumer companies traditionally often were opposed to offering automated access to their services because they want their user engagement, they want the time to advertise to the user.

Starting point is 00:30:01 Will that limit how much we can deploy agents? And would that be changed once we have the browser native agents that can browse the web and browse our phone? Great question, yes. I think that I think Yoko is totally right. You know, it's like there's strong incentives for people who own data about, you know, physical entities, you know, people, businesses, et cetera, to keep it to themselves, right?

Starting point is 00:30:24 Especially because they may be scared what AI is going to do to them, by the way. So they're kind of clinging tight to what they have. And these problems are rarely solved. by defining a new protocol and just saying, hey, if we make it easy for people to give away their core assets, they'll just do it. You know, obviously, you know, that's very unlikely to work. But someone eventually will solve this by saying,

Starting point is 00:30:42 hey, if your data is publicly visible, we're going to get it. You know, it's like, by the way, it's not actually your data. It's not actually your data. It's not about me. Actually, I feel like the new advancement in models may just change the data mode. Kind of to the point of today, web browsing, using an agent, doesn't work super well. It's very slow.

Starting point is 00:31:00 It's very clunky. You have to try it multiple times for it to do any task. But imagine if we have foundational model capability of giving an agent ability to go to any website, logging as a human, we'll table that one. I don't know how agent identity works yet. Or go SSH into a server, like execute certain commands, or like spin up a virtual machine for mobile or access the device farm, devising of device farm to play Pokemon Go. Like maybe those are the data traditionally only available to humans under that account now may be available to agents.

Starting point is 00:31:37 There's also the opposite that could happen, right? That basically all the consumer sites are starting with more and more complex anti-agent captas trying to keep out their agents because they only want the humans that have attention to come to those sides. I mean, I recently did use one of these deep research tools, one of the major LLMs. And one of the steps, if you look through it, all the steps that went through was like, you know, trying to do. see how it can get around the capture mechanism for a site. That was an actually reasoning step, right? Where basically it felt it'd know what information I wanted and it was blocked from accessing it. So is that, you know, how dystopian is the future going to be here? It solved it, actually. I mean, it's so interesting. So here's a really early machine learning example of this. I don't know if

Starting point is 00:32:16 you guys remember when Gmail first implemented ads. It was a big controversy because they basically said, okay, we're not going to read your emails, but our algorithms are going to read your emails. and we're going to suggest ads that you should watch, or click on based on that. We all sort of, I think, just forgot and got used to it. I still think we don't love the idea, but we kind of lived with it. But some of the data providers reacted by removing data from email, right?

Starting point is 00:32:41 So Amazon famously now when you order something, they send you a confirmation email that says, hey, you just ordered something. Click here to find out what you ordered, when it's going to arrive, or any information you might want to know. And so that actually did happen in practice in that example that the major data holders kind of found

Starting point is 00:32:56 ways to withhold it. It'll be interesting to see whether that's possible now or not. But that same data is script on the client's side from the ad network that install. Oh, sure. Yeah, yeah. Yeah, there's always some other way. Yeah, not maybe exactly the same, but pretty good proxy. Yeah, yeah. It may be that

Starting point is 00:33:12 it's much harder to tell the difference between an LOM and a human than a classic, you know, so the API call mechanism at a human. That may change the dynamics. Finally, Guido, Matt, and Yoko answer an obvious question on the longest timeline into which we might have clear visibility. What needs to happen to make agents a truly

Starting point is 00:33:31 game-changing innovation within the next, say, two years? I think the positive vision is that in two years, we figured out how an agent working on my behalf can use most of the tools that I have access to. I think it's also clear what are all the pieces that are missing for that, right? We have not figured out security authentication access control for agents working on my behalf yet. We have not figured out how data retention works. We have not figured out the relationship with consumer websites that potentially want to block that agent. But if you had that, it could make many tasks much, much easier.

Starting point is 00:34:06 Today, if I have data sitting, say, my Google Drive or so, right, how easy I can reason about that data versus other data that's in more fragmented sources, it makes an incredible difference. So I think that's the bulk case, right, where you have agents that can take all the data that you can access, they can access it, and you'll be. behalf and perform tasks on your behalf, right, and save you a ton of time. It could make you, depending what you do, like, you know, multiple times as productive as you are today. My answer to that is actually different modalities on the foundational model. Today is still

Starting point is 00:34:36 very much text-based, and that worked really well for coding and text-based tests. But then for more visual first tests, there's just no one-to-one mapping. Even for web browsing, it's like a very clunky experience of take screenshot every couple of seconds and send it back to the foundational model. So I will actually bet on multi-modality when it comes to if we train the model with different traces of clicking on buttons on the website, navigating the web, using different devices, drawing, producing vector art. I think there will be new things that the model could unlock on the agent level. You can probably guess my answer. If we don't use the word agent two years from now or five years from now, I think that's a huge win. There's actually a fun paper put out by some folks at Columbia, I think, called AI.

Starting point is 00:35:22 as normal technology. And they sort of make the argument that there's a false dichotomy out there. It's like, AI is either going to bring about utopia or dystopia, meaning everything's going to be amazing because we have AI or everything's going to be terrible. This is kind of the national discourse.

Starting point is 00:35:37 But if you just think of it as normal, right, like water or electricity or the internet or things like that, I think that's the world we're kind of headed towards an agent is this kind of way to help us get there. And so that's my goal. I mean, this stuff is just incredibly powerful. We understand how to use it. understand any use cases and we're kind of, you know, we're kind of putting it to use for us.

Starting point is 00:35:58 Thanks for listening to the A16Z podcast. If you enjoy the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way. See you next time.

The a16z Show - What Is an AI Agent?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.