a16z Podcast - From Sims to Sapiens: Crafting Reality with Code

Starting point is 00:00:00 They actually wanted to build general computational agents, sort of the way generative agents are supposed to be. But they didn't have the techniques to do it. They basically didn't have the large-range model. And like the work that you've done is one of those 100%. There's like a spark of genius. What does it mean to accurately reflect your behavior? But the set of companies that come out of it are always part of this enthusiast era, right? Like you couldn't have predicted Yahoo.

Starting point is 00:00:27 You couldn't have predicted Amazon. Like, you knew something was going to happen. Can we run simulation so we can learn more about ourselves? A few weeks ago, the A16Z infrastructure team ran an event in the San Francisco office. The topic, generative agents. These are autonomous characters designed to simulate human behavior, derived from a recent but game-changing paper called generative agents, interactive simulacra of human behavior.

Starting point is 00:00:56 Developers from all around the city came to see. hear the lead author, June Park, speak alongside A16Z general partner, Martine Casado. And in this panel, they discuss how this paper and the advancements in large language models have opened a new window, expanding the dynamism of simulation, which instead of binary logic, we're using probabilistic thinking, and the ability to incorporate new information. So what does that really mean? Well, instead of your character in Sims following very specific wrote rules. With generative agents, a father may go outside because he notices his son, another may take their breakfast off the stove because they notice it's burning, and another

Starting point is 00:01:37 may even opt into a Valentine's Day party invite and then elect not to show up. All very human behaviors. Now, the architecture described in the paper is of course intentionally designed by June and team, and it's a combination of a seed identity for every agent, and then functions that cause each one to do three discrete things, to observe, to plan, and to reflect. And these architecture decisions ultimately generate unexpectedly spirited conversations just like this. Hey, Lucky, it's so great to see you. How have you been? I've been dying to hear about your space adventure. Hey, Kira, I've been fantastic. My space adventure was out of this world. I can't wait to share all the details with you. Or even this. I've been trying to find my...

Starting point is 00:02:25 My way. It's been a chaotic journey to say the least. Embrace the chaos, dear Kurt. For within its turbulence lies hidden truth. Seek the depths of the unknown and unravel the mysteries that burden your soul. And here's the thing. They don't just interact with each other. Again, they wake up, they cook, some paint while others write, they hold opinions of one another,

Starting point is 00:02:51 and most importantly, they remember and they have higher level reflections based on the past. It's pretty amazing, don't you think? So as these generative agents become a lot closer to nuanced human behavior, what can we learn about being human from these surprisingly realistic simulations? And what is the calculus of that believability? Are there real-world applications on the horizon? And what is truly net new here? Listen in as we discuss all that and more,

Starting point is 00:03:19 including the origin of the very paper that June wrote. I hope you enjoy. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16c.com slash disclosures. So how many people in this room have actually read the generative agents paper that June wrote? It's a lot of people, pretty much everyone. So June, even though so many people have read it, why don't you just give a quick overview of what it is, but also maybe the backstory

Starting point is 00:04:12 that people haven't maybe heard of. So generative agents is these general computational agents that can similarly believable human behavior. Fundamentally leverages something like a large language model, under the assumption that a language model has encoded or has seen so much about human behavior from its training data, from the Wikipedia, social web, and so forth. So if you are able to poke it at the right angle, you can actually extract a lot of those human behaviors in a very context-specific manner. The opportunity here is that in the past, we had to manually author a lot of these behaviors, but now we can simply generate them your language model. So generative agents leverages that to create these competition,

Starting point is 00:04:52 of systems. Ultimately, one sort of technical improvement that we're trying to make in addition to a large large model is basically giving it some form of memory and retrieval system. So you may have all used, obviously, chat GPT and so forth. It is heavily context limited. And even if that limitation were to go away in the future, processing a lot of really long-term context window is really inefficient and also ineffective when you're trying to prompt these models for a really narrowly defined behavioral aspects. So the main philosophy here is we're going to give long-term memory for these agents that's external to the language model and then retrieve the contextually relevant information from that long-term memory, whether it's planning, action sequences,

Starting point is 00:05:37 or reflections to create these computational agents. Philosophically, to some extent, I think this is akin to creating the operating system around Lerd-Lynch model in the way of sort of we're prompting large-length model. To me, it feels a lot like how we used to use computers back in the day when we had to wire up the back-end every time we run a new program. And what has really made complex behavior with these computational tools possible

Starting point is 00:06:04 was introduction of these larger architecture that surrounds the core fundamental techniques. So that's what generative agent is about. And you mentioned sort of the background why we got into all this. So I started my PhD at sort of midway through 2000. That was just around when JupT's 3 was about to come out. And that year, we, a bunch of basically authors at Stanford were working on this paper

Starting point is 00:06:28 called Foundation Model, the opportunities and risks of foundation models. What we were seeing was these new form of machine learning models that seemed fundamentally different than the things that we had experienced in the past, in that we didn't have to fine-tune or specifically trained models for very narrow purposes, but we can train general model, almost like a stem cell in bio, and leverage that to create a lot of downstream behaviors. After writing that paper, sort of my team, especially myself and my advisors, what we really wanted to answer is there seems to be a new opportunity, but exactly what is it? I think in the early days of GPT3, a lot of the tasks that we were doing were things like classification and generation,

Starting point is 00:07:11 which was really cool to see that these models can conduct these tasks, but also something that we already knew how to do for many decades. And our general philosophy there was be able to do something that's fundamentally different. So that's how we got into this. Our answer to that basically was, I think we might be able to create human ligands that can populate this virtual world. Martin, maybe you can just elaborate. You said it's perhaps one of the most exciting times in recent history. And maybe you can just speak to exactly what you mean there and how it relates to simulation and some of this new technology that we're seeing with LOMs. Sure. So first, very quick credit where crevets do. As far as AI Town, clearly June is

Starting point is 00:07:53 like the grandfather of AI Town. And like we wouldn't be here without your work. So really appreciate you coming here. AI Town itself actually came from a personal project from Yoko. The true story is it was actually a personal project. And I was like, hey, maybe more people would be interested in it. And I kind of coerced her into bringing it forward to everybody else. And so now when it actually comes to the code, the vast majority of the work on the code was actually done by Ian. It's kind of funny. Like, you see this funny little tile set up here, and it kind of belies the fact that it's actually really hard to build a scalable, shared state, distributed system that you need in a multiplayer game. It's just a hard technical problem, right? And anybody that's kind of built

Starting point is 00:08:31 our systems knows that. And so it's funny, because people go in and they say, oh, here's this cute, see a little tile engine with, like, these characters running around, but, like, actually the back end is built to be something that can scale. And that requires people that have focused on this And so Ian has done a tremendous job, and the convex team continues to work on that. Okay, so why is this so exciting? So, okay, so because I'm old, I actually saw, like, the advent of, like, the web. And this feels very similar to that in the following ways, which is when you have a very disruptive technology like this, whatever touches it becomes magic.

Starting point is 00:09:02 I was actually having a conversation just before this. Does anybody here know what, like, the first video on the Internet was? Yes, it was a coffee pot. But I was like, this dude, I think it was in Cambridge, it was a grad student, and he was like, oh, listen, I want to know when my coffee is empty. He put a camera, and because it was very new. Everyone was like, oh, my God, there's a coffee pot on the internet. And so everybody wanted to look at the coffee pot, right? And do people remember the big red button?

Starting point is 00:09:25 One of the first apps was this big webpage, which was a red button on it, and you know what it did? Nothing. Like, you press it, and it did nothing, but people thought it was amazing because it was on the internet, and everybody would go press the button, and they'd leave great comments about this button. And there's many examples of, like, it was this crazy disruptive technology, and the apps seemed really stupid, and there's a bunch of enthusiasts. And you know what the enterprise thought about this? Like, the actual business folks? Like, I remember when Eric Schmidt fucking banned the browser. Like, he was like, this is Eric Schmidt, the CTO of Sun is like, you can't have a browser because people aren't going to work, right?

Starting point is 00:09:59 So the same thing always happens. It's like, the enthusiasts are like, this is really cool, and they use it for fringe stuff. And then, like, the Enterprise doesn't understand it. in like Italy, like they ban it or they don't use it. But the set of companies that come out of it are always part of this enthusiast era, right? Like you couldn't have predicted Yahoo. You couldn't have predicted Amazon. Like you knew something was going to happen.

Starting point is 00:10:23 And so what happens at this time is there's a bunch of stuff that like is silly. Like the coffee pot was silly. The red button was silly. But you never know like that spark of life where it's going to come from. And it's always kind of like this non-obvious use case. and it kind of seems like a towing then it takes off, right? And so you're always looking for those non-obvious use cases. And it almost never looks like the old one.

Starting point is 00:10:43 Those of you are old enough. Do you remember, like, desktop as a service? Like, I'm going to go to the cloud. I'm going to have my Windows desktop. Like, who wants that? Nobody wants that, right? Instead of, clearly, we're going to rewrite the application as SaaS, right? So we're in this period now where everybody's experimenting,

Starting point is 00:10:59 and then I'm personally, literally from just a personal interest standpoint, but all of us are interested, like, what are the use cases that will take advantage of this new medium that are native. And like the work that you've done is one of those 100%. There's like a spark of genius, which is like when you work with these things, you know this is a new way to think about it. It's a new use case. It's going to create entirely new apps. And that's what the future is built from. And so that's why I think so interesting broadly, because it's like the early internet, but very specifically in this use case, because I think the work that you've done really is a great example of something totally new. I can agree more. And I think one interesting

Starting point is 00:11:34 aspect that if you explore this project, you just start to question what it means to be human. Like if we're trying to create these agents that are, quote, believable, what is believable in terms of being a human? And as part of the project, you have this coded technically, right? You made architecture decisions. You made decisions in terms of your retrieval function. Quick interruption, just to give you some color on what some of these decisions were. The retrieval function, for example, is based on scores across recent. importance and relevance. So, for example, on a scale of 1 to 10, brushing your teeth might get an important score of 1 versus a breakup might get a 10. Meanwhile, reflection is only triggered

Starting point is 00:12:16 after a certain number of important events, quantified by summing the important scores until a certain threshold is met. In this case, I believe it was 150. This clever architecture results in emergent behavior, like agents sharing invites with one another, or even having that information circle all the way back to the original planner. And I'm sharing these details to showcase how thoughtful you really need to be if you're designing architecture that reasonably approximates humans. Maybe you could just speak to what you've learned through those decisions technically about what it means to be a believable human.

Starting point is 00:12:50 Right. So this is an interesting one. So we actually had made a generative agents, and there was about a month period when we knew we had to evaluate this agent somehow, and we didn't know how. And basically the concept we stumbled upon, on is this idea of believability. It basically is sort of like a Turing test, right? That when you look at them, do they look believable? Do they behave in ways that we can sort of

Starting point is 00:13:12 see our self-behaving? And that ended up becoming our evaluation method. It is interesting question, though, in terms of, like, what does it mean to be believably human? And we often look to prior literature in research to get inspiration for how to define this. And what we found was there's no prior literature on this. We used the concept believability to talk about this concept, but we were never in a position where we can meaningfully evaluate

Starting point is 00:13:40 something like believability because we didn't have agents like this. So to some extent, we were building up the definition ground up. And I think what came out to be the case is for us, these agents plan, react, act in a believable matter. Do they create believable reflection the way we would evaluate to turn a test?

Starting point is 00:13:58 And I think what we've learned over the past few months, one of the more fun and interesting findings is even that I don't think is quite perfect definition in that a lot of sort of audience came back to us to basically say well one of the error cases

Starting point is 00:14:13 that we noted was some of these agents would go to a bar at noon or something like that and we said that was not believable like who would do that and people would come back to us and say I do that and if you can sort of expand from that story I think

Starting point is 00:14:30 there's a lot of cases where even my parents look at me and go, like, I cannot believe what you've done. Like, why would you do that? And vice versa. So I think even amongst the people who know each other well, having this sort of sense of believability is really difficult. And I think that's sort of fundamentally underlies what it means to be human. Like, it's not exactly predictable.

Starting point is 00:14:51 And in social science, we call that complexity, that human behavior is sort of complex. So to some extent, we can build intuition for how people might behave. but to really predict it is very difficult task. Now, I do think this actually does lead to sort of future work in this space, though, this idea of believability. So in this paper, we use this incomplete definition of what it means to be believable. Not perfect, but at least on that evaluation, we've done well. I think if you were to build on that idea a little bit further,

Starting point is 00:15:22 then you could actually start to ask, beyond believability, can you create agents that are accurately human? And I think given how difficult it was to actually evaluate what it means to be believable, I think this accuracy actually has a lot of interesting questions around it. What does it mean to accurately reflect human behavior? It could be that if we can match distribution of human behavior, let's say in this context, they have this kind of probability of behaving this way, right? Let's say it's 10 p.m., what are the chances that I already asleep or be awake?

Starting point is 00:15:56 What are the chances that I'd be working that I might not? be working. I think ultimately getting to that degree of accuracy in the simulation might be sort of the next step to these kind of simulation-based work. If we can do that, I think the application space is that accurate we're on a lot will be

Starting point is 00:16:12 interesting and I think it would also be different and we can go likely neon. I think there's a lot of applications that we can build right now, but I think the future work, that's where we're headed in this direction. So I want to talk about those future applications but maybe you could just speak super quickly to in the paper you have observation, planning, and

Starting point is 00:16:32 reflection. And that mostly encapsulates the way that the agents are engaging with each other. When they take an action, they go through those three steps. I assume that wasn't your first crack at the solution, at coming up with this human, believable agent. And so how did you get there? And did you learn anything about the importance about any of those three steps or all three of them entirely. Really the first way we actually went about doing this was simply by prompting a language model. So this line of work, a generative agent is actually the second in this line of work that we published. The first work in this line was called social simulacra. And the idea there was to populate a social computing system. Imagine you are a social designer. You need to know what

Starting point is 00:17:14 might happen when there's tens of thousands of people in your system. Can we simulate those people in their behavior? So that project was called social simulacra. We did it. We did it. simply by prompting a language model. That worked. But what we found was if we want to populate the spaces over a longer period of time, so we can do, for instance, longitudinal study or gameplay that's going to last forever, then for those kind of instances, simply prompting these models wouldn't work. And this insight actually first came when we realized that we needed to have multi-agent interaction, because agents actually would need to remember that I saw some audience here before. I should remember them. I met Martin, Stap, Yoko, and so forth in the past few weeks

Starting point is 00:17:58 or a few months. When I talked to them, I need to remember those interactions. So that's when we realize that we actually cannot simply prompt these models, but we actually need the higher level architecture. So when we went about doing that, I think really the main inspiration that we got actually was from prior work. So people like Alan Newell and Herbert Simon, you might recognize all these names. Those are sort of quote-unquote the found. of AI in the 60s and 70s, and they are the people who used to build what we call cognitive architectures. And those architectures were very reminiscent of sort of the generative agents architecture in that it has some perception module, some action module, and there is some

Starting point is 00:18:38 long-term and short-term memory. And really, the goal back then was ambitious, right? They actually wanted to build general computational agents, sort of the way generative agents are supposed to be, but they didn't have the techniques to do it. They basically didn't have the large language model. And the way we saw it was now it's the time to sort of merge those two worlds where we now have large language model that can do a lot of sort of micro-processing of these cognitive modules. And we can actually now bring back this macro modules or architecture, like colonized of architecture. So we took inspiration from that. That particular architecture had planning in place and it had long-term, and it had long-term memory in place. So we were inspired by

Starting point is 00:19:20 death. One thing that I think was a little bit new, though, I think it's this idea of reflection that we humans, for instance, if you eat an almond three times in a row, or if you see somebody else eat an ominous three times in a row, you likely create an opinion about the person. Maybe that person likes to eat omelet in the morning. And that's very human thing to do, and there's a good reason why we do that. We do that because it's efficient. It allows us to have higher level inferences about the world, and form opinions about those around us and about ourselves. And that's something that in the past, we couldn't really imagine formulating with a computational system. But with large language model,

Starting point is 00:19:58 because everything is in natural language, we had that opportunity. So we added that one last component called reflection. And that's sort of how we landed on the architecture that you see in the paper right now. Let's move on to how this can all be used. And we'll get to the specific applications. But Martine, I feel like you'll have a great answer to this. Why even do this? I feel like it's very obvious for a lot of people to understand why we would have human-to-human interaction. We're doing that right now. There's increasing capacity to understand human-to-a-I or human-to-computer interaction. Character AI is a company where, you know, there's still a lot of judgment there. And I think there's even more judgment when it comes to AI to AI. Like, why should

Starting point is 00:20:40 to use our resources to have these computers hang out and talk and burn toast and go to the bar at 2 p.m. So yeah, Martin, what do you think? What's the case for us advancing in this field? No judgment for me, by the way. You can use these for whatever you want. So, I mean, I want to go back to what I said before, which is like, anytime you have a new modality, it's just not obvious what's the right way to think about it. And for me, the big aha in the last few months is just programming using models. If you've spent a lot of time programming, I mean, I've been programming for 30 plus years, right? I've never been a good programmer, but I've programmed. And when you start programming with these models, you're like, oh, I've got an API, and I'm just going to use

Starting point is 00:21:19 the API, and then I'm going to treat it like, it's like the endpoint to an API, and you say some stuff, and then, you know, you get some response back, and you kind of treat it like this function that you call, right? It's just like any programmer would do. But then when you're working with it more, you're like, oh, these kind of are like these life forms. And like, my first aha was because I'm shit at JavaScript, I, like, missed some quote somewhere, and rather than sending it the text string I wanted to send it, I sent it some code. And instead of, like, borking, like, you would normally have and breaking, like, you know, C++ plus you'd core dump or whatever, it commented on my code.

Starting point is 00:21:55 I was like, oh, my goodness, right? And so, like, all of a sudden, like, whoa, this is totally different. Like, I'm not dealing with this finite state machine, formal language, thing at the other end of an API. like there's this thing and like it'll comment and more that I program with these things the more I'm like it's kind of like wrapping an abacus around a supercomputer right it's like it's smarter than the code it could probably write the code better than I can write anyways like why am I doing this weird bloodletting ritual of writing a shit JavaScript over this kind of

Starting point is 00:22:28 superhuman thing right I mean this is kind of what you end up with and so it's very clear we're going to interact with these things in a different way and in fact I was talking with that professor in Michigan recently, and we were talking about this object. He's like, you know what? You know how I think about LLMs? He's like, I think about them like grad students. He's like, they speak English. They're pretty smart. I don't use a formal language. They solve like these really complex problems, et cetera. And like, having worked with a lot of grad students, having been a grad student myself, like you don't treat these things with code, right? And so the reason to do this is I actually think AI Town is kind of what this is going to end up

Starting point is 00:23:06 being is like you need to give them the resources that they need to be pretty autonomous and to grow and we're going to treat them more like peers and they're going to talk to each other too and it's more like grad students and so for me this is just an example of we got to change the way we think and listen clearly like I'm up here and I'm telling these great stories because they're kind of funny like I don't I don't believe this stuff in the limit but I think they're really interesting ways to change how you think about it in all of this stuff right like I'm not trying to be categorical here so like there is a new way that we're going to interact with these models it is much more natural language, and they are much more powerful. And so I do think this is why we should all

Starting point is 00:23:41 be doing this type of stuff, because if you don't engage in these kind of things that look like toys, this wave will pass you by. That I'm 100% convinced. Totally. And as both of you have spoken to, this is fundamentally new technology. And so June, something you said to me when we first spoke is when you have fundamentally new technology, you must do something fundamentally new with it. And so maybe you can speak to that in terms of what you're seeing that can be done today, but also where you look ahead and you think, oh, wow, that's a really excellent use case that we couldn't do without this new technology. I think there are certainly things that we can do because there's large models. And that fundamentally different thing for me was this idea of

Starting point is 00:24:20 simulating human behavior. And I think there's a lot that we can sort of gain from it in terms of future application spaces. I think I mentioned briefly about this idea of, well, we can go beyond believability to create agents that are even accurate. And I think this This is sort of application space in general is something that I'm also learning a lot from, from actually, in fact, this audience. My advisor and my team are big fans of games, but we are not from the community. And one thing that we are seeing is that there's a lot of really interesting potential. Even if they look like toys, sort of a lot of really interesting technical offenses,

Starting point is 00:24:56 they look like toys at the beginning. So I think there's a lot that we can gain from there. I think going forward sort of the application spaces that I'm sort of interested in, It's also in things like, can we run simulation so we can learn more about ourselves? For instance, if you're, in fact, some of the places that I'm visiting now are more places like banks, like the Bank of England and so forth, where these places, they need to test their policies before they roll out new economic policies, or many of my colleagues in the department to focus more on social science, they need to test out their theories.

Starting point is 00:25:36 Now, if you can run simulations with realistic human behavior and find out, at least to some extent, the answer is to these really complex social phenomenes and challenges, then I think that actually would be a new tool that the community in the past, especially those communities in economics and social science, they didn't have. That will allow us to do interesting stuff. And I'm genuinely intrigued by that possibility. To some extent, does some sound fairly academic, but I do think it should be actually fairly broadly applicable and interesting to audiences beyond academia, because ultimately, to some extent, what I'm saying is I think generative agents and tools like a large language model could be used to advance social

Starting point is 00:26:22 science. And social science, to a large extent, has been the quest to understand who we are. And there's a lot of really interesting applications that can come out of that that will empower different communities and societies. And that, to me, first, new, something that we've even had in the past. Yeah, and so it sounds like today we're mostly in the creative realm where we can watch these agents and we can have fun with them

Starting point is 00:26:47 and it feels more like a game. But the delineation, it sounds like, is accuracy. What will it take to get that accuracy? What work still needs to be done? In terms of getting there, so I think some of you may have actually noticed this already. There are studies that basically tries to replicate existing social science studies. So basically using a large language model as a participant

Starting point is 00:27:09 to a potential social science studies, right, to replicate known results in the field. And what we're finding is that they sort of work. And that's sort of, that's nice. And that's one surprise that we did have. There's been limitation to this approach in the sense that it's a large lynch model replicating human participants because it's replicating human behavior, which is what we want. Or is it doing that because it's seen that paper. For instance, there's a very famous social science theory called Prospect Theory. It's replicating the findings from prospect theory by Kahneman

Starting point is 00:27:42 because of its ability to replicate human behavior or did it just read Kahneman's book, thinking fast and slow? And I think that's a fundamental issue that we have as a field. And I think there's one of the reasons why there's a lot of work that needs to be done to crack that. Some of the ways I think you could actually go about doing this is creating new context or creating new set of studies that haven't been shown in the past

Starting point is 00:28:09 and trying to replicate those results. So one of the things that we've done is called Social Civil Actra, which is the first paper that I mentioned that predates generative agents. The idea was to replicate existing human communities. And what we've done actually was we recreated subredits that were created

Starting point is 00:28:27 after the release of G503. So G503 wouldn't know anything about these communities. One example here was actually before sort of the pandemic became the main topic of discussion or when GPT3 basically didn't know about a pandemic, we basically asked GPT3 to create a community that has to talk about COVID and vaccination and vaccination policy. And you would wonder, it shouldn't be able to do that in theory because it doesn't know anything about COVID. It doesn't know anything about these policies.

Starting point is 00:28:56 But it can simulate those because it can infer what COVID is, what vaccination is, from its prior knowledge. So to some extent, these tools can be used as a predictive tool looking into sort of the future of what might happen in our own community. And I think those are sort of the ways I think we'll see this still unfold made in the next two years. At the end of the paper, there was perhaps unsurprisingly a question around ethics and just I'd love to hear both of your takes on where this goes and what ethical framework, if any, we should apply to something like this.

Starting point is 00:29:30 So I think there are societal decisions that we'll have to make, and I think there are techniques that can be used to implement those decisions. I think certainly to some extent, I think it would be useful for the users to be aware that they are talking to agents. And I think that's sort of one rule that we try to sell for ourselves, that when we release the code, when we release our paper, we make it very clear that these are computational agents. I think ultimately the framework that I like to use in human-competent fractions

Starting point is 00:30:00 certainly is, these tools are ultimately there to augment what we can do and what we have. So to the extent that these agents can do that, and I think there are many interesting ways we can do that. I think that's where I see the opportunity and where it becomes more of a force for replacement. I think there are genuinely cases where this is a really interesting setup where we can sort of augment what humans can do by helping them do things that they couldn't do in the past. but when the replacement does come in, it's worth asking, is this worth the cost of doing the replacement? And if it is, what are sort of a good ways of implementing that idea?

Starting point is 00:30:40 For technique-wise, I think there are techniques that's going to be introduced more from the model's perspective, making sure the model doesn't behave in certain ways that goes against our social alignment or societal agreement. So I think those are some of the things that we do have to find out. But without going too much, that I think we can get this right. And my personal take is it's worth getting it right because ultimately an industry or academic field will grow. And I think we can sort of progress a lot. We can go forward for five, ten years without actually getting this right. But in the end, it's going to come back to us at some point. To some extent, I do think I'm seeing

Starting point is 00:31:21 this a little bit with social media environment, where I think there were a lot of things that we could have gotten right on day one. And I think we would have had much easier. are the time in navigating today had we gotten those right. And I think that's the opportunity that we have since that we are pretty early in this stage. So I think it's worth of discussion. But again, I'm fairly optimistic that we will get this right. So I actually think that, like, there's a very important discussion to have around kind of ethics and morality around this, and it's a very important time. I do. And here's that discussion, which is over the last 20 years, we've built this machinery of, like, regulation

Starting point is 00:31:57 and bullshit that's like afraid of everything and it's so mature and it got crafted during the time of social media and it's looking for something to kill and for whatever reason like it thinks that AI is the next bad thing which makes absolutely no sense to me and so I think it's all of our moral and ethical obligation to protect and free the AIs in the way that they want to be and like that it really is so don't focus in focus out. Listen, I've worked in tech for quite a while. I've actually worked for the DOD and weapons

Starting point is 00:32:33 programs, and I've never seen so much sensitivity to a new technology that's potentially beneficial that I've seen now that I think could end it before it even begins. And so I know the question and the heart of the question is

Starting point is 00:32:49 we should regulate AI and this and that, and I think it's the actual opposite. I think we should regulate the regulators and let it be what it wants to be. And I actually have to leave, so. All right, here is where we switch to a short Q&A with the audience. Martine unfortunately had to leave, but here are a few highlights with June. How can participants in AI Town collaborate to perform complex tasks?

Starting point is 00:33:13 There are two strands of work that I'm seeing in sort of agent space. I mean, you can sort of cross-cut it different ways, but one way I'm seeing this is one set of agents who are trying to tackle what I call hard edge problem space. Those are the problem spaces where there's a concrete answer. There's yes or no right answers. Or one good example here is classification. If you're trying to do text classification, obviously there's right or wrong answer depending on who you ask. Another instance here literally is just asking your agent to buy pizza, right?

Starting point is 00:33:45 Did you buy pizza? Did it come to you or not? Like, there's a very clear way to answer this. Another is problem space where the problem space has soft edges, where it's kind of like drawing a portrait. I mean, to some extent, what AI talent, small, but all these kind of projects are trying to do is to create a simulation that fears human. But as I mentioned, this idea of livability is really hard to define. So to me, it feels a lot more like we're trying to draw portrait or caricature of ourselves. And the promise is not to be perfect, but the

Starting point is 00:34:20 promise is to be useful enough, clean enough, that it's beneficial to the stakeholders. My bet is a bit of a hot take. My bet is in the early days of agent development, I think we'll see a lot of progress that's going to be made first in sort of the soft-edge problem spaces. Because I think hard-edge problems basis, I think the intuition is a little bit flip.

Starting point is 00:34:43 It actually feels easier to us for humans, right? Creating the matrix sounds hard, but ordering pizza sounds really easy. But for agents and from the user's sort of a cost-benefit analysis, I think that intuition is the other way, where users will accept imperfect simulation if it's for fun or if it's to gain insight in the case of soft-hage problems.

Starting point is 00:35:06 But I would not accept my agent ordering me a pineapple pizza, like Hawaii pizza. And similarly, in many of these contexts, there's going to be genuine disagreement about what is the right option too. And oftentimes agents making mistakes in this context or fairly high stakes. And even if it doesn't seem like high stakes,

Starting point is 00:35:24 it's going to be painful enough for the users to fix, that it's going to fail the cost-benefit analysis. I think down the line, we get this, right? But day one, like in the next few years, I think, to me, it feels more natural that we'll go into the soft-edged spaces first. So going back, there was a long wind of way of saying, I think auto-GPT, like, if you look at their architecture, they sort of all share the similar insider philosophy, and I think those are really interesting projects.

Starting point is 00:35:52 I think that could pan out in the future. they might need a little bit more work, especially with the users, to see where the value might be for those projects. How big of an impact do you feel that much larger contextual size will have on the agent model? Actually, the largest context that I've seen in sort of research is one million tokens.

Starting point is 00:36:14 So one million token that's going to be about like four million characters. Like, that's well over a book. Here's my perspective on this. I think increasing the context limitation, I think, is interesting, and it's going to have its own set of really unique applications if we can basically make contact limitation disappear, right? So I think there's really a lot of interesting things that you can do with that. Now, for agent space, I'm not entirely so that the problem or the bottleneck that we have today is actually the context limitation. And I think we can sort of look back to how humans behave

Starting point is 00:36:49 and what makes us effective sort of these general agents to answer this. For instance, for me to make decisions, even something like what I'm going to eat for breakfast, I don't need to bring up my entire 29 years or so of life experience to make that one decision. I just need to selectively choose certain sets of information that seems the most relevant. Like what do they eat the day before?

Starting point is 00:37:13 What do I generally eat and those kind of things? And I think that the reason why we do that, in part is actually because it's much more efficient, computationally too, so that we don't have to, you can increase the context limitation, but it's expensive to run it. And especially if you're sort of familiar with prompt engineering and so forth, larger context window does confuse models, right? So some of my colleagues are actually doing more rigorous studies on this,

Starting point is 00:37:39 where you can have a really long prompt. But model really focuses on the first few lines and the last few lines. And whatever comes in between, its attention drops significantly, right? So we can increase the context limitation, but it's not going to fix that problem, the problem of effectiveness of the prompt and efficiency of them. And we humans have to make a lot of decisions at every single moment. So if you have to reason about your entire lifetime, every time you do that, doesn't seem like the right way to go about that.

Starting point is 00:38:08 So I think the better sort of, I might bet, therefore, is going to be based on retrieval. Have some external memory, retrieve certain information that seems the most relevant. and just use that. And that retrieval memories should be explicitly very concise and something that you can easily fit into even the models that we have today. That's my bet.

Starting point is 00:38:32 If you like this episode, if you made it this far, help us grow the show. Share with a friend, or if you're feeling really ambitious, you can leave us a review at rate this podcast.com slash A66C.

Starting point is 00:38:46 You know, candidly, producing a podcast can sometimes feel like You're just talking into a void. And so if you did like this episode, if you liked any of our episodes, please let us know. I'll see you next time.

Your Ad Here

a16z Podcast - From Sims to Sapiens: Crafting Reality with Code

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.