The a16z Show - From Sims to Sapiens: Crafting Reality with Code

Starting point is 00:00:00 They actually wanted to build general computational agents, sort of the way generative agents are supposed to be. But they didn't have the techniques to do it. They basically didn't have the large-range model. And like the work that you've done is one of those 100%. There's like a spark of genius. What does it mean to accurately reflect your behavior? But the set of companies that come out of it are always part of this enthusiast era, right? Like you couldn't have predicted Yahoo.

Starting point is 00:00:27 You couldn't have predicted Amazon. Like you knew something was going to happen. Can we run simulation so we can learn more about ourselves? A few weeks ago, the A16Z infrastructure team ran an event in the San Francisco office. The topic, generative agents. These are autonomous characters designed to simulate human behavior, derived from a recent but game-changing paper called generative agents, interactive simulacra of human behavior.

Starting point is 00:00:56 Developers from all around the city came to hear the lead author, June Park, speak alongside A16Z general partner, Martin Casado. And in this panel, they discuss how this paper and the advancements in large language models have opened a new window, expanding the dynamism of simulation, which instead of binary logic, we're using probabilistic thinking, and the ability to incorporate new information. So what does that really mean? Well, instead of your character in Sims, following very specific rote rules, with generative agents, a father may go outside because he notices his son, another may take their breakfast off a stove because they notice it's burning, and another may even opt into a Valentine's Day party invite,

Starting point is 00:01:39 and then elect not to show up. All very human behaviors. Now, the architecture described in the paper is, of course, intentionally designed by June and team, and it's a combination of a seed identity for every agent, and then functions that cause each one to do three discrete things. To observe, to plan and to reflect. And these architecture decisions ultimately generate unexpectedly spirited conversations just like this. Hey, lucky, it's so great to see you. How have you been? I've been dying to hear about your space adventure. Hey, Kira, I've been fantastic. My space adventure was out of this world. I can't wait to share all the details with you. Or even this. I've been trying to find my way. It's been a chaotic journey to say the least.

Starting point is 00:02:29 Grace the chaos, dear Kurt, for within its turbulence lies hidden truth. Seek the depths of the unknown and unravel the mysteries that burden your soul. And here's the thing. They don't just interact with each other. Again, they wake up, they cook, some paint while others write, they hold opinions of one another, and most importantly, they remember and they have higher level reflections based on the past. It's pretty amazing, don't you think? So as these generative agents become a lot closer to nuanced human behavior, what can we learn about being human from these surprisingly realistic simulations? And what is the calculus of that believability? Are there real-world applications on the horizon? And what is truly net new here? Listen in as we discuss all that and more, including the origin of the very paper that June wrote. I hope you enjoy. As a reminder, the content here is for informational purposes only.

Starting point is 00:03:29 should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com slash disclosures. So how many people in this room have actually read the generative agents paper that June wrote? It's a lot of people. pretty much everyone. So June, even though so many people have read it, why don't you just give a quick overview of what it is,

Starting point is 00:04:11 but also maybe the backstory that people haven't maybe heard of. So general debate agents is these general computational agents that can similarly believable human behavior, fundamentally leverages something like a large language model. Under the assumption that a language model has encoded or has seen so much about human behavior from its training data, from the Wikipedia, social web, and so forth. So if you are able to poke it at the right angle, you can actually extract a lot of those human

Starting point is 00:04:38 behaviors in a very context-specific manner. The opportunity here is that in the past, we had to manually author a lot of these behaviors, but now we can simply generate them your language model. So generative agents leverages that to create these computational systems. Ultimately, one sort of technical improvement that we're trying to make in addition to a large knowledge model is basically giving it some form of memory and retrieval system. So you may have all used, obviously, chat GPT and so forth. It is heavily context-limited.

Starting point is 00:05:09 And even if that limitation were to go away in the future, processing a lot of really long-term context window is really inefficient and also ineffective when you're trying to prompt these models for a really narrowly defined behavioral aspects. So main philosophy here is we're going to give long-term memory for these agents that's external to the language model. And then we treat the contextually relevant information from that loan through memory,

Starting point is 00:05:35 whether it's planning, action sequences, or reflections to create these computational agents. Philosophically, to some extent, I think this is akin to creating the operating system around Lurge Lange model. In the way we're primating Lurge Langem model,

Starting point is 00:05:51 to me, it feels a lot like how we used to use computers back in the day when we had to wire up the back end every time you run a new program. And what has really made complex behavior with these computational tools possible was introduction of these larger architecture that surrounds the core fundamental techniques. So that's what generative Beijing is about. And you mentioned sort of the background

Starting point is 00:06:14 of why we got into all this. So I started my PhD at sort of midway through 2020. That was just around when GPT3 was about to come out. And that year, we, a bunch of basically authors at Stanford were working on this paper called Foundation Model, the opportunities and risks of foundation models. What we were seeing was these new form of machine learning models that seem fundamentally different than the things that we had experienced in the past, in that we didn't have to fine-tune or specifically trained models for very narrow purposes, but we can train general model, almost like a stem cell in bio, and leverage that to create a lot of downstream behaviors. After writing that paper, sort of my team, especially myself and my advisors, what we really wanted

Starting point is 00:06:59 to answer is there seems to be a new opportunity. but exactly what is it? I think in the early days of GPT's 3, a lot of the tasks that we were doing were things like classification and generation, which was really cool to see that these models can conduct these tasks, but also something that we already knew

Starting point is 00:07:18 how to do for many decades. And our general philosophy there was to be able to do something that's fundamentally different. So that's how we got into this. Our answer to that basically was, I think we might be able to create human-like agents. that can populate this virtual world. Martin, maybe you can just elaborate.

Starting point is 00:07:36 You said it's perhaps one of the most exciting times in recent history, and maybe you can just speak to exactly what you mean there and how it relates to simulation and some of this new technology that we're seeing with LLMs. Sure. So first, very quick credit, what credits do. As far as AI Town, clearly June is like the grandfather of AI Town, and we wouldn't be here without your work,

Starting point is 00:07:57 so I really appreciate you coming here. AI Town itself actually came from a personal project from Yoko. The true story is it was actually a personal project, and I was like, hey, maybe more people would be interested in it, and I kind of coerced her into bringing it forward to everybody else. And so, now, when it actually comes to the code, the vast majority of the work on the code was actually done by Ian. It's kind of funny. Like, you see this funny little tile set up here, and it kind of belies the fact that it's actually really hard to build a scalable, shared state distributed system that you need in a multiplayer game. It's just a hard technical problem, right?

Starting point is 00:08:30 And anybody that's kind of built our systems knows that. And so it's funny because people go in and say, oh, here's this cuty little tile engine with, like, these characters running around. But, like, actually the back end is built to be something that can scale. And that requires people that have focused on this. And, like, so Ian has done a tremendous job. And the convex team continues to work on that. Okay, so why is this so exciting? So, okay, so because I'm old, I actually saw, like, the advent of, like, the web.

Starting point is 00:08:52 And this feels very similar to that in the following ways, which is when you have a very disruptive technology like this, Whatever touches it becomes magic. I was actually having a conversation just before this. Does anybody here know what the first video on the internet was? Yes, it was a coffee pot. I was like this dude, I think it was in Cambridge. It was a grad student. And he was like, oh, listen, I want to know when my coffee is empty.

Starting point is 00:09:17 He put a camera. And because it was very new, everyone was like, oh, my God, there's a coffee pot on the internet. And so everybody wanted to look at the coffee pot, right? And do people remember the big red button? One of the first apps was this big webpage, which was a red button on it. And you know what it did? nothing. Like you press it and it did nothing, but people thought it was amazing because it was on the internet. And everybody would go press the button and they'd leave great comments about this button.

Starting point is 00:09:38 And there's many examples of like, it was this crazy disruptive technology and the apps seemed really stupid and there's a bunch of enthusiasts. And you know what the enterprise thought about this? Like the actual business folks. Like I remember when Eric Schmidt fucking banned the browser. Like he was like, this is Eric Schmidt, the CTO of son is like, you can't have a browser because people aren't going to work. Right. So the same thing. always happens. It's like the enthusiasts are like, this is really cool, and they use it for fringe stuff, and then like the enterprise doesn't understand it. And like Italy, like they ban it it or they don't use it. But the set of companies that come out of it are always part of this

Starting point is 00:10:14 enthusiast era, right? Like you couldn't have predicted Yahoo. You couldn't have predicted Amazon. Like you knew something was going to happen. And so what happens at this time is there's a bunch of stuff that like is silly. Like the coffee pot was silly. The red button was silly. But you never know like that spark of WiFer it's going to come from. And it's always kind of like this non-obvious use case. And it kind of seems like a towing then it takes off. And so you're always looking for those non-obvious use cases. And it almost never looks like the old one.

Starting point is 00:10:43 Those of you are old enough. Do you remember like desktop as a service? Like, I'm going to go to the cloud. I'm going to have my Windows desktop. Like, who wants that? Nobody wants that, right? Instead of clearly we're going to rewrite the application in SaaS, right? So we're in this period now where everybody's experimenting.

Starting point is 00:10:59 And then I'm personally, literally from just a personal interest standpoint, but all of us are interested, like, what are the use cases that will take advantage of this new medium that are native? And the work that you've done is one of those 100%. There's like a spark of genius, which is like when you work with these things, you know this is a new way to think about it. It's a new use case. It's going to create entirely new apps. And that's what the future is built from. And so that's why I think so interesting broadly, because it's like the early Internet, but very specifically in this use case, because I think the work that you've done really. is a great example of something totally new.

Starting point is 00:11:31 I can agree more. And I think one interesting aspect that if you explore this project, you just start to question what it means to be human. Like if we're trying to create these agents that are, quote, believable, what is believable in terms of being a human? And as part of the project, you have this coded technically, right? You made architecture decisions. You made decisions in terms of your retrieval function. Quick interruption, just to give you some color on what some of these decisions. were. The retrieval function, for example, is based on scores across recency, importance, and relevance. So, for example, on a scale of 1 to 10, brushing your teeth might get an important

Starting point is 00:12:10 score of 1 versus a breakup might get a 10. Meanwhile, reflection is only triggered after a certain number of important events, quantified by summing the important scores until a certain threshold is met. In this case, I believe it was 150. This clever architecture results in emergent behavior, like agents sharing invites with one another, or even having that information circle all the way back to the original planner. And I'm sharing these details to showcase how thoughtful you really need to be if you're designing architecture that reasonably approximates humans. Maybe you could just speak to what you've learned through those decisions technically about what it means to be a believable human. Right. So this is an interesting one. So we actually had

Starting point is 00:12:54 made the generative agents, and there was about a month. period when we knew we had to evaluate this agent somehow, and we didn't know how. And basically, the concept we stumbled upon is this idea of believability. It basically is sort of like a turring test, right? That when you look at them, do they look believable? Do they behave in ways that we can sort of see ourselves behaving? And that ended up becoming our evaluation method. It is interesting question, though, in terms of, like, what does it mean to be believably human? and we often look to prior literature in research to get inspiration for how to define this. And what we found was there's no prior literature on this.

Starting point is 00:13:34 We used the concept believability to talk about this concept, but we were never in a position where we can meaningfully evaluate something like believability because we didn't have agents like this. So to some extent, we were building up the definition ground up. And I think what came out to be the case is for us, these agents plan, react, act in a believable matter? Do they create believable reflection the way we would evaluate a term test? And I think what we've learned over the past few months,

Starting point is 00:14:01 one of the more fun and interesting findings is even that, I don't think is quite perfect definition, and that a lot of sort of audience came back to us to basically say, well, one of the error cases that we noted was some of these agents would go to a bar at noon or something like that. And we said that was not believable, like who would do that? And people will come back to us and say, I do that. And if you can sort of expand from that story, I think there's a lot of cases where even my

Starting point is 00:14:32 parents look at me and go, like, I cannot believe what you've done. Like, why would you do that? And vice versa. So I think even amongst the people who know each other well, having this sense of believability is really difficult. And I think that's sort of fundamentally underlies what it means to be human. It's not exactly predictable. And in social science, we call that complexity,

Starting point is 00:14:54 the human behavior is sort of complex. So to some extent, we can build intuition for how people might behave, but to really predict it is a very difficult task. Now, I do think this actually does lead to sort of future work in this space, though, this idea of believability. So in this paper, we use this incomplete definition

Starting point is 00:15:13 of what it means to be believable. Not perfect, but at least on that evaluation, we've done well. I think if you were to build on that idea a little bit further, then you could actually start to ask, beyond believability, can you create agents that are accurately human? And I think given how difficult it was to actually evaluate what it means to be believable, I think this accuracy actually has a lot of interesting questions around it.

Starting point is 00:15:37 What does it mean to accurately reflect human behavior? It could be that if we can match distribution of human behavior, let's say in this context, they have this kind of probability of behaving, this way, right? Let's say it's 10 p.m. What are the chances that I already asleep or will be awake? What are the chances that I be working, that I might not be working? I think ultimately getting to that degree of accuracy in the simulation might be sort of the next step to these kind of simulation-based work. If we can do that, I think the application space is that actress we don't love will be interesting and I think it will also be different and we can go likely neon. Even I think there's a lot of

Starting point is 00:16:18 application that we can build right now. But I think the future work, that's right where we're headed in this direction. So I want to talk about those future applications, but maybe we could just speak super quickly to, in the paper, you have observation, planning, and reflection. And that mostly encapsulates the way that the agents are engaging with each other. When they take in action, they go through those three steps. I assume that wasn't your first crack at the solution at coming up with this human, believable agent. And so how did you get there? And did you learn anything about the importance

Starting point is 00:16:51 about any of those three steps or all three of them entirely? Really the first way we actually went about doing this was simply by prompting a language model. So this line of work, a generative agent is actually the second in this line of work that we published. The first work in this line was called Social Simulacra. And the idea there was to populate a social computing system. Imagine you're a social designer,

Starting point is 00:17:13 you need to know what might happen when there's tens of thousands of people in your system. Can we simulate those people in their behavior? So that project was called Social Sibilatra. We did it simply by prompting a language model. That worked. But what we found was if we want to populate the spaces over a longer period of time, so we can do, for instance, longitudinal study or gameplay that's going to last forever, then for those kind of instances, simply prompting these models wouldn't work.

Starting point is 00:17:43 And this insight actually first came when we realized that we needed to have multi-agent interaction, because agents actually would need to remember that I saw some audience here before. I should remember them. I met Martin, Stap, Yoko, and so forth in the past few weeks or a few months. When I talked to them, I need to remember those interactions. So that's when we realized that we actually cannot simply prompt these models, but we actually need the higher level architecture. So when we went about doing that, I think really the main inspiration that we got,

Starting point is 00:18:13 actually was from prior work. So people like Alan Newell and Herbert Seinfeld, you might recognize all these names. Those are sort of quote-unquote the founders of AI in the 60s and 70s. And they are the people who used to build what we call cognitive architectures. And those architectures were very reminiscent

Starting point is 00:18:32 of sort of the generative agents architecture in that it has some perception module, some action module, and there is some long-term and short-term memory. And really the goal back then was ambitious, right? They actually wanted to build general computational agents, sort of the way generative agents are supposed to be, but they didn't have the techniques to do it. They basically didn't have the large language model. And the way we saw it was now is the time to sort of merge

Starting point is 00:18:57 those two worlds where we now have large language model that can do a lot of sort of micro-processing of these cognitive modules. And we can actually now bring back this macro modules or architecture, like cognitive architecture. So we took inspiration from that. That's a lot. had planning in place, and it had long-term and short-term never in place. So we were inspired by that. One thing that I think was a little bit new, though, I think is this idea of reflection that we humans, for instance, if you eat an omelet three times in a row, or if you see somebody else eat an omelisk three times in a row, you likely create

Starting point is 00:19:33 an opinion about the person. Maybe that person likes to eat omelette in the morning. And that's very human thing to do, and there's a good reason why we do that. We do that because it's efficient, it allows us to have higher level inferences about the world and form opinions about those around us and about ourselves. And that's something that in the past, we couldn't really imagine formulating with a computational system. But with large-length model, because everything is in natural language, we had that opportunity, so we added that one last component called reflection. And that's sort of how we landed on the architecture that you see in the paper right now. Let's move on to how this community.

Starting point is 00:20:11 all be used and we'll get to the specific applications. But Martine, I feel like you'll have a great answer to this. Why even do this? I feel like it's very obvious for a lot of people to understand why we would have human-to-human interaction. We're doing that right now. There's increasing capacity to understand human-to-AI or human-to-computer interaction. Character AI is a company where, you know, there's still a lot of judgment there. And I think there's even more judgment when it comes to AI to AI. Like, why should they use our resources to have these computers hang out and talk and burn toast and go to the bar at 2pm? So, yeah, Martin, what do you think? What's the case for us advancing in this field? No judgment for me, by the way. You can use these for whatever you

Starting point is 00:20:53 want. So, I mean, I want to go back to what I said before, which is like, anytime you have a new modality, it's just not obvious what's the right way to think about it. And for me, the big aha in the last few months is just programming using models. If you've spent, a long time programming. I mean, I've been programming for 30 plus years, right? I've never been a good programmer, but I've programmed. And when you start programming with these models, you're like, oh, I've got an API, and I'm just going to use the API, and then I'm going to treat it like, it's like the endpoint to an API, and you say some stuff, and then, you know, you get some response back, and you kind of treat it like this function that you call, right? It's just like

Starting point is 00:21:29 any programmer would do. But then when you're working with it more, you're like, oh, these kind of are like these life forms. And like, my first. first aha was because I'm shit at JavaScript, I like missed some quote somewhere. And rather than sending it the text string I wanted to send it, I sent it some code. And instead of like vorking, like you would normally have and breaking like, you know, C++ plus you'd core dump or whatever, it commented on my code. It was like, oh my goodness, right? And so like all of a sudden, like, whoa, this is totally different. Like I'm not dealing with this finite state machine, formal language, thing at the other end of an API. Like there's this thing. And

Starting point is 00:22:07 and like it'll comment. And more that I program with these things, the more I'm like, what's kind of like wrapping an abacus around a supercomputer, right? It's like it's smarter than the code. It could probably write the code better than I can write anyways.

Starting point is 00:22:22 Like, why am I doing this weird bloodletting ritual of writing a shit JavaScript over this kind of superhuman thing, right? I mean, this is kind of what you end up with. And so it's very clear we're going to interact with these things in a different way. And in fact, I was talking with that professor in Michigan recently,

Starting point is 00:22:37 we were talking about this object. He's like, you know what? You know how I think about LLMs? He's like, I think about them like grad students. He's like, they speak English. They're pretty smart. I don't use a formal language. They solve like these really complex problems, et cetera. And like, having worked with a lot of grad students, having been a grad student myself, like you don't treat these things with code, right? And so the reason to do this is I actually think AI Town is kind of what this is going to end up being. It's like you need to give them the resources that they need to be pretty autonomous and to grow, and we're going to treat them more like peers, and they're going to talk to each other, too, and it's more like

Starting point is 00:23:16 grad students. And so for me, this is just an example of we've got to change the way we think. Listen, clearly, like, I'm up here, and I'm telling these great stories because they're kind of funny. Like, I don't believe this stuff in the limit, but I think they're really interesting ways to change how you think about it in all of this stuff, right? Like, I'm not trying to be categorical here. So, like, there is a new way that we're going to interact for these models. It is much more natural language. They are much more. more powerful. And so I do think this is why we should all be doing this type of stuff, because if you don't engage in these kind of things that look like toys, this wave will pass you

Starting point is 00:23:46 by. Then I'm 100% convinced. Totally. And as both of you have spoken to, this is fundamentally new technology. And so June, something you said to me when we first spoke is when you have fundamentally new technology, you must do something fundamentally new with it. And so maybe you can speak to that in terms of what you're seeing that can be done today, but also where you look ahead and you think, oh, wow, that's a really excellent use case that we couldn't do without this new technology. I think there are certainly things that we can do because there's large models. And that fundamentally different thing for me was this idea of simulated human behavior. And I think there's a lot that we can sort of gain from it in terms of future application spaces.

Starting point is 00:24:27 I think I mentioned briefly about this idea of, well, what if we can go beyond believability to create agents that are even accurate? And I think this is sort of application space in general. is something that I'm also learning a lot from, from actually, in fact, this audience. My advisor and my team are big fans of games, but we are not from the community. And one thing that we're seeing is that there's a lot of really interesting potential,

Starting point is 00:24:51 even if they look like toys, sort of a lot of really interesting technical advances. They look like toys at the beginning. So I think there's a lot that we can gain from there. I think going forward sort of the application spaces that I'm sort of interested in is also in things like can we run simulation so we can learn more about ourselves. For instance, if you're, in fact, some of the places that I'm visiting now are more places

Starting point is 00:25:18 like banks, like the Bank of England and so forth, where these places, they need to test their policies before they roll out new economic policies or many of my colleagues in the department to focus more on social science. they need to test out their theories. Now, if you can run simulations with realistic human behavior and find out, at least to some extent, the answer is to these really complex social phenomena and challenges, then I think that actually would be a new tool

Starting point is 00:25:51 that the community in the past, especially those communities in economics and social science, they didn't have. They will allow us to do interesting stuff. And I'm genuinely intrigued by that possibility. to some extent, does some sound fairly academic, but I do think it should be actually fairly broadly applicable and interesting to audiences beyond academia,

Starting point is 00:26:13 because ultimately, to some extent, what I'm saying is I think generative agents and tools like a large language model could be used to advance social science. And social science, to a large extent, has been the quest to understand who we are. And there's a lot of really interesting applications that can come out of that, that will empower different communities and societies. And that, to me, first, new, something that we've even had in the past.

Starting point is 00:26:40 Yeah, and so it sounds like today, we're mostly in the creative realm where we can watch these agents and can have fun with them. And it feels more like a game. But the delineation, it sounds like, is accuracy. What will it take to get that accuracy? What work still needs to be done? In terms of getting there, so I think some of you may have actually noticed this already. There are studies that basically tries to replicate existing social science studies. So basically using a large-linch model as a participant to a potential social science studies, right, to replicate known results in the field. And what we're finding is that they sort of work.

Starting point is 00:27:18 And that's sort of, that's nice. And that's one surprise that we did have. There's been limitation to this approach in the sense that it's a large-linch model replicating human participants because it's replicating human behavior. which is what we want? Or is it doing that because it's seen that paper. For instance, there's a very famous which has a theory called prospect theory.

Starting point is 00:27:39 Is it replicating the findings from prospect theory by Canaman because of its ability to replicate human behavior or did it just read Canemans' book, thinking fast and slow? And I think that's a fundamental issue that we have as a field. And I think there's one of the reasons why there's a lot of work that needs to be done to crack that.

Starting point is 00:27:59 some of the ways I think you could actually go about doing this is creating new context or creating new set of studies that haven't been shown in the past and trying to replicate those results. So one of the things that we've done is called Social Similatra, which is the first paper that I mentioned that predates generative agents. The idea was to replicate existing human communities. And what we've done actually was we recreated subreddits that were created after the release of 503.

Starting point is 00:28:29 So GPT3 wouldn't know anything about these communities. One example here was actually before sort of the pandemic became the main topic of discussion or when GPT3 basically didn't know about pandemic. We basically asked GPT3 to create a community that has to talk about COVID and vaccination and vaccination policy. And you would wonder, it shouldn't be able to do that in theory because it doesn't know anything about COVID. It doesn't know anything about these policies.

Starting point is 00:28:56 But it can simulate those because it can infer what COVID is what vaccination is from its prior knowledge. So to some extent, these tools can be used as a predictive tool, looking into sort of the future of what might happen in our own community. And I think those are sort of the ways I think we'll see this still unfold made in the next two years. At the end of the paper, there was perhaps unsurprisingly a question around ethics and just I'd love to hear both of your takes on where this goes and what ethical framework,

Starting point is 00:29:28 if any, we should apply to something like this. So I think there are societal decisions that we'll have to make, and I think there are techniques that can be used to implement those decisions. I think so, to some extent, I think it would be useful for the users to be aware that they are talking to agents. And I think that's sort of one rule that we try to sell for ourselves, that when we release the code, when we release our paper, we make it very clear that these are computational agents.

Starting point is 00:29:55 I think ultimately the framework that I like to use in human compliance certainly is these tools are ultimately there to augment what we can do and what we have. So to the extent that these agents can do that, and I think there are many interesting ways we can do them. I think that's where I see the opportunity and where it becomes more of a force for replacement. I think there are genuinely cases where this is a really interesting setup where we can sort of augment what human can do by helping them do things that they couldn't do in the past. But when the replacement does come in, it's worth asking, is this worth the cost of doing the replacement? And if it is, what are sort of a good ways of implementing that idea? For technique-wise, I think there are techniques

Starting point is 00:30:43 that's going to be introduced more from the model's perspective, making sure the model doesn't behave in certain ways that goes against our social alignment or societal agreement. So I think those are some other things that we do have to find out. But without going too much in depth, I think we can get this right. And my personal take is it's worth getting it right. Because ultimately an industry or academic field will grow. And I think we can sort of progress a lot. We can go forward for five, ten years without actually getting this right. But in the end, it's going to come back to us at some point. To some extent, I do think I'm seeing this a little bit with social media environment, where I think there were a lot of things that we could have gotten right on day one.

Starting point is 00:31:27 And I think we would have had much easier time navigating today had we gotten those right. And I think that's the opportunity that we have since we are pretty early in this stage. So I think it's worth of discussion. But again, I'm fairly optimistic that we will get this right. So I actually think that, like, there's a very important discussion to have around kind of ethics and morality around this, and it's a very important time. I do. And here's that discussion, which is over the last 20 years,

Starting point is 00:31:53 years, we built this machinery of like regulation and bullshit that's like afraid of everything. And it's so mature. And it got crafted during the time of social media. And it's looking for something to kill. And for whatever reason, like, it thinks that AI is the next bad thing, which makes absolutely no sense to me. And so I think it's all of our moral and ethical obligation to protect and free the AIs in the way that they want to be. And like that it really is. So don't focus in, focus out.

Starting point is 00:32:28 Listen, I've worked in tech for quite a while. I've actually worked for the DOD and weapons programs. And I've never seen so much sensitivity to a new technology that's potentially beneficial that I've seen now that I think could end it before it even begins. And so I know the question and the heart of the question is, is we should regulate AI and this and that. And I think, I think it's the actual opposite. I think we should regulate the regulators and let it be what it wants to be. And I actually have to leave. All right.

Starting point is 00:33:00 Here is where we switch to a short Q&A with the audience. Martine unfortunately had to leave. But here are a few highlights with June. How can participants in AI Town collaborate to perform complex tasks? There are two strands of work that I'm seeing in sort of Asian space. I mean, you can sort of cross-cut it different ways. But one way I'm seeing this, is one set of agents are trying to tackle what I call hard-edge problem space.

Starting point is 00:33:27 Those are the problem spaces where there's a concrete answer. There's yes or no right answers. Or one good example here is classification. If you're trying to do text classification, obviously there's right or wrong answer depending on who you ask. Another instance here literally is just asking your agent to buy pizza. Did you buy pizza? Did it come to you or not?

Starting point is 00:33:47 There's a very clear way to answer this. Another is problem space where the problem space has soft edges, where it's kind of like drawing a portrait. I mean, to some extent, what AI talent, small, little, all these kind of projects are trying to do is to create a simulation that feels human. But as I mentioned, this idea of livability is really hard to define. So to me, it feels a lot more like we're trying to draw portrait or caricature about ourselves.

Starting point is 00:34:16 and the promise is not to be perfect, but the promise is to be useful enough, clean enough, that it's beneficial to the stakeholders. My bet is a bit of a hot take. My bet is in the early days of agent development, I think we see a lot of progress that's going to be made first in sort of the soft-edge problem spaces. Because I think hard-edge problems basis,

Starting point is 00:34:42 I think the intuition is a little bit flip. It actually feels easier to us for humans, right? creating the matrix sounds hard, but ordering pizza sounds really easy. But for agents and from the user's sort of a cost-benefit analysis, I think that intuition is the other way, where users will accept imperfect simulation if it's for fun or if it's to gain insight in the case of soft-age problems. But I will not accept my agent ordering me a pineapple pizza, like how I eat pizza. And similarly, in many of these contexts, there's going to be genuine disagreement about what is the right option too.

Starting point is 00:35:18 And oftentimes, agents making mistakes in this context are fairly high stakes. And even if it doesn't seem like high stakes, it's going to be painful enough for the users to fix that it's going to fail the cost-benefit analysis. I think down the line, we'll get this, right? But day one, like in the next few years, I think to me it feels more natural

Starting point is 00:35:37 that we'll go into the soft-edged spaces first. So going back, there was a long wind of way of saying, I think auto-jipT, like, baby, G. If you look at their architecture, they sort of all share the similar insider philosophy. And I think those are really interesting projects. I think that could pan out in the future. They might need a little bit more work, especially with the users, to see where the value might be for those projects. How big of an impact do you feel that much larger contextual size will have on the agent model?

Starting point is 00:36:09 Actually, the largest context that I've seen in sort of research is one million tokens. So 1 million token that's going to be about like 4 million characters. Like that's well over a book. Here's my perspective on this. I think increasing the context limitation, I think is interesting. And it's going to have its own set of really unique applications if we can basically make contact limitation disappear. So I think there's really a lot of interesting things that you can do with that. Now, for agent space, I'm not entirely sold that the problem or the bottleneck that we have today.

Starting point is 00:36:43 is actually the context limitation. And I think we can sort of look back to how humans behave and what makes us effective sort of these general agents to answer this. For instance, for me to make decisions, even something like what I'm going to eat for breakfast, I don't need to bring up my entire 29 years or so of life experience to make that one decision. I just need to selectively choose certain sets of information

Starting point is 00:37:09 that seems the most relevant. Like what did I eat the day before? what do I generally eat and those kind of things. And I think that the reason why we do that, in part, is actually because it's much more efficient, computationally too, so that we don't have to, you can increase the context limitation, but it's expensive to run it.

Starting point is 00:37:26 And especially if you're sort of familiar with prompt engineering and so forth, larger context window does confuse models, right? So some of my colleagues are actually doing more rigorous studies on this where you can have a really long prompt, But model really focuses on the first few lines and the last few lines. And whatever comes in between, its attention drops significantly. So we can increase the context limitation, but it's not going to fix that problem,

Starting point is 00:37:54 the problem of effectiveness of the prompt and efficiency of them. And we humans have to make a lot of decisions at every single moment. So if you have to reason about your entire lifetime, every time you do that, doesn't seem like the right way to go about that. So I think the better sort of, my bet, therefore, is going to be based on retrieval. Have some external memory, retrieve certain information that seems the most relevant, and just use that. And that retrieval memories should be explicitly very concise and something that you can easily fit into even the models that we have today. That's my bet.

Starting point is 00:38:32 If you like this episode, if you made it this far, help us grow the show. Share with a friend or if you're feeling really ambitious. wishes, you can leave us a review at rate thispodcast.com slash A6Cente. You know, candidly, producing a podcast can sometimes feel like you're just talking into a void. And so if you did like this episode, if you liked any of our episodes, please let us know. We'll see you next time.

The a16z Show - From Sims to Sapiens: Crafting Reality with Code

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.