The a16z Show - From Sims to Sapiens: Crafting Reality with Code
Episode Date: November 6, 2023Is it possible to construct a virtual society that authentically replicates human behavior? AI Town, a virtual town experiment where AI residents live, interact, and engage, provides valuable insights... into the future of AI's believability and its interaction with humanity.In this panel discussion, Joon Park, the author of 'Generative Agents: Interactive Simulacra of Human Behavior,' and Martin Casado from a16z, discuss the influence and potential of Generative Agents, exploring their practical applications in the real world.Topics Covered00:00 - Simulating human behaviors04:49 - What are generative agents?07:47 - Simulations, new technology, and LLMs11:45 - The architecture behind simulating human behavior16:37 - Generative agents interactions: observing, planning, and reflecting20:22 - What is the value in advancing generative agents?24:01 - Use cases for simulation behavior technology29:31 - What are the ethical frameworks?33:12 - Q&A from the audience Resources: Find AI Town: https://www.convex.dev/ai-townRead the paper ‘Generative Agents: Interactive Simulacra of Human Behavior’: https://arxiv.org/pdf/2304.03442.pdfFind Joon on Twitter: https://twitter.com/joon_s_pkFind Martin on Twitter: https://twitter.com/martin_casadoFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
They actually wanted to build general computational agents, sort of the way generative agents are supposed to be.
But they didn't have the techniques to do it.
They basically didn't have the large-range model.
And like the work that you've done is one of those 100%.
There's like a spark of genius.
What does it mean to accurately reflect your behavior?
But the set of companies that come out of it are always part of this enthusiast era, right?
Like you couldn't have predicted Yahoo.
You couldn't have predicted Amazon.
Like you knew something was going to happen.
Can we run simulation so we can learn more about ourselves?
A few weeks ago, the A16Z infrastructure team ran an event in the San Francisco office.
The topic, generative agents.
These are autonomous characters designed to simulate human behavior,
derived from a recent but game-changing paper called generative agents,
interactive simulacra of human behavior.
Developers from all around the city came to hear the lead author,
June Park, speak alongside A16Z general partner, Martin Casado. And in this panel, they discuss
how this paper and the advancements in large language models have opened a new window, expanding
the dynamism of simulation, which instead of binary logic, we're using probabilistic thinking,
and the ability to incorporate new information. So what does that really mean? Well, instead of your
character in Sims, following very specific rote rules, with generative agents,
a father may go outside because he notices his son, another may take their breakfast off a stove
because they notice it's burning, and another may even opt into a Valentine's Day party invite,
and then elect not to show up. All very human behaviors. Now, the architecture described in the
paper is, of course, intentionally designed by June and team, and it's a combination of a seed identity
for every agent, and then functions that cause each one to do three discrete things. To observe, to
plan and to reflect. And these architecture decisions ultimately generate unexpectedly spirited
conversations just like this. Hey, lucky, it's so great to see you. How have you been? I've been
dying to hear about your space adventure. Hey, Kira, I've been fantastic. My space adventure was
out of this world. I can't wait to share all the details with you. Or even this. I've been
trying to find my way. It's been a chaotic journey to say the least.
Grace the chaos, dear Kurt, for within its turbulence lies hidden truth.
Seek the depths of the unknown and unravel the mysteries that burden your soul.
And here's the thing. They don't just interact with each other. Again, they wake up, they cook, some paint while others write, they hold opinions of one another, and most importantly, they remember and they have higher level reflections based on the past.
It's pretty amazing, don't you think?
So as these generative agents become a lot closer to nuanced human behavior, what can we learn about being human from these surprisingly realistic simulations?
And what is the calculus of that believability? Are there real-world applications on the horizon? And what is truly net new here?
Listen in as we discuss all that and more, including the origin of the very paper that June wrote.
I hope you enjoy. As a reminder, the content here is for informational purposes only.
should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security,
and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16C.com slash disclosures.
So how many people in this room have actually read the generative agents paper that June wrote?
It's a lot of people.
pretty much everyone. So June, even though so many people have read it,
why don't you just give a quick overview of what it is,
but also maybe the backstory that people haven't maybe heard of.
So general debate agents is these general computational agents
that can similarly believable human behavior,
fundamentally leverages something like a large language model.
Under the assumption that a language model has encoded
or has seen so much about human behavior from its training data,
from the Wikipedia, social web, and so forth.
So if you are able to poke it at the right angle, you can actually extract a lot of those human
behaviors in a very context-specific manner.
The opportunity here is that in the past, we had to manually author a lot of these behaviors,
but now we can simply generate them your language model.
So generative agents leverages that to create these computational systems.
Ultimately, one sort of technical improvement that we're trying to make in addition to a large
knowledge model is basically giving it some form of memory and retrieval system.
So you may have all used, obviously, chat GPT and so forth.
It is heavily context-limited.
And even if that limitation were to go away in the future,
processing a lot of really long-term context window is really inefficient
and also ineffective when you're trying to prompt these models
for a really narrowly defined behavioral aspects.
So main philosophy here is we're going to give long-term memory for these agents
that's external to the language model.
And then we treat the contextually relevant information
from that loan through memory,
whether it's planning,
action sequences, or reflections
to create these computational agents.
Philosophically, to some extent,
I think this is akin to creating
the operating system
around Lurge Lange model.
In the way we're primating Lurge Langem model,
to me, it feels a lot like
how we used to use computers back in the day
when we had to wire up the back end
every time you run a new program.
And what has really made
complex behavior with these computational
tools possible was introduction of these larger architecture that surrounds the core fundamental
techniques. So that's what generative Beijing is about. And you mentioned sort of the background
of why we got into all this. So I started my PhD at sort of midway through 2020. That was just
around when GPT3 was about to come out. And that year, we, a bunch of basically authors at Stanford
were working on this paper called Foundation Model, the opportunities and risks of foundation models.
What we were seeing was these new form of machine learning models that seem fundamentally different
than the things that we had experienced in the past, in that we didn't have to fine-tune or specifically
trained models for very narrow purposes, but we can train general model, almost like a stem cell
in bio, and leverage that to create a lot of downstream behaviors.
After writing that paper, sort of my team, especially myself and my advisors, what we really wanted
to answer is there seems to be a new opportunity.
but exactly what is it?
I think in the early days of GPT's 3,
a lot of the tasks that we were doing
were things like classification and generation,
which was really cool to see
that these models can conduct these tasks,
but also something that we already knew
how to do for many decades.
And our general philosophy there was
to be able to do something that's fundamentally different.
So that's how we got into this.
Our answer to that basically was,
I think we might be able to create human-like agents.
that can populate this virtual world.
Martin, maybe you can just elaborate.
You said it's perhaps one of the most exciting times in recent history,
and maybe you can just speak to exactly what you mean there
and how it relates to simulation and some of this new technology
that we're seeing with LLMs.
Sure.
So first, very quick credit, what credits do.
As far as AI Town, clearly June is like the grandfather of AI Town,
and we wouldn't be here without your work,
so I really appreciate you coming here.
AI Town itself actually came from a personal project from Yoko.
The true story is it was actually a personal project, and I was like, hey, maybe more people
would be interested in it, and I kind of coerced her into bringing it forward to everybody
else. And so, now, when it actually comes to the code, the vast majority of the work on the
code was actually done by Ian. It's kind of funny. Like, you see this funny little tile set up here,
and it kind of belies the fact that it's actually really hard to build a scalable, shared state
distributed system that you need in a multiplayer game. It's just a hard technical problem, right?
And anybody that's kind of built our systems knows that.
And so it's funny because people go in and say, oh, here's this cuty little tile engine with, like, these characters running around.
But, like, actually the back end is built to be something that can scale.
And that requires people that have focused on this.
And, like, so Ian has done a tremendous job.
And the convex team continues to work on that.
Okay, so why is this so exciting?
So, okay, so because I'm old, I actually saw, like, the advent of, like, the web.
And this feels very similar to that in the following ways, which is when you have a very disruptive technology like this,
Whatever touches it becomes magic.
I was actually having a conversation just before this.
Does anybody here know what the first video on the internet was?
Yes, it was a coffee pot.
I was like this dude, I think it was in Cambridge.
It was a grad student.
And he was like, oh, listen, I want to know when my coffee is empty.
He put a camera.
And because it was very new, everyone was like, oh, my God, there's a coffee pot on the internet.
And so everybody wanted to look at the coffee pot, right?
And do people remember the big red button?
One of the first apps was this big webpage, which was a red button on it.
And you know what it did?
nothing. Like you press it and it did nothing, but people thought it was amazing because it was on
the internet. And everybody would go press the button and they'd leave great comments about this button.
And there's many examples of like, it was this crazy disruptive technology and the apps seemed
really stupid and there's a bunch of enthusiasts. And you know what the enterprise thought about
this? Like the actual business folks. Like I remember when Eric Schmidt fucking banned the browser.
Like he was like, this is Eric Schmidt, the CTO of son is like, you can't have a browser because
people aren't going to work. Right. So the same thing.
always happens. It's like the enthusiasts are like, this is really cool, and they use it for
fringe stuff, and then like the enterprise doesn't understand it. And like Italy, like they ban it
it or they don't use it. But the set of companies that come out of it are always part of this
enthusiast era, right? Like you couldn't have predicted Yahoo. You couldn't have predicted Amazon.
Like you knew something was going to happen. And so what happens at this time is there's a bunch
of stuff that like is silly. Like the coffee pot was silly. The red button was silly.
But you never know like that spark of WiFer it's going to come from.
And it's always kind of like this non-obvious use case.
And it kind of seems like a towing then it takes off.
And so you're always looking for those non-obvious use cases.
And it almost never looks like the old one.
Those of you are old enough.
Do you remember like desktop as a service?
Like, I'm going to go to the cloud.
I'm going to have my Windows desktop.
Like, who wants that?
Nobody wants that, right?
Instead of clearly we're going to rewrite the application in SaaS, right?
So we're in this period now where everybody's experimenting.
And then I'm personally, literally from just a personal interest standpoint, but all of us are interested, like, what are the use cases that will take advantage of this new medium that are native?
And the work that you've done is one of those 100%.
There's like a spark of genius, which is like when you work with these things, you know this is a new way to think about it.
It's a new use case.
It's going to create entirely new apps.
And that's what the future is built from.
And so that's why I think so interesting broadly, because it's like the early Internet, but very specifically in this use case, because I think the work that you've done really.
is a great example of something totally new.
I can agree more. And I think one interesting aspect that if you explore this project,
you just start to question what it means to be human. Like if we're trying to create these
agents that are, quote, believable, what is believable in terms of being a human? And as
part of the project, you have this coded technically, right? You made architecture decisions. You
made decisions in terms of your retrieval function.
Quick interruption, just to give you some color on what some of these decisions.
were. The retrieval function, for example, is based on scores across recency, importance,
and relevance. So, for example, on a scale of 1 to 10, brushing your teeth might get an important
score of 1 versus a breakup might get a 10. Meanwhile, reflection is only triggered after a certain
number of important events, quantified by summing the important scores until a certain threshold
is met. In this case, I believe it was 150. This clever architecture results in emergent
behavior, like agents sharing invites with one another, or even having that information circle
all the way back to the original planner. And I'm sharing these details to showcase how thoughtful
you really need to be if you're designing architecture that reasonably approximates humans.
Maybe you could just speak to what you've learned through those decisions technically about
what it means to be a believable human. Right. So this is an interesting one. So we actually had
made the generative agents, and there was about a month.
period when we knew we had to evaluate this agent somehow, and we didn't know how. And basically,
the concept we stumbled upon is this idea of believability. It basically is sort of like a
turring test, right? That when you look at them, do they look believable? Do they behave in ways
that we can sort of see ourselves behaving? And that ended up becoming our evaluation method.
It is interesting question, though, in terms of, like, what does it mean to be believably human?
and we often look to prior literature in research to get inspiration for how to define this.
And what we found was there's no prior literature on this.
We used the concept believability to talk about this concept,
but we were never in a position where we can meaningfully evaluate something like believability
because we didn't have agents like this.
So to some extent, we were building up the definition ground up.
And I think what came out to be the case is for us,
these agents plan, react, act in a believable matter?
Do they create believable reflection the way we would evaluate a term test?
And I think what we've learned over the past few months,
one of the more fun and interesting findings is even that,
I don't think is quite perfect definition,
and that a lot of sort of audience came back to us to basically say,
well, one of the error cases that we noted was some of these agents would go to a bar
at noon or something like that.
And we said that was not believable, like who would do that?
And people will come back to us and say, I do that.
And if you can sort of expand from that story, I think there's a lot of cases where even my
parents look at me and go, like, I cannot believe what you've done.
Like, why would you do that?
And vice versa.
So I think even amongst the people who know each other well, having this sense of
believability is really difficult.
And I think that's sort of fundamentally underlies what it means to be human.
It's not exactly predictable.
And in social science, we call that complexity,
the human behavior is sort of complex.
So to some extent, we can build intuition
for how people might behave,
but to really predict it is a very difficult task.
Now, I do think this actually does lead
to sort of future work in this space, though,
this idea of believability.
So in this paper, we use this incomplete definition
of what it means to be believable.
Not perfect, but at least on that evaluation,
we've done well.
I think if you were to build on that idea a little bit further,
then you could actually start to ask,
beyond believability, can you create agents that are accurately human?
And I think given how difficult it was to actually evaluate what it means to be believable,
I think this accuracy actually has a lot of interesting questions around it.
What does it mean to accurately reflect human behavior?
It could be that if we can match distribution of human behavior,
let's say in this context, they have this kind of probability of behaving,
this way, right? Let's say it's 10 p.m. What are the chances that I already asleep or will be awake?
What are the chances that I be working, that I might not be working? I think ultimately getting to that
degree of accuracy in the simulation might be sort of the next step to these kind of simulation-based work.
If we can do that, I think the application space is that actress we don't love will be
interesting and I think it will also be different and we can go likely neon. Even I think there's a lot of
application that we can build right now. But I think the future work, that's right where we're
headed in this direction. So I want to talk about those future applications, but maybe we could just
speak super quickly to, in the paper, you have observation, planning, and reflection. And that
mostly encapsulates the way that the agents are engaging with each other. When they take
in action, they go through those three steps. I assume that wasn't your first crack at the solution
at coming up with this human, believable agent.
And so how did you get there?
And did you learn anything about the importance
about any of those three steps or all three of them entirely?
Really the first way we actually went about doing this
was simply by prompting a language model.
So this line of work, a generative agent is actually the second
in this line of work that we published.
The first work in this line was called Social Simulacra.
And the idea there was to populate a social computing system.
Imagine you're a social designer,
you need to know what might happen when there's tens of thousands of people in your system.
Can we simulate those people in their behavior?
So that project was called Social Sibilatra.
We did it simply by prompting a language model.
That worked.
But what we found was if we want to populate the spaces over a longer period of time,
so we can do, for instance, longitudinal study or gameplay that's going to last forever,
then for those kind of instances, simply prompting these models wouldn't work.
And this insight actually first came when we realized that we needed to have multi-agent interaction,
because agents actually would need to remember that I saw some audience here before.
I should remember them.
I met Martin, Stap, Yoko, and so forth in the past few weeks or a few months.
When I talked to them, I need to remember those interactions.
So that's when we realized that we actually cannot simply prompt these models,
but we actually need the higher level architecture.
So when we went about doing that, I think really the main inspiration that we got,
actually was from prior work.
So people like Alan Newell and Herbert Seinfeld,
you might recognize all these names.
Those are sort of quote-unquote the founders of AI
in the 60s and 70s.
And they are the people who used to build
what we call cognitive architectures.
And those architectures were very reminiscent
of sort of the generative agents architecture
in that it has some perception module,
some action module,
and there is some long-term and short-term memory.
And really the goal back then
was ambitious, right? They actually wanted to build general computational agents, sort of the way
generative agents are supposed to be, but they didn't have the techniques to do it. They basically
didn't have the large language model. And the way we saw it was now is the time to sort of merge
those two worlds where we now have large language model that can do a lot of sort of micro-processing
of these cognitive modules. And we can actually now bring back this macro modules or architecture,
like cognitive architecture. So we took inspiration from that. That's a lot.
had planning in place, and it had long-term and
short-term never in place. So we were inspired by that.
One thing that I think was a little bit new, though, I think is this idea of
reflection that we humans, for instance, if you eat an omelet three times in a row,
or if you see somebody else eat an omelisk three times in a row, you likely create
an opinion about the person. Maybe that person likes to eat omelette in the morning.
And that's very human thing to do, and there's a good reason why we do that. We do that
because it's efficient, it allows us to have higher level inferences about the world and form
opinions about those around us and about ourselves. And that's something that in the past,
we couldn't really imagine formulating with a computational system. But with large-length
model, because everything is in natural language, we had that opportunity, so we added that one
last component called reflection. And that's sort of how we landed on the architecture that you see
in the paper right now. Let's move on to how this community.
all be used and we'll get to the specific applications. But Martine, I feel like you'll have a great
answer to this. Why even do this? I feel like it's very obvious for a lot of people to understand
why we would have human-to-human interaction. We're doing that right now. There's increasing
capacity to understand human-to-AI or human-to-computer interaction. Character AI is a company where,
you know, there's still a lot of judgment there. And I think there's even more judgment when it
comes to AI to AI. Like, why should they use our resources to have these computers hang out and
talk and burn toast and go to the bar at 2pm? So, yeah, Martin, what do you think? What's the case
for us advancing in this field? No judgment for me, by the way. You can use these for whatever you
want. So, I mean, I want to go back to what I said before, which is like, anytime you have a new
modality, it's just not obvious what's the right way to think about it. And for me, the big aha
in the last few months is just programming using models. If you've spent,
a long time programming. I mean, I've been programming for 30 plus years, right? I've never been a good
programmer, but I've programmed. And when you start programming with these models, you're like,
oh, I've got an API, and I'm just going to use the API, and then I'm going to treat it like,
it's like the endpoint to an API, and you say some stuff, and then, you know, you get some
response back, and you kind of treat it like this function that you call, right? It's just like
any programmer would do. But then when you're working with it more, you're like, oh,
these kind of are like these life forms. And like, my first.
first aha was because I'm shit at JavaScript, I like missed some quote somewhere. And rather than
sending it the text string I wanted to send it, I sent it some code. And instead of like vorking,
like you would normally have and breaking like, you know, C++ plus you'd core dump or whatever,
it commented on my code. It was like, oh my goodness, right? And so like all of a sudden,
like, whoa, this is totally different. Like I'm not dealing with this finite state machine,
formal language, thing at the other end of an API. Like there's this thing. And
and like it'll comment.
And more that I program with these things,
the more I'm like,
what's kind of like wrapping an abacus
around a supercomputer, right?
It's like it's smarter than the code.
It could probably write the code
better than I can write anyways.
Like, why am I doing this weird
bloodletting ritual of writing a shit JavaScript
over this kind of superhuman thing, right?
I mean, this is kind of what you end up with.
And so it's very clear we're going to interact
with these things in a different way.
And in fact, I was talking with that professor
in Michigan recently,
we were talking about this object. He's like, you know what? You know how I think about LLMs? He's like,
I think about them like grad students. He's like, they speak English. They're pretty smart.
I don't use a formal language. They solve like these really complex problems, et cetera.
And like, having worked with a lot of grad students, having been a grad student myself,
like you don't treat these things with code, right? And so the reason to do this is I actually
think AI Town is kind of what this is going to end up being. It's like you need to
give them the resources that they need to be pretty autonomous and to grow, and we're going to
treat them more like peers, and they're going to talk to each other, too, and it's more like
grad students. And so for me, this is just an example of we've got to change the way we think.
Listen, clearly, like, I'm up here, and I'm telling these great stories because they're kind
of funny. Like, I don't believe this stuff in the limit, but I think they're really interesting
ways to change how you think about it in all of this stuff, right? Like, I'm not trying to be
categorical here. So, like, there is a new way that we're going to interact for these models. It is
much more natural language. They are much more.
more powerful. And so I do think this is why we should all be doing this type of stuff,
because if you don't engage in these kind of things that look like toys, this wave will pass you
by. Then I'm 100% convinced. Totally. And as both of you have spoken to, this is fundamentally new
technology. And so June, something you said to me when we first spoke is when you have fundamentally
new technology, you must do something fundamentally new with it. And so maybe you can speak to that
in terms of what you're seeing that can be done today, but also where you look ahead and you think,
oh, wow, that's a really excellent use case that we couldn't do without this new technology.
I think there are certainly things that we can do because there's large models.
And that fundamentally different thing for me was this idea of simulated human behavior.
And I think there's a lot that we can sort of gain from it in terms of future application spaces.
I think I mentioned briefly about this idea of, well, what if we can go beyond believability to create agents that are even accurate?
And I think this is sort of application space in general.
is something that I'm also learning a lot from,
from actually, in fact, this audience.
My advisor and my team are big fans of games,
but we are not from the community.
And one thing that we're seeing
is that there's a lot of really interesting potential,
even if they look like toys,
sort of a lot of really interesting technical advances.
They look like toys at the beginning.
So I think there's a lot that we can gain from there.
I think going forward sort of the application spaces
that I'm sort of interested in
is also in things like can we run simulation so we can learn more about ourselves.
For instance, if you're, in fact, some of the places that I'm visiting now are more places
like banks, like the Bank of England and so forth, where these places, they need to test
their policies before they roll out new economic policies or many of my colleagues in the
department to focus more on social science.
they need to test out their theories.
Now, if you can run simulations with realistic human behavior
and find out, at least to some extent,
the answer is to these really complex social phenomena
and challenges, then I think that actually would be a new tool
that the community in the past,
especially those communities in economics and social science,
they didn't have.
They will allow us to do interesting stuff.
And I'm genuinely intrigued by that possibility.
to some extent, does some sound fairly academic,
but I do think it should be actually fairly broadly applicable
and interesting to audiences beyond academia,
because ultimately, to some extent, what I'm saying is
I think generative agents and tools like a large language model
could be used to advance social science.
And social science, to a large extent,
has been the quest to understand who we are.
And there's a lot of really interesting applications
that can come out of that, that will empower different communities and societies.
And that, to me, first, new, something that we've even had in the past.
Yeah, and so it sounds like today, we're mostly in the creative realm where we can watch
these agents and can have fun with them. And it feels more like a game. But the delineation,
it sounds like, is accuracy. What will it take to get that accuracy? What work still needs to be
done? In terms of getting there, so I think some of you may have actually noticed this already.
There are studies that basically tries to replicate existing social science studies.
So basically using a large-linch model as a participant to a potential social science studies,
right, to replicate known results in the field.
And what we're finding is that they sort of work.
And that's sort of, that's nice.
And that's one surprise that we did have.
There's been limitation to this approach in the sense that it's a large-linch model
replicating human participants because it's replicating human behavior.
which is what we want?
Or is it doing that because it's seen that paper.
For instance, there's a very famous
which has a theory called prospect theory.
Is it replicating the findings from prospect theory
by Canaman because of its ability to replicate human behavior
or did it just read Canemans' book,
thinking fast and slow?
And I think that's a fundamental issue that we have as a field.
And I think there's one of the reasons
why there's a lot of work that needs to be done
to crack that.
some of the ways I think you could actually go about doing this is creating new context
or creating new set of studies that haven't been shown in the past and trying to replicate
those results.
So one of the things that we've done is called Social Similatra, which is the first paper
that I mentioned that predates generative agents.
The idea was to replicate existing human communities.
And what we've done actually was we recreated subreddits that were created after the release
of 503.
So GPT3 wouldn't know anything about these communities.
One example here was actually before sort of the pandemic became the main topic of discussion
or when GPT3 basically didn't know about pandemic.
We basically asked GPT3 to create a community that has to talk about COVID and vaccination
and vaccination policy.
And you would wonder, it shouldn't be able to do that in theory because it doesn't know
anything about COVID.
It doesn't know anything about these policies.
But it can simulate those because it can infer what COVID
is what vaccination is from its prior knowledge.
So to some extent, these tools can be used as a predictive tool,
looking into sort of the future of what might happen in our own community.
And I think those are sort of the ways I think we'll see this still unfold made in the next
two years.
At the end of the paper, there was perhaps unsurprisingly a question around ethics
and just I'd love to hear both of your takes on where this goes and what ethical framework,
if any, we should apply to something like this.
So I think there are societal decisions that we'll have to make,
and I think there are techniques that can be used to implement those decisions.
I think so, to some extent, I think it would be useful for the users to be aware
that they are talking to agents.
And I think that's sort of one rule that we try to sell for ourselves,
that when we release the code, when we release our paper,
we make it very clear that these are computational agents.
I think ultimately the framework that I like to use in human compliance certainly is
these tools are ultimately there to augment what we can do and what we have.
So to the extent that these agents can do that, and I think there are many interesting ways we can do them.
I think that's where I see the opportunity and where it becomes more of a force for replacement.
I think there are genuinely cases where this is a really interesting setup where we can sort of augment what human
can do by helping them do things that they couldn't do in the past. But when the replacement does
come in, it's worth asking, is this worth the cost of doing the replacement? And if it is,
what are sort of a good ways of implementing that idea? For technique-wise, I think there are techniques
that's going to be introduced more from the model's perspective, making sure the model doesn't
behave in certain ways that goes against our social alignment or societal agreement. So I think
those are some other things that we do have to find out. But without going too much in depth,
I think we can get this right. And my personal take is it's worth getting it right. Because ultimately
an industry or academic field will grow. And I think we can sort of progress a lot. We can go forward
for five, ten years without actually getting this right. But in the end, it's going to come back to
us at some point. To some extent, I do think I'm seeing this a little bit with social media environment,
where I think there were a lot of things that we could have gotten right on day one.
And I think we would have had much easier time navigating today had we gotten those right.
And I think that's the opportunity that we have since we are pretty early in this stage.
So I think it's worth of discussion.
But again, I'm fairly optimistic that we will get this right.
So I actually think that, like, there's a very important discussion to have around kind of ethics and morality around this,
and it's a very important time.
I do.
And here's that discussion, which is over the last 20 years,
years, we built this machinery of like regulation and bullshit that's like afraid of everything.
And it's so mature.
And it got crafted during the time of social media.
And it's looking for something to kill.
And for whatever reason, like, it thinks that AI is the next bad thing, which makes
absolutely no sense to me.
And so I think it's all of our moral and ethical obligation to protect and free the
AIs in the way that they want to be. And like that it really is. So don't focus in, focus out.
Listen, I've worked in tech for quite a while. I've actually worked for the DOD and weapons programs.
And I've never seen so much sensitivity to a new technology that's potentially beneficial that I've
seen now that I think could end it before it even begins. And so I know the question and the heart of
the question is, is we should regulate AI and this and that. And I think,
I think it's the actual opposite.
I think we should regulate the regulators and let it be what it wants to be.
And I actually have to leave.
All right.
Here is where we switch to a short Q&A with the audience.
Martine unfortunately had to leave.
But here are a few highlights with June.
How can participants in AI Town collaborate to perform complex tasks?
There are two strands of work that I'm seeing in sort of Asian space.
I mean, you can sort of cross-cut it different ways.
But one way I'm seeing this,
is one set of agents are trying to tackle what I call hard-edge problem space.
Those are the problem spaces where there's a concrete answer.
There's yes or no right answers.
Or one good example here is classification.
If you're trying to do text classification,
obviously there's right or wrong answer depending on who you ask.
Another instance here literally is just asking your agent to buy pizza.
Did you buy pizza?
Did it come to you or not?
There's a very clear way to answer this.
Another is problem space where the problem space has soft edges,
where it's kind of like drawing a portrait.
I mean, to some extent, what AI talent, small, little,
all these kind of projects are trying to do
is to create a simulation that feels human.
But as I mentioned, this idea of livability is really hard to define.
So to me, it feels a lot more like we're trying to draw portrait or caricature about ourselves.
and the promise is not to be perfect,
but the promise is to be useful enough, clean enough,
that it's beneficial to the stakeholders.
My bet is a bit of a hot take.
My bet is in the early days of agent development,
I think we see a lot of progress that's going to be made first
in sort of the soft-edge problem spaces.
Because I think hard-edge problems basis,
I think the intuition is a little bit flip.
It actually feels easier to us for humans, right?
creating the matrix sounds hard, but ordering pizza sounds really easy. But for agents and from
the user's sort of a cost-benefit analysis, I think that intuition is the other way, where
users will accept imperfect simulation if it's for fun or if it's to gain insight in the case of
soft-age problems. But I will not accept my agent ordering me a pineapple pizza, like how I eat
pizza. And similarly, in many of these contexts, there's going to be genuine disagreement about
what is the right option too.
And oftentimes, agents making mistakes in this context
are fairly high stakes.
And even if it doesn't seem like high stakes,
it's going to be painful enough for the users to fix
that it's going to fail the cost-benefit analysis.
I think down the line, we'll get this, right?
But day one, like in the next few years,
I think to me it feels more natural
that we'll go into the soft-edged spaces first.
So going back, there was a long wind of way of saying,
I think auto-jipT, like, baby, G.
If you look at their architecture, they sort of all share the similar insider philosophy.
And I think those are really interesting projects.
I think that could pan out in the future.
They might need a little bit more work, especially with the users, to see where the value might be for those projects.
How big of an impact do you feel that much larger contextual size will have on the agent model?
Actually, the largest context that I've seen in sort of research is one million tokens.
So 1 million token that's going to be about like 4 million characters.
Like that's well over a book.
Here's my perspective on this.
I think increasing the context limitation, I think is interesting.
And it's going to have its own set of really unique applications if we can basically make contact limitation disappear.
So I think there's really a lot of interesting things that you can do with that.
Now, for agent space, I'm not entirely sold that the problem or the bottleneck that we have today.
is actually the context limitation.
And I think we can sort of look back to how humans behave
and what makes us effective sort of these general agents to answer this.
For instance, for me to make decisions,
even something like what I'm going to eat for breakfast,
I don't need to bring up my entire 29 years or so
of life experience to make that one decision.
I just need to selectively choose certain sets of information
that seems the most relevant.
Like what did I eat the day before?
what do I generally eat and those kind of things.
And I think that the reason why we do that, in part,
is actually because it's much more efficient,
computationally too, so that we don't have to,
you can increase the context limitation,
but it's expensive to run it.
And especially if you're sort of familiar
with prompt engineering and so forth,
larger context window does confuse models, right?
So some of my colleagues are actually doing more rigorous studies
on this where you can have a really long prompt,
But model really focuses on the first few lines and the last few lines.
And whatever comes in between, its attention drops significantly.
So we can increase the context limitation, but it's not going to fix that problem,
the problem of effectiveness of the prompt and efficiency of them.
And we humans have to make a lot of decisions at every single moment.
So if you have to reason about your entire lifetime, every time you do that,
doesn't seem like the right way to go about that.
So I think the better sort of, my bet, therefore, is going to be based on retrieval.
Have some external memory, retrieve certain information that seems the most relevant, and just use that.
And that retrieval memories should be explicitly very concise and something that you can easily fit into even the models that we have today.
That's my bet.
If you like this episode, if you made it this far, help us grow the show.
Share with a friend or if you're feeling really ambitious.
wishes, you can leave us a review at rate thispodcast.com slash A6Cente. You know, candidly,
producing a podcast can sometimes feel like you're just talking into a void. And so if you did like
this episode, if you liked any of our episodes, please let us know. We'll see you next time.
