a16z Podcast - From Sims to Sapiens: Crafting Reality with Code
Episode Date: November 6, 2023Is it possible to construct a virtual society that authentically replicates human behavior? AI Town, a virtual town experiment where AI residents live, interact, and engage, provides valuable insights... into the future of AI's believability and its interaction with humanity.In this panel discussion, Joon Park, the author of 'Generative Agents: Interactive Simulacra of Human Behavior,' and Martin Casado from a16z, discuss the influence and potential of Generative Agents, exploring their practical applications in the real world.Topics Covered00:00 - Simulating human behaviors04:49 - What are generative agents?07:47 - Simulations, new technology, and LLMs11:45 - The architecture behind simulating human behavior16:37 - Generative agents interactions: observing, planning, and reflecting20:22 - What is the value in advancing generative agents?24:01 - Use cases for simulation behavior technology29:31 - What are the ethical frameworks?33:12 - Q&A from the audience Resources: Find AI Town: https://www.convex.dev/ai-townRead the paper ‘Generative Agents: Interactive Simulacra of Human Behavior’: https://arxiv.org/pdf/2304.03442.pdfFind Joon on Twitter: https://twitter.com/joon_s_pkFind Martin on Twitter: https://twitter.com/martin_casadoFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
They actually wanted to build general computational agents, sort of the way generative agents are supposed to be.
But they didn't have the techniques to do it.
They basically didn't have the large-range model.
And like the work that you've done is one of those 100%.
There's like a spark of genius.
What does it mean to accurately reflect your behavior?
But the set of companies that come out of it are always part of this enthusiast era, right?
Like you couldn't have predicted Yahoo.
You couldn't have predicted Amazon.
Like, you knew something was going to happen.
Can we run simulation so we can learn more about ourselves?
A few weeks ago, the A16Z infrastructure team ran an event in the San Francisco office.
The topic, generative agents.
These are autonomous characters designed to simulate human behavior,
derived from a recent but game-changing paper called generative agents,
interactive simulacra of human behavior.
Developers from all around the city came to see.
hear the lead author, June Park, speak alongside A16Z general partner, Martine Casado.
And in this panel, they discuss how this paper and the advancements in large language models
have opened a new window, expanding the dynamism of simulation, which instead of binary logic,
we're using probabilistic thinking, and the ability to incorporate new information.
So what does that really mean? Well, instead of your character in Sims following very specific
wrote rules. With generative agents, a father may go outside because he notices his son,
another may take their breakfast off the stove because they notice it's burning, and another
may even opt into a Valentine's Day party invite and then elect not to show up. All very
human behaviors. Now, the architecture described in the paper is of course intentionally designed
by June and team, and it's a combination of a seed identity for every agent, and then functions
that cause each one to do three discrete things, to observe, to plan, and to reflect.
And these architecture decisions ultimately generate unexpectedly spirited conversations just like this.
Hey, Lucky, it's so great to see you. How have you been? I've been dying to hear about your space adventure.
Hey, Kira, I've been fantastic. My space adventure was out of this world. I can't wait to share all
the details with you. Or even this. I've been trying to find my...
My way.
It's been a chaotic journey to say the least.
Embrace the chaos, dear Kurt.
For within its turbulence lies hidden truth.
Seek the depths of the unknown and unravel the mysteries that burden your soul.
And here's the thing.
They don't just interact with each other.
Again, they wake up, they cook, some paint while others write, they hold opinions of one another,
and most importantly, they remember and they have higher level reflections based on the past.
It's pretty amazing, don't you think?
So as these generative agents become a lot closer to nuanced human behavior,
what can we learn about being human from these surprisingly realistic simulations?
And what is the calculus of that believability?
Are there real-world applications on the horizon?
And what is truly net new here?
Listen in as we discuss all that and more,
including the origin of the very paper that June wrote.
I hope you enjoy.
As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16c.com slash disclosures.
So how many people in this room have actually read the generative agents paper that
June wrote? It's a lot of people, pretty much everyone. So June, even though so many people
have read it, why don't you just give a quick overview of what it is, but also maybe the backstory
that people haven't maybe heard of. So generative agents is these general computational agents
that can similarly believable human behavior. Fundamentally leverages something like a large
language model, under the assumption that a language model has encoded or has seen so much
about human behavior from its training data, from the Wikipedia, social web, and so forth.
So if you are able to poke it at the right angle, you can actually extract a lot of those
human behaviors in a very context-specific manner. The opportunity here is that in the past,
we had to manually author a lot of these behaviors, but now we can simply generate them
your language model. So generative agents leverages that to create these competition,
of systems. Ultimately, one sort of technical improvement that we're trying to make in addition
to a large large model is basically giving it some form of memory and retrieval system. So you may
have all used, obviously, chat GPT and so forth. It is heavily context limited. And even if
that limitation were to go away in the future, processing a lot of really long-term context window
is really inefficient and also ineffective when you're trying to prompt these models for a really
narrowly defined behavioral aspects. So the main philosophy here is we're going to give long-term
memory for these agents that's external to the language model and then retrieve the contextually
relevant information from that long-term memory, whether it's planning, action sequences,
or reflections to create these computational agents. Philosophically, to some extent, I think this
is akin to creating the operating system around Lerd-Lynch model in the way of sort of
we're prompting large-length model.
To me, it feels a lot like how we used to use computers back in the day
when we had to wire up the back-end
every time we run a new program.
And what has really made complex behavior
with these computational tools possible
was introduction of these larger architecture
that surrounds the core fundamental techniques.
So that's what generative agent is about.
And you mentioned sort of the background
why we got into all this.
So I started my PhD at sort of midway through 2000.
That was just around when JupT's 3 was about to come out.
And that year, we, a bunch of basically authors at Stanford were working on this paper
called Foundation Model, the opportunities and risks of foundation models.
What we were seeing was these new form of machine learning models that seemed fundamentally
different than the things that we had experienced in the past, in that we didn't have to
fine-tune or specifically trained models for very narrow purposes, but we can train general
model, almost like a stem cell in bio, and leverage that to create a lot of downstream behaviors.
After writing that paper, sort of my team, especially myself and my advisors, what we really wanted
to answer is there seems to be a new opportunity, but exactly what is it? I think in the early
days of GPT3, a lot of the tasks that we were doing were things like classification and generation,
which was really cool to see that these models can conduct these tasks, but also something that we
already knew how to do for many decades. And our general philosophy there was
be able to do something that's fundamentally different. So that's how we got into this.
Our answer to that basically was, I think we might be able to create human ligands that can
populate this virtual world. Martin, maybe you can just elaborate. You said it's perhaps
one of the most exciting times in recent history. And maybe you can just speak to exactly
what you mean there and how it relates to simulation and some of this new technology that we're seeing
with LOMs. Sure. So first, very quick credit where crevets do. As far as AI Town, clearly June is
like the grandfather of AI Town. And like we wouldn't be here without your work. So really appreciate
you coming here. AI Town itself actually came from a personal project from Yoko. The true story is it was
actually a personal project. And I was like, hey, maybe more people would be interested in it.
And I kind of coerced her into bringing it forward to everybody else. And so now when it actually
comes to the code, the vast majority of the work on the code was actually done by Ian. It's kind of
funny. Like, you see this funny little tile set up here, and it kind of belies the fact that it's
actually really hard to build a scalable, shared state, distributed system that you need in a
multiplayer game. It's just a hard technical problem, right? And anybody that's kind of built
our systems knows that. And so it's funny, because people go in and they say, oh, here's this
cute, see a little tile engine with, like, these characters running around, but, like, actually
the back end is built to be something that can scale. And that requires people that have focused on this
And so Ian has done a tremendous job, and the convex team continues to work on that.
Okay, so why is this so exciting?
So, okay, so because I'm old, I actually saw, like, the advent of, like, the web.
And this feels very similar to that in the following ways, which is when you have a very disruptive technology like this,
whatever touches it becomes magic.
I was actually having a conversation just before this.
Does anybody here know what, like, the first video on the Internet was?
Yes, it was a coffee pot.
But I was like, this dude, I think it was in Cambridge, it was a grad student, and he was like, oh, listen, I want to know when my coffee is empty.
He put a camera, and because it was very new.
Everyone was like, oh, my God, there's a coffee pot on the internet.
And so everybody wanted to look at the coffee pot, right?
And do people remember the big red button?
One of the first apps was this big webpage, which was a red button on it, and you know what it did?
Nothing.
Like, you press it, and it did nothing, but people thought it was amazing because it was on the internet, and everybody would go press the button, and they'd leave great comments about this button.
And there's many examples of, like, it was this crazy disruptive technology, and the apps seemed really stupid, and there's a bunch of enthusiasts.
And you know what the enterprise thought about this?
Like, the actual business folks?
Like, I remember when Eric Schmidt fucking banned the browser.
Like, he was like, this is Eric Schmidt, the CTO of Sun is like, you can't have a browser because people aren't going to work, right?
So the same thing always happens.
It's like, the enthusiasts are like, this is really cool, and they use it for fringe stuff.
And then, like, the Enterprise doesn't understand it.
in like Italy, like they ban it or they don't use it.
But the set of companies that come out of it are always part of this enthusiast era, right?
Like you couldn't have predicted Yahoo.
You couldn't have predicted Amazon.
Like you knew something was going to happen.
And so what happens at this time is there's a bunch of stuff that like is silly.
Like the coffee pot was silly.
The red button was silly.
But you never know like that spark of life where it's going to come from.
And it's always kind of like this non-obvious use case.
and it kind of seems like a towing then it takes off, right?
And so you're always looking for those non-obvious use cases.
And it almost never looks like the old one.
Those of you are old enough.
Do you remember, like, desktop as a service?
Like, I'm going to go to the cloud.
I'm going to have my Windows desktop.
Like, who wants that?
Nobody wants that, right?
Instead of, clearly, we're going to rewrite the application as SaaS, right?
So we're in this period now where everybody's experimenting,
and then I'm personally, literally from just a personal interest standpoint,
but all of us are interested, like, what are the use cases that will
take advantage of this new medium that are native. And like the work that you've done is one of those
100%. There's like a spark of genius, which is like when you work with these things, you know this
is a new way to think about it. It's a new use case. It's going to create entirely new apps. And that's
what the future is built from. And so that's why I think so interesting broadly, because it's like
the early internet, but very specifically in this use case, because I think the work that you've done
really is a great example of something totally new. I can agree more. And I think one interesting
aspect that if you explore this project, you just start to question what it means to be human.
Like if we're trying to create these agents that are, quote, believable, what is believable
in terms of being a human? And as part of the project, you have this coded technically, right?
You made architecture decisions. You made decisions in terms of your retrieval function.
Quick interruption, just to give you some color on what some of these decisions were.
The retrieval function, for example, is based on scores across recent.
importance and relevance. So, for example, on a scale of 1 to 10, brushing your teeth might get
an important score of 1 versus a breakup might get a 10. Meanwhile, reflection is only triggered
after a certain number of important events, quantified by summing the important scores until
a certain threshold is met. In this case, I believe it was 150. This clever architecture
results in emergent behavior, like agents sharing invites with one another, or even having
that information circle all the way back to the original planner.
And I'm sharing these details to showcase how thoughtful you really need to be
if you're designing architecture that reasonably approximates humans.
Maybe you could just speak to what you've learned through those decisions technically
about what it means to be a believable human.
Right.
So this is an interesting one.
So we actually had made a generative agents,
and there was about a month period when we knew we had to evaluate this agent somehow,
and we didn't know how.
And basically the concept we stumbled upon,
on is this idea of believability. It basically is sort of like a Turing test, right?
That when you look at them, do they look believable? Do they behave in ways that we can sort of
see our self-behaving? And that ended up becoming our evaluation method. It is interesting
question, though, in terms of, like, what does it mean to be believably human? And we often
look to prior literature in research to get inspiration for how to define this. And what we found
was there's no prior literature on this.
We used the concept believability
to talk about this concept,
but we were never in a position
where we can meaningfully evaluate
something like believability
because we didn't have agents like this.
So to some extent, we were building up the definition ground up.
And I think what came out to be the case
is for us, these agents plan, react,
act in a believable matter.
Do they create believable reflection
the way we would evaluate to turn a test?
And I think what we've learned
over the past few months,
one of the more fun and interesting findings
is even that I don't think
is quite perfect definition
in that a lot of sort of audience
came back to us to basically say
well one of the error cases
that we noted was some of these agents would go
to a bar at noon or something like that
and we said that was not
believable like who would do that
and people would come back to us and say
I do that
and if you can sort of expand
from that story I think
there's a lot of cases where even my parents look at me and go, like, I cannot believe what
you've done.
Like, why would you do that?
And vice versa.
So I think even amongst the people who know each other well, having this sort of sense
of believability is really difficult.
And I think that's sort of fundamentally underlies what it means to be human.
Like, it's not exactly predictable.
And in social science, we call that complexity, that human behavior is sort of complex.
So to some extent, we can build intuition for how people might behave.
but to really predict it is very difficult task.
Now, I do think this actually does lead to sort of future work in this space, though,
this idea of believability.
So in this paper, we use this incomplete definition of what it means to be believable.
Not perfect, but at least on that evaluation, we've done well.
I think if you were to build on that idea a little bit further,
then you could actually start to ask,
beyond believability, can you create agents that are accurately human?
And I think given how difficult it was to actually evaluate what it means to be believable,
I think this accuracy actually has a lot of interesting questions around it.
What does it mean to accurately reflect human behavior?
It could be that if we can match distribution of human behavior,
let's say in this context, they have this kind of probability of behaving this way, right?
Let's say it's 10 p.m., what are the chances that I already asleep or be awake?
What are the chances that I'd be working that I might not?
be working. I think ultimately
getting to that degree of accuracy in the
simulation might be sort of the next step
to these kind of simulation-based work.
If we can do that, I think the application
space is that accurate
we're on a lot will be
interesting and I think it would also
be different and we can go likely
neon. I think there's a lot of
applications that we can build right now,
but I think the future work, that's where we're
headed in this direction.
So I want to talk about those future applications
but maybe you could just speak super quickly to in the paper you have observation, planning, and
reflection. And that mostly encapsulates the way that the agents are engaging with each other.
When they take an action, they go through those three steps. I assume that wasn't your first crack
at the solution, at coming up with this human, believable agent. And so how did you get there?
And did you learn anything about the importance about any of those three steps or all three of them
entirely. Really the first way we actually went about doing this was simply by prompting a language
model. So this line of work, a generative agent is actually the second in this line of work
that we published. The first work in this line was called social simulacra. And the idea there was to
populate a social computing system. Imagine you are a social designer. You need to know what
might happen when there's tens of thousands of people in your system. Can we simulate those
people in their behavior? So that project was called social simulacra. We did it. We did it.
simply by prompting a language model. That worked. But what we found was if we want to populate
the spaces over a longer period of time, so we can do, for instance, longitudinal study or gameplay
that's going to last forever, then for those kind of instances, simply prompting these models
wouldn't work. And this insight actually first came when we realized that we needed to have
multi-agent interaction, because agents actually would need to remember that I saw some audience
here before. I should remember them. I met Martin, Stap, Yoko, and so forth in the past few weeks
or a few months. When I talked to them, I need to remember those interactions. So that's when we
realize that we actually cannot simply prompt these models, but we actually need the higher
level architecture. So when we went about doing that, I think really the main inspiration
that we got actually was from prior work. So people like Alan Newell and Herbert Simon,
you might recognize all these names. Those are sort of quote-unquote the found.
of AI in the 60s and 70s, and they are the people who used to build what we call
cognitive architectures. And those architectures were very reminiscent of sort of the generative
agents architecture in that it has some perception module, some action module, and there is some
long-term and short-term memory. And really, the goal back then was ambitious, right? They actually
wanted to build general computational agents, sort of the way generative agents are supposed to be, but
they didn't have the techniques to do it. They basically didn't have the large language
model. And the way we saw it was now it's the time to sort of merge those two worlds
where we now have large language model that can do a lot of sort of micro-processing of
these cognitive modules. And we can actually now bring back this macro modules or architecture,
like colonized of architecture. So we took inspiration from that. That particular architecture
had planning in place and it had long-term, and it had long-term memory in place. So we were inspired by
death. One thing that I think was a little bit new, though, I think it's this idea of
reflection that we humans, for instance, if you eat an almond three times in a row,
or if you see somebody else eat an ominous three times in a row, you likely create an
opinion about the person. Maybe that person likes to eat omelet in the morning. And that's
very human thing to do, and there's a good reason why we do that. We do that because it's
efficient. It allows us to have higher level inferences about the world, and form
opinions about those around us and about ourselves. And that's something that in the past,
we couldn't really imagine formulating with a computational system. But with large language model,
because everything is in natural language, we had that opportunity. So we added that one last
component called reflection. And that's sort of how we landed on the architecture that you see
in the paper right now. Let's move on to how this can all be used. And we'll get to the specific
applications. But Martine, I feel like you'll have a great answer to this. Why even do this?
I feel like it's very obvious for a lot of people to understand why we would have human-to-human
interaction. We're doing that right now. There's increasing capacity to understand human-to-a-I
or human-to-computer interaction. Character AI is a company where, you know, there's still a lot of
judgment there. And I think there's even more judgment when it comes to AI to AI. Like, why should
to use our resources to have these computers hang out and talk and burn toast and go to the bar
at 2 p.m. So yeah, Martin, what do you think? What's the case for us advancing in this field?
No judgment for me, by the way. You can use these for whatever you want. So, I mean, I want to
go back to what I said before, which is like, anytime you have a new modality, it's just not
obvious what's the right way to think about it. And for me, the big aha in the last few months is
just programming using models. If you've spent a lot of time programming, I mean, I've been programming
for 30 plus years, right? I've never been a good programmer, but I've programmed. And when you
start programming with these models, you're like, oh, I've got an API, and I'm just going to use
the API, and then I'm going to treat it like, it's like the endpoint to an API, and you say some
stuff, and then, you know, you get some response back, and you kind of treat it like this function
that you call, right? It's just like any programmer would do. But then when you're working with
it more, you're like, oh, these kind of are like these life forms. And like, my first aha was because
I'm shit at JavaScript, I, like, missed some quote somewhere, and rather than sending
it the text string I wanted to send it, I sent it some code.
And instead of, like, borking, like, you would normally have and breaking, like, you know,
C++ plus you'd core dump or whatever, it commented on my code.
I was like, oh, my goodness, right?
And so, like, all of a sudden, like, whoa, this is totally different.
Like, I'm not dealing with this finite state machine, formal language, thing at the other end
of an API.
like there's this thing and like it'll comment and more that I program with these things
the more I'm like it's kind of like wrapping an abacus around a supercomputer right it's like
it's smarter than the code it could probably write the code better than I can write anyways like
why am I doing this weird bloodletting ritual of writing a shit JavaScript over this kind of
superhuman thing right I mean this is kind of what you end up with and so it's very clear we're
going to interact with these things in a different way and in fact I was talking with that
professor in Michigan recently, and we were talking about this object. He's like, you know what?
You know how I think about LLMs? He's like, I think about them like grad students.
He's like, they speak English. They're pretty smart. I don't use a formal language. They solve
like these really complex problems, et cetera. And like, having worked with a lot of grad students,
having been a grad student myself, like you don't treat these things with code, right? And so
the reason to do this is I actually think AI Town is kind of what this is going to end up
being is like you need to give them the resources that they need to be pretty autonomous and to grow
and we're going to treat them more like peers and they're going to talk to each other too and it's more
like grad students and so for me this is just an example of we got to change the way we think and listen
clearly like I'm up here and I'm telling these great stories because they're kind of funny like I don't
I don't believe this stuff in the limit but I think they're really interesting ways to change how you think
about it in all of this stuff right like I'm not trying to be categorical here so like
there is a new way that we're going to interact with these models it is much
more natural language, and they are much more powerful. And so I do think this is why we should all
be doing this type of stuff, because if you don't engage in these kind of things that look like
toys, this wave will pass you by. That I'm 100% convinced. Totally. And as both of you have spoken
to, this is fundamentally new technology. And so June, something you said to me when we first spoke
is when you have fundamentally new technology, you must do something fundamentally new with it.
And so maybe you can speak to that in terms of what you're seeing that can be done today, but
also where you look ahead and you think, oh, wow, that's a really excellent use case that we
couldn't do without this new technology. I think there are certainly things that we can do because
there's large models. And that fundamentally different thing for me was this idea of
simulating human behavior. And I think there's a lot that we can sort of gain from it in
terms of future application spaces. I think I mentioned briefly about this idea of, well,
we can go beyond believability to create agents that are even accurate. And I think this
This is sort of application space in general is something that I'm also learning a lot from,
from actually, in fact, this audience.
My advisor and my team are big fans of games, but we are not from the community.
And one thing that we are seeing is that there's a lot of really interesting potential.
Even if they look like toys, sort of a lot of really interesting technical offenses,
they look like toys at the beginning.
So I think there's a lot that we can gain from there.
I think going forward sort of the application spaces that I'm sort of interested in,
It's also in things like, can we run simulation so we can learn more about ourselves?
For instance, if you're, in fact, some of the places that I'm visiting now are more places like banks,
like the Bank of England and so forth, where these places, they need to test their policies
before they roll out new economic policies, or many of my colleagues in the department
to focus more on social science, they need to test out their theories.
Now, if you can run simulations with realistic human behavior and find out, at least to some
extent, the answer is to these really complex social phenomenes and challenges, then I think
that actually would be a new tool that the community in the past, especially those communities
in economics and social science, they didn't have. That will allow us to do interesting stuff.
And I'm genuinely intrigued by that possibility. To some extent, does some sound fairly
academic, but I do think it should be actually fairly broadly applicable and interesting
to audiences beyond academia, because ultimately, to some extent, what I'm saying is I think
generative agents and tools like a large language model could be used to advance social
science. And social science, to a large extent, has been the quest to understand who we are.
And there's a lot of really interesting applications that can come out of that that will empower
different communities and societies.
And that, to me, first, new,
something that we've even had in the past.
Yeah, and so it sounds like today
we're mostly in the creative realm
where we can watch these agents and we can have fun with them
and it feels more like a game.
But the delineation, it sounds like, is accuracy.
What will it take to get that accuracy?
What work still needs to be done?
In terms of getting there,
so I think some of you may have actually noticed this already.
There are studies that basically tries to
replicate existing social science studies. So basically using a large language model as a participant
to a potential social science studies, right, to replicate known results in the field. And what we're
finding is that they sort of work. And that's sort of, that's nice. And that's one surprise that we did
have. There's been limitation to this approach in the sense that it's a large lynch model replicating
human participants because it's replicating human behavior, which is what we want. Or is it doing that
because it's seen that paper.
For instance, there's a very famous social science theory
called Prospect Theory.
It's replicating the findings from prospect theory by Kahneman
because of its ability to replicate human behavior
or did it just read Kahneman's book, thinking fast and slow?
And I think that's a fundamental issue that we have as a field.
And I think there's one of the reasons why there's a lot of work
that needs to be done to crack that.
Some of the ways I think you could actually go about doing this
is creating new context or creating new set of studies
that haven't been shown in the past
and trying to replicate those results.
So one of the things that we've done
is called Social Civil Actra,
which is the first paper that I mentioned
that predates generative agents.
The idea was to replicate existing human communities.
And what we've done actually was
we recreated subredits that were created
after the release of G503.
So G503 wouldn't know anything about these communities.
One example here was actually before sort of the pandemic became the main topic of discussion
or when GPT3 basically didn't know about a pandemic, we basically asked GPT3 to create a community
that has to talk about COVID and vaccination and vaccination policy.
And you would wonder, it shouldn't be able to do that in theory because it doesn't know
anything about COVID.
It doesn't know anything about these policies.
But it can simulate those because it can infer what COVID is, what vaccination is, from its prior knowledge.
So to some extent, these tools can be used as a predictive tool
looking into sort of the future of what might happen in our own community.
And I think those are sort of the ways I think we'll see this still unfold
made in the next two years.
At the end of the paper, there was perhaps unsurprisingly a question around ethics
and just I'd love to hear both of your takes on where this goes
and what ethical framework, if any, we should apply to something like this.
So I think there are societal decisions that we'll have to make,
and I think there are techniques that can be used to implement those decisions.
I think certainly to some extent, I think it would be useful for the users to be aware
that they are talking to agents.
And I think that's sort of one rule that we try to sell for ourselves,
that when we release the code, when we release our paper,
we make it very clear that these are computational agents.
I think ultimately the framework that I like to use in human-competent fractions
certainly is, these tools are ultimately there to augment what we can do and what we have.
So to the extent that these agents can do that, and I think there are many interesting ways
we can do that.
I think that's where I see the opportunity and where it becomes more of a force for replacement.
I think there are genuinely cases where this is a really interesting setup where we can
sort of augment what humans can do by helping them do things that they couldn't do in the past.
but when the replacement does come in, it's worth asking, is this worth the cost of doing the
replacement? And if it is, what are sort of a good ways of implementing that idea?
For technique-wise, I think there are techniques that's going to be introduced more from the
model's perspective, making sure the model doesn't behave in certain ways that goes against
our social alignment or societal agreement. So I think those are some of the things that we
do have to find out. But without going too much,
that I think we can get this right. And my personal take is it's worth getting it right
because ultimately an industry or academic field will grow. And I think we can sort of
progress a lot. We can go forward for five, ten years without actually getting this right.
But in the end, it's going to come back to us at some point. To some extent, I do think I'm seeing
this a little bit with social media environment, where I think there were a lot of things that
we could have gotten right on day one. And I think we would have had much easier.
are the time in navigating today had we gotten those right. And I think that's the opportunity
that we have since that we are pretty early in this stage. So I think it's worth of discussion.
But again, I'm fairly optimistic that we will get this right.
So I actually think that, like, there's a very important discussion to have around kind
of ethics and morality around this, and it's a very important time. I do. And here's that
discussion, which is over the last 20 years, we've built this machinery of, like, regulation
and bullshit that's like afraid of everything and it's so mature and it got crafted during
the time of social media and it's looking for something to kill and for whatever reason
like it thinks that AI is the next bad thing which makes absolutely no sense to me
and so I think it's all of our moral and ethical obligation to protect and free the AIs in
the way that they want to be and like that it really is so don't focus in focus
out. Listen, I've worked in tech
for quite a while. I've actually worked
for the DOD and weapons
programs, and I've never
seen so
much sensitivity to a new technology
that's potentially beneficial that I've seen
now that I think
could end it before it even begins.
And so I know the question
and the heart of the question is
we should regulate AI and this and that,
and I think it's the actual opposite. I think we should regulate the
regulators and let it be what it
wants to be. And I actually
have to leave, so.
All right, here is where we switch to a short Q&A with the audience.
Martine unfortunately had to leave, but here are a few highlights with June.
How can participants in AI Town collaborate to perform complex tasks?
There are two strands of work that I'm seeing in sort of agent space.
I mean, you can sort of cross-cut it different ways, but one way I'm seeing this is one set
of agents who are trying to tackle what I call hard edge problem space.
Those are the problem spaces where there's a concrete answer.
There's yes or no right answers.
Or one good example here is classification.
If you're trying to do text classification, obviously there's right or wrong answer depending on who you ask.
Another instance here literally is just asking your agent to buy pizza, right?
Did you buy pizza?
Did it come to you or not?
Like, there's a very clear way to answer this.
Another is problem space where the problem space has soft edges, where it's kind of like
drawing a portrait. I mean, to some extent, what AI talent, small, but all these kind of
projects are trying to do is to create a simulation that fears human. But as I mentioned, this
idea of livability is really hard to define. So to me, it feels a lot more like we're trying
to draw portrait or caricature of ourselves. And the promise is not to be perfect, but the
promise is to be useful enough, clean enough, that it's beneficial to the stakeholders.
My bet is a bit of a hot take.
My bet is in the early days of agent development,
I think we'll see a lot of progress
that's going to be made first
in sort of the soft-edge problem spaces.
Because I think hard-edge problems basis,
I think the intuition is a little bit flip.
It actually feels easier to us for humans, right?
Creating the matrix sounds hard,
but ordering pizza sounds really easy.
But for agents and from the user's sort of a cost-benefit analysis,
I think that intuition is the other way,
where users will accept imperfect simulation
if it's for fun or if it's to gain insight
in the case of soft-hage problems.
But I would not accept my agent ordering me a pineapple pizza,
like Hawaii pizza.
And similarly, in many of these contexts,
there's going to be genuine disagreement
about what is the right option too.
And oftentimes agents making mistakes in this context
or fairly high stakes.
And even if it doesn't seem like high stakes,
it's going to be painful enough
for the users to fix, that it's going to fail the cost-benefit analysis.
I think down the line, we get this, right?
But day one, like in the next few years, I think, to me, it feels more natural that we'll
go into the soft-edged spaces first.
So going back, there was a long wind of way of saying, I think auto-GPT, like,
if you look at their architecture, they sort of all share the similar insider philosophy,
and I think those are really interesting projects.
I think that could pan out in the future.
they might need a little bit more work,
especially with the users,
to see where the value might be for those projects.
How big of an impact do you feel
that much larger contextual size will have on the agent model?
Actually, the largest context that I've seen
in sort of research is one million tokens.
So one million token that's going to be about like four million characters.
Like, that's well over a book.
Here's my perspective on this.
I think increasing the context limitation, I think, is interesting, and it's going to have
its own set of really unique applications if we can basically make contact limitation disappear,
right? So I think there's really a lot of interesting things that you can do with that.
Now, for agent space, I'm not entirely so that the problem or the bottleneck that we have today
is actually the context limitation. And I think we can sort of look back to how humans behave
and what makes us effective sort of these general agents to answer this.
For instance, for me to make decisions,
even something like what I'm going to eat for breakfast,
I don't need to bring up my entire 29 years or so
of life experience to make that one decision.
I just need to selectively choose certain sets of information
that seems the most relevant.
Like what do they eat the day before?
What do I generally eat and those kind of things?
And I think that the reason why we do that,
in part is actually because it's much more efficient,
computationally too, so that we don't have to, you can increase the context limitation,
but it's expensive to run it.
And especially if you're sort of familiar with prompt engineering and so forth,
larger context window does confuse models, right?
So some of my colleagues are actually doing more rigorous studies on this,
where you can have a really long prompt.
But model really focuses on the first few lines and the last few lines.
And whatever comes in between, its attention drops significantly, right?
So we can increase the context limitation, but it's not going to fix that problem,
the problem of effectiveness of the prompt and efficiency of them.
And we humans have to make a lot of decisions at every single moment.
So if you have to reason about your entire lifetime, every time you do that,
doesn't seem like the right way to go about that.
So I think the better sort of, I might bet, therefore, is going to be based on retrieval.
Have some external memory, retrieve certain information that seems the most relevant.
and just use that.
And that retrieval memories
should be explicitly very concise
and something that you can easily fit into
even the models that we have today.
That's my bet.
If you like this episode,
if you made it this far,
help us grow the show.
Share with a friend,
or if you're feeling really ambitious,
you can leave us a review
at rate this podcast.com
slash A66C.
You know, candidly,
producing a podcast can sometimes feel like
You're just talking into a void.
And so if you did like this episode, if you liked any of our episodes, please let us know.
I'll see you next time.