The AI Daily Brief: Artificial Intelligence News and Analysis - The Problems with AutoGPT and BabyAGI: How Useful Are They Really?
Episode Date: April 22, 2023For the last 3 weeks, AutoGPT has massively captured the attention of the AI community. But how useful is it really? Some are starting to ask whether it really lives up to the hype. Watch the origi...nal video: https://www.youtube.com/@TheAIBreakdown
Transcript
Discussion (0)
The AI breakdown you're about to hear was originally released as a YouTube video on Saturday, April 22nd.
In it, we take another look at Auto GPT and Baby AGI, which are now three weeks old or so,
and ask, as our initial impressions wear off, just how useful are they actually?
Today, we are back with another video on Auto GPD, but this time, it's three weeks on, and we're asking,
is it actually all that useful?
Welcome back to the AI breakdown.
If you spend any time in and around AI for the last three weeks, you have definitely heard about
Auto GPT and Baby AGI and these autonomous AI agents that are theoretically going to change everything.
And just for a little bit of background, in case you haven't spent much time here,
as opposed to something like ChatGPT, which is very mediated by humans and which has a
limited set of data that it's been trained on, AutoGPT can search the internet.
It has memory.
It can theoretically create other AI agents to accomplish tasks.
And it was explosively exciting to people when it came out.
You can see here just how fast it grew as one of the biggest projects on GitHub.
See, Gravitas, who is the one who introduced AutoGPT, post hit 100,000 stars on GitHub.
Am I supposed to make a speech?
I'm speechless.
The initial hype was huge, right?
We saw all these things like the task list that can do it, sell.
or the website that builds itself.
But people are starting to ask now, how useful is it really?
This is a tweet from today.
Auto GPTs are cool, but they're not useful in their current forms.
So let's talk about what people are finding when they try to use these tools specifically
or other implementations of them.
Just as a personal example, I tried on a YouTube video recently, God Mode,
which is inspired by Auto GPD, although a little bit different.
And effectively, I asked it to help me make a plan to grow a YouTube channel to 10,000 followers.
And what we found if you watched that video is that it did a really, really good job of helping think through the steps that it would take to go build that video channel to 10,000 subscribers.
But it didn't necessarily go beyond that.
It didn't start to actually really implement the tasks, except in the most nascent ways if they were like a writing,
task or something like that. And what's more, it started to flip around and perform loops over and
over again, where it would go back and restart itself instead of trying to proceed on to the next
step in execution. So all in all, it was very impressive in the sense that it was clearly helping
think through how to take an idea and start to implement it, but it wasn't this sort of
mind-blowing, autonomous agent that could come in and just change everything.
And it seems that it wasn't just my experience.
So Avram Pilch here wrote a piece recently called AutoGBT and Baby AGI are AI's new hotness, but they suck right now.
And I've excerpted a few parts of it that I think are kind of instructive.
So he gave it a bunch of different tasks, and he was actually using an implementation specifically of Auto GPD.
And the one that he found that he was most successful with was a simple website builder, right?
the more discreet the task, the more likely to actually achieve something it was.
And I knew going into my question that the idea of just building a YouTube website to grow to 10,000 followers was going to be maybe a little bit too abstract for it.
Anyways, what Abram found is that the more discreet it was, the better AutoGPT was able to handle it.
But it had some problems, right?
So Avram writes, after AutoGPT was done with the website building task, I did indeed have HTML files representing the three pages of the website.
site. But neither the design or the copy on these pages was very good. And the copy both
describing the company and for the privacy policy was just plain made up. Now, he points out that
there would be no way for it to know this information, right? He says the auto GPT bot had no way to
know what geek in chief design stands for because all I said was that it was a web design
company. There's no digital footprint for this company, so the bot just made up all these
details. To be fair to the bot, I didn't give it enough details to do a good job of writing the
website. If I had hired a human to create a corporate website for my company, that person would
no doubt have come back to me, asking for a lot more details. Instead, since AutoGPT can't ask
follow-up questions, apart from asking for permission to perform its next step, it just wrote the
most generic thing possible devoid of facts. I have never seen a chat bot that asks follow-up
questions to determine what... I have never seen a chat bot that asks follow-up questions to determine
what the human wants, even though that would be very helpful. If I was using Chat-GPT and I had
asked for it to write a homepage for Geekin-Chief Designs, and I got this kind of vague made-up copy,
I'd write a new prompt that provided a lot more information. However, with an autonomous agent,
there's no chance to intervene until all of the very long list of tasks is completed. I think this is
a hugely important point that as people are looking at these tools, they're kind of comparing
it to what it would be like to just use chat GPT, but in a kind of self-mediated way. And what Abram is
pointing out is that there is an inherent back and forth that there's only so much that can be
automated to get a good result and that in fact we might be not seeing the just the limitations
of the technology but also having a mismatch or misalignment of our expectations with what sorts of
tasks an autonomous agent should do do we really want to in other words or put differently
outsource entirely the creation of the website for our design business without having any input
into the details of how it's presented or the copy or anything like that. It feels like a task
where we do want a productivity accelerant of the form that many of the tools out there that
we're seeing now can be really helpful with, chat GPT to write copy, some of these other
sort of website builders to help maybe actually code the site itself. But there's a difference
between that and those incredible productivity gains and just outsourcing entirely.
Now, Avram also tried Baby AGI, and he basically pointed out something similar to what many
have reported with this idea of endless looping. So he says, even worse, baby AGI couldn't
seem to follow through on its list of tasks and kept changing task number one instead of moving
on to task number two. For example, I asked it to identify and write five Windows 11 how-toos.
it provided a list of how-toes it could write, and then proceeded to do the first one on the list, then, instead of doing the second task,
it would just change the entire list and start over at tutorial number one, which could be a topic that it had covered two steps ago.
It seemed to have no memory of what it had promised to do or had done just a few minutes before.
So that sort of looping, the restarting from the beginning, like I saw in my admittedly very basic YouTube growth task, was something that he was seeing as well.
Now, how does he conclude? Is he down on this technology? And the short answer is no.
He says the autonomous agent's biggest problem is that they don't ask you follow-up questions to get more details from you,
nor do they give you the opportunity to fine-tune them midstream.
That makes them app to give you bad output while going down a long, winding path to get there.
And that conclusion section he actually calls, autonomous agents might be too autonomous to be useful.
So really, I think, good feedback, good context for us who are exploring these tools.
And that's really where I'm starting to see people get.
It is literally the three-week anniversary of this,
and a lot of people are pointing out that it's the three-week anniversary of this.
A couple days ago, Jim Fan, who's at NVIDIA, says Auto-GPT just exceeded Pye Torch itself in GitHub stars.
I see Auto-GPT is a fun experiment, as the authors point out too.
But nothing more.
Prototypes are not meant to be production-ready.
Don't let media fool you.
Most of the cool demos are heavily cherry-picked.
Nate Chan retweeted something from Matt Schumer that I had referenced earlier.
Auto-GPTs are cool, but they're not useful in their current.
forms and he said, this is true, but it's like saying babies are cool, but they're not useful in
their current forms. It can be said about both babies and auto GPs today. A miracle was born,
see its potential, help it grow and push it forward. And soon it'll have the potential to change
the world. There's a funny little thing that we're going through right now where we're re-remembering
in some ways that even in the context of these mind-bending AI tools, it's not like they exist
all of a sudden and they instantly work perfectly.
There is still a development cycle that's needed around them.
And meanwhile, it's not slowing the people who are excited about building on these technologies
down at all.
Yohei, the creator of BabyAGI wrote a huge long update today.
Babyagi.org is live.
Baby AGI Classic is available, blah, blah, blah, blah.
There's all these different things showing a ton of development and developer excitement around
this.
I'm in the auto-GPT community as well.
and this thing is just going constantly.
You can see a little bit here just how many channels there are,
how active they are.
You have thousands and thousands and thousands of people building on this.
And then, of course, the folks who are trying to improve upon it.
So Hishi here writes about Chameleon,
a quote, better multimodal auto-GPT with real benchmarks,
solves many of the problems I've encountered with current agents
and moves us in the direction of plug-able,
modular metasystems for LLMs that can work on increasingly complex tasks.
And then he goes on to explain all.
all of this. So the point is, the era, the phase of autonomous AI agents that seems to have popped
open a few weeks ago is, in fact, open. However, it's just the very beginning of that era. And the tools
are not as sophisticated as perhaps they seemed initially, even if the creators of the tools
never promised that they were. Now, I will also say one last thing. One of the things that some
people thought as soon as they saw auto GPD and these autonomous AI agents is that they represented
really something different than chat GPT in terms of what the public's response might likely to be.
They present in many ways more risk, right? The idea that AI agents can just be searching the
web is something that many in the AI safety community are not necessarily sure is a really good
thing. And in fact, that was kind of a long held principle. So the fact that these tools aren't as
powerful as they seemed at first right away, maybe at least gives us a moment to catch our breath
and ask some of the important questions from an ethical or safety perspective as well.
So to sum up, I think that if you saw these initial use cases of auto GPTs a week ago or
two weeks ago and were just blown away and excited, I don't think you have to not be blown
away or not excited anymore just because we're recognizing that there are limits to what they can do
and how fast they can do it. These are incredibly nascent technologies. They're still being
built. There's an incredibly dynamic and fluid community of people who are building upon them,
and they're going to be doing the types of things that it seemed like they could right away
before you know it. So enjoy the ride, enjoy having a chance to help shape them. Go join the
AutoGPT Discord community and see what's happening. But for now, they are in fact still just
nascent technologies and have a lot of room to run yet. All right, guys, that's it for today. Until
next time, peace.
