The AI Daily Brief: Artificial Intelligence News and Analysis - The Problems with AutoGPT and BabyAGI: How Useful Are They Really?

Episode Date: April 22, 2023

For the last 3 weeks, AutoGPT has massively captured the attention of the AI community. But how useful is it really? Some are starting to ask whether it really lives up to the hype.   Watch the origi...nal video: https://www.youtube.com/@TheAIBreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 The AI breakdown you're about to hear was originally released as a YouTube video on Saturday, April 22nd. In it, we take another look at Auto GPT and Baby AGI, which are now three weeks old or so, and ask, as our initial impressions wear off, just how useful are they actually? Today, we are back with another video on Auto GPD, but this time, it's three weeks on, and we're asking, is it actually all that useful? Welcome back to the AI breakdown. If you spend any time in and around AI for the last three weeks, you have definitely heard about Auto GPT and Baby AGI and these autonomous AI agents that are theoretically going to change everything.
Starting point is 00:00:46 And just for a little bit of background, in case you haven't spent much time here, as opposed to something like ChatGPT, which is very mediated by humans and which has a limited set of data that it's been trained on, AutoGPT can search the internet. It has memory. It can theoretically create other AI agents to accomplish tasks. And it was explosively exciting to people when it came out. You can see here just how fast it grew as one of the biggest projects on GitHub. See, Gravitas, who is the one who introduced AutoGPT, post hit 100,000 stars on GitHub.
Starting point is 00:01:23 Am I supposed to make a speech? I'm speechless. The initial hype was huge, right? We saw all these things like the task list that can do it, sell. or the website that builds itself. But people are starting to ask now, how useful is it really? This is a tweet from today. Auto GPTs are cool, but they're not useful in their current forms.
Starting point is 00:01:46 So let's talk about what people are finding when they try to use these tools specifically or other implementations of them. Just as a personal example, I tried on a YouTube video recently, God Mode, which is inspired by Auto GPD, although a little bit different. And effectively, I asked it to help me make a plan to grow a YouTube channel to 10,000 followers. And what we found if you watched that video is that it did a really, really good job of helping think through the steps that it would take to go build that video channel to 10,000 subscribers. But it didn't necessarily go beyond that. It didn't start to actually really implement the tasks, except in the most nascent ways if they were like a writing,
Starting point is 00:02:31 task or something like that. And what's more, it started to flip around and perform loops over and over again, where it would go back and restart itself instead of trying to proceed on to the next step in execution. So all in all, it was very impressive in the sense that it was clearly helping think through how to take an idea and start to implement it, but it wasn't this sort of mind-blowing, autonomous agent that could come in and just change everything. And it seems that it wasn't just my experience. So Avram Pilch here wrote a piece recently called AutoGBT and Baby AGI are AI's new hotness, but they suck right now. And I've excerpted a few parts of it that I think are kind of instructive.
Starting point is 00:03:19 So he gave it a bunch of different tasks, and he was actually using an implementation specifically of Auto GPD. And the one that he found that he was most successful with was a simple website builder, right? the more discreet the task, the more likely to actually achieve something it was. And I knew going into my question that the idea of just building a YouTube website to grow to 10,000 followers was going to be maybe a little bit too abstract for it. Anyways, what Abram found is that the more discreet it was, the better AutoGPT was able to handle it. But it had some problems, right? So Avram writes, after AutoGPT was done with the website building task, I did indeed have HTML files representing the three pages of the website. site. But neither the design or the copy on these pages was very good. And the copy both
Starting point is 00:04:05 describing the company and for the privacy policy was just plain made up. Now, he points out that there would be no way for it to know this information, right? He says the auto GPT bot had no way to know what geek in chief design stands for because all I said was that it was a web design company. There's no digital footprint for this company, so the bot just made up all these details. To be fair to the bot, I didn't give it enough details to do a good job of writing the website. If I had hired a human to create a corporate website for my company, that person would no doubt have come back to me, asking for a lot more details. Instead, since AutoGPT can't ask follow-up questions, apart from asking for permission to perform its next step, it just wrote the
Starting point is 00:04:44 most generic thing possible devoid of facts. I have never seen a chat bot that asks follow-up questions to determine what... I have never seen a chat bot that asks follow-up questions to determine what the human wants, even though that would be very helpful. If I was using Chat-GPT and I had asked for it to write a homepage for Geekin-Chief Designs, and I got this kind of vague made-up copy, I'd write a new prompt that provided a lot more information. However, with an autonomous agent, there's no chance to intervene until all of the very long list of tasks is completed. I think this is a hugely important point that as people are looking at these tools, they're kind of comparing it to what it would be like to just use chat GPT, but in a kind of self-mediated way. And what Abram is
Starting point is 00:05:28 pointing out is that there is an inherent back and forth that there's only so much that can be automated to get a good result and that in fact we might be not seeing the just the limitations of the technology but also having a mismatch or misalignment of our expectations with what sorts of tasks an autonomous agent should do do we really want to in other words or put differently outsource entirely the creation of the website for our design business without having any input into the details of how it's presented or the copy or anything like that. It feels like a task where we do want a productivity accelerant of the form that many of the tools out there that we're seeing now can be really helpful with, chat GPT to write copy, some of these other
Starting point is 00:06:16 sort of website builders to help maybe actually code the site itself. But there's a difference between that and those incredible productivity gains and just outsourcing entirely. Now, Avram also tried Baby AGI, and he basically pointed out something similar to what many have reported with this idea of endless looping. So he says, even worse, baby AGI couldn't seem to follow through on its list of tasks and kept changing task number one instead of moving on to task number two. For example, I asked it to identify and write five Windows 11 how-toos. it provided a list of how-toes it could write, and then proceeded to do the first one on the list, then, instead of doing the second task, it would just change the entire list and start over at tutorial number one, which could be a topic that it had covered two steps ago.
Starting point is 00:07:03 It seemed to have no memory of what it had promised to do or had done just a few minutes before. So that sort of looping, the restarting from the beginning, like I saw in my admittedly very basic YouTube growth task, was something that he was seeing as well. Now, how does he conclude? Is he down on this technology? And the short answer is no. He says the autonomous agent's biggest problem is that they don't ask you follow-up questions to get more details from you, nor do they give you the opportunity to fine-tune them midstream. That makes them app to give you bad output while going down a long, winding path to get there. And that conclusion section he actually calls, autonomous agents might be too autonomous to be useful. So really, I think, good feedback, good context for us who are exploring these tools.
Starting point is 00:07:47 And that's really where I'm starting to see people get. It is literally the three-week anniversary of this, and a lot of people are pointing out that it's the three-week anniversary of this. A couple days ago, Jim Fan, who's at NVIDIA, says Auto-GPT just exceeded Pye Torch itself in GitHub stars. I see Auto-GPT is a fun experiment, as the authors point out too. But nothing more. Prototypes are not meant to be production-ready. Don't let media fool you.
Starting point is 00:08:09 Most of the cool demos are heavily cherry-picked. Nate Chan retweeted something from Matt Schumer that I had referenced earlier. Auto-GPTs are cool, but they're not useful in their current. forms and he said, this is true, but it's like saying babies are cool, but they're not useful in their current forms. It can be said about both babies and auto GPs today. A miracle was born, see its potential, help it grow and push it forward. And soon it'll have the potential to change the world. There's a funny little thing that we're going through right now where we're re-remembering in some ways that even in the context of these mind-bending AI tools, it's not like they exist
Starting point is 00:08:49 all of a sudden and they instantly work perfectly. There is still a development cycle that's needed around them. And meanwhile, it's not slowing the people who are excited about building on these technologies down at all. Yohei, the creator of BabyAGI wrote a huge long update today. Babyagi.org is live. Baby AGI Classic is available, blah, blah, blah, blah. There's all these different things showing a ton of development and developer excitement around
Starting point is 00:09:15 this. I'm in the auto-GPT community as well. and this thing is just going constantly. You can see a little bit here just how many channels there are, how active they are. You have thousands and thousands and thousands of people building on this. And then, of course, the folks who are trying to improve upon it. So Hishi here writes about Chameleon,
Starting point is 00:09:34 a quote, better multimodal auto-GPT with real benchmarks, solves many of the problems I've encountered with current agents and moves us in the direction of plug-able, modular metasystems for LLMs that can work on increasingly complex tasks. And then he goes on to explain all. all of this. So the point is, the era, the phase of autonomous AI agents that seems to have popped open a few weeks ago is, in fact, open. However, it's just the very beginning of that era. And the tools are not as sophisticated as perhaps they seemed initially, even if the creators of the tools
Starting point is 00:10:09 never promised that they were. Now, I will also say one last thing. One of the things that some people thought as soon as they saw auto GPD and these autonomous AI agents is that they represented really something different than chat GPT in terms of what the public's response might likely to be. They present in many ways more risk, right? The idea that AI agents can just be searching the web is something that many in the AI safety community are not necessarily sure is a really good thing. And in fact, that was kind of a long held principle. So the fact that these tools aren't as powerful as they seemed at first right away, maybe at least gives us a moment to catch our breath and ask some of the important questions from an ethical or safety perspective as well.
Starting point is 00:10:54 So to sum up, I think that if you saw these initial use cases of auto GPTs a week ago or two weeks ago and were just blown away and excited, I don't think you have to not be blown away or not excited anymore just because we're recognizing that there are limits to what they can do and how fast they can do it. These are incredibly nascent technologies. They're still being built. There's an incredibly dynamic and fluid community of people who are building upon them, and they're going to be doing the types of things that it seemed like they could right away before you know it. So enjoy the ride, enjoy having a chance to help shape them. Go join the AutoGPT Discord community and see what's happening. But for now, they are in fact still just
Starting point is 00:11:36 nascent technologies and have a lot of room to run yet. All right, guys, that's it for today. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.