The AI Daily Brief: Artificial Intelligence News and Analysis - The Peril and Potential of AutoGPT

Starting point is 00:00:01 Today on the AI Breakdown, we're discussing all about AutoGPT and AI agents. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter. Hello, friends, welcome to the weekend. Today we are doing another one of our long reads, and I will be excerpting sections of a piece called On AutoGPT, which was written by Zvi Moussowitz in his Don't Worry about the Vase newsletter in April. Now, Zvi for those who don't know, is an incredibly interesting thinker. He's a phenomenal Magic the Gathering player, which is how I got to know him, reading his theory about the game back in the day,

Starting point is 00:00:43 and now writes a lot primarily, in fact, about AI safety. This piece was written at the height of Auto-GPT mania, and although it's a couple months old, I think does a really great job of exploring some of the concerns or issues with that field in terms of how it might evolve, if not in terms of how it was at the time. Given how big and interesting this area was to so many people in the AI space, I think it's worth pulling this one back out of the archives. You'll notice I have some AI help for the reading of portions of this piece. The piece kicks off.

Starting point is 00:01:15 The primary talk of the AI world recently is about AI agents. The trigger for this was Auto-GPT, now number one on GitHub, which allows you to turn GPT4 into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be? Now from there, as V

Starting point is 00:01:48 does a quick hit on the basics of what AutoGPT is, how it works, as well as looking at what it's actually accomplished at the time, which was not very much. And by the way, we haven't made all that much progress since this was written in April. But as V points out, in his section, just think of the potential. The fact that auto GPTs hadn't done much at the time, quote, does not mean that all the people saying auto GPTs are the future are wrong. So for the sake of this post, he goes to the future in which auto Gpties have advanced significantly and thinks about them from that angle. And that's where we pick up the story again. Back to my AI. The good, the bad, and the agent overhang. I've gained confidence in my position that all of this happening now is a good

Starting point is 00:02:26 thing, both from the perspective of smaller risks like malware attacks and from the perspective of potential existential threats. Seems worth going over the logic. What we want to do is avoid what one might call an agent overhang. One might hope to execute our plan A of having our AIs not be agents. Alas, even if technically feasible, which is not at all clear, that only can work if we don't intentionally turn them into agents via wrapping code around them. We've checked with actual humans about the possibility of kindly not doing that. Didn't go great. So plan B then. If we are definitely going to turn our AIs into agents in the future, and there is no way to stop that, which is clearly the case, then better to first turn our current AIs into agents now.

Starting point is 00:03:05 That way, we won't suddenly be dealing with highly capable AI agents at some point the future. We will instead gradually face more capable AI agents, such that we'll hopefully get fire alarms and other chances to error correct. Our current LLMs like GPT4 are not in their base configurations, agents. They do not have goals. This is a severe limitation on what they are able to accomplish, and how well they can help us accomplish our own goals, whatever they might be, including using them, to build more capable AIs or more capable systems that incorporate AIs. Thus, one can imagine a future version of GPTN that is supremely superhuman at a wide variety of tasks, where we can ask it questions like, how do we make

Starting point is 00:03:41 humans much smarter, or how do we build an array of safe, efficient fusion power plants, or anything else we might want. And we don't have to worry about it attempting to navigate a path through causal space towards its end goal. It will simply give us its best answer to the information on the level on which the question was intended. Using this tool, we could perhaps indeed make ourselves smarter and more capable, then figure out how to build more general, more agentic AIs, figure out in what configuration we want to place the universe, and then get a maximally good future.

Starting point is 00:04:06 That does not mean that this is what would happen if we managed to not turn GPTN into an agent first, or that getting to this result is easy. One must notice that in order to predict the next token as well as possible, the LMM will benefit from being able to simulate every situation, every person, and every causal element behind the creation of every bit of text in its training distribution, no matter what we then train the LMM to output to us, what mask we put on it afterwards. The LLM will absolutely know in some sense what it means to be an agent, and how to steer physical reality by charting a path through causal space,

Starting point is 00:04:38 seeking goals that arise out of the training run, and thus almost certainly are only maximally fulfilled in ways that involve the LLM taking control of the future, and likely killing everyone, before we even get a chance to use RLHF on it. During the RLHF training run, later on, at what level does this happen? We don't know. I could believe a wide variety of answers here. What we do know is that if you intentionally turn the LLM into an agent, you are going to get a lot earlier down the line, something that looks a lot more like an agent. We also know that humans who get their hands on these LLMs will do their best to turn them into agents as quickly and effectively as possible.

Starting point is 00:05:11 We don't only know that. We also know that no matter how stupid you think an instruction would be to give to a self-directed AI agent, no matter how much no movie that starts this way could possibly ever end well, That's exactly one of the first things someone is going to try, except they're going to go intentionally make it even worse than that. Thus, for example, we already have Chaos GPT told explicitly to cause mayhem, so distrust and destroy the entire human race. This should at least partially answer your question of why would an AI want to destroy humanity. It is because humans are going to tell it to do that. That is in addition to all the people who will give their auto-GPT an instruction that means well,

Starting point is 00:05:42 but actually translates to killing all the humans, or at least take control over the future, since that is so obviously the easiest way to accomplish the thing, such as bring about world peace and end world hunger. Link goes to Sully hyping AutoGPT saying you give it a goal like end world hunger, or stop climate change or deliver my coffee every morning at 8 a.m. sharp no matter what, as reliably as possible. Or literally almost anything else. Seriously, if you find a genie, I highly recommend not wishing for anything. For now, AutoGPT is harmless. Let's ensure that the moment it's mostly harmless, we promptly edit the Hitchhiker's guide entry. Let's therefore run experiments of various sorts so we know exactly how much damage could be done and in what ways at every step.

Starting point is 00:06:21 One good idea is to use games to put such systems to the test. Back to real NLW here. In the next couple sections, Zvi does some big interesting thought experiments around putting an AI NPC in a game and seeing if it takes over the world, an AI sim style world experiment, which was something that had just been released by Stanford and Google researchers. And then he also talks about a simpler test proposal that came from the CEO of Replit, who said that the ultimate test for an LLM agent is to make money. From there, we pick up his piece again in the section called no true agent. What does it mean to be an agent?

Starting point is 00:06:54 Would an improved, actually viable version of AutoGPT be an agent in the true sense? Sarah Constantine says no, in an excellent post explaining at length why she is not a doomer. I'd love for more people who disagree with me about things to be writing posts like this one. It is the way. She agrees that a sufficiently powerful and agentic, goal-driven AGI, would be an existential risk, that this risk, conditional on creating such an AGI, would be very difficult and likely impossible to stop, and that building such a thing is physically possible.

Starting point is 00:07:20 What she doesn't buy is that we will get to such a thing anytime soon, or that our near-term models are capable of it, not in the 2020s, likely not in the 2030s. I note that this does not seem like that much confidence in that much non-doomed time, the goalposts they have moved. Auto-GPT style agents are in her model, not the droids we are looking for, or the droids we need to worry about.

Starting point is 00:07:41 They are, at their best, only a deeply pale shadow. She thinks that to be an X-risk, in addition to a more robust version of the world models LLMs kind of sometimes have now, an AI will need a causal model and a goal robustness across ontologies. She believes we are nowhere near creating either of these things. I wish I was more convinced by these arguments. Alas, to the extent that one needs the thing she is calling goal robustness, and it is distinct from what existing models have, I see wrapping procedures as being able to deliver

Starting point is 00:08:07 this on the level that humans have it. Not I can do this in a day with no coding experience easy, but definitely the whole internet tinkering at this for years is going to figure this out level of easy. I do not think that current auto-GPT has this, and I think this is a key and perhaps fatal weakness. But what we do here that is load-bearing seems unlikely to me to be all that mysterious or impossible to duplicate. As for causality, even if this is importantly currently missing, I don't know how an entity can have a functioning world model that doesn't include causality, and thus as world modeling improves, I expect to get causality in its low-term. mode-bearing sense here, and for it to happen without anyone having to do it on purpose in any way from here, to the extent we can confirm its thingness. Sarah has an intuition in her post that seems true and important, that humans kind of have two different modes. In our normal mode, we are mostly on a kind of autopilot, we are not really thinking.

Starting point is 00:08:55 More like we are going through motions, executing scripts, vibing. In our causal or actually thinking mode, we actually pay attention to the situation. Model it, attempt to find new solutions or insights, and so on. A human in mode one can do a lot of useful or profitable things, including most of the hours spent on most things by most humans. Everyone is in this mode quite a lot. One goal of expertise is kind of to get to the point where you can execute in this mode more. It is highly useful. That human can't generate true surprises.

Starting point is 00:09:21 In an important sense, it isn't a dangerous agent. It is a dead player only capable of imitation. So under this way of thinking an auto-GPT combined with an LLM can plausibly generate streamlined execution of established lines of digital action that people can do in normal mode, which again includes quite a lot of what we do all day, so it's economically potentially super valuable if done well enough. From there, Zvi gets into a set of predictions,

Starting point is 00:09:42 which I think are really some of the most interesting parts of the piece. What to expect next? AutoGPT is brand new. What predictions can we make about this class of thing? This is where one gets into trouble and looks like an idiot. Predictions are hard, especially about the future, even in relatively normal situations. This is not a normal situation, so there's super high uncertainty.

Starting point is 00:10:02 Still, I will go make some predictions because doing so is the virtuous and helpful thing to be doing. I apologize for mostly not putting out actual units of time here. My brain is having a very hard time knowing when it should think in weeks versus months versus years. If I had to guess, the actual economically important impacts of such moves start roughly when they have access to GPT, 4.5, or similar or higher with good bandwidth, or if that takes a long time then in something like a year. All of this is rough on the thinking out loud level.

Starting point is 00:10:30 I hope to change my mind a lot quickly on a lot of it in the sense that I hope I update when I get new info, rather than in the sense that I am predicting bad things. which mostly I don't think I am here. The goal here is to be concrete. Share intuitions. See what it sounds like out loud. What parts are nonsense when people think about them for five minutes or five hours, iterate and so on. One, in the short term, auto-GPT and its ilk will remain severely limited. The term overhyped will be appropriate. Improvements will not lead to either a string of major incidents or major accomplishments. Two, there will still be viable use cases even relatively soon. They will consist of relatively bounded tasks with clear sub-tasks that are things

Starting point is 00:11:06 that such systems are already known to be good at. What auto-GPT-style things will enable will not be creative solutions. It will be more like when you would have otherwise needed to manage going through a list of tasks or options manually, and now you can automate that process, which is still pretty valuable. Three, thus the best and most successful auto-GPT-style agents people use to do tasks will at least for a while be less universal, less auto, and more bounded in both goals and methods. They will largely choose from a known pool of tricks that are known to be things they can handle, if not exclusively, then primarily. There will be a lot of tinkering,

Starting point is 00:11:37 restricting manual error checking, explicit reflection steps, and so on. Many will know when to interrupt the auto and ask for human help. 4. There will be a phase where there is a big impact from Microsoft co-pilot 365 and Google Bard's version of it, if that version is any good, during which it overshadows Agent LLMs and other LLM wrapping attempts. Microsoft and Google will give US known to be safe tools, and most people will mostly wisely stick with that for a good while.

Starting point is 00:12:01 5. Agent-style logic will be incorporated into the back end of those products over time, but will be sandboxed and rendered safe the way the current product announcements work. It will use agent logic to produce a document or other outputs sometimes, or to propose an action, but there will always be a human in the loop for a good while. 6. Agents with the proper scaffolding, restrictions, guidance, and so on, will indeed prove in the longer run the proper way to get automation of medium complexity tasks, or especially multi-step branching tasks, and also be good to employ when dealing with things like customer relations or customer service. risk management will be a major focus.

Starting point is 00:12:37 7. There will be services that help you create agents that have a better chance of doing what you want and less of a chance of screwing things up, which will mostly be done via you talking to an agent and a host of other similar things. 8. A common interface will be that you ask your chatbot, your GPT4N or Good Bing or Bardway or Claude Z variant, to do something, and it will sometimes respond by spinning up an agent or asking if you want to do that.

Starting point is 00:12:59 9. We will increasingly get used to a growing class of actions that are now considered Atomic, where we can make a request directly and it will go well. 10. This will be part of an increasing bifurcation between those places where such systems can be trusted and the regulations and risks of liability allow them versus those areas where this isn't true. Finding ways to let things be messy will be a major source of disruption. 11. It will take a while to get seriously going, but once it does, there will be increasing economic pressure to deploy more and more and more agents and to give them more and more authority and assign them more and more things.

Starting point is 00:13:28 12. There will be pressure to increasingly take those agents off the leashes in various ways, have them prioritize accomplishing their goals, and care less about morality or damage that might be done to others. 13. A popular form of agent will be one that assigns you the user tasks to do as part of its process. Many people will increasingly let such agents plan their days. 14. Prompt injections will be a major problem for auto-GPT-style agents. Anyone who does not take this problem seriously and gives their system free internet access or lets it read their emails, will have a high probability of regretting it. Fifteen, some people will be deeply stupid letting us witness the results. There will be incidents of ascending orders of magnitudes of money being lit on fire.

Starting point is 00:14:12 We will not always hear about them, but we'll hear about some of them. When they involve crypto or NFTs, I will find them funny and I will laugh. 16. The incidents likely will include at least one system that was deployed at scale or distributed and used widely when it really, really shouldn't have been. 17. These systems will perform much better when we get the next generation of underlying LLMs, and with the time for refinement that comes along with that. GPT-5 versions of these systems will be much more robust and a lot scarier.

Starting point is 00:14:39 18. Whoever is doing the ARC evaluations will not have a trivial job when they are examining things worthy of the name GPT-5 or GPT-5 level. 19. The most likely outcome of such tests is that ARC notices things that everyone involved would have previously said make a model something you wouldn't release, then everyone involved says they are putting in safeguards of some kind, changes the goalposts, and releases anyway.

Starting point is 00:15:02 20. We will, by the end of 2023, have the first agent GPTs that have been meaningfully set loose on the internet, without any mechanism available for humans to control them or shut them down. Those paying attention will realize that we don't actually have a good way to shut the next generation of such a thing down if it goes rogue. People will be in denial about the implications and have absolutely zero dignity. about the whole thing. 21.

Starting point is 00:15:26 The first popular online Sims Westworlds, settings where many, if not most or all, non-human characters or agent LLMs, will start coming out quickly, with early systems available within a few months at most, and the first popular and actually fun one within the year, even if underlying LLM tech does not much advance. There will be lots of them in 2024,

Starting point is 00:15:42 both single player and multiplayer, running the whole range from classrooms to very adult themes. 22. Some of those worlds give us good data on agent LLMs and how much they go into power-seeky-es. mode. It will become clear that it is possible if conditions are made to allow it for an LLM agent to take over a virtual world. Many will dismiss this saying that the world an agent had to be designed for, or that the tests are otherwise unfair. They won't always be wrong, but centrally, they will be

Starting point is 00:16:08 wrong. 23. Versions of these worlds in some form will become the best known uses of VR, and VR will grow greatly in popularity as a result. There will be big pushes to go actual Westworld. Results will depend on tech things I don't know about including robotics and their progress, but even relatively bad versions will do. 24. All of this will look silly and wrong, and I'll change my mind on a lot of it, likely by the end of May. Life comes at you fast these days.

Starting point is 00:16:34 And that is assuming nothing else unexpected and crazier happens first, which is presumably going to be wrong as well. Back to non-A-I-NLW. That's where Zvi concludes this piece. And just to wrap up, the reason that I thought this was interesting to share is that when it comes to these questions of how AI is going to develop and what concerns we should have with it, I'm really attracted to analyses and the people who have those analyses who are looking at things in their full complexity of possibilities, who are not sure and who say that they're

Starting point is 00:17:09 not sure, who can see the exciting and the worrying and actually have a conversation on that basis about how to weigh those different possible outcomes. I think we need more of that. And when it comes to AI agents, there are really specific reasons to hone in there. They're one of the most hyped and exciting areas to so many developers, but they also have some really interesting implications. So hopefully after hearing this, you have a better sense of how at least some people are thinking about these issues. I will, of course, include a link to the full piece, which I encourage you to read, and of course to the newsletter in general, which I highly encourage you to subscribe to. For now, that is going to do it for today's AI breakdown. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The Peril and Potential of AutoGPT

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.