The AI Daily Brief: Artificial Intelligence News and Analysis - How Close Are We To True AI Agents?

Starting point is 00:00:00 Today on the AI Breakdown, we're looking at the rise of Agent AI. What is that, what it means, and why it matters. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown Network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI breakdown. For this weekend episode, I wanted to check in and talk about agents a little bit more broadly. And there are three contexts for that.

Starting point is 00:00:34 The first is, of course, that we are now a couple weeks into the land of the land of GPTs. GPTs are custom versions of chat GPT that are specifically pre-prompted for unique or specific tasks. GPTs can include actions which tap into external APIs. And while GPTs are not, as open AI has been clear, agents, they are certainly a first step down the path towards an agent-powered future, in which the way that we interact with computers and the way that we get many tasks done is having highly specialized microapplications or agents that are designed exclusively to do one type of thing, be able to actually go execute those things on our behalf. So part one of why we're having this discussion right now is that GPTs are all over and we're starting to grok what they mean.

Starting point is 00:01:18 A second context is generally speaking the rise of agent-focused startups, but specifically a couple that I've seen posting recently. Jayhack, for example, introduced this week CodeGen. That's coden.com or at CodeGenOnX. And he calls it agent-driven software development for enterprise code bases. The company just raised $16 million, and the demo that they put out on Twitter is really interesting. Jay writes, our mission at CodeGen is L5 Software Engineering Automation. That is, the system you can ask to build Netflix and it writes the code, provisions AWS infrastructure, monitors, logs, and makes fixes, performing any task using all the tools of modern software. Copilot and chat your code have already changed the way we program. These keep the human in the

Starting point is 00:02:00 driver's seat and accelerate them with AI. But what's possible when we remove the human as a constraint. CodeGen brings this vision to life by putting AI agents in the driver's seat, delegating complex tasks to CodeGen, and watches it eliminates tech debt, rapidly builds out boilerplate, refines and refactors code, all without leaving your linear GitHub and Slack. Now, specifically, they announced a new feature called Ticket to Pull Request for Enterprise Codebases, which is they write, add the CodeGen label to a ticket, CodeGen spins up an army of agents to solve the issue, you receive a PR slash explanation commented on the ticket, chat, give feedback, etc. J continues, CodeGen is built to collaborate with you like a human, incorporating feedback and navigating ambiguity

Starting point is 00:02:37 through interaction. Have a question about its changes, DMCodgen and Slack. Have feedback on a PR, leave a comment and cogen will fix it. What happens when you rig up a multi-agent system with access to your code, your docs, your Slack, your tickets, Git, your CI, and more? The tedium of building software recedes, and you're left to focus on shipping product, the way it should be. Today, we're merging AI-generated PRs in prod for multimillion-line codebases. We're the first to do this that we know of at any scale, and we're sprinting towards bringing this to the broader market. Now, the thing that I wanted to hone in on on why I thought this was a relevant example of the shift that we're experiencing is that Jay perfectly articulates this really key dividing line that we are perhaps more on one side of right

Starting point is 00:03:15 now, but increasingly moving to a world on the other side. That comes in this line. Co-pilot and chat your code have already changed the way we program. These keep the human in the driver's seat and accelerate them with AI. But what's possible when we remove the human as a constraint? Co-Gen brings this vision to life by putting AI agents in the driver's seat. So this is the fundamental shift, from AI is your co-pilot to you are AI's co-pilot. This, by the way, is why I think I saw some amount of critique around Microsoft's decision to name all of their AI tools copilot. As Brian Romley writes, this is a regrettable misunderstanding of how AI will play out. Okay, so we've got this idea that the shift that we're living through, what that agent shift means, is the shift from

Starting point is 00:03:56 AI being our co-pilots to us being AI's co-pilots, a shift in other words in who's in the driver's seat. Now, the other startup context that I wanted to at least give a shout out to was Multion, who are building a general-purpose web AI agent. They've been working on not just giving Multion the ability to do the same things that a person would do, but to do it so much faster that it simply doesn't make sense to have the human do it anymore. CEO Div Garg tweeted earlier this week, excited to announce that our web AI agent Multion is now 10x faster than human finger speeds. Here's our very first slow-mo video shot in 0.2x. You can choose for yourself between human finger typing or our AI agent, which is better. So again, the point here is that this agent world seems to be coming and coming

Starting point is 00:04:36 fast, but again, what will it really all mean? Well, the third context for why I wanted to have this conversation today, is that Darmesh Shaw, who's the founder of HubSpot, has started to write a ton of content about agents at agent.aI, which I assume is a broader agent project that he's working on, not just a blog for him. But hey, if it's just a blog, I'll take it. Darmes showed a piece that I'm going to read some excerpts of earlier this week called What is Agent A.I. And why all the excitement. An unofficial definition and overview of the new hotness. Darmesh writes, if you hang out in AI circles, you've probably run into the term Agent A.I. Even if you don't hang out in AI circles, you've probably still run into Agent AI. And you may have wondered what exactly is Agent AI

Starting point is 00:05:14 and why are the people that are excited about it so excited about it. A term is born. Anytime there's a new term that starts getting used in tech, one of the challenges is that nobody really knows what it means. People just start using it and repurposing it to mean what they need it to mean. For Agent AI, there wasn't a committee somewhere that declared, Behold, we have a new thing, we call it Agent AI, Marvel at its magnificence. Nope, that's not how this works at all. Basically, somebody starts using the term because it's a useful label for whatever it was they were thinking of, and then other people start using the term, and over time, we as a community start to develop some sense for what it is. So here's my totally unofficial first pass at a definition. Agent AI is software

Starting point is 00:05:48 that uses artificial intelligence to pursue a specific goal. It accomplishes this by decobeyed composing the goal into actionable tasks, monitoring its progress, and engaging with digital resources and other agents as necessary. Now, a little editor's note here from NLW, I think there's a lot that's really interesting in here. One, the idea of a specific goal, that an agent is trying to do something discrete and specific. Second, the idea that the agent breaks that goal down into smaller actionable tasks, and that it can use other resources to get those tasks done. Now, as Darmesh points out, he went to a lot of pains to try to decide whether to use the word autonomous in here. He writes, One could make the case that part of what makes agents special and exciting is that they can run

Starting point is 00:06:27 autonomously with no human intervention. But I decided to lead that out because although it's certainly exciting, I don't think it's a requirement for an agent to run completely autonomously. Based on the goal and one's tolerance for risk, it may be completely fine to have some human intervention and nudging in there. However, then he asks, what makes an agent an agent and not just sparkling AI software from a certain region of Silicon Valley? The short answer, he writes, goals.

Starting point is 00:06:51 Section. the key difference with Agent A.I. Darmesh writes, Today, when we as humans interact with AI, it's usually through a conversational chat interface as popularized by OpenAI's ChatGBT. You give ChatGPT a task by typing in a prompt, and it goes and does that task.

Starting point is 00:07:05 Examples? Write me a 500-word blog post about the impact of AI on CRM. List the top 10 cities in Italy, including what they're known for. Give that to me as JSON. The fundamental thing that makes agents different is that instead of specifying a discrete task, you instead specify a goal,

Starting point is 00:07:19 what you're looking to accomplish. The agent then determines what tasks need to be completed in order to accomplish that goal. And it's smart enough to know how to break down those tasks in the sub-tasks and track its progress along the way. Here's an example of a high-order goal. Launch a new online newsletter about Agent AI. You can imagine this breaking down into something that looks like this. Branding, come up with a name for the newsletter, create logo for the newsletter, figure out

Starting point is 00:07:41 website domain. Tech stack. Determine where to host the newsletter, research available products, gather ratings and reviews, summarize pricing, choose a platform to host. Writing, write initial posts, introductory posts, flagship post. launch, make subscriptions available, announce the newsletter, and available social channels. You get the idea. Each of the tasks above could likely be further decomposed into sub-tasks until you get to a level of granularity such that the task itself can be executed by the software.

Starting point is 00:08:03 Or instructions and guidelines can be written that can then be done by a human. The big point here is that we are specifying a goal or objective and letting the agent figure out how to get that goal accomplished. Let's wrap up with why so many are excited about the potential and perils of agent AIs. One can imagine building agents that have a specific set of goals based on a given role. Examples could be designer, writer, researcher, SEO, analyst, project manager, etc. This maps to actual roles that might exist in an organization. Now imagine, having a team that is a mix of people and agents that can accomplish a goal more efficiently and more effectively than humans alone. That is the promise of agent AI. Okay, so what we get from this first post is one, the beginning

Starting point is 00:08:39 of a definition that's focused on this idea of goals rather than tasks. Your prompting of a general purpose chat GPT starts with tasks that you wanted to accomplish, whereas the prompting of an agent focuses on a goal that is going to inherently have lots of subtasks. Now, one other really interesting post from Darmesh is about the fact that just when you thought that chat-based user experiences were the future that we would just interact with computers through natural language, Darmesh points out that Agent U.X is slightly different. He writes, I've been a massive fan of natural language interfaces in the form of chat UX for a long, long time.

Starting point is 00:09:08 Even before I started HubSpot in 2006, I was noodling on natural language interfaces with business software, but could never quite get it to work well enough. Turns out, I was about 17 years too early. Now, thanks to OpenAI and ChatGPT, we've finally gotten to experience the magic of using natural language to ask software for what we want, and that's been awesome. But I don't think that's the end of it when it comes to how we interact with AI and software in general.

Starting point is 00:09:29 I think there will be new developments on the U.S. front, so not everything you do in AI is via a chat-based user experience. What might drive some of that evolution is the emergence of Agent AIs, software that can pursue goals on your behalf. Section. The Limitations of ChatUX. Right now, the most commonly used interface for action, accessing the power of AI is chat GPT. It's simple. You enter what's called a prompt in natural language asking chat CBT for what you want, and it goes and tries to do that for you. Yes, we have multimodal

Starting point is 00:09:54 support now too, but the basics are still the same. With this model, AI doesn't really start doing anything for you until you ask it to by entering a prompt. That's fine for a lot of things, but not everything. Section, AI magic without the micromanaging. There will be times when we want AI to be working on our behalf in the background. And if something relevant, important, or interesting comes up, AI can then reach out to us and initiate a conversation. Example, imagine an agent that knew you were a strategic consultant in the e-commerce industry and interested in AI. The agent is monitoring the internet and sees that the videos from up in AI's Dev Day event were recently posted to YouTube. One of those videos was a panel where

Starting point is 00:10:27 one of the participants was a director of product at Shopify, a platform many of your clients use. The agent grabs the full transcript of the video, find segments that were interesting, and then pigs you not just with the snippet, but an analysis and explanation as to why you should spend five minutes reviewing that snippet and what actions you should take. The point here is that you are not expected to monitor social media every day, find interesting videos, go to chat GPT, and then have it analyze the transcript and then see if anything interesting is happening. You can just go about your day doing the awesome things you do, and the agent works on your behalf in the background. But wait, there's more. Section. Fire and forget. The power of async U.S. Right now, when you do

Starting point is 00:11:01 something in chat GPT or whatever your AI tool of choices, you're generally sitting down, having a back and forth with AI. It's mostly happening in real time and you're waiting for the result of each step. Do this, wait, tweak that. Okay, now do this, now this. That's fine, most of the time. There's something gratifying about getting the results right now-ish, but that doesn't always make sense. In real life, there are oftentimes when you want to just assign something and have someone spend some time on it. Pull together a bunch of bits, iterate, then synthesize into something high quality that you can then review. You want to fire off the request and then forget about it until they have something really good for you to review. The same can apply to the

Starting point is 00:11:32 world of AI. Imagine being able to just give an agent a high-order goal, and then it goes off and does the work which may take a while, and then it pings you when it's got something for you. Example. The big annual conference for my industry happened recently and the content is now all on YouTube. Go analyze all the content and create a 15-minute summary video that includes any mentions of my company, my key partners, or my key clients. Based on social media, figure out what hashtags were used and which quotes got the most uptake. Give me any big highlights or announcements. Add text captions to the video of who is speaking and what my relationship is to them, if any. All right, so I'll pause here. He's got even more posts. But what I like about

Starting point is 00:12:04 this and why I wanted to share some of these new things coming out from, again, agent.aI is that it shows that this thing that people have been so excited about all year, really since Baby AGI and Auto GPD started showing up on the scene in April, and everyone started chattering about it on YouTube and on Twitter. This idea of agents that can actually accomplish goals has been so ever-present, just lurking below the surface and something that so much developer energy was going towards. What's interesting about what we're seeing now is one, we're starting to see these actual tools brought to bear in specific contexts, such as co-gen, and two, we're starting to see a specific discourse and lexicon form around it.

Starting point is 00:12:40 In other words, the idea of Agent AI as a subcategory of AI that has its own specific language and terminology and understanding and interfaces, and of course, it is through having that specific contextual knowledge that more people will be able to take advantage of these changes. Overall, it's a great reminder that we are barely scratching the surface on what we think of now as AI, and 2024 could get a little wild. For now, we will wrap the episode there. I hope you are having a great weekend wherever you are. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - How Close Are We To True AI Agents?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.