The AI Daily Brief: Artificial Intelligence News and Analysis - How Close Are We To True AI Agents?
Episode Date: November 26, 2023An exploration of one of the hottest topics in AI. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newslett...er: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Breakdown, we're looking at the rise of Agent AI.
What is that, what it means, and why it matters.
The AI breakdown is a daily podcast and video about the most important news and discussions
in AI.
Go to Breakdown Network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI breakdown.
For this weekend episode, I wanted to check in and talk about agents a little bit more broadly.
And there are three contexts for that.
The first is, of course, that we are now a couple weeks into the land of the land of
GPTs. GPTs are custom versions of chat GPT that are specifically pre-prompted for unique or specific
tasks. GPTs can include actions which tap into external APIs. And while GPTs are not, as open AI has been
clear, agents, they are certainly a first step down the path towards an agent-powered future,
in which the way that we interact with computers and the way that we get many tasks done is having
highly specialized microapplications or agents that are designed exclusively to do one type of thing,
be able to actually go execute those things on our behalf. So part one of why we're having this
discussion right now is that GPTs are all over and we're starting to grok what they mean.
A second context is generally speaking the rise of agent-focused startups, but specifically a couple
that I've seen posting recently. Jayhack, for example, introduced this week CodeGen. That's coden.com
or at CodeGenOnX. And he calls it agent-driven software development for
enterprise code bases. The company just raised $16 million, and the demo that they put out on Twitter
is really interesting. Jay writes, our mission at CodeGen is L5 Software Engineering Automation. That is,
the system you can ask to build Netflix and it writes the code, provisions AWS infrastructure,
monitors, logs, and makes fixes, performing any task using all the tools of modern software.
Copilot and chat your code have already changed the way we program. These keep the human in the
driver's seat and accelerate them with AI. But what's possible when we remove the human
as a constraint. CodeGen brings this vision to life by putting AI agents in the driver's seat,
delegating complex tasks to CodeGen, and watches it eliminates tech debt, rapidly builds out boilerplate,
refines and refactors code, all without leaving your linear GitHub and Slack. Now, specifically,
they announced a new feature called Ticket to Pull Request for Enterprise Codebases, which is they
write, add the CodeGen label to a ticket, CodeGen spins up an army of agents to solve the issue,
you receive a PR slash explanation commented on the ticket, chat, give feedback, etc. J continues,
CodeGen is built to collaborate with you like a human, incorporating feedback and navigating ambiguity
through interaction. Have a question about its changes, DMCodgen and Slack. Have feedback on a PR,
leave a comment and cogen will fix it. What happens when you rig up a multi-agent system with access
to your code, your docs, your Slack, your tickets, Git, your CI, and more? The tedium of building
software recedes, and you're left to focus on shipping product, the way it should be. Today, we're
merging AI-generated PRs in prod for multimillion-line codebases. We're the first to do this that we know of at any
scale, and we're sprinting towards bringing this to the broader market. Now, the thing that I wanted to
hone in on on why I thought this was a relevant example of the shift that we're experiencing is that
Jay perfectly articulates this really key dividing line that we are perhaps more on one side of right
now, but increasingly moving to a world on the other side. That comes in this line. Co-pilot and
chat your code have already changed the way we program. These keep the human in the driver's seat and
accelerate them with AI. But what's possible when we remove the human as a constraint? Co-Gen brings this vision
to life by putting AI agents in the driver's seat. So this is the fundamental shift, from
AI is your co-pilot to you are AI's co-pilot. This, by the way, is why I think I saw some
amount of critique around Microsoft's decision to name all of their AI tools copilot. As Brian Romley
writes, this is a regrettable misunderstanding of how AI will play out. Okay, so we've got this
idea that the shift that we're living through, what that agent shift means, is the shift from
AI being our co-pilots to us being AI's co-pilots, a shift in other words in who's in the driver's seat.
Now, the other startup context that I wanted to at least give a shout out to was Multion, who
are building a general-purpose web AI agent. They've been working on not just giving Multion the
ability to do the same things that a person would do, but to do it so much faster that it
simply doesn't make sense to have the human do it anymore. CEO Div Garg tweeted earlier this week,
excited to announce that our web AI agent Multion is now 10x faster than human finger speeds. Here's our very
first slow-mo video shot in 0.2x. You can choose for yourself between human finger typing or our AI
agent, which is better. So again, the point here is that this agent world seems to be coming and coming
fast, but again, what will it really all mean? Well, the third context for why I wanted to have this
conversation today, is that Darmesh Shaw, who's the founder of HubSpot, has started to write a ton
of content about agents at agent.aI, which I assume is a broader agent project that he's working on,
not just a blog for him. But hey, if it's just a blog, I'll take it. Darmes showed a piece that I'm going to
read some excerpts of earlier this week called What is Agent A.I. And why all the excitement.
An unofficial definition and overview of the new hotness. Darmesh writes, if you hang out in AI
circles, you've probably run into the term Agent A.I. Even if you don't hang out in AI circles,
you've probably still run into Agent AI. And you may have wondered what exactly is Agent AI
and why are the people that are excited about it so excited about it. A term is born. Anytime there's
a new term that starts getting used in tech, one of the challenges is that nobody really knows what it
means. People just start using it and repurposing it to mean what they need it to mean. For Agent
AI, there wasn't a committee somewhere that declared, Behold, we have a new thing, we call it
Agent AI, Marvel at its magnificence. Nope, that's not how this works at all. Basically, somebody
starts using the term because it's a useful label for whatever it was they were thinking of,
and then other people start using the term, and over time, we as a community start to develop some sense
for what it is. So here's my totally unofficial first pass at a definition. Agent AI is software
that uses artificial intelligence to pursue a specific goal. It accomplishes this by decobeyed
composing the goal into actionable tasks, monitoring its progress, and engaging with digital resources
and other agents as necessary. Now, a little editor's note here from NLW, I think there's a lot that's
really interesting in here. One, the idea of a specific goal, that an agent is trying to do something
discrete and specific. Second, the idea that the agent breaks that goal down into smaller actionable
tasks, and that it can use other resources to get those tasks done. Now, as Darmesh points out,
he went to a lot of pains to try to decide whether to use the word autonomous in here. He writes,
One could make the case that part of what makes agents special and exciting is that they can run
autonomously with no human intervention.
But I decided to lead that out because although it's certainly exciting, I don't think
it's a requirement for an agent to run completely autonomously.
Based on the goal and one's tolerance for risk, it may be completely fine to have some human
intervention and nudging in there.
However, then he asks, what makes an agent an agent and not just sparkling AI software
from a certain region of Silicon Valley?
The short answer, he writes, goals.
Section.
the key difference with Agent A.I.
Darmesh writes,
Today, when we as humans interact with AI,
it's usually through a conversational chat interface
as popularized by OpenAI's ChatGBT.
You give ChatGPT a task by typing in a prompt,
and it goes and does that task.
Examples? Write me a 500-word blog post
about the impact of AI on CRM.
List the top 10 cities in Italy,
including what they're known for.
Give that to me as JSON.
The fundamental thing that makes agents different
is that instead of specifying a discrete task,
you instead specify a goal,
what you're looking to accomplish.
The agent then determines what tasks need to be completed in order to accomplish that goal.
And it's smart enough to know how to break down those tasks in the sub-tasks and track its progress
along the way.
Here's an example of a high-order goal.
Launch a new online newsletter about Agent AI.
You can imagine this breaking down into something that looks like this.
Branding, come up with a name for the newsletter, create logo for the newsletter, figure out
website domain.
Tech stack.
Determine where to host the newsletter, research available products, gather ratings and reviews,
summarize pricing, choose a platform to host.
Writing, write initial posts, introductory posts, flagship post.
launch, make subscriptions available, announce the newsletter, and available social channels.
You get the idea. Each of the tasks above could likely be further decomposed into sub-tasks
until you get to a level of granularity such that the task itself can be executed by the software.
Or instructions and guidelines can be written that can then be done by a human.
The big point here is that we are specifying a goal or objective and letting the agent
figure out how to get that goal accomplished. Let's wrap up with why so many are excited about
the potential and perils of agent AIs. One can imagine building agents that have a specific set of goals
based on a given role. Examples could be designer, writer, researcher, SEO, analyst, project manager,
etc. This maps to actual roles that might exist in an organization. Now imagine, having a team that is a
mix of people and agents that can accomplish a goal more efficiently and more effectively than humans alone.
That is the promise of agent AI. Okay, so what we get from this first post is one, the beginning
of a definition that's focused on this idea of goals rather than tasks. Your prompting of a general
purpose chat GPT starts with tasks that you wanted to accomplish, whereas the prompting of an agent
focuses on a goal that is going to inherently have lots of subtasks.
Now, one other really interesting post from Darmesh is about the fact that just when you
thought that chat-based user experiences were the future that we would just interact with
computers through natural language, Darmesh points out that Agent U.X is slightly different.
He writes, I've been a massive fan of natural language interfaces in the form of chat
UX for a long, long time.
Even before I started HubSpot in 2006, I was noodling on natural language interfaces with
business software, but could never quite get it to work well enough.
Turns out, I was about 17 years too early.
Now, thanks to OpenAI and ChatGPT,
we've finally gotten to experience the magic of using natural language
to ask software for what we want, and that's been awesome.
But I don't think that's the end of it when it comes to how we interact with AI and software
in general.
I think there will be new developments on the U.S. front,
so not everything you do in AI is via a chat-based user experience.
What might drive some of that evolution is the emergence of Agent AIs,
software that can pursue goals on your behalf.
Section. The Limitations of ChatUX.
Right now, the most commonly used interface for action,
accessing the power of AI is chat GPT. It's simple. You enter what's called a prompt in natural language
asking chat CBT for what you want, and it goes and tries to do that for you. Yes, we have multimodal
support now too, but the basics are still the same. With this model, AI doesn't really start
doing anything for you until you ask it to by entering a prompt. That's fine for a lot of things,
but not everything. Section, AI magic without the micromanaging. There will be times when we want
AI to be working on our behalf in the background. And if something relevant, important,
or interesting comes up, AI can then reach out to us and initiate a conversation.
Example, imagine an agent that knew you were a strategic consultant in the e-commerce
industry and interested in AI. The agent is monitoring the internet and sees that the videos from
up in AI's Dev Day event were recently posted to YouTube. One of those videos was a panel where
one of the participants was a director of product at Shopify, a platform many of your clients
use. The agent grabs the full transcript of the video, find segments that were interesting, and then
pigs you not just with the snippet, but an analysis and explanation as to why you should spend
five minutes reviewing that snippet and what actions you should take. The point here is that you are not
expected to monitor social media every day, find interesting videos, go to chat GPT, and then have it
analyze the transcript and then see if anything interesting is happening. You can just go about your
day doing the awesome things you do, and the agent works on your behalf in the background.
But wait, there's more. Section. Fire and forget. The power of async U.S. Right now, when you do
something in chat GPT or whatever your AI tool of choices, you're generally sitting down,
having a back and forth with AI. It's mostly happening in real time and you're waiting for the
result of each step. Do this, wait, tweak that. Okay, now do this, now this. That's fine,
most of the time. There's something gratifying about getting the results right now-ish, but that
doesn't always make sense. In real life, there are oftentimes when you want to just assign something and
have someone spend some time on it. Pull together a bunch of bits, iterate, then synthesize
into something high quality that you can then review. You want to fire off the request and then
forget about it until they have something really good for you to review. The same can apply to the
world of AI. Imagine being able to just give an agent a high-order goal, and then it goes
off and does the work which may take a while, and then it pings you when it's got something for you.
Example. The big annual conference for my industry happened recently and the content is now all
on YouTube. Go analyze all the content and create a 15-minute summary video that includes any
mentions of my company, my key partners, or my key clients. Based on social media,
figure out what hashtags were used and which quotes got the most uptake. Give me any big highlights
or announcements. Add text captions to the video of who is speaking and what my relationship is to
them, if any. All right, so I'll pause here. He's got even more posts. But what I like about
this and why I wanted to share some of these new things coming out from, again, agent.aI
is that it shows that this thing that people have been so excited about all year, really since
Baby AGI and Auto GPD started showing up on the scene in April, and everyone started
chattering about it on YouTube and on Twitter. This idea of agents that can actually
accomplish goals has been so ever-present, just lurking below the surface and something that
so much developer energy was going towards. What's interesting about what we're seeing now
is one, we're starting to see these actual tools brought to bear in specific contexts,
such as co-gen, and two, we're starting to see a specific discourse and lexicon form around it.
In other words, the idea of Agent AI as a subcategory of AI that has its own specific language
and terminology and understanding and interfaces, and of course, it is through having that
specific contextual knowledge that more people will be able to take advantage of these changes.
Overall, it's a great reminder that we are barely scratching the surface on what we think of now
as AI, and 2024 could get a little wild. For now, we will wrap the episode there. I hope you
are having a great weekend wherever you are. Until next time, peace.
