The AI Daily Brief: Artificial Intelligence News and Analysis - A Free Course on Using Agents Created by ChatGPT Agent
Episode Date: July 20, 2025On today's episode, NLW puts ChatGPT's new Agent (and Manus) to the test to see how well they do in creating a free online course on agent management. Watch the episode then try for yourself:h...ttps://docsend.com/view/s/awcwzv6kw9hsvekaBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at agntcy.org Vanta - Simplify compliance - https://vanta.com/nlwPlumb - The automation platform for AI experts and consultants https://useplumb.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
Today we're reviewing a course on AI agent management created by Chatchabit Agent.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Hello, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG, Blitzy, and Vanta.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief.
And if you're interested in sponsoring the show, shoot me a note at NLW at Breakdown.
network. Now today we are doing a bit of a review of chat GPT agent. The new tool was just announced
at the end of last week. Access started rolling out shortly thereafter. And for this weekend,
Long-Reed-style big think episode, I decided I wanted to check it out and see how well it did
at a particular task. Now, of course, I and the rest of the people out there trying it are
still figuring out what use cases are actually going to be valuable, but I thought this one
might be something that did a good job combining its abilities. For those of you who haven't listened to
or watched my overview show of ChatGBTGBT agent, basically it combines the deep research capability
of deep research with the computer use capabilities of operator, along with the terminal encoding
abilities native to ChatGPT. What that means is it can go further in terms of producing valuable
assets than something like Deep Research does, and it can natively integrate a sort of research
capability into some of the agentic operations that were represented by operator.
The natural first use cases, then, I think that a lot of us are experimenting with, are in some
ways sort of like deep research plus a production task, something that maybe we could have
used these two different systems or a handful of AI systems to do in the past, but would have
had to wire together manually, and now can maybe be put together right from the get.
So what I decided to do was to ask ChatchipT agent to create a new course on the management of
AI agents. I have a particular axe to grind right now with the upskilling platforms that are still
stuck on some of the AI skills that were relevant in 2023. It is very clear to me, and the launch
of ChatGPT agent only reinforces this, that we need to be thinking about systems for better
orchestrating and managing and producing agents rather than just better ways for people to interact
with assistant style tools. This gets into our recent conversations as well about context
engineering. And so screw it, I thought. Let's see how well chatGBT agent does all in its own
at creating a course on agent management. So from a prompting perspective, I used the dictate
feature because I wanted to ramble a little bit and give it some of my background, basically
say what I just said to you. And I also wanted to talk through what I wanted out of this,
a course syllabus, a set of action items and to-dos. I made it clear that I didn't want this to be a
loss leader to a full course. I wanted it to be a fully stocked end-to-end
self-directed self-managed course that someone could take for themselves. The goal is,
and I will in fact do this, to share everything that was produced in the show notes, so that you can
go do this yourself if you so choose. Now, why I thought this would be a good task for Chatchip-T
agent is that on the one hand, there is, of course, a deep research aspect of this. It's got to go
find a bunch of resources, but it also needs the O3 sort of reasoning capability around designing
that syllabus as well as designing the activities. And then it needed an output that was not just a
syllabus, but as you'll see, I also wanted a companion workbook that had all of the activities pulled out.
So I've already produced all of this, but I'm going to run the initial prompt again,
just as we talk over it, so you can get a feel for how Chat Chaptapit agent works.
So we plug this in, and then immediately, remember, the way that agent works is it has a virtual
desktop, a virtual computer that is going to be its way of interfacing with the internet.
As we're waiting for it to get set up, it did this particular set of tasks pretty fast.
I think that the first search took about four minutes, and then a refinement took another couple of
minutes, and then when I had it turned the materials into decks, it took a little bit longer,
10 or 11 minutes, but still it was pretty fast, especially relative to some of the other
more comprehensive missions I've seen that have taken 25, 30, 35 minutes all at once.
Okay, and now we can start to see its chain of thought.
One of the cool things is that you can actually go back and see the full chain of thought,
or at least what OpenAI exposes to you, even after it's done.
So the way that this works is because it's got operator built in,
it doesn't just have to use a text-based version of the internet,
but it can also interact with the graphical user interfaces via clicking around.
And one of the other cool things about Agent is that just like if you had tasked a colleague
to do this work, you can give it refinements in the middle.
You don't have to wait for it finished to continue to talk and refine its mission.
For example, after two minutes initially, I said this should default be a course for non-coters,
although even for non-coters, we should have them vibe coding and learning MCP.
If you want to include an advanced section for coders, that could work.
And you go back, and you can see that it was actually at the point where it was researching
MCP.
I also gave it Greg Eisenberg and Riley Brown as inspiration sources, given how much of this
type of content they'd been producing.
And after four minutes, we got this course on the management of AI agents.
getting good at agent management. It had an overview that let people know what was going to be
included as well as what background they needed, and then a set of, I think, seven modules.
Module one was foundations of AI agents with topics like what are AI agents and why use them,
when should you build an agent, the components of agents, types of agents, agent frameworks overview.
For each of these, it had a set of key points, as well as a resource suggestion with links.
So it references OpenAI's practical guide to building agents, another OpenAIs,
Open AI Guide when you should build an agent, and it ends 15 practical AI agent examples,
fire crawls, crew AI tutorial, and so on and so forth.
For an activity, it said, use ChatGPT's custom instructions to create a simple agent
persona, e.g. a trip planner that keeps memory of the conversation.
Observe how adding context and clear tools, e.g. you can access an API to check flight
prices, improves performance. Module 2 was prompt engineering the latest techniques.
And it should be noted that I did give a little bit of guide in
around some things that I knew that I wanted. I suggested that although the whole point of this
is to move beyond prompt engineering, it was potentially valuable to have at least part of it
provide the latest up-to-date context on that discipline. And you can see, and I think it did a
pretty good job of appropriately constraining this. Module 2 is much less comprehensive than
module 1 was, and feels like it was checking prompt engineering off the box without getting too
lost in the sauce. Module 3 was context engineering, and this gets a little bit more
comprehensive, which is good, because this is a new aspect that I thought was particularly important
when it comes to agent management. It has a section for background, a section for techniques and
best practices, including a set of sources that you can follow, and then it has activities. And I actually
thought that some of these activities were a pretty good way to start wrapping your head around
context engineering. Now, maybe this might be too beginner for some of you who are listening now,
but it included things like building a context store. Use a note-taking tool, and for a selected project,
collect relevant documents, past conversations, and instructions.
Write a script or use N8N that retrieves the latest notes and summarizes them to feed into your agent.
Measure how adding this context improves responses.
It also suggested experimenting with context size.
Provide too little context versus too much and observe the difference in output quality and cost.
Now, this is not revelatory, but it is the type of basic practical activity that I was looking for
that would get someone who just wanted to spend in an afternoon, shoring up their skills,
a little bit of hands-on guidance.
Module 4 was tools and frameworks for agent management, which was broken up between no-code and low-code tools and code-based frameworks,
and you see that the agent has taken me up on the idea of trying to divide things for the average user versus an advanced version for people with more technical knowledge.
Now, one thing that deep research isn't particularly good at, that agent appears to me at least to struggle with potentially as well, is being hyper up-to-date with the latest tools and trends.
It misses some obvious vibe coding platforms, for example, like Lovable and Replit.
And to some extent, this makes sense.
My goal for recency and relevance runs counter to the traditional way the internet has been
structured around Google Search, which is often based in duration and how long a source has
been cited.
Module 5 was about vibe coding in the MCP ecosystem.
And here you can tell that, again, while this is doing pretty well for one-shotting
an overall course syllabus, it doesn't know exactly necessarily how to prioritize super well.
For example, it has a whole section in here that's basically the same length as its understanding
vibe coding section around safe vibe coding, which really is pulled just from a single article.
And I don't think it's particularly relevant in the context of what this overall course is trying
to achieve. We've also kind of lost the idea of this as a syllabus, and it's now become more
of an overview dossier than a course. I will say that part of that might be a lack of precision
in my language. If I had been more clear around what I wanted out of each section of a syllabus,
I'm sure it would have done a better job versus leaving it fairly unconstrained.
It also kind of lumped in all the MCP stuff with the vibe coding stuff,
and the activities are getting a little bit more generic at this point.
Whereas, for example, those context engineering activities were pretty thoughtful
in how they helped people approach the idea of context engineering.
By now we're just at try vibe coding or deploy an MCP server.
Module 6 is orchestration and multi-agent workflows.
And again, at this point, we've totally lost the syllabus idea,
although we've retained the activities.
Module 7 is safety guardrails and evaluation.
Module 8 was capstone projects and further inspiration.
Now, this I think was pretty cool.
Basically, it suggested to do a project that would apply concept from each modules,
and it gave three examples each for non-coders versus coders.
Lastly, it had some leaving sources of inspiration,
which frankly was a little loose, if a good concept.
So this was its first take.
And remember, it only took four minutes to do this.
It was okay but not something I would necessarily push.
Today's episode is brought to you by KPMG.
In today's fiercely competitive market,
unlocking AI's potential could help give you a competitive edge,
foster growth, and drive new value.
But here's the key.
You don't need an AI strategy.
You need to embed AI into your overall business strategy
to truly power it up.
KPMG can show you how to integrate AI and AI agents
into your business strategy
in a way that truly works and is built
on trusted AI principles and employees.
platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at
www.kpmg.comg.com slash AI. Again, that's www.kpmg.comg.com slash AI. This episode is brought to you
by Blitzy. If you're a technology leader, here's something that probably sounds familiar.
Your organization's competitive edge is buried in legacy code that desperately needs modernization,
but the resources required feel out of reach. That was the case for a global investment analysis
firm. They needed to migrate 70,000 lines of complex MATLAB financial algorithms to Python.
Algorithms that drive investment decisions for trillions in assets. Their estimate,
months of high-cost specialized engineering work. Instead, they partnered with Blitzie.
Blitzie's autonomous AI preserved mathematical precision and generated over 80% of the new codebase,
completing the migration with just five days of engineering time. They cut the timeline by 95%
and saved 880 engineering hours. If your organization is facing similar modernization
challenges, visit blitzie.com to schedule a consultation and discover how AI power development can
transform your technical capabilities. Today's episode is brought to you by Vanta. In today's
business landscape, businesses can't just claim security, they have to prove it. Achieving compliance
with a framework like SOC2, ISO-27-01, HIPAA, GDPR, and more is how businesses can demonstrate
strong security practices. The problem is that navigating security and compliance is time-consuming
and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy
and faster by automating compliance across 35 plus frameworks. It gets you audit ready in weeks instead of
months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta
customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months.
The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time,
listeners get $1,000 off at vanta.com slash nLW. That's va-n-ta.com for $1,000 off.
I decided that rather than generally trying to critique it section by section, which is probably what
I would have done if I was actually trying to produce this in full, I wanted to hone in on one
thing that felt like it was missing or underwhelming here. I said, this is okay, but I'm not sure
it does enough to give really practical hands-on experience with agent creation or agent management.
can you create a broader activity bank for actual agent management?
For example, have them build more workflows with N8N and Lindy,
but also have them use Manus or ChatGBTBT agent for tasks that these new general purpose agents are good for.
It once again worked for four minutes and included an updated module, module 9,
with a hands-on agent management activity bank.
Now again, you could see how, if I really wanted to get this into course shape,
I might need to go back and continue tweaking it,
but for our purposes, I wanted to try out some extended capabilities.
And module nine, the hands-on agent management activity bank was pretty useful.
There was a section for N-A-DN workflow challenges that included a change request workflow,
a single-agent personal assistant workflow, a multi-agent with gatekeeper, and a multi-agent team
collaboration.
It identified them as beginner intermediate or advanced.
Then there were a set of Lindy agent-building exercises, some Manus general purpose tasks,
and some chat GPT agent assignments.
Now, you'll note that it hewed pretty closely to exactly what I suggested.
I gave the example of N-N-N, Lindy, Manus, and ChatGPT agent, and that's where it focused.
Now, what I could have done after that is said those were just examples.
Go use your own creativity and research to figure out if there should be other platforms that we focus on,
and it probably would have done a better job.
I found in general that these models view pretty closely to what you ask,
and so by giving it examples, it very naturally anchored itself to them rather than using them as examples,
which is sort of what I intended.
Next up, like I said, I wanted to explore some of the,
advanced capabilities, and so I asked it to create a companion deck or presentation that would be a
workbook focused on the specific activities. I said there should be organization by sections of the
course and each slide should be an individual activity. I also asked it to provide more detailed
step-by-step instructions than were provided in the syllabus. So let's see what it produced. This is its
first version of an agent management workbook. At the top of each page, you have a name, and then you have a set of
steps. It also did have sources, which definitely makes it more useful, and it was shared in a
PowerPoint presentation which you could download. Now, this has about 20 pages of tasks, and while some of them
might not have quite enough instructions to really help, it did expand at least from where it was
in the syllabus. You'll note, though, that this really was just focused on the updated Section 9
Activity Bank. It didn't extract the other activities from all the sections, as I had asked. I then
asked it to redo another version to provide the goals of each activity. I also asked it to diversify
the imagery, given that all of the images were very, very samey and similar in the first version.
What it came back with was the same set of activities, but with a new section on each for their
goal to help anchor the user to see what they were trying to do. The way that it handled
the multivisual request was that for each platform it was talking about, it had the same image,
but then it switched when it switched platforms. So I think the big question here is,
how would I rate this? And how would it compare to other ways of getting this done? As an actual product
or output, what I got is frankly in the C generously range. By that I mean, if you go through and did
all of this reading and did all of these activities, you would absolutely be way better off than 99%
of people. However, it could be a lot better. It's a little haphazard. It's a little mishmashy.
It doesn't feel all that smart. However, here are some caveats to that.
First of all, I was doing this in the context of preparing it for this experiment for you guys.
If I had sat down and said, I am going to produce with ChatGPT agent, the best AI management
course, and I'm going to give myself even just two hours to do it, to say nothing of a full
afternoon or a full day, I absolutely could have rung way more out of this system.
So it's getting a C for a fairly one-shot example.
Second, there's also the fact that we should be grading on the curve that the capability
to do this set of tasks all in one wasn't really available until extremely recently.
Particularly, although this workbook output isn't all that great yet, this represents an enormous
amount of time that was saved.
The ability to, in the same interface of just using language, to ask it to extract a whole
set of information from this, turn it into a different workbook document, and then output it
in a meaningful way, it's hard to overstate how big an efficiency gain that is.
Now, I am going to upload all of this and share with you guys, so you'll have a chance to
review it for yourself. I wouldn't be surprised if you find a lot of stuff that's useful in here,
even if on the whole, you're glad that you got this so-called course for free rather than paying
for it. But it certainly shows the possibilities of this type of general agentic platform
that can just take on more and more work and more complex tasks. Now, there is one more
test I had to run. To the extent that people have used a general agent tool before trying
chat GPT agent, kind of the most likely candidate at this point is Manus. Manus you'll remember is a
Chinese company that went viral earlier in the year and got tons of people excited. And certainly
when ChatGPT agent launched, they welcomed the comparison. They dropped a threat on Twitter yesterday,
basically taking a bunch of use cases that people had highlighted with ChatGPT agent,
doing them with Manus, and effectively arguing that they did everything better than Chatchapet.
GTPT agent did. Now, I ran the exact same prompt through Manus, and I have to say,
overall, if Chadshapit agent got a C, Manus definitely pushes up into the B minus territory.
First of all, it structured its approach more concretely and observably than chatchapit agent did,
and as it was processing, it told me at each step where in its process it was. Now, it's actually
doing a secondary thing, which I'll come back to in just a minute right now, but it had each of
these six different steps. It took, as you can see, 13 minutes or
overall. One thing that I immediately noticed, I was able to interrupt it in the same way that I was
with ChatGPT, and when I did, and I mentioned Greg and Riley, it actually did a better job of going out
and searching what they had. You can see this is the course outline that it was working off of the
whole time. One thing we noticed as it was working is that it thought that MCP was master control
prompt rather than model context protocol, so we writeed that, but it is notable that it was going
to make that mistake. But ultimately, we got to this. And immediately, this is just a
better output than what we got from chat GPT. For each of the modules, it was structured into two
different lessons. I found in general that the lessons were more right on, and it also understood
the prompt that because this was self-directed, the resources needed to be really front and center
so people could go read them for themselves. Interestingly, it just seemed smarter than agent in some
ways. For example, it picked up on and created a lesson about why agent management is the new key skill.
When it came to the activities, they were pretty comparable to Chat Chb-T agents.
I think the modules overall were slightly better organized in Manus.
I think the hands-on agent building suggestions were a little bit better with Manus.
And one thing that's notable is, just like with ChatchipT agent,
the first round of activities were a little bit lacking.
So I gave it the same prompt to go expand them, and I gave it the same examples of
N8N, Lindy, and General Purpose agent work with Chat Chachybt, Agent, and Manus.
And it once again hewed pretty closely to those.
It did some updates on the activities within the main modules,
but then once again added an additional practical activity bank.
Overall, like I said, if ChatGPT was getting a C,
this is definitely closer to at least a C plus or even a B minus.
Now from there, I asked Manus to create that workbook and slides on the side.
And this is where it really just kicked the slats out of ChatGPT.
Coming into this, I was looking for a few things.
One, I wanted to see whether it would actually extract the activities from each module,
or just pull from the activity bank, as ChatGPT agent had done.
Two, I wanted to see how good at providing an expanded set of steps it was.
And three, I wanted to just see how good the thing felt overall.
Now, we did get one error along the way, but then we were able to finalize it and got this
presentation.
And if ChatGBTGPT agent's presentation was closer to a D plus even than a C, this is
closer to a B plus.
It is much more robust.
It actually pulled all of the activities from the different modules.
and just in general did a really nice job with this.
Now, the thing that it was doing as I was just talking to you
is that it gave an additional selection
asking if we wanted to turn it into a website,
so I said, sure, and that's where we are now.
Now, as we wait for Manus to deploy this website,
I want to come back to the overall takeaway.
First of all, it's clear that Manus is a more polished product.
It's gotten more reps in when it comes to this type of user experience.
However, by virtue of the distribution they have,
chat Shat-GPT agent will almost instantly be used by more people. And what this means in total
is that people are about to start having these first experiences with these powerful general agents.
Yes, there are tons of parts of these outputs that are ill-defined, that wouldn't pass muster
if you had, for example, an employee doing them, but we are over here one-shotting things.
And the fact that we're getting these types of results in a matter of minutes, without a ton of
refinement as I'm live creating this during a podcast recording is indicative of where things are
headed. I remain most excited to discover what use cases actually get powerful and normalized most
quickly, and I'm excited to hear about as you guys get access to it, what you are finding
use for with chatGBT agent and with Manus. For now, let's close out here so you can get off to
using this weekend to play. Appreciate you listening or watching as always, and until next time,
peace.
