The AI Daily Brief: Artificial Intelligence News and Analysis - A Free Course on Using Agents Created by ChatGPT Agent

Starting point is 00:00:00 Today we're reviewing a course on AI agent management created by Chatchabit Agent. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Blitzy, and Vanta. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief. And if you're interested in sponsoring the show, shoot me a note at NLW at Breakdown. network. Now today we are doing a bit of a review of chat GPT agent. The new tool was just announced at the end of last week. Access started rolling out shortly thereafter. And for this weekend,

Starting point is 00:00:44 Long-Reed-style big think episode, I decided I wanted to check it out and see how well it did at a particular task. Now, of course, I and the rest of the people out there trying it are still figuring out what use cases are actually going to be valuable, but I thought this one might be something that did a good job combining its abilities. For those of you who haven't listened to or watched my overview show of ChatGBTGBT agent, basically it combines the deep research capability of deep research with the computer use capabilities of operator, along with the terminal encoding abilities native to ChatGPT. What that means is it can go further in terms of producing valuable assets than something like Deep Research does, and it can natively integrate a sort of research

Starting point is 00:01:25 capability into some of the agentic operations that were represented by operator. The natural first use cases, then, I think that a lot of us are experimenting with, are in some ways sort of like deep research plus a production task, something that maybe we could have used these two different systems or a handful of AI systems to do in the past, but would have had to wire together manually, and now can maybe be put together right from the get. So what I decided to do was to ask ChatchipT agent to create a new course on the management of AI agents. I have a particular axe to grind right now with the upskilling platforms that are still stuck on some of the AI skills that were relevant in 2023. It is very clear to me, and the launch

Starting point is 00:02:07 of ChatGPT agent only reinforces this, that we need to be thinking about systems for better orchestrating and managing and producing agents rather than just better ways for people to interact with assistant style tools. This gets into our recent conversations as well about context engineering. And so screw it, I thought. Let's see how well chatGBT agent does all in its own at creating a course on agent management. So from a prompting perspective, I used the dictate feature because I wanted to ramble a little bit and give it some of my background, basically say what I just said to you. And I also wanted to talk through what I wanted out of this, a course syllabus, a set of action items and to-dos. I made it clear that I didn't want this to be a

Starting point is 00:02:49 loss leader to a full course. I wanted it to be a fully stocked end-to-end self-directed self-managed course that someone could take for themselves. The goal is, and I will in fact do this, to share everything that was produced in the show notes, so that you can go do this yourself if you so choose. Now, why I thought this would be a good task for Chatchip-T agent is that on the one hand, there is, of course, a deep research aspect of this. It's got to go find a bunch of resources, but it also needs the O3 sort of reasoning capability around designing that syllabus as well as designing the activities. And then it needed an output that was not just a syllabus, but as you'll see, I also wanted a companion workbook that had all of the activities pulled out.

Starting point is 00:03:31 So I've already produced all of this, but I'm going to run the initial prompt again, just as we talk over it, so you can get a feel for how Chat Chaptapit agent works. So we plug this in, and then immediately, remember, the way that agent works is it has a virtual desktop, a virtual computer that is going to be its way of interfacing with the internet. As we're waiting for it to get set up, it did this particular set of tasks pretty fast. I think that the first search took about four minutes, and then a refinement took another couple of minutes, and then when I had it turned the materials into decks, it took a little bit longer, 10 or 11 minutes, but still it was pretty fast, especially relative to some of the other

Starting point is 00:04:06 more comprehensive missions I've seen that have taken 25, 30, 35 minutes all at once. Okay, and now we can start to see its chain of thought. One of the cool things is that you can actually go back and see the full chain of thought, or at least what OpenAI exposes to you, even after it's done. So the way that this works is because it's got operator built in, it doesn't just have to use a text-based version of the internet, but it can also interact with the graphical user interfaces via clicking around. And one of the other cool things about Agent is that just like if you had tasked a colleague

Starting point is 00:04:39 to do this work, you can give it refinements in the middle. You don't have to wait for it finished to continue to talk and refine its mission. For example, after two minutes initially, I said this should default be a course for non-coters, although even for non-coters, we should have them vibe coding and learning MCP. If you want to include an advanced section for coders, that could work. And you go back, and you can see that it was actually at the point where it was researching MCP. I also gave it Greg Eisenberg and Riley Brown as inspiration sources, given how much of this

Starting point is 00:05:07 type of content they'd been producing. And after four minutes, we got this course on the management of AI agents. getting good at agent management. It had an overview that let people know what was going to be included as well as what background they needed, and then a set of, I think, seven modules. Module one was foundations of AI agents with topics like what are AI agents and why use them, when should you build an agent, the components of agents, types of agents, agent frameworks overview. For each of these, it had a set of key points, as well as a resource suggestion with links. So it references OpenAI's practical guide to building agents, another OpenAIs,

Starting point is 00:05:43 Open AI Guide when you should build an agent, and it ends 15 practical AI agent examples, fire crawls, crew AI tutorial, and so on and so forth. For an activity, it said, use ChatGPT's custom instructions to create a simple agent persona, e.g. a trip planner that keeps memory of the conversation. Observe how adding context and clear tools, e.g. you can access an API to check flight prices, improves performance. Module 2 was prompt engineering the latest techniques. And it should be noted that I did give a little bit of guide in around some things that I knew that I wanted. I suggested that although the whole point of this

Starting point is 00:06:17 is to move beyond prompt engineering, it was potentially valuable to have at least part of it provide the latest up-to-date context on that discipline. And you can see, and I think it did a pretty good job of appropriately constraining this. Module 2 is much less comprehensive than module 1 was, and feels like it was checking prompt engineering off the box without getting too lost in the sauce. Module 3 was context engineering, and this gets a little bit more comprehensive, which is good, because this is a new aspect that I thought was particularly important when it comes to agent management. It has a section for background, a section for techniques and best practices, including a set of sources that you can follow, and then it has activities. And I actually

Starting point is 00:06:56 thought that some of these activities were a pretty good way to start wrapping your head around context engineering. Now, maybe this might be too beginner for some of you who are listening now, but it included things like building a context store. Use a note-taking tool, and for a selected project, collect relevant documents, past conversations, and instructions. Write a script or use N8N that retrieves the latest notes and summarizes them to feed into your agent. Measure how adding this context improves responses. It also suggested experimenting with context size. Provide too little context versus too much and observe the difference in output quality and cost.

Starting point is 00:07:29 Now, this is not revelatory, but it is the type of basic practical activity that I was looking for that would get someone who just wanted to spend in an afternoon, shoring up their skills, a little bit of hands-on guidance. Module 4 was tools and frameworks for agent management, which was broken up between no-code and low-code tools and code-based frameworks, and you see that the agent has taken me up on the idea of trying to divide things for the average user versus an advanced version for people with more technical knowledge. Now, one thing that deep research isn't particularly good at, that agent appears to me at least to struggle with potentially as well, is being hyper up-to-date with the latest tools and trends. It misses some obvious vibe coding platforms, for example, like Lovable and Replit. And to some extent, this makes sense.

Starting point is 00:08:12 My goal for recency and relevance runs counter to the traditional way the internet has been structured around Google Search, which is often based in duration and how long a source has been cited. Module 5 was about vibe coding in the MCP ecosystem. And here you can tell that, again, while this is doing pretty well for one-shotting an overall course syllabus, it doesn't know exactly necessarily how to prioritize super well. For example, it has a whole section in here that's basically the same length as its understanding vibe coding section around safe vibe coding, which really is pulled just from a single article.

Starting point is 00:08:46 And I don't think it's particularly relevant in the context of what this overall course is trying to achieve. We've also kind of lost the idea of this as a syllabus, and it's now become more of an overview dossier than a course. I will say that part of that might be a lack of precision in my language. If I had been more clear around what I wanted out of each section of a syllabus, I'm sure it would have done a better job versus leaving it fairly unconstrained. It also kind of lumped in all the MCP stuff with the vibe coding stuff, and the activities are getting a little bit more generic at this point. Whereas, for example, those context engineering activities were pretty thoughtful

Starting point is 00:09:20 in how they helped people approach the idea of context engineering. By now we're just at try vibe coding or deploy an MCP server. Module 6 is orchestration and multi-agent workflows. And again, at this point, we've totally lost the syllabus idea, although we've retained the activities. Module 7 is safety guardrails and evaluation. Module 8 was capstone projects and further inspiration. Now, this I think was pretty cool.

Starting point is 00:09:44 Basically, it suggested to do a project that would apply concept from each modules, and it gave three examples each for non-coders versus coders. Lastly, it had some leaving sources of inspiration, which frankly was a little loose, if a good concept. So this was its first take. And remember, it only took four minutes to do this. It was okay but not something I would necessarily push. Today's episode is brought to you by KPMG.

Starting point is 00:10:10 In today's fiercely competitive market, unlocking AI's potential could help give you a competitive edge, foster growth, and drive new value. But here's the key. You don't need an AI strategy. You need to embed AI into your overall business strategy to truly power it up. KPMG can show you how to integrate AI and AI agents

Starting point is 00:10:28 into your business strategy in a way that truly works and is built on trusted AI principles and employees. platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at www.kpmg.comg.com slash AI. Again, that's www.kpmg.comg.com slash AI. This episode is brought to you by Blitzy. If you're a technology leader, here's something that probably sounds familiar. Your organization's competitive edge is buried in legacy code that desperately needs modernization, but the resources required feel out of reach. That was the case for a global investment analysis

Starting point is 00:11:02 firm. They needed to migrate 70,000 lines of complex MATLAB financial algorithms to Python. Algorithms that drive investment decisions for trillions in assets. Their estimate, months of high-cost specialized engineering work. Instead, they partnered with Blitzie. Blitzie's autonomous AI preserved mathematical precision and generated over 80% of the new codebase, completing the migration with just five days of engineering time. They cut the timeline by 95% and saved 880 engineering hours. If your organization is facing similar modernization challenges, visit blitzie.com to schedule a consultation and discover how AI power development can transform your technical capabilities. Today's episode is brought to you by Vanta. In today's

Starting point is 00:11:44 business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC2, ISO-27-01, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices. The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35 plus frameworks. It gets you audit ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time,

Starting point is 00:12:28 listeners get $1,000 off at vanta.com slash nLW. That's va-n-ta.com for $1,000 off. I decided that rather than generally trying to critique it section by section, which is probably what I would have done if I was actually trying to produce this in full, I wanted to hone in on one thing that felt like it was missing or underwhelming here. I said, this is okay, but I'm not sure it does enough to give really practical hands-on experience with agent creation or agent management. can you create a broader activity bank for actual agent management? For example, have them build more workflows with N8N and Lindy, but also have them use Manus or ChatGBTBT agent for tasks that these new general purpose agents are good for.

Starting point is 00:13:07 It once again worked for four minutes and included an updated module, module 9, with a hands-on agent management activity bank. Now again, you could see how, if I really wanted to get this into course shape, I might need to go back and continue tweaking it, but for our purposes, I wanted to try out some extended capabilities. And module nine, the hands-on agent management activity bank was pretty useful. There was a section for N-A-DN workflow challenges that included a change request workflow, a single-agent personal assistant workflow, a multi-agent with gatekeeper, and a multi-agent team

Starting point is 00:13:38 collaboration. It identified them as beginner intermediate or advanced. Then there were a set of Lindy agent-building exercises, some Manus general purpose tasks, and some chat GPT agent assignments. Now, you'll note that it hewed pretty closely to exactly what I suggested. I gave the example of N-N-N, Lindy, Manus, and ChatGPT agent, and that's where it focused. Now, what I could have done after that is said those were just examples. Go use your own creativity and research to figure out if there should be other platforms that we focus on,

Starting point is 00:14:08 and it probably would have done a better job. I found in general that these models view pretty closely to what you ask, and so by giving it examples, it very naturally anchored itself to them rather than using them as examples, which is sort of what I intended. Next up, like I said, I wanted to explore some of the, advanced capabilities, and so I asked it to create a companion deck or presentation that would be a workbook focused on the specific activities. I said there should be organization by sections of the course and each slide should be an individual activity. I also asked it to provide more detailed

Starting point is 00:14:38 step-by-step instructions than were provided in the syllabus. So let's see what it produced. This is its first version of an agent management workbook. At the top of each page, you have a name, and then you have a set of steps. It also did have sources, which definitely makes it more useful, and it was shared in a PowerPoint presentation which you could download. Now, this has about 20 pages of tasks, and while some of them might not have quite enough instructions to really help, it did expand at least from where it was in the syllabus. You'll note, though, that this really was just focused on the updated Section 9 Activity Bank. It didn't extract the other activities from all the sections, as I had asked. I then asked it to redo another version to provide the goals of each activity. I also asked it to diversify

Starting point is 00:15:24 the imagery, given that all of the images were very, very samey and similar in the first version. What it came back with was the same set of activities, but with a new section on each for their goal to help anchor the user to see what they were trying to do. The way that it handled the multivisual request was that for each platform it was talking about, it had the same image, but then it switched when it switched platforms. So I think the big question here is, how would I rate this? And how would it compare to other ways of getting this done? As an actual product or output, what I got is frankly in the C generously range. By that I mean, if you go through and did all of this reading and did all of these activities, you would absolutely be way better off than 99%

Starting point is 00:16:08 of people. However, it could be a lot better. It's a little haphazard. It's a little mishmashy. It doesn't feel all that smart. However, here are some caveats to that. First of all, I was doing this in the context of preparing it for this experiment for you guys. If I had sat down and said, I am going to produce with ChatGPT agent, the best AI management course, and I'm going to give myself even just two hours to do it, to say nothing of a full afternoon or a full day, I absolutely could have rung way more out of this system. So it's getting a C for a fairly one-shot example. Second, there's also the fact that we should be grading on the curve that the capability

Starting point is 00:16:49 to do this set of tasks all in one wasn't really available until extremely recently. Particularly, although this workbook output isn't all that great yet, this represents an enormous amount of time that was saved. The ability to, in the same interface of just using language, to ask it to extract a whole set of information from this, turn it into a different workbook document, and then output it in a meaningful way, it's hard to overstate how big an efficiency gain that is. Now, I am going to upload all of this and share with you guys, so you'll have a chance to review it for yourself. I wouldn't be surprised if you find a lot of stuff that's useful in here,

Starting point is 00:17:28 even if on the whole, you're glad that you got this so-called course for free rather than paying for it. But it certainly shows the possibilities of this type of general agentic platform that can just take on more and more work and more complex tasks. Now, there is one more test I had to run. To the extent that people have used a general agent tool before trying chat GPT agent, kind of the most likely candidate at this point is Manus. Manus you'll remember is a Chinese company that went viral earlier in the year and got tons of people excited. And certainly when ChatGPT agent launched, they welcomed the comparison. They dropped a threat on Twitter yesterday, basically taking a bunch of use cases that people had highlighted with ChatGPT agent,

Starting point is 00:18:07 doing them with Manus, and effectively arguing that they did everything better than Chatchapet. GTPT agent did. Now, I ran the exact same prompt through Manus, and I have to say, overall, if Chadshapit agent got a C, Manus definitely pushes up into the B minus territory. First of all, it structured its approach more concretely and observably than chatchapit agent did, and as it was processing, it told me at each step where in its process it was. Now, it's actually doing a secondary thing, which I'll come back to in just a minute right now, but it had each of these six different steps. It took, as you can see, 13 minutes or overall. One thing that I immediately noticed, I was able to interrupt it in the same way that I was

Starting point is 00:18:46 with ChatGPT, and when I did, and I mentioned Greg and Riley, it actually did a better job of going out and searching what they had. You can see this is the course outline that it was working off of the whole time. One thing we noticed as it was working is that it thought that MCP was master control prompt rather than model context protocol, so we writeed that, but it is notable that it was going to make that mistake. But ultimately, we got to this. And immediately, this is just a better output than what we got from chat GPT. For each of the modules, it was structured into two different lessons. I found in general that the lessons were more right on, and it also understood the prompt that because this was self-directed, the resources needed to be really front and center

Starting point is 00:19:26 so people could go read them for themselves. Interestingly, it just seemed smarter than agent in some ways. For example, it picked up on and created a lesson about why agent management is the new key skill. When it came to the activities, they were pretty comparable to Chat Chb-T agents. I think the modules overall were slightly better organized in Manus. I think the hands-on agent building suggestions were a little bit better with Manus. And one thing that's notable is, just like with ChatchipT agent, the first round of activities were a little bit lacking. So I gave it the same prompt to go expand them, and I gave it the same examples of

Starting point is 00:20:01 N8N, Lindy, and General Purpose agent work with Chat Chachybt, Agent, and Manus. And it once again hewed pretty closely to those. It did some updates on the activities within the main modules, but then once again added an additional practical activity bank. Overall, like I said, if ChatGPT was getting a C, this is definitely closer to at least a C plus or even a B minus. Now from there, I asked Manus to create that workbook and slides on the side. And this is where it really just kicked the slats out of ChatGPT.

Starting point is 00:20:32 Coming into this, I was looking for a few things. One, I wanted to see whether it would actually extract the activities from each module, or just pull from the activity bank, as ChatGPT agent had done. Two, I wanted to see how good at providing an expanded set of steps it was. And three, I wanted to just see how good the thing felt overall. Now, we did get one error along the way, but then we were able to finalize it and got this presentation. And if ChatGBTGPT agent's presentation was closer to a D plus even than a C, this is

Starting point is 00:21:00 closer to a B plus. It is much more robust. It actually pulled all of the activities from the different modules. and just in general did a really nice job with this. Now, the thing that it was doing as I was just talking to you is that it gave an additional selection asking if we wanted to turn it into a website, so I said, sure, and that's where we are now.

Starting point is 00:21:20 Now, as we wait for Manus to deploy this website, I want to come back to the overall takeaway. First of all, it's clear that Manus is a more polished product. It's gotten more reps in when it comes to this type of user experience. However, by virtue of the distribution they have, chat Shat-GPT agent will almost instantly be used by more people. And what this means in total is that people are about to start having these first experiences with these powerful general agents. Yes, there are tons of parts of these outputs that are ill-defined, that wouldn't pass muster

Starting point is 00:21:52 if you had, for example, an employee doing them, but we are over here one-shotting things. And the fact that we're getting these types of results in a matter of minutes, without a ton of refinement as I'm live creating this during a podcast recording is indicative of where things are headed. I remain most excited to discover what use cases actually get powerful and normalized most quickly, and I'm excited to hear about as you guys get access to it, what you are finding use for with chatGBT agent and with Manus. For now, let's close out here so you can get off to using this weekend to play. Appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - A Free Course on Using Agents Created by ChatGPT Agent

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.