The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Uses for the New ChatGPT Agent

Episode Date: July 18, 2025

ChatGPT Agent just launched, and it's already showing impressive results - beating humans on about half of knowledge work tasks according to OpenAI's internal benchmarks. In this video, we bre...ak down 5 practical use cases people are already getting value from: from analyzing customer feedback to handling multi-step planning tasks like wedding preparation research. Brought to you by:KPMG – Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://kpmg.com/ai⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠agntcy.org ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠  ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Plumb - The automation platform for AI experts and consultants ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://useplumb.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, five use cases for chat GPT's just announced chat GPT agent. Before then the headlines, a potential $100 billion deal for Anthropic. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello again, friends, quick announcements today before we dive in. First of all, thank you to today's sponsors, KPMG, Blitzy and Superintelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief. Also, if you're interested in sponsoring the show, shoot me a note at NLW at Breakdown. DotRetwork.
Starting point is 00:00:39 I will send you all the information. We're currently booking out for the fall and the early winter. And with Chad Shp.T. releasing a new agent today, it is poised to be quite the fall and winter, so without any further ado, let's dive in. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Well, friends, a billion dollars is no longer cool. You know, it is cool. A hundred billion dollars.
Starting point is 00:01:02 Anthropic is reportedly in the early stages of the first of the time. of planning another investment round that would see them valued at, yes, the much-funded 12 figures, $100 billion. Now, apparently, the sources say that Anthropic isn't formally out fundraising, but is rather fielding interested inbound from VCs. Now, the company last raised funds at a $61.5 billion valuation, but they are growing very quickly. Recent estimates have their run rate at $4 billion annualized, but that's up from just
Starting point is 00:01:32 $3 billion in June. and $1 billion at the beginning of the year. Now, a big part of the revenue bump is driven by the release of Claude Code. The tool has already reached 3 million weekly downloads, which is more than the monthly downloads of the Claude iOS app. Anthropic expects revenue to continue to grow in the coming years. Their optimistic projection is reportedly $35 billion in revenue by 2027, which would close the gap with OpenAI substantially.
Starting point is 00:01:57 Their base case is $11 billion in revenue over the same time span, which would still be tripling sales in two and a half years. margins are also apparently improving, with Anthropic reportedly telling investors that gross profit margin is now at 60% and headed towards 70%. Still, if you're looking for takeaways, it is very clear that there is continued appetite, even at the most expensive levels for the very small number of very top AI labs. Now, one other interesting bit of good news for Anthropic. Turns out the grass isn't always greener in the AI talent war as two key anthropic leaders return. Two weeks ago, Curser pulled off a coup and poached Boris Churny and Kat Wu, the lead developer and the product manager responsible
Starting point is 00:02:35 for Claude Code. The information now reports, however, that they are back at Anthropic, apparently ditching cursor before they even had time to set up their desks. Speculation is understandably rampant around what caused such a quick reversal. AI entrepreneur Palash Shah writes, they really took a peek into the front door and dipped. Curious what the deciding factor was. Size of company, they realized they didn't want to be on the application layer, but what about a company could you discover so quickly that makes you leave? Some speculated it was a spy mission, more that the pair didn't like the look of Cursor's unit economics. Alex Yang and AI hiring consultant wrote,
Starting point is 00:03:08 Have it on good authority that Anthropic beat Cursor's pay package offer. Dario just couldn't get alignment overpaying a PM or software engineer, even a fraction of what the researchers are getting. Now, if that's true, that's an interesting new wrinkle in the AI talent wars. Researchers have been the driving force behind AI growth, but could we be witnessing the start of a shift towards product people getting paid as well? It would certainly align with a broader transition to the app layer becoming the most important part of the AI business.
Starting point is 00:03:32 I'm certainly waiting for the communications layer to be valued that highly, right? Now, others took this way farther. If it's just about the right compensation, then this is weird, but really ultimately no big deal. However, some, like Mark Egan wrote, Boris and Kat back at Anthropic after two weeks at Curser is a huge indicator that Curser is a sinking ship. There is literally nothing they can do to stop the bleeding. The only way to slow it is rug-pulling users. Not a doomer, but it's over for Curser.
Starting point is 00:03:58 I don't know, man. Seems like a lot of speculation to me. by all accounts continues to do nothing but grow, but it is quite an interesting story from the outside. Speaking from the itso over end of the spectrum, scale AI is apparently laying off hundreds of employees as the company adjust to life after the meta-acquihire. Now, much was made of the recent windsurf saga and what it means for the employees left behind after a big aqua-hire deal. That story seems to have a reasonably good ending, with Devon Creator Cognition Labs snatching up the remainder of the startup and providing
Starting point is 00:04:27 employees with a soft landing and a place to keep building the type of thing they were building. At least at first glance, on the other hand, scale looks like a case study and what it looks like when a founder leaves a vacuum behind after the aqua hire. It was already reported that Scale AI had lost Google and Open AI as customers, which represented more than half of their business, and now apparently they've laid off 200 full-time employees, which represents around 14% of the workforce. The company will also stop working with 500 of its thousands of global contractors. A spokesperson said the move is aimed at, quote, streamlining our data business to help us move faster. They also said that scale intends to staff up in other areas, including
Starting point is 00:05:01 enterprise and government sales. In-term CEO, Jason Droge, told staff that the layoffs were a result of bringing in too many people too quickly over the past year. His memo said that that led to too many layers, excessive bureaucracy, and unhelpful confusion about the team's mission. He did add that shifts in market demand had contributed to the decision to restructure as well. Ultimately, he said, we remain a well-resourced, well-funded company, and today's announcement will allow us to accelerate new investments and resources where necessary. This is nothing, if not a fast-moving industry, I will tell you what.
Starting point is 00:05:32 And speaking of which, OpenAI just released a new chat GPT agent, so I think it's time to go talk about it. That's going to do it for today's episode of the headlines. Next up, the main episode. Today's episode is brought to you by KPMG. In today's fiercely competitive market, unlocking AI's potential could help give you a competitive edge, foster growth and drive new value. But here's the key. You don't need an AI strategy. You need to embed
Starting point is 00:05:56 AI into your overall business strategy to truly power it up. KPMG can show you how to integrate AI and AI agents into your business strategy in a way that truly works and is built on trusted AI principles and platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at www.kpmg.comg.com.com.com. Again, that's www.kp pmg.us slash AI. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy is used alongside your favorite coding copilot as your batch software development
Starting point is 00:06:34 platform for the enterprise seeking dramatic development acceleration on large-scale codebases. While traditional co-pilots help with line-by-line completions, Blitzy works ahead of the IDEE by first documenting your entire codebase, then deploying over 3,000 coordinated AI agents in parallel to batch build millions of lines of high-quality code. The scale difference is staggering. Copilot's might give you a few hundred lines of code in seconds, but Blitzy can generate up to 3 million lines of thoroughly vetted code.
Starting point is 00:07:01 If your enterprise is looking to accelerate software development, contact us at blitzy.com to book a custom demo or press get started to begin using the product right away. Today's episode is brought to you by superintelligence specifically agent readiness audits. Everyone is trying to figure out what agent use cases are going to be most impactful for their business and the agent readiness audit is the fastest and best way to do that. We use voice agents to interview your leadership and team and process all of that information to provide an agent readiness score, a set of insights around that score, and a set of highly actionable recommendations on both organizational gaps and high value agent use cases that you should
Starting point is 00:07:40 pursue. Once you've figured out the right use cases, you can use our marketplace to find the right vendors and partners. And what it all adds up to is a faster, better agent strategy. Check it out at B-Super.a.i or email agents at B-supert.a.i to learn more. Welcome back to the AI Daily Brief. It is always a fun day when we have a very hot off the press's new tool or model that has people chattering and thinking about new possibilities, and we get to go check it out and see how people are already getting value out of it. I'm talking, of course, today about chat GPT agent. Now, it's been clear for a while that OpenAI was headed in an agentic direction.
Starting point is 00:08:18 We got Operator right at the beginning of the year, and just a couple weeks later, we got Deep Research. And while Operator hasn't necessarily become a frequent user for most people, deep research very much has. Now, in the time subsequent to Deep Research being launched, we got some of the first credible multipurpose agents that people have actually been using outside of just novelty. Maybe most notably was Manus, the Chinese agent, that had a viral moment earlier this year and attracted a ton of attention.
Starting point is 00:08:45 Yesterday, OpenAI announced that they were going to be announcing something today, and it was pretty clear that it was this agent, and then three hours before the live stream, they posted chat chip emoji deep research, handshake emoji operator. And that combination of things is very clearly at the center of this effort. So much so that Sean, better known as Swix, meme diversion of Steve Jobs' famous introduction of the iPhone, when he said today we're launching three revolutionary products, a new internet browsing device, a new phone, and a new iPod. Now the joke, of course, was that they were all one thing, and that was the introduction of iPhone.
Starting point is 00:09:17 And Swix put this in the new chat GPT context, writing, three things. A deep research model with enhanced search browser, a revolutionary computer use operator, and a sandbox terminal to execute math and code. A browser, a computer, a terminal. Are you getting it? These are not three separate agents. This is one agent, and we are calling it agent. Perfect tweet, 10 out of 10, no notes, good job, SWIX. Now, honestly, this is just a very natural evolution of the capabilities of operator on the one hand
Starting point is 00:09:44 and deep research on the other. Deep research is good as they put it at synthesizing information and operator can interact with websites. And it turns out that combined with ChatGPT's general intelligence and conversational fluency adds up to a hole that is potentially greater than the sum of its parts. In their announcement post and in the announcement live stream, they talked about how each of those different tools operator on the one hand, deep research on the other, short in different and kind of complementary situations. Operator could scroll, click, and type the web, but it couldn't dive deep into analysis or write detailed reports. Deep research couldn't interact
Starting point is 00:10:17 with websites to refine results or access content requiring user authentication. In fact, they noticed that many queries that users were attempting were actually better for the other tool in some way. So the way the chatybt agent works is that it has a text-based browser that can read text of websites, a visual browser that can actually interact with the web in the graphical user interface way that the web is designed, terminal access to code, direct API access to chat GPT, and the agent can also leverage chatGBT connectors, which give it access to data sources like Gmail or GitHub. Just like selecting deep research from the drop-down menu on chatGBT when you prompt it, you select agent when you want to use this new agent tool. ChatGBT agent then spins up its own virtual
Starting point is 00:11:00 computer that becomes the home base for the whole initiative. Now, one of the really cool things about this is that chat ChbT has clearly learned that people really like having some sense of what's going on under the hood. So as Agent is working, it's giving you both a real-time look at the websites that it's browsing, as well as providing chain of thought and a narration of its activities. Part of what makes that valuable is that it's also interruptible. If you realize that there's something that you forgot to ask it as part of the initial prompt, you can even as it's working go in and say, hey, also do this other thing.
Starting point is 00:11:32 OpenAI made a big point of pointing out that that's a much more natural mode, given that that's the way that you'd interact with another human who was working on a set of tasks for you as well. In other words, you wouldn't just assume that you were going to give them one set of instructions at the beginning, and that would be that. You might anticipate that there would be clarifying questions or when they ran up against a problem, they might come back to you. And that's more of the type of mode and interaction that you're going to get with chat GPT agent. Now, this being a new product launch, there had to be some benchmarks. and although you know that I'm a little bit skeptical of over-reliance on benchmarks as a determination of whether a tool is useful or not, preferring instead to see how it does in the real world,
Starting point is 00:12:09 there is, I think, a core story here which is really valuable to help us understand the likely evolution of AI and agents. The TLDR is that ChatsyBT agent did really well on a lot of benchmarks. The first one they share is, for example, Humanity's last exam, which is expert level questions, kind of PhD-level questions across a bunch of different academic topics. O3 with no tools gets a 20.3% on humanity's last exam. Deep research, which has access to both Python and browsing, gets a 26.6%. But chatchibet agent, which has access to a browser, the virtual computer, and the terminal, got 41.6%. On frontier math, 03, which had access to Python,
Starting point is 00:12:50 got 10.3%. The optimized for math, 04 Mini, got 19.3%. And chat GPT agent, again with access to a browser computer and terminal got 27.4%. And this was repeated over and over and over, on data analysis, data modeling, on a spreadsheet bench. Speaking of Steve Jobs, there was this famous story or analogy he used to tell about an article in Scientific American in the early 1970s, which compared the efficiency of locomotion for various species. Basically, they were looking at how much energy it took different animals to get from point A to point B and then ranking them. Jobs would point out that ultimately the Condor won as the most efficient and that humans were somewhere in the middle of the pack, about a third of the way down the list. However, when you instead tested
Starting point is 00:13:35 the efficiency of a person riding a bicycle, that human hybrid was twice as efficient as the condor. And the point that Jobs was making was that what made humans humans was their ability to create tools that amplify their inherent abilities. This is where the idea of computers as a bicycle for the mind came from. When I look at these benchmarks, where ChatGBTGPT agent having access to a set of tools is outperforming these other models in their raw state, it's reflective of that same idea. I also think that this really matters for our understanding of the future of agents, given how important tool use is going to be in terms of shaping how performant agents are. But now let's talk about five different use cases that people have already started to explore for
Starting point is 00:14:20 chat GPT agent. At Kortami, it seems like... the use cases that are most interesting, at least on first glance, are basically deep research style prompts with an action attached, or deep research style prompts that require a bit wider and more diverse access to information than just stuff you might search out on the normal web. So use case number one, understanding product and customer feedback. One of the experiments that Dan Shipper over at EveryRan was that he asked ChatGPT agent to identify and profile both their core customer as well as their biggest missing features. for their CORA tool, which is all about email efficiency and automation.
Starting point is 00:14:58 Now, Dan writes that to complete the task, agents scan through more than 1,500 support emails, hundreds of support forum posts, and even looked people up on LinkedIn to put together a report on who really likes them, who doesn't like them so much, why they don't like them if they don't, and what it all says about their ideal customers. So again, you see here a task that has a combination of a more diverse information set and both outputs and inputs that involve going and browsing around non-public websites. At the end of this, I'll come back to another observation that Dan had, but next let's go to a second use case, which I'll call new business generation, but is really about an expanded ability to plan and theorize. Professor Ethan Malik writes,
Starting point is 00:15:39 chat GPT agent is, I think, a big step forward for getting AIs to do real work. Even at this stage, it does a good job autonomously doing research and assembling Excel files with formulas, PowerPoint, etc. It gives a sense of how agents are coming together. An example prompt he shared was come up with an innovative startup idea in the AI and education space, do market research on it, come up with the financials and build me a pitch deck. And so again, you can see here there is an element of what deep research might have done. You're basically taking O3's insights for coming up with the idea, deep research's ability to do market research, but then coming up with financials, building a pitch deck, that starts to get into both the coding and terminal access. Now, after asking a few clarifying
Starting point is 00:16:19 questions, agent went off and worked for 38 minutes. The idea it came up with was skill AI micro-learning and alongside the idea it also put together those financials, as well as a short deck. He also showed how after he had gotten all of that, he was able to, in much the same way he would with a colleague, ask it to modify one piece of its work, basically building out a tab in the financial model with detailed cost structures. It worked for an additional two minutes and added that in. Ethan reflected, it feels much more like working with an actual human intern capable of a wider range of analytical and computer tasks, and like an intern, you want to give it feedback and work back and forth. Not all the way there yet, but the paradigm is shifting from prompting to delegating.
Starting point is 00:17:01 Now, he also noted that for those of you who have used Manus, that while he thought that Manus might be capable of somewhat more complex tasks, Chatsubit Agent is better at integrating research and doing a wider range of tasks. Next up, since it's come up a couple of times, let's talk about data visualization. One of the examples they gave during their presentation was basically asking agent to go into the chat GPT agent-avow numbers from the Google Drive connector. Remember, it has access to connectors, which provide access to specific data sources, and make slides based on the performance numbers it found there.
Starting point is 00:17:35 You can see in the demo that it's using the terminal to code to create these data visualizations. And it even has a sense of what is or isn't high enough quality and goes back and refines itself. Data visualization all on its own would be a use case that would bring me to Chatchapet agent pretty frequently, as it's something that even some of the best raw models struggle with. Use case number four I'll call complex scenario planning. Rowan Chung of the rundown tested this by asking Agent to build him a complete early retirement plan. He said that to complete the task, agent found local tax laws where he is based in Vancouver, analyzed average monthly spending rates, calculated savings needed to retire at 30,
Starting point is 00:18:12 research optimal investment allocations, found a bunch of tax optimization strategies, including some he'd never heard of, built multiple fire or financial independence retire early scenarios, and turned it all into a downloadable presentation. And again, the value here is integrating things that would have taken a bunch of independent steps otherwise. Yes, you could use deep research for some of this, but it might have had tougher access to certain parts of this information, and it certainly wouldn't have been able to put it all together in a presentation in this way. Now, one interesting thing that Rowan pointed out is that this is actually illustrating a new skill set. He calls it agent management, and not only do I agree with this as
Starting point is 00:18:50 an important new skill set, I basically think that it is the new skill set, and the faster that today's upskilling platforms can shift off things like Pure Play Prompt Engineering as their focus and into things like agent management and personal agent orchestration, the better off we're all going to be. Rowan writes, agents are finally becoming capable of doing real work autonomously, so anyone who learns how to effectively orchestrate agents will have a huge advantage. Now, related to that complex scenario idea, the anchor example that they gave during their live stream was all about this multi-step planning around a wedding. The prompt was basically, I'm going to a wedding in this place, research what clothes I should wear, based on the dress code on
Starting point is 00:19:30 the website, figure out what I should actually buy to match that dress code, and also figure out my hotel options. Now, put this all together in a dossier with a bunch of links that that also included screenshots that helped provide a trace for where it got the information. And so, although this is a research dossier in some way, it's not something that would have been quite as easy for Jeep research alone to do, and it also wouldn't have the ability to actually then take next steps, like go out and actually buy those clothes, which it could have if it had been prompted to do so. Now, there are a bunch of other ideas floating around for use cases for this.
Starting point is 00:20:02 Financial modeling was actually something they tested in the benchmarks that Chatchapit agent performs super well on. Dan Shipper from Every had a bunch of other ideas like UX audits. And yet when it comes to whether this is going to be an everyday tool, it's not clear yet. Dan basically argued that for a lot of his tasks, O3, was still totally sufficient, and that he wasn't sure after a couple of weeks with this thing, how often he would find himself using it. However, when he did need it, it was for a very specific reason
Starting point is 00:20:29 that just wasn't possible with the other tools. The point being that I think it's going to take some time for us to really start to uncover where the most frequent uses of these general agent tools actually is. I think at the beginning it'll be different for different people, and it's going to take not only experimentation, but a bunch of sitting around and seeing where you find yourself using these tools in ways that you might not have expected.
Starting point is 00:20:51 One last note, though, to point out just the profundity of the continued progress here, one of the lines that really stood out from the announcement blog was this one. On an internal benchmark designed to evaluate model performance on complex economically valuable knowledge work tasks, ChatGBTGBT's agent's output is comparable to or better than that of humans in roughly half the cases across a range of task completion times. This is the first version of OpenAI's general agent. It's its third bite at the agent Apple if you include operator and deep research separately,
Starting point is 00:21:23 and within six months basically of the first being released. And already, at least by this internal benchmark, it's outperforming humans in roughly half of knowledge work tasks. If you don't think things are changing fast, man, I do. not know what to tell you. Then again, if you're listening to this show, you were probably not in that camp. In any case, that is going to do it for today's episode. I am excited to dig into chat GPT agent. My guess is that like Dan and Every, I won't find a ton of use cases that I immediately jump into and switch my behaviors towards. But I have a couple in mind that I want to get trying. When I do and if I see how they perform, especially relative to deep research or O3
Starting point is 00:21:58 independently, I will of course let you know, but for now, that's going to do it for today's episode. Appreciate your listening or watching, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.