The AI Daily Brief: Artificial Intelligence News and Analysis - What to Use Different AI Models For

Starting point is 00:00:00 Today on the AI Daily Brief, when to use different AI models and what to use them for. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Thanks to today's sponsors, blitzie.com for T-slabs and Super Intelligent. And to get an ad-free version of the show, go to patreon.com slash AI Daily Brief. Welcome back to the AI Daily Brief. Today was a slightly slow news day. It doesn't happen very often in this space. And so I decided to use it for a show that I've been thinking about for a little while.

Starting point is 00:00:32 One of the big challenges for people with LLMs is which models to select. If you go into chat GBT, for example, up in the top left-hand corner, there's a model selector where you can choose GPT4O, GPT4.5, O3, 04 Mini, 04 Mini, or even O1 Pro mode. Now, these terms obviously mean nothing to most people. And as much as OpenAI gives little tiny guidance here, for example, it says GPT40, great for most tasks, GPT 4.5, good for writing and exploring ideas. these aren't really sufficient to help people understand what different use cases are suited to different types of models. It doesn't take much searching to find examples of people's confusion here.

Starting point is 00:01:11 For example, on ex-Edwardo Borges writes, I follow AI improvements closely. I'm familiar with most models like Claude, Mistral, Lama, GBT, Gemini, Grok, etc. And I currently have no idea which models to use on OpenAI. It feels like they're pranking us. I tried asking ChatGBT for an answer and it got even more complicated. Now, I think most people default to something really basic, like Shil Monard here, who writes, I use 4090% of the time and 0.310% of the time. Do people use the other ones? And while these questions may seem small, the reality is,

Starting point is 00:01:40 use of these tools is becoming completely endemic in a professional setting. ChatTT recently peaked at over 800 million weekly active users. In the recent KPMG Pulse survey, we saw the number of people who are using tools like ChatGPT daily jump from 22 to 58%. Point is, these tools are becoming a key part of our workstream, and yet we still don't know exactly which use cases belong with what models. And that's why I was very excited to see about a week ago, OpenAI published a post on their Help Center about exactly this. It was specifically aimed at Enterprise and was called when to use

Starting point is 00:02:12 each model. So what we're going to do today is go through how they frame it, look at some sample use cases for each of those models, both generally speaking as well as organized by specific categories of business users, from Solopreneurs to SMEs to mid-market companies to enterprises, and then we're going to do a little bit of a summary and my sort of crib notes for how much you really have to care about all these different things. A couple quick caveats before we dig into this model by model assessment. First of all, I'm basing this off of OpenAI's recent post, and so I'm only focusing on the different OpenAI models.

Starting point is 00:02:47 This is not an argument that you shouldn't care about Claude or Gemini or GROC. And to the extent that people find it valuable, I'll happily do an episode about when I use those different models for different purposes. but for today I'm just going to focus on OpenAI Suite. The second thing to note is that I'm coming at this from an individual user perspective rather than from the perspective of developers who are building software for an enterprise. So the considerations are likely going to be different as developers think about how to wire together different systems to optimize for both cost and performance.

Starting point is 00:03:17 We also aren't going to get that much into what models to use for different types of agentic workflows. Again, just for the purposes of this show, we're really going to be focused on that individual employee kind of use case. So specifically how you might use these things as an individual. Now let's start with the daily workhorse, GBT40. And daily use is exactly what OpenAI describes as this being good for. They write that GPT4O excels at everyday tasks, brainstorming, summarizing, emails, creative content, fully multimodal supports almost all capabilities, GPTs, data analysis, search, image generation, canvas, advanced voice, and inputs, documents, images, CSV files, audio, and video.

Starting point is 00:03:56 Basically, this is the default model. It's the one that you're going to use day in and day out for your most common tasks. The example prompts that OpenAI provides for GPT40 include summarizing meeting notes into key action items, drafting a follow-up email after a project kickoff, proofreading a report, and brainstorming a launch plan in real time. Now, as we'll see, I actually do not agree with the last one, brainstorming a launch plan. At this point, I think that pretty much all brainstorming, anything having to do with strategy or planning, should be moved over to 03, but we'll get into that in a minute. On those other example prompts, summarizing meeting notes, drafting follow-up emails, proofreading reports, that is exactly what

Starting point is 00:04:36 GPT-40's bread and butter is. Now, in addition to kind of generalist use cases, like the ones just mentioned, the other thing to note about 4-0 is its multimodal capabilities. So, for example, if you are out in the world and you want to take a photo of something as a potential input, but GPT40 is going to be your model. Likewise, bringing it back to an enterprise use case, let's say that you've maybe done some chicken scratch drawings of UI for a new website or an application you're designing, and you want to translate that into something more. Again, that's going to be for GPT40. So here the two operative words are generalist and multimodal. So thinking about this from the standpoint of different types of users, as an individual, you're going

Starting point is 00:05:18 to do things like feeding it transcripts of my podcast to get summaries that can be shared as part of show notes. And when it comes to, you're going to do things. And when it comes to, employees at SMEs, mid-markets, or enterprises, honestly, a lot of the use cases look very similar. It's going to be things like ingesting call recordings and slides and creating tailored follow-up decks for prospects, basic marketing use cases. It's going to be things like creating standard operating procedure documents for internal knowledge management. Ultimately, 4-0 is your workhorse model for a lot of day-in, day-out knowledge work. Then we get over to GPT 4.5. And I think what's complicated about the naming convention here is that in this

Starting point is 00:05:53 case, 4.5 doesn't just mean strictly better. It means better at certain things. And the specific things it's better for, in short, are creative writing tasks. OpenAI says that GPT4.5 is ideal for creative tasks. Emotional intelligence, clear communication, creativity, and a more collaborative, intuitive approach to brainstorming. The example prompts they give, create an engaging LinkedIn post about AI trends, write a product description for a new feature launch, develop a customer apology letter with an empathetic tone. So the way that I think about this, is that effectively any time I need writing to be outward-facing and good, rather than just completely perfunctory, I'm turning to GPT 4.5. If I am ever doing things like coming up with a set of different

Starting point is 00:06:36 possible names for a blog post or an article, I'm using 4.5. If I'm actually having it try to write in a particular style, once again, 4.5. I think 4.5 should be the default for marketers who are using it for any sort of external-facing copy, be that email copy, but especially social media, a post or any longer form article. In fact, pretty much the only writing that I don't have GPT 4.5 do and let 4O handle is stuff that is just complete wrote summarization where the quality of the words doesn't so much matter. All that matters is it capturing the key ideas. So, for example, taking a transcript of a podcast and doing a summary, does a fine enough job with that. There's no reason not to use 4.5 necessarily. It's just kind of more creative horsepower than that particular

Starting point is 00:07:19 task needs. So some ways that different types of companies might use this, So-lopreneurs are really anyone who's using it as an individual. If they're trying to do any sort of thought leadership writing, that's going to be a task for 4.5. For SMEs, 4.5 is going to be really good at tasks that require empathy. So, for example, generating empathetic HR templates. Think performance review feedback. New hiring notes.

Starting point is 00:07:42 Any of that sort of communication 4.5 is going to do well with. When you get into the mid-market and enterprise, especially for companies that have global CX teams, 4.5 could be really good at things like taking brand voice guidelines, and drafting localization ready customer service macros. Again, what 4.5 is good for is external-facing writing where the quality of the words actually matters. Today's episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context, which, if you don't know exactly what that means yet, do not worry we're going to explain, and it's awesome. So Blitzy is used alongside your favorite coding co-pilot as your batch software development platform

Starting point is 00:08:20 for the enterprise, and it's meant for those who are seeking dramatic, development acceleration on large-scale codebases. Traditional copilots help developers with line-by-line completions and snippets, but Blitzy works ahead of the IDE, first documenting your entire code base, then deploying more than 3,000 coordinated AI agents working in parallel to batch build millions of lines of high-quality code for large-scale software projects. So then, whether it's code-based refactors, modernizations, or bulk development of your product roadmap, the whole idea of Blitzy is to provide enterprise's dramatic velocity improvement. To put it in simpler terms, for every line of code

Starting point is 00:08:53 eventually provided to the human engineering team, Blitsey will have written it hundreds of times, validating the output with different agents to get the highest quality code to the enterprise and batch. Projects then that would normally require dozens of developers working for months can now be completed with a fraction of the team in weeks, empowering organizations to dramatically shorten development cycles and bring products to market faster than ever. If your enterprise is looking to accelerate software development, whether it's large-scale modernization, refactoring, or just increasing the rate of your STLC, contact Blitsey at blitzie.com, that's B-L-I-T-Z-Y.com, to book a custom demo, or just press get started and start using the product right away.

Starting point is 00:09:30 Today's episode is brought to you by Vertice Labs, the AI Native Digital Consulting firm specializing in product development and AI agents for small to medium-sized businesses. Now, guys, this is a market that we have seen so much interest for, so much demand for, and many times great AI dev shops and builders out there just have so, much business from the high end of the mid-market and big enterprises that this is a group of buyers that gets neglected. Now, for Vertice, AI Native means that they don't just build AI, they use it in every step of their process. They embed agents in their workflows so that they better know how to help you embed agents in your workflows. And indeed, what they specialize in is building

Starting point is 00:10:07 AI agents and agentic workflows that augment knowledge work, from customer support to internal ops, so that your team can focus on higher value work. Vertice wants to ensure that this is not just another co-pilot, but something that works end-to-end, translating business problems into working software in weeks, not quarters. They have found that their clients typically see a 60% reduction in time and cost, with significantly higher output than traditional technology partners. So if you are a founder, a CTO, a business leader, or you've just got a product idea to launch, check out for tislabs.io. That's v-E-R-T-I-E-Labs.io. Today's episode is brought to you by Superintelligent.

Starting point is 00:10:48 Now, you have heard me talk about agent readiness audits probably numerous times at this point. This is our system that uses voice agents and a hybrid human AI analysis process to benchmark your agent readiness and map your agent opportunities and give you some really pointed, actionable next steps to move further down the path in your agentic journey. But we're coming up on the slow time of the year, and if you want to use this time to get out ahead of peers and competitors, we're excited to announce something we're calling Agent Summer. The idea here isn't that complicated. It's basically just an accelerated program to get you agentified and fast.

Starting point is 00:11:21 First of all, it's going to include an agent readiness audit, figuring out where your biggest agent opportunities are. Next, we're going to support both your internal change management process, helping you figure out AI policy, data readiness, things like that, as well as doing action planning around the agent opportunities that are most relevant for you. And finally, we're going to connect you to the right vendors to actually go and deliver this. Now, for this, we want to work with a very small handful of companies that really want to move. we're going to be bundling more than $50,000 of services for something that starts closer to $30,000.

Starting point is 00:11:49 And so if you want to use this summer to jump ahead on your company's agent journey, email agent at besuper.a.i with summer in the subject line, claim one of these limited spots, and let's go have an agent summer. Now let's actually jump ahead to a couple of models that you're probably not using all that much, at least as an individual. First up, we've got 04 Mini. And again, one of the challenges with OpenAI's naming conventions is that you hear 04 and you assume it must be better than 03, right?

Starting point is 00:12:13 Well, it's very clear that 04 Mini and 04 Mini High were planted by OpenAI for a very specific set of tasks that really do slant technically. OpenAI says that 04 Mini is good for fast technical tasks, quick STEM-related queries, programming, and visual reasoning. The example prompts they give are extracting key data points from a CSV file, providing a quick summary of a scientific article, or quick fix a Python traceback. O4 Mini High is the same domain except where you need more detail rather than speed. So they say this is for detailed technical tasks such as advanced coding, math, and scientific explorations. O4 Mini High is programmed to think longer for higher accuracy. So the example prompts they give are solving a complex math equation and explaining the

Starting point is 00:12:56 steps, drafting SQL queries for data extraction, or explaining a scientific concept in layman's terms. So how might this manifest inside of different types of companies? Well, a solopreneur who's managing all of their own processes, including their own tech, might use a model like 04 Mini to help fix bugs as a sort of fix my site helper. For example, spotting issues with a WordPress CSS glitch or something like that. An SME might use O4 Mini as a sort of IT help desk assistant. A mid-market might have their data ops team use it to churn out ad hoc Python ETL scripts,

Starting point is 00:13:30 and Enterprise might use it to power a continuous code review bot to flag security issues across thousands of small daily pull requests. For companies, O4 Mini is designed for a lot of usage. In OpenAI's enterprise plan, it has 300 requests per day, as opposed to, for example, 04 Mini High, which has 100 requests per day, or 03, which has 100 requests per week. I think it's fairly safe to say, though, that if your role isn't particularly technical, you're likely not going to be using 04 Mini very much. So for all intents and purposes, you can kind of ignore it and its partner, 04 Mini High,

Starting point is 00:14:02 which is going to index even more specific on the technical role complexity, being really something that, for example, data scientists are going to use. Likewise, let's just touch briefly on 01 Pro mode. It is included in this list because it's available as part of their enterprise plan, and OpenAI says that it is for complex reasoning. So, for example, drafting a detailed risk analysis memo for an EU data privacy rollout, generating a multi-page research summary on emerging technologies, creating an algorithm for financial forecasting using theoretical models.

Starting point is 00:14:31 It's a very small number of requests that enterprises get per month for 01. Pro. And for individuals, it's not even included in the main model selector. You have to go click more models, and it's framed as a legacy reasoning expert. Now, it's not impossible that there might be some use cases where 01 Pro mode is useful for a particular type of output, such as a long form high stakes document. So for an SME, things like drafting a long ISO-27-001 compliance handbook, for the mid-market producing a deep patent landscape review for a potential acquisition, or for an enterprise, developing some super extensive impact assessment that has to cite specific rules and regulations from different jurisdictions. Maybe the most germane question is when to use O1 Pro

Starting point is 00:15:16 as opposed to O3. They're both reasoning models. O3 is theoretically a more advanced reasoning model, so where would you want to use the legacy O1 Pro mode? And the short answer is O1 Pro mode is optimized for work that's really long, or where accuracy is really important. O1 is extremely slow. And the whole idea of it is that it sacrifices speed for a more exhaustive internal reasoning pass, meaning that it's tuned for accuracy and depth. So for things like regulatory filings, safety critical engineering reviews, litigation briefs, risk assessments, anything where accuracy really matters, that's the time to consider O1 Pro. Now, the other side is that O1 devotes extra compute to maintaining a coherent throughline over significant outputs. So while

Starting point is 00:16:00 O1 Pro and O3 both have the same 200K token context window, O1 is designed to output a much bigger set of tokens in a single go. So if you're talking about something that needs tens of thousands of words of output, for example, O1Pro might be a consideration. These projects are going to be fewer and farther between, which is of course why OpenAI only gives even enterprises five queries per user per month, but that's sort of the idea. Now, for our purposes, the other model besides 4O and 4.5 that you're like to do. to use most often is O3. OpenAI's current state-of-the-art, at least in full version, reasoning model.

Starting point is 00:16:37 OpenAI says that O3 is good for complex, multi-step tasks. Basically, this is a generalist reasoning model. So O3 is going to be good for things like strategic planning, detailed analyses, extensive coding, advanced math, etc. The example prompts they give are developing a risk analysis for market expansion, drafting a business strategy outlined based on competitive data, running a multi-step analysis on a CSV file, forecasting the next quarter and plotting the trend or reviewing pipeline metrics and searching for new top-of-funnel strategies. A solopreneur might use OpenAI then for something like building an investor-ready financial model. SMEs might use it to run a supply chain simulation weighing sourcing options, tariffs, and

Starting point is 00:17:17 currency risk for their next product line. Basically, if 40 is the generalist workhorse, O3 is the more advanced knowledge work workhorse. O3 is now actually the model that I spend the most time with and has completely revolutionized my interaction with ChatGPT. Before I was a very frequent user. Obviously, there are lots of summarization things that even the base model like 4O just speeds up and makes better. And when it comes to writing, there are certain types of documents that I care little enough about that I'll go with the 4.5 version. But 03 is the first time that I've actually found chat GPT to be capable of robust enough strategic thinking that I can use it as a real thought

Starting point is 00:17:55 partner. One more note on O3, if you are using the deep research tool, that is by definition, O3. So the way that OpenAI has that set up is that deep research takes advantage of O3's reasoning and planning capabilities to be able to take your research assignment, strategize on how to do it, go out and find all the sources, and then ultimately consolidate them into whatever type of output you're looking for. I will also say that this is the one area where I have sometimes found 03's writing to be better than 4.5. Specifically, when I have compared the process of giving 4.5 the text from a handful of articles and asked them to write a short summarization essay from it, versus asking deep research to go create a research dossier about the same topic,

Starting point is 00:18:42 referencing a couple of the same articles, but also letting it go find whatever it's going to find, and then using that dossier to summarize and write a short article, I have in many cases found that the 03 output is better than the 4.5. Perhaps just because it has better information to draw from because of the deep research process. But that's one thing to consider if you are using it for that sort of output. In any case, given how valuable deep research is, I wanted to make sure it was clear that that is, in fact, 03. And so let's try to wrap up by honing in on the three models that I think you're going to use

Starting point is 00:19:13 most and what I think you're going to use them for. This is your sort of cheat sheet if you just want to do this fast. If you have a boring use case, something like meeting summarization, that's almost certainly going to be inside 4-0. OpenAI calls it everyday tasks, but you know what when you see it. This is stuff that's low stakes but takes time and you just want off your plate so you can be focused on other parts of your work. Anything that falls into that category is likely going to be a fit for 4-0. 4-5 on the other hand, just like OpenAI says it is for creative tasks, specifically, I think, when the output is writing. So if you are doing any sort of thought leadership

Starting point is 00:19:52 supported by AI, you're absolutely most definitely going to want to use GPT 4.5, not GPT40. It just does a much better job. And there are a lot of other types of business use cases where the quality of the words that are output also matter as well. Again, meeting summarization, who cares? You're just trying to make sure that the main ideas are there. But when you are, for example, writing HR documents, even if you're not trying to be creative in a traditional way, the quality of the output really matters. It's going to interact with human emotions, and that's what 4.5 is going to be good for. And then when it comes to anything that involves strategic thinking, brainstorming, planning, in general, your reasoning models are just going to, by nature, do better with that.

Starting point is 00:20:31 And I would go even a step farther and say that 03 is really the first model that I've seen that is extremely competent at these sort of use cases. there's a growing conventional wisdom that people who view AI as a collaborator rather than just a tool are finding themselves using it more effectively. I think that that's true, and I think that O3 actually makes that viable. Now, the one other thing that I want to say about O3 is that in addition to just being a better thought partner, because it's actually thinking and reasoning in a different way than 40 and 4.5, it also has more coherent structured outputs. If you've used 03, you've probably noticed that it uses a lot more charts and other sort of visual

Starting point is 00:21:12 hierarchies that simply communicate ideas more quickly. One interesting question that has come up recently is if I have a big report where I want the structured outputs of 03, but the creative and writing quality of 4.5, which do I use? For the specific use case that I was just talking with someone about on that front, the answer that ended up making sense for them was to use O3 for the overall, because ultimately what mattered most was the structure of the information that was coming out of the report, not just the poetry of it. They rewrote an intro section using 4.5 to capture a little bit of that creative juice as well. This is, of course, a bit of a moving target. It's going to continue to change as these models evolve. And of course, like I said,

Starting point is 00:21:51 I haven't even gotten into Claude and Grock and Gemini and all the other options. But by and large, this is how I'm using Open AIs models and how I'm seeing them work for other companies as well. That, however, is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - What to Use Different AI Models For

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.