The AI Daily Brief: Artificial Intelligence News and Analysis - What to Use Different AI Models For
Episode Date: May 14, 2025If you’ve ever stared at OpenAI’s model selector and thought, “Which one am I supposed to use?”, this episode breaks it all down. We go through when to use GPT-4o, GPT-4.5, o3, o4-mini, o4-min...i-high, and o1 pro, all based on real world business scenarios. Get Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Vertice Labs - Check out http://verticelabs.io/ - the AI-native digital consulting firm specializing in product development and AI agents for small to medium-sized businesses.The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
Today on the AI Daily Brief, when to use different AI models and what to use them for.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Thanks to today's sponsors, blitzie.com for T-slabs and Super Intelligent.
And to get an ad-free version of the show, go to patreon.com slash AI Daily Brief.
Welcome back to the AI Daily Brief.
Today was a slightly slow news day.
It doesn't happen very often in this space.
And so I decided to use it for a show that I've been thinking about for a little while.
One of the big challenges for people with LLMs is which models to select.
If you go into chat GBT, for example, up in the top left-hand corner, there's a model selector
where you can choose GPT4O, GPT4.5, O3, 04 Mini, 04 Mini, or even O1 Pro mode.
Now, these terms obviously mean nothing to most people.
And as much as OpenAI gives little tiny guidance here, for example, it says GPT40, great
for most tasks, GPT 4.5, good for writing and exploring ideas.
these aren't really sufficient to help people understand what different use cases are suited to
different types of models. It doesn't take much searching to find examples of people's confusion here.
For example, on ex-Edwardo Borges writes, I follow AI improvements closely. I'm familiar with most
models like Claude, Mistral, Lama, GBT, Gemini, Grok, etc. And I currently have no idea which
models to use on OpenAI. It feels like they're pranking us. I tried asking ChatGBT for an answer
and it got even more complicated. Now, I think most people default to something
really basic, like Shil Monard here, who writes,
I use 4090% of the time and 0.310% of the time.
Do people use the other ones?
And while these questions may seem small, the reality is,
use of these tools is becoming completely endemic in a professional setting.
ChatTT recently peaked at over 800 million weekly active users.
In the recent KPMG Pulse survey, we saw the number of people who are using tools like
ChatGPT daily jump from 22 to 58%.
Point is, these tools are becoming a key part of our workstream,
and yet we still don't know exactly which use cases belong with what models.
And that's why I was very excited to see about a week ago, OpenAI published a post on their
Help Center about exactly this. It was specifically aimed at Enterprise and was called when to use
each model. So what we're going to do today is go through how they frame it, look at some sample
use cases for each of those models, both generally speaking as well as organized by specific
categories of business users, from Solopreneurs to SMEs to mid-market companies to enterprises,
and then we're going to do a little bit of a summary and my sort of crib notes for how much
you really have to care about all these different things.
A couple quick caveats before we dig into this model by model assessment.
First of all, I'm basing this off of OpenAI's recent post, and so I'm only focusing on the
different OpenAI models.
This is not an argument that you shouldn't care about Claude or Gemini or GROC.
And to the extent that people find it valuable, I'll happily do an episode about when I
use those different models for different purposes.
but for today I'm just going to focus on OpenAI Suite.
The second thing to note is that I'm coming at this from an individual user perspective
rather than from the perspective of developers who are building software for an enterprise.
So the considerations are likely going to be different as developers think about how to wire
together different systems to optimize for both cost and performance.
We also aren't going to get that much into what models to use for different types of
agentic workflows.
Again, just for the purposes of this show, we're really going to be focused on that individual
employee kind of use case. So specifically how you might use these things as an individual.
Now let's start with the daily workhorse, GBT40. And daily use is exactly what OpenAI describes as
this being good for. They write that GPT4O excels at everyday tasks, brainstorming, summarizing,
emails, creative content, fully multimodal supports almost all capabilities, GPTs, data analysis,
search, image generation, canvas, advanced voice, and inputs, documents, images, CSV files, audio, and video.
Basically, this is the default model.
It's the one that you're going to use day in and day out for your most common tasks.
The example prompts that OpenAI provides for GPT40 include summarizing meeting notes into key action items,
drafting a follow-up email after a project kickoff, proofreading a report, and brainstorming a launch plan in real time.
Now, as we'll see, I actually do not agree with the last one, brainstorming a launch plan.
At this point, I think that pretty much all brainstorming, anything having to do with strategy or planning,
should be moved over to 03, but we'll get into that in a minute. On those other example prompts,
summarizing meeting notes, drafting follow-up emails, proofreading reports, that is exactly what
GPT-40's bread and butter is. Now, in addition to kind of generalist use cases, like the ones just
mentioned, the other thing to note about 4-0 is its multimodal capabilities. So, for example,
if you are out in the world and you want to take a photo of something as a potential input,
but GPT40 is going to be your model. Likewise, bringing it back to an enterprise use case,
let's say that you've maybe done some chicken scratch drawings of UI for a new website or an
application you're designing, and you want to translate that into something more. Again, that's
going to be for GPT40. So here the two operative words are generalist and multimodal.
So thinking about this from the standpoint of different types of users, as an individual, you're going
to do things like feeding it transcripts of my podcast to get summaries that can be shared as part of
show notes. And when it comes to, you're going to do things. And when it comes to,
employees at SMEs, mid-markets, or enterprises, honestly, a lot of the use cases look very similar.
It's going to be things like ingesting call recordings and slides and creating tailored
follow-up decks for prospects, basic marketing use cases. It's going to be things like
creating standard operating procedure documents for internal knowledge management. Ultimately,
4-0 is your workhorse model for a lot of day-in, day-out knowledge work. Then we get over to GPT
4.5. And I think what's complicated about the naming convention here is that in this
case, 4.5 doesn't just mean strictly better. It means better at certain things. And the specific
things it's better for, in short, are creative writing tasks. OpenAI says that GPT4.5 is ideal for
creative tasks. Emotional intelligence, clear communication, creativity, and a more collaborative, intuitive
approach to brainstorming. The example prompts they give, create an engaging LinkedIn post about
AI trends, write a product description for a new feature launch, develop a customer apology letter
with an empathetic tone. So the way that I think about this,
is that effectively any time I need writing to be outward-facing and good, rather than just completely
perfunctory, I'm turning to GPT 4.5. If I am ever doing things like coming up with a set of different
possible names for a blog post or an article, I'm using 4.5. If I'm actually having it try to write
in a particular style, once again, 4.5. I think 4.5 should be the default for marketers who
are using it for any sort of external-facing copy, be that email copy, but especially social media,
a post or any longer form article. In fact, pretty much the only writing that I don't have GPT
4.5 do and let 4O handle is stuff that is just complete wrote summarization where the quality
of the words doesn't so much matter. All that matters is it capturing the key ideas. So, for example,
taking a transcript of a podcast and doing a summary, does a fine enough job with that. There's no reason
not to use 4.5 necessarily. It's just kind of more creative horsepower than that particular
task needs. So some ways that different types of companies might use this,
So-lopreneurs are really anyone who's using it as an individual.
If they're trying to do any sort of thought leadership writing, that's going to be a task for
4.5.
For SMEs, 4.5 is going to be really good at tasks that require empathy.
So, for example, generating empathetic HR templates.
Think performance review feedback.
New hiring notes.
Any of that sort of communication 4.5 is going to do well with.
When you get into the mid-market and enterprise, especially for companies that have global
CX teams, 4.5 could be really good at things like taking brand voice guidelines,
and drafting localization ready customer service macros. Again, what 4.5 is good for is external-facing
writing where the quality of the words actually matters. Today's episode is brought to you by
Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context, which,
if you don't know exactly what that means yet, do not worry we're going to explain, and it's awesome.
So Blitzy is used alongside your favorite coding co-pilot as your batch software development platform
for the enterprise, and it's meant for those who are seeking dramatic,
development acceleration on large-scale codebases.
Traditional copilots help developers with line-by-line completions and snippets, but Blitzy works
ahead of the IDE, first documenting your entire code base, then deploying more than 3,000
coordinated AI agents working in parallel to batch build millions of lines of high-quality code
for large-scale software projects. So then, whether it's code-based refactors, modernizations,
or bulk development of your product roadmap, the whole idea of Blitzy is to provide
enterprise's dramatic velocity improvement. To put it in simpler terms, for every line of code
eventually provided to the human engineering team, Blitsey will have written it hundreds of times,
validating the output with different agents to get the highest quality code to the enterprise and batch.
Projects then that would normally require dozens of developers working for months can now be
completed with a fraction of the team in weeks, empowering organizations to dramatically shorten
development cycles and bring products to market faster than ever. If your enterprise is looking to
accelerate software development, whether it's large-scale modernization, refactoring, or just increasing
the rate of your STLC, contact Blitsey at blitzie.com, that's B-L-I-T-Z-Y.com, to book a custom demo,
or just press get started and start using the product right away.
Today's episode is brought to you by Vertice Labs, the AI Native Digital Consulting
firm specializing in product development and AI agents for small to medium-sized businesses.
Now, guys, this is a market that we have seen so much interest for, so much demand for,
and many times great AI dev shops and builders out there just have so,
much business from the high end of the mid-market and big enterprises that this is a group of buyers
that gets neglected. Now, for Vertice, AI Native means that they don't just build AI, they use
it in every step of their process. They embed agents in their workflows so that they better
know how to help you embed agents in your workflows. And indeed, what they specialize in is building
AI agents and agentic workflows that augment knowledge work, from customer support to internal
ops, so that your team can focus on higher value work. Vertice wants to ensure that this is not just
another co-pilot, but something that works end-to-end, translating business problems into
working software in weeks, not quarters. They have found that their clients typically see a 60%
reduction in time and cost, with significantly higher output than traditional technology partners.
So if you are a founder, a CTO, a business leader, or you've just got a product idea to launch,
check out for tislabs.io. That's v-E-R-T-I-E-Labs.io.
Today's episode is brought to you by Superintelligent.
Now, you have heard me talk about agent readiness audits probably numerous times at this point.
This is our system that uses voice agents and a hybrid human AI analysis process to benchmark
your agent readiness and map your agent opportunities and give you some really pointed,
actionable next steps to move further down the path in your agentic journey.
But we're coming up on the slow time of the year, and if you want to use this time to get
out ahead of peers and competitors, we're excited to announce something we're calling Agent Summer.
The idea here isn't that complicated.
It's basically just an accelerated program to get you agentified and fast.
First of all, it's going to include an agent readiness audit, figuring out where your biggest
agent opportunities are.
Next, we're going to support both your internal change management process, helping you figure
out AI policy, data readiness, things like that, as well as doing action planning around
the agent opportunities that are most relevant for you.
And finally, we're going to connect you to the right vendors to actually go and deliver this.
Now, for this, we want to work with a very small handful of companies that really want to move.
we're going to be bundling more than $50,000 of services for something that starts closer to $30,000.
And so if you want to use this summer to jump ahead on your company's agent journey,
email agent at besuper.a.i with summer in the subject line, claim one of these limited spots,
and let's go have an agent summer.
Now let's actually jump ahead to a couple of models that you're probably not using all that much,
at least as an individual.
First up, we've got 04 Mini.
And again, one of the challenges with OpenAI's naming conventions is that you hear
04 and you assume it must be better than 03, right?
Well, it's very clear that 04 Mini and 04 Mini High were planted by OpenAI for a very specific set of tasks that really do slant technically.
OpenAI says that 04 Mini is good for fast technical tasks, quick STEM-related queries, programming, and visual reasoning.
The example prompts they give are extracting key data points from a CSV file, providing a quick summary of a scientific article, or quick fix a Python traceback.
O4 Mini High is the same domain except where you need more detail rather than speed.
So they say this is for detailed technical tasks such as advanced coding, math, and scientific
explorations.
O4 Mini High is programmed to think longer for higher accuracy.
So the example prompts they give are solving a complex math equation and explaining the
steps, drafting SQL queries for data extraction, or explaining a scientific concept in layman's
terms.
So how might this manifest inside of different types of companies?
Well, a solopreneur who's managing all of their own processes, including their own tech,
might use a model like 04 Mini to help fix bugs as a sort of fix my site helper.
For example, spotting issues with a WordPress CSS glitch or something like that.
An SME might use O4 Mini as a sort of IT help desk assistant.
A mid-market might have their data ops team use it to churn out ad hoc Python ETL scripts,
and Enterprise might use it to power a continuous code review bot to flag security issues
across thousands of small daily pull requests.
For companies, O4 Mini is designed for a lot of usage.
In OpenAI's enterprise plan, it has 300 requests per day, as opposed to, for example, 04 Mini High,
which has 100 requests per day, or 03, which has 100 requests per week.
I think it's fairly safe to say, though, that if your role isn't particularly technical,
you're likely not going to be using 04 Mini very much.
So for all intents and purposes, you can kind of ignore it and its partner, 04 Mini High,
which is going to index even more specific on the technical role complexity,
being really something that, for example, data scientists are going to use.
Likewise, let's just touch briefly on 01 Pro mode.
It is included in this list because it's available as part of their enterprise plan,
and OpenAI says that it is for complex reasoning.
So, for example, drafting a detailed risk analysis memo for an EU data privacy rollout,
generating a multi-page research summary on emerging technologies,
creating an algorithm for financial forecasting using theoretical models.
It's a very small number of requests that enterprises get per month for 01.
Pro. And for individuals, it's not even included in the main model selector. You have to go click
more models, and it's framed as a legacy reasoning expert. Now, it's not impossible that there
might be some use cases where 01 Pro mode is useful for a particular type of output, such as a long
form high stakes document. So for an SME, things like drafting a long ISO-27-001 compliance handbook,
for the mid-market producing a deep patent landscape review for a potential acquisition, or for an
enterprise, developing some super extensive impact assessment that has to cite specific rules and
regulations from different jurisdictions. Maybe the most germane question is when to use O1 Pro
as opposed to O3. They're both reasoning models. O3 is theoretically a more advanced reasoning model,
so where would you want to use the legacy O1 Pro mode? And the short answer is O1 Pro mode is
optimized for work that's really long, or where accuracy is really important. O1 is extremely
slow. And the whole idea of it is that it sacrifices speed for a more exhaustive internal reasoning
pass, meaning that it's tuned for accuracy and depth. So for things like regulatory filings,
safety critical engineering reviews, litigation briefs, risk assessments, anything where accuracy
really matters, that's the time to consider O1 Pro. Now, the other side is that O1
devotes extra compute to maintaining a coherent throughline over significant outputs. So while
O1 Pro and O3 both have the same 200K token context window, O1 is designed to output a much bigger set
of tokens in a single go.
So if you're talking about something that needs tens of thousands of words of output, for
example, O1Pro might be a consideration.
These projects are going to be fewer and farther between, which is of course why OpenAI
only gives even enterprises five queries per user per month, but that's sort of the idea.
Now, for our purposes, the other model besides 4O and 4.5 that you're like to do.
to use most often is O3. OpenAI's current state-of-the-art, at least in full version, reasoning model.
OpenAI says that O3 is good for complex, multi-step tasks. Basically, this is a generalist reasoning
model. So O3 is going to be good for things like strategic planning, detailed analyses,
extensive coding, advanced math, etc. The example prompts they give are developing a risk
analysis for market expansion, drafting a business strategy outlined based on competitive data,
running a multi-step analysis on a CSV file, forecasting the next quarter and plotting the
trend or reviewing pipeline metrics and searching for new top-of-funnel strategies.
A solopreneur might use OpenAI then for something like building an investor-ready financial model.
SMEs might use it to run a supply chain simulation weighing sourcing options, tariffs, and
currency risk for their next product line.
Basically, if 40 is the generalist workhorse, O3 is the more advanced knowledge work
workhorse. O3 is now actually the model that I spend the most time with and has completely
revolutionized my interaction with ChatGPT. Before I was a very frequent user. Obviously, there are
lots of summarization things that even the base model like 4O just speeds up and makes better.
And when it comes to writing, there are certain types of documents that I care little enough
about that I'll go with the 4.5 version. But 03 is the first time that I've actually found
chat GPT to be capable of robust enough strategic thinking that I can use it as a real thought
partner. One more note on O3, if you are using the deep research tool, that is by definition,
O3. So the way that OpenAI has that set up is that deep research takes advantage of O3's reasoning
and planning capabilities to be able to take your research assignment, strategize on how to do it,
go out and find all the sources, and then ultimately consolidate them into whatever type of output
you're looking for. I will also say that this is the one area where I have sometimes found
03's writing to be better than 4.5. Specifically, when I have compared the process of giving
4.5 the text from a handful of articles and asked them to write a short summarization essay from it,
versus asking deep research to go create a research dossier about the same topic,
referencing a couple of the same articles, but also letting it go find whatever it's going to
find, and then using that dossier to summarize and write a short article,
I have in many cases found that the 03 output is better than the 4.5.
Perhaps just because it has better information to draw from because of the deep research process.
But that's one thing to consider if you are using it for that sort of output.
In any case, given how valuable deep research is, I wanted to make sure it was clear that that is,
in fact, 03.
And so let's try to wrap up by honing in on the three models that I think you're going to use
most and what I think you're going to use them for.
This is your sort of cheat sheet if you just want to do this fast.
If you have a boring use case, something like meeting summarization, that's almost certainly
going to be inside 4-0. OpenAI calls it everyday tasks, but you know what when you see it.
This is stuff that's low stakes but takes time and you just want off your plate so you can be
focused on other parts of your work. Anything that falls into that category is likely going to be a
fit for 4-0. 4-5 on the other hand, just like OpenAI says it is for creative tasks,
specifically, I think, when the output is writing. So if you are doing any sort of thought leadership
supported by AI, you're absolutely most definitely going to want to use GPT 4.5, not GPT40. It just does a
much better job. And there are a lot of other types of business use cases where the quality of the
words that are output also matter as well. Again, meeting summarization, who cares? You're just
trying to make sure that the main ideas are there. But when you are, for example, writing HR documents,
even if you're not trying to be creative in a traditional way, the quality of the output really
matters. It's going to interact with human emotions, and that's what 4.5 is going to be good for.
And then when it comes to anything that involves strategic thinking, brainstorming, planning,
in general, your reasoning models are just going to, by nature, do better with that.
And I would go even a step farther and say that 03 is really the first model that I've seen
that is extremely competent at these sort of use cases.
there's a growing conventional wisdom that people who view AI as a collaborator rather than just a tool
are finding themselves using it more effectively. I think that that's true, and I think that O3
actually makes that viable. Now, the one other thing that I want to say about O3 is that in
addition to just being a better thought partner, because it's actually thinking and reasoning
in a different way than 40 and 4.5, it also has more coherent structured outputs. If you've
used 03, you've probably noticed that it uses a lot more charts and other sort of visual
hierarchies that simply communicate ideas more quickly. One interesting question that has come up
recently is if I have a big report where I want the structured outputs of 03, but the
creative and writing quality of 4.5, which do I use? For the specific use case that I was just
talking with someone about on that front, the answer that ended up making sense for them was to
use O3 for the overall, because ultimately what mattered most was the structure of the information
that was coming out of the report, not just the poetry of it. They rewrote an intro section using
4.5 to capture a little bit of that creative juice as well. This is, of course, a bit of a moving
target. It's going to continue to change as these models evolve. And of course, like I said,
I haven't even gotten into Claude and Grock and Gemini and all the other options. But by and large,
this is how I'm using Open AIs models and how I'm seeing them work for other companies as well.
That, however, is going to do it for today's AI Daily Brief.
Appreciate you listening or watching as always, and until next time, peace.
