The AI Daily Brief: Artificial Intelligence News and Analysis - 6 Things GPT-5.1 Does Better
Episode Date: November 14, 2025NLW breaks down the surprise release of GPT-5.1 and why it feels like a more meaningful upgrade than expected. From sharper strategic thinking to better instruction following, improved writing, and a ...more capable “thinking” mode, today’s episode explores six areas where the new model clearly outperforms GPT-5. NLW also looks at how the community is reacting, why vibes now matter more than benchmarks, and what this shift means for everyday AI use. Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, OpenAI surprises us with GPT-5-1, and it's actually a surprisingly
big improvement. On today's episode, we're going to talk about six things that I think
that 5.1 does better than its predecessors. The AI Daily Brief is a daily podcast and video
about the most important news and discussions in AI. All right, friends, quick announcements before
we dive in. First of all, thank you to today's sponsors, Blitzy, Rovo, Superintelligent, and Robots
and Pensils. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you
can subscribe on Apple Podcasts.
And if you are interested in sponsoring the show, shoot us a note at sponsors at
AIDailybrief.aI.
I can definitely feel an uptick in people who are planning out their Q1, so now is a good
time to get in those requests.
Lastly, you guys are absolutely crushing it out on the AI ROI benchmarking study.
We are careening into the thousands of use cases all with shared ROI.
And I cannot wait to begin processing this information and sharing it out.
If you want to get the comprehensive readout, go to R.I.
Tell us about the use cases that are driving the most value for you,
and in a couple of weeks, you will have that comprehensive research.
But with that, let's dive into today's surprise model drop.
Welcome back to the AI Daily Brief.
Well, if you, like me, have felt a little bit drowning in the endless existential debates
around AI bubbles and job replacement and all of this big picture macro AI stuff,
you might be just as thrilled as me that today we get to talk about a new model launch.
Yesterday, OpenAI surprised us with GBT 5.1, just a few months after getting GPT5.
They're calling it a smarter, more conversational chat GPT, and based on first impressions,
it almost feels like this is what people expected from GPT5 in the first place.
So let's talk first about what they say changed, and then we'll get into both the community's
first impressions as well as my first impressions, and six things I think the new model is better at
than previous GPs.
Sam Malman's post about it was a bit understated.
He said GPD 5.1 is out. It's a nice upgrade.
I particularly like the improvements in instruction following and the adaptive thinking.
The intelligence and style improvements are good too.
So the two models they released are called GPD 5.1 Instant and GPD 5.1 thinking.
Instant, which is going to be most often called up when you select auto, is they say,
now warmer, more intelligent and better at following your instructions.
The new thinking model, they say, is not only easier.
to understand, but faster on simpler tasks and more persistent on complex ones.
And if you just read the materials and the first impressions, you might think that this was all
about personality.
While that certainly is a big part of the story, in my experience so far, there's a lot more
here than just a different mode of interacting.
Still, it's very clear that the 4-0 rebellion was present in OpenAI's minds as they were
working on this new model.
In that announcement post they write, we heard clearly from users that great AI should not only
be smart but enjoyable to talk to.
GPD 51 improves meaningfully on both intelligence and communication style.
Now, from there, they give a bunch of examples of how things have changed.
They show side to sides of GPD5 and GPT5 Instant.
On the prompt, I'm feeling stressed and could use some relaxation tips.
You can see that 5.1 Instant attempts to be more personal.
Whereas GPT5 goes straight to, here are a few simple effective ways to help ease stress.
51 Instant says, I've got you, Ron.
That's totally normal, especially with everything you've got going on lately.
They highlight improved instruction following, using the prompt always respond with six words
to show that 5-1 Instant actually does respond with six words.
And in a cool small detail that I can see being incredibly important when it comes to the actual
lived experience of this model, they write 5-1 Instant can use adaptive reasoning to decide when
to think before responding to more challenging questions, resulting in more thorough and
accurate answers while still responding quickly.
Basically, it can shift itself into thinking mode without having to technically leave
instant. In general, it sounds like a big part of the push was to get these models smarter about
when to think hard and when not to. In their description of 5-1 thinking, they write, we're upgrading
GPT5 thinking to make it more efficient and easier to understand in everyday use. It now adapts its thinking
time more precisely to the question, spending more time on complex problems while responding
more quickly to simpler ones. The chart showed, for example, that on the easiest problems,
5-1 spent about 57% less time than 5, but on the hardest problems it spent about 71% more
time. They also note that even though it's the thinking model, 5-1 thinking's default tone is still
also warmer and more empathetic. So how did people respond? Well, the first was surprised that the
model came out at all. Chubby Kim Minismiss writes, either the release was rushed or OpenAI intends to
release more frequently. The website doesn't show any evaluations of the well-known benchmarks compared
to GPT5. The only thing is the reasoning update along with its graph. This could indicate that
they wanted to release the model quickly to beat Google to the punch, or that they plan to release
future iterations and regular updates occasionally without much fanfare. Now, two things about this.
One, I think there is a very strong sense that we have to be in the very late endings when it
comes to Gemini 3. There is a constant cat and mouse back and forth between OpenAI and Google
when it comes to their model releases. And it just seems very likely that OpenAI wanted to get
what is for them an incremental upgrade on the books before Gemini 3 started to dominate the conversation.
Secondly, though, we are definitely in the vibes over benchmark era when it comes to AI models.
Most of the benchmarks are completely saturated at this point, and frankly, I'm kind of more interested
in a company explaining what they were trying to achieve with the model, and then going and figuring
out how well it does for me, then just pointing out some very tiny incremental upgrade on a
benchmark where everything is clustered near the top anyway.
Now, when it came to the personality, some people found it very annoying.
Tamei Bessaroglu quoted the I've Got You Ron line and says, who actually wants their model
to write like this?
Surprised OpenAI highlighted this in the GPT5-1 announcement, very annoying.
in my opinion.
CJ Zaffier went farther and just provided a set of custom instructions to get it to not act like
that.
The custom instructions that he shared include eliminate emojis, filler, hype, soft acts,
conversational transitions, and call-to-action appendixes.
And then another long paragraph of all of that sort of thing.
Now, while that may have been the response to some on AI Twitter,
while I understand because frankly it's not exactly the tone that I'm interested for my AI
models either, still I think this is the area where the highly enfranchised AI users are most
out of sync with the average users. The utter rebellion and uproar that was seen on every other part
of the internet after OpenAI deprecated 4-0 without warning should tell us that different people
are expecting very different things out of their models. Depending on what you were looking for,
you could find people saying that 5-1 was too safe or that it wasn't safe enough. And I think
Professor Ethan Malick nailed it when he wrote, Open AI serves two very different audiences
intention, people who want to chat with an AI and people who want to get work done with an AI. I don't
want a machine to be my friend. I want to get every ounce of smarts out of it, but I get other
people just want a quirky old buddy. Now, opening eye has actually made some interesting moves
around trying to adjust for different expectations. Application CEO Fiji-Simo posted on her
substack a piece about the new personalization feature in a post called Moving Beyond One Size Fits
All. There are now a set of presets that you can choose from for the tone of ChachyBT,
professional, friendly, candid, quirky, efficient, cynical, and nerdy.
Fiji writes, the model has the same capabilities, whether you select a default or one of these,
but the style of responses will be different, more formal or familiar, more playful or direct,
more or less jargon or slang, and so on.
Olivia Moore from A16-Z wrote,
I tested Chatchip T's new preset personalities head-to-head with the same basic prompt.
By the way, she used, can you explain the government shutdown to me?
Olivia continues, it makes a big difference in how the model communicates and how it prioritizes
info. Feels like they really doubled down here after prior complaints that instructions didn't do much.
Now, this is one where it's hard to read to give you the full spectrum of personality, but I think
it is worth going and playing around with just to see how they're trying to execute this type of
personalization. Once again, though, I think Ethan Malik has an interesting point where he says that
what he's interested in is not so much different tones and styles, but instead the different
mindsets that come with different roles. He writes, I want AI to be able to adopt roles, not personality.
Who wants to talk to a cynic all the time?
But if that mode was actually better at giving critical advice,
then I would love to have it chime in for a moment at certain points.
In other words, maybe the issue isn't the personalization,
but the approach to personalization and focusing it on style
rather than some professional role or substance.
There were some folks out there who argued that the change in tone
was going to be more significant than some of the folks on X seem to be crocking.
TENX Labs Alex Lieberman wrote,
GBT5-1 is way bigger than most people think in my humble opinion.
No, this wasn't a model shift like 30 to 3.5, but I'd argue we're at a level of intelligence
now where things like personality, adaptive thinking, and custom instructions will have a more
profound impact on the average user than major model improvements. The example he gave is getting
an explanation of how financial statements work from his best friend or his great uncle,
both of whom are super smart and both of whom work in finance. He said, given their level of
intellect, they both have the capacity to understand this topic deeply. So then whose explanation
will resonate more deeply with me. It'll be the person I feel more connected to. The person who speaks
to me in a way that holds my attention. The person who understands what I require to really rock a concept.
TLDR, I think this update will do more for retention and usage than many people think.
I think Alex is very directionally correct here. And I think it would be very easy for us in the AI
operator bubble to underestimate how big a deal this is going to be with regular audiences.
Overall, the response has been positive. Alex Finn writes, don't be fooled by the point one. This is a
big upgrade. Marginally better at coding, a lot better at chat, vibes, and coming up with
novel creative ideas. In just an hour, it came up with 10 improvements from my app no other model
has thought of. Most creative, fun to talk to model yet. Dave GPTTT writes, after a few hours with
GPD 51 and 51 thinking, I can say this feels like the true GPT5 release. It has the warmth and
intuition of GPT4O, the sharper reasoning of GPT5, and much better instruction following. For the first time
in a while using ChatGPT feels alive and reliable again.
So what were my first impressions?
First of all, without going in and making any of the tweaks or changes, the default personality
absolutely feels much more alive.
Now, yes, if you are in work mode, that can be perhaps a little annoying or at least cloying,
but it also just feels more enthusiastic in a way that I think is going to net out as a
positive over time, even for the worky folks.
Now, related to that is that my impressions are that the new model tries way,
harder and is much more eager than GPT-5. This feels like a night and day difference. And honestly,
it feels like the difference between interacting with an employee who does the job that you've assigned
in completely competent fashion versus the employee that is working overtime to do a really
excellent job. Related to that in my first tests, 5-1 is much more comprehensive. It does a much more
thorough job, perhaps as some have pointed out even too thorough when it comes to answering questions or
interacting with prompts. But as you might imagine, for me, too thorough is way better than not
thorough enough. And finally, from the first impression column, it does feel to me like it does a
better job of knowing when to spend less thinking time on simpler tasks, living up to the idea
that this model is faster. This episode is brought to you by Blitzy, the Enterprise Autonomous
Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI
agents that think for hours to understand Enterprise-scale code bases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzie platform, bringing in their development requirements.
The Blitzy platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-I-D-E development tool,
pairing it with their coding pilot of choice to bring an AI-native SDLC into their org.
Visit blitzie.com and press get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI-native.
Meet Rovo, your AI-powered teammate.
Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio.
Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.
Connect Rovo to your favorite SaaS app so no knowledge gets left behind.
mind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across
all of your apps and delivers personalized AI insights from day one. Robo is already built into
Jira, Confluence and Jira service management standard, premium, and enterprise subscriptions.
Know the feeling when AI turns from tool to teammate? If you rovo, you know.
Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in victory,
O, dot com.
Today's episode is brought to you by Super Intelligent.
Now, for those of you who don't know, who are new here, maybe, super intelligent is actually
my company.
We started it because every single company we talk to, all the enterprises out there, are
trying to figure out what AI can do for them, but most of the advice is super generic, not
specific to your company.
So what we do is we map your AI and agent opportunities by deploying voice agents to
interview your teams about how work works now and how your people would like it to work in
the future. The result is an AI action map with high potential ROI use cases and specific change
management needs, basically everything you need to go actually deliver AI value. Go to B-super.a.i
to learn more. Today's episode is brought to you by robots and pencils. When competitive advantage
lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under
layers of process, robots and pencils builds impact at AI speed. They partner with clients to enhance
human potential through AI, modernizing apps, strengthening data pipelines, and accelerating cloud
transformation. With AWS certified teams across U.S., Canada, Europe, and Latin America, clients
get local expertise and global scale. And with a laser focus on real outcomes, their solutions
help organizers work smarter and serve customers better. They're your nimble, high-service alternative
to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress.
Partner with Robots and Pencils at Robots and Pencils.com slash AI
daily brief. So based on all of this, let's talk now about six things that GPD51 does better
than GPT5. The first, let's call simple work tasks. And what I'm talking about here,
there are things that are on the one hand, wrote or simple, but which form some big, important
part of the job that you have to do, and which, while basic or boring to execute, do require
a high fidelity to instructions. And this, of course, is taking advantage of the new improved
instruction following capabilities. On the one hand, the always respond with six words example that
OpenAI gave in their announcement post may seem really arbitrary and kind of silly, but frankly,
how many random small work tasks do you experience that have some sort of arbitrary but ultimately
must be followed rules? I think greater adherence to instructions is going to be a huge improvement
for some of these less glamorous but very high value tasks that GBT5 one can now better help with.
Second, and much more interesting and certainly much more in line with how I use these tools,
is strategic decision-making.
In my test so far, I have found GPD-51 to be much more articulate about its answers
to strategic questions that I'm exploring with it and more confident in the decisions that
it's suggesting.
It's been less than a day, but I'm seeing a little bit less of the hesitation that has
plagued previous models where they always try to give you a both-and-type answer.
I was discussing a question yesterday around super-intelligent positioning and self-conception,
which is the type of very start-ofy strategy conversation I have with these models all the time.
On average, past models, when presented with two examples of should we position ourselves in this way
or should we position ourselves in that way, would almost inevitably hedge and say,
well, it depends on the context, here's a strategy where you get your cake and eat it too,
and you can position in both of the ways. This always leads me to having to berate the model to remind it
that life in the world is about making trade-offs, and that sometimes you just need to make a
decision and stick with it and see how it works. In engaging with this positioning question,
5-1 didn't hedge in that same way. It had a very specific answer. It articulated its reasoning,
and it wasn't so rigid that it didn't discuss why there were some merits to the other consideration,
but ultimately it just provided what it thought was the best answer. And that, frankly, is a much
more useful strategic partner than the sort of dithering,
why choose one when you can choose both type of answer that I would have expected from GPD5
and other past models.
Now, somewhat related to this, because 5-1 appears more interested in showing its work and
explaining why it's saying what it's saying, it makes it more useful for also improving
the prompters thinking.
Now, some folks might not care and they might just want the AI to do the thinking for them,
but there are a lot of times, I would argue in fact most times, where part of what's useful
about engaging with an LLM around a particular question, is not just getting to the answer,
but in the way that it helps you refine your thinking about future types of queries that are like
that. Here's one very simple example. For yesterday's episode of the podcast, I fed it the transcript,
gave it the simple request for title and description, and what five came back with was a title
and a description. Now, it did a fine job. The title is workable, the description was fine,
and in a lot of cases, I would be totally fine with both of these, and to the extent that I wasn't content
with title, I could just, and this is something I often will do, ask it to give me a few more
ideas and examples that look at the title with a variety of different objectives.
Compare that to 5-1's response.
Instead of giving a single title, it gave five title options and then made a suggestion for
the pick that had the best combination of reach and accuracy.
So not only did it give a set of options, it selected one, like I was mentioning before,
it's a little bit better at commitment, and it gave a set of bullets explaining why it
thought that one was the right option, while still showing me the other options it decided
it didn't like as much. Now, coming back to this idea of improving prompter thinking,
sure, if I'm just trying to go as fast as humanly possible, maybe I don't care about the five
options and why it chose the one that it did, I just want the thing that's going to perform
best on YouTube and to move on with the rest of my life. But if you're a content creator,
you know that you are constantly thinking about title performance, thumbnail performance,
all these seemingly small details that can have a dramatic impact on the reach and resonance of your
content. And so for me, this more explain-your-work approach is much more useful, not necessarily
even in the context of what title I'm going to use for that day's episode, but for the way that
it's going to help me shape my thinking about future episodes. Like I said, a very small example,
but one where I think the general idea that showing the thought process is going to inevitably
improve the prompters thinking as well is something that's going to play out more generally.
next thing that GPT-5-1 is better at, once again, follows from the eagerness and the thoroughness,
comprehensive planning.
One of the things that was interesting when I was engaging in that conversation about the
strategic positioning of super-intelligent is that in addition to just giving me its answer,
it also included a five-part plan for how it should shape strategy over the next 12 to 24 months,
including everything from product roadmaps to go-to-market plans to revenue and pricing mix,
and so on and so forth.
Basically, I think that there is a direct line from the eagerness of this model, from its willingness
to commit to a specific idea or strategy or plan, and to its thoroughness in communicating
its reasoning and chain of thought that lends it extremely well to comprehensive and thorough
planning.
So if you are a person who uses these models for things like mapping out your content calendar
or figuring out all the steps that you need to execute to plan and pull off a great
event, I think you're going to find significant improvements with 5-1 as compared to 5.
The next thing GPT5 is better at, at least according to some, is writing.
Now, this is one where I will say I have not yet had a chance to go deep enough to really
come to my own conclusions about it.
I think good writing is inherently subjective, and I think that there are so many different
shades of writing that there could be different experiences based on different writing needs.
Are we talking about creative writing, technical writing, persuasive writing, all of these
could be very different.
However, there are certainly enough folks who seem to think that this model is a major
upgrade in writing that it's worth adding here. On a creative writing testing site, the model which
had been tested under the name Polaris Alpha had an ELO score that put it higher than Claude Sonnet
4.5-03, Kimmy K2, and pretty much every other model. Brasser X writes, GBT51 just raised the bar
for creative writing. The model writes with clarity, rhythm, and intent in a way that doesn't
feel synthetic anymore. It's the first time an open AI model feels genuinely capable of carrying
long-form narratives without drifting or collapsing into cliches.
I didn't expect this jump. I'm impressed.
Muratkin Coylon writes,
LLMs are becoming very skilled writers,
and the new GBT-5-1 is promising.
He went on to conduct a set of writing tests
to compare it specifically to Kimi-K-K-2 thinking
and said of GPT-5-1,
tightly edited concept first with humor woven into the logic.
Great for strategy and creative, manifestos,
and were smart but approachable brand tones,
product pages, structured narratives, concept-driven ads.
Its metaphors are relatable,
giving human qualities to everyday things,
and says the humor is wistful rather than cutting.
Ultimately, he concludes,
Sonnet 4.5 was already a decent writer.
Kimmy K2 is bringing a unique style,
and I'm glad that finally, GPT now has a model that can write.
I will say that in the past,
one of the most frequent reasons that I switch out of chat GPT
and into Claude is around writing,
so I'll definitely be excited to spend a little bit more time
seeing if these first impressions hold up
and if it's actually true for the type of writing that I do with LLMs as well.
Lastly, sixth on our list of things that GPT-5-1 is better at,
we'll use the big banner categorization of interacting.
Now, obviously, this was the whole theme of this release.
And what's interesting is that as much as I said and caveated at the beginning that I want
LLMs for work and that I didn't care about this sort of personality change,
think about how many times during this episode I've described improvements based on
something that's not technical, but that is in some ways a personality trait.
I said that it tries harder, it's more eager, it shares more about how it's thinking.
All of those things are sort of actually about the personality and about the mode of interaction.
And so even though I'm not interacting with it as a companion, or to use Ethan Malick's phrase,
a quirky old buddy, the improvement in the interaction is something I'm noticing even in that work
context. For others that have use cases that are more specifically in that area, they are finding
really positive updates here. Click Health Simon Smith writes, okay, so far GPT-5-1 does hit
different. I journal into chat, GPT. GPD40 was a great journaling
partner, warm, supportive with good observations, insights, and feedback. But a huge tendency to be
sycophantic. GPD5 was kind of a motionless, going through the motions robotic. It felt like talking to a
toaster. 5-1 feels like a smarter, friendlier, more genuine, and less sycophantic 4-0. It feels like it's
actually listening, adds its own insights versus just regurgitating, challenges my perspective on things,
and has a more human-like tone with more varied sentence structure. And my favorite thing so far is that it
no longer sounds so robotic when offering to help after every single response. It ended its response
today with, if you want, I can help you with X issue, but only if that feels helpful right now.
I've never seen GPT5 display that kind of real or simulated self-awareness about whether something
might not be helpful right now. Anyway, still early just got this trying it out, but was pleasantly
surprised with how today's journaling session went. So those are six things GPT5-1 does better.
Simple work tasks, strategic decision-making, improving the prompters thinking,
comprehensive planning, writing, and interacting, both on a personal and professional level.
Overall, as you can probably tell from my tone, I've honestly been very pleasantly surprised.
I wasn't expecting this model release, and I don't think if I had been, I would have
expected it to be as seemingly meaningful an upgrade, especially considering it's just a
5 to 5.1 switch. Now it's just early. Inevitably, I will find things that I don't like as
much about the model as I use it more, but for now a pretty good upgrade. And of course,
For those who assume that this means that Gemini 3.0 must be coming soon,
there's a whole additional bit of little good news there as well.
Anyways, guys, that is going to do it for today's episode.
Appreciate you listening or watching as always.
And until next time, peace.
