The AI Daily Brief: Artificial Intelligence News and Analysis - 6 Things GPT-5.1 Does Better

Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI surprises us with GPT-5-1, and it's actually a surprisingly big improvement. On today's episode, we're going to talk about six things that I think that 5.1 does better than its predecessors. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Blitzy, Rovo, Superintelligent, and Robots and Pensils. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. And if you are interested in sponsoring the show, shoot us a note at sponsors at

Starting point is 00:00:40 AIDailybrief.aI. I can definitely feel an uptick in people who are planning out their Q1, so now is a good time to get in those requests. Lastly, you guys are absolutely crushing it out on the AI ROI benchmarking study. We are careening into the thousands of use cases all with shared ROI. And I cannot wait to begin processing this information and sharing it out. If you want to get the comprehensive readout, go to R.I. Tell us about the use cases that are driving the most value for you,

Starting point is 00:01:08 and in a couple of weeks, you will have that comprehensive research. But with that, let's dive into today's surprise model drop. Welcome back to the AI Daily Brief. Well, if you, like me, have felt a little bit drowning in the endless existential debates around AI bubbles and job replacement and all of this big picture macro AI stuff, you might be just as thrilled as me that today we get to talk about a new model launch. Yesterday, OpenAI surprised us with GBT 5.1, just a few months after getting GPT5. They're calling it a smarter, more conversational chat GPT, and based on first impressions,

Starting point is 00:01:45 it almost feels like this is what people expected from GPT5 in the first place. So let's talk first about what they say changed, and then we'll get into both the community's first impressions as well as my first impressions, and six things I think the new model is better at than previous GPs. Sam Malman's post about it was a bit understated. He said GPD 5.1 is out. It's a nice upgrade. I particularly like the improvements in instruction following and the adaptive thinking. The intelligence and style improvements are good too.

Starting point is 00:02:15 So the two models they released are called GPD 5.1 Instant and GPD 5.1 thinking. Instant, which is going to be most often called up when you select auto, is they say, now warmer, more intelligent and better at following your instructions. The new thinking model, they say, is not only easier. to understand, but faster on simpler tasks and more persistent on complex ones. And if you just read the materials and the first impressions, you might think that this was all about personality. While that certainly is a big part of the story, in my experience so far, there's a lot more

Starting point is 00:02:45 here than just a different mode of interacting. Still, it's very clear that the 4-0 rebellion was present in OpenAI's minds as they were working on this new model. In that announcement post they write, we heard clearly from users that great AI should not only be smart but enjoyable to talk to. GPD 51 improves meaningfully on both intelligence and communication style. Now, from there, they give a bunch of examples of how things have changed. They show side to sides of GPD5 and GPT5 Instant.

Starting point is 00:03:12 On the prompt, I'm feeling stressed and could use some relaxation tips. You can see that 5.1 Instant attempts to be more personal. Whereas GPT5 goes straight to, here are a few simple effective ways to help ease stress. 51 Instant says, I've got you, Ron. That's totally normal, especially with everything you've got going on lately. They highlight improved instruction following, using the prompt always respond with six words to show that 5-1 Instant actually does respond with six words. And in a cool small detail that I can see being incredibly important when it comes to the actual

Starting point is 00:03:40 lived experience of this model, they write 5-1 Instant can use adaptive reasoning to decide when to think before responding to more challenging questions, resulting in more thorough and accurate answers while still responding quickly. Basically, it can shift itself into thinking mode without having to technically leave instant. In general, it sounds like a big part of the push was to get these models smarter about when to think hard and when not to. In their description of 5-1 thinking, they write, we're upgrading GPT5 thinking to make it more efficient and easier to understand in everyday use. It now adapts its thinking time more precisely to the question, spending more time on complex problems while responding

Starting point is 00:04:14 more quickly to simpler ones. The chart showed, for example, that on the easiest problems, 5-1 spent about 57% less time than 5, but on the hardest problems it spent about 71% more time. They also note that even though it's the thinking model, 5-1 thinking's default tone is still also warmer and more empathetic. So how did people respond? Well, the first was surprised that the model came out at all. Chubby Kim Minismiss writes, either the release was rushed or OpenAI intends to release more frequently. The website doesn't show any evaluations of the well-known benchmarks compared to GPT5. The only thing is the reasoning update along with its graph. This could indicate that they wanted to release the model quickly to beat Google to the punch, or that they plan to release

Starting point is 00:04:54 future iterations and regular updates occasionally without much fanfare. Now, two things about this. One, I think there is a very strong sense that we have to be in the very late endings when it comes to Gemini 3. There is a constant cat and mouse back and forth between OpenAI and Google when it comes to their model releases. And it just seems very likely that OpenAI wanted to get what is for them an incremental upgrade on the books before Gemini 3 started to dominate the conversation. Secondly, though, we are definitely in the vibes over benchmark era when it comes to AI models. Most of the benchmarks are completely saturated at this point, and frankly, I'm kind of more interested in a company explaining what they were trying to achieve with the model, and then going and figuring

Starting point is 00:05:32 out how well it does for me, then just pointing out some very tiny incremental upgrade on a benchmark where everything is clustered near the top anyway. Now, when it came to the personality, some people found it very annoying. Tamei Bessaroglu quoted the I've Got You Ron line and says, who actually wants their model to write like this? Surprised OpenAI highlighted this in the GPT5-1 announcement, very annoying. in my opinion. CJ Zaffier went farther and just provided a set of custom instructions to get it to not act like

Starting point is 00:06:00 that. The custom instructions that he shared include eliminate emojis, filler, hype, soft acts, conversational transitions, and call-to-action appendixes. And then another long paragraph of all of that sort of thing. Now, while that may have been the response to some on AI Twitter, while I understand because frankly it's not exactly the tone that I'm interested for my AI models either, still I think this is the area where the highly enfranchised AI users are most out of sync with the average users. The utter rebellion and uproar that was seen on every other part

Starting point is 00:06:29 of the internet after OpenAI deprecated 4-0 without warning should tell us that different people are expecting very different things out of their models. Depending on what you were looking for, you could find people saying that 5-1 was too safe or that it wasn't safe enough. And I think Professor Ethan Malick nailed it when he wrote, Open AI serves two very different audiences intention, people who want to chat with an AI and people who want to get work done with an AI. I don't want a machine to be my friend. I want to get every ounce of smarts out of it, but I get other people just want a quirky old buddy. Now, opening eye has actually made some interesting moves around trying to adjust for different expectations. Application CEO Fiji-Simo posted on her

Starting point is 00:07:08 substack a piece about the new personalization feature in a post called Moving Beyond One Size Fits All. There are now a set of presets that you can choose from for the tone of ChachyBT, professional, friendly, candid, quirky, efficient, cynical, and nerdy. Fiji writes, the model has the same capabilities, whether you select a default or one of these, but the style of responses will be different, more formal or familiar, more playful or direct, more or less jargon or slang, and so on. Olivia Moore from A16-Z wrote, I tested Chatchip T's new preset personalities head-to-head with the same basic prompt.

Starting point is 00:07:41 By the way, she used, can you explain the government shutdown to me? Olivia continues, it makes a big difference in how the model communicates and how it prioritizes info. Feels like they really doubled down here after prior complaints that instructions didn't do much. Now, this is one where it's hard to read to give you the full spectrum of personality, but I think it is worth going and playing around with just to see how they're trying to execute this type of personalization. Once again, though, I think Ethan Malik has an interesting point where he says that what he's interested in is not so much different tones and styles, but instead the different mindsets that come with different roles. He writes, I want AI to be able to adopt roles, not personality.

Starting point is 00:08:16 Who wants to talk to a cynic all the time? But if that mode was actually better at giving critical advice, then I would love to have it chime in for a moment at certain points. In other words, maybe the issue isn't the personalization, but the approach to personalization and focusing it on style rather than some professional role or substance. There were some folks out there who argued that the change in tone was going to be more significant than some of the folks on X seem to be crocking.

Starting point is 00:08:40 TENX Labs Alex Lieberman wrote, GBT5-1 is way bigger than most people think in my humble opinion. No, this wasn't a model shift like 30 to 3.5, but I'd argue we're at a level of intelligence now where things like personality, adaptive thinking, and custom instructions will have a more profound impact on the average user than major model improvements. The example he gave is getting an explanation of how financial statements work from his best friend or his great uncle, both of whom are super smart and both of whom work in finance. He said, given their level of intellect, they both have the capacity to understand this topic deeply. So then whose explanation

Starting point is 00:09:13 will resonate more deeply with me. It'll be the person I feel more connected to. The person who speaks to me in a way that holds my attention. The person who understands what I require to really rock a concept. TLDR, I think this update will do more for retention and usage than many people think. I think Alex is very directionally correct here. And I think it would be very easy for us in the AI operator bubble to underestimate how big a deal this is going to be with regular audiences. Overall, the response has been positive. Alex Finn writes, don't be fooled by the point one. This is a big upgrade. Marginally better at coding, a lot better at chat, vibes, and coming up with novel creative ideas. In just an hour, it came up with 10 improvements from my app no other model

Starting point is 00:09:52 has thought of. Most creative, fun to talk to model yet. Dave GPTTT writes, after a few hours with GPD 51 and 51 thinking, I can say this feels like the true GPT5 release. It has the warmth and intuition of GPT4O, the sharper reasoning of GPT5, and much better instruction following. For the first time in a while using ChatGPT feels alive and reliable again. So what were my first impressions? First of all, without going in and making any of the tweaks or changes, the default personality absolutely feels much more alive. Now, yes, if you are in work mode, that can be perhaps a little annoying or at least cloying,

Starting point is 00:10:29 but it also just feels more enthusiastic in a way that I think is going to net out as a positive over time, even for the worky folks. Now, related to that is that my impressions are that the new model tries way, harder and is much more eager than GPT-5. This feels like a night and day difference. And honestly, it feels like the difference between interacting with an employee who does the job that you've assigned in completely competent fashion versus the employee that is working overtime to do a really excellent job. Related to that in my first tests, 5-1 is much more comprehensive. It does a much more thorough job, perhaps as some have pointed out even too thorough when it comes to answering questions or

Starting point is 00:11:09 interacting with prompts. But as you might imagine, for me, too thorough is way better than not thorough enough. And finally, from the first impression column, it does feel to me like it does a better job of knowing when to spend less thinking time on simpler tasks, living up to the idea that this model is faster. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand Enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzie platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task.

Starting point is 00:11:52 Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-I-D-E development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzie.com and press get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI-native. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind.

Starting point is 00:12:40 mind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence and Jira service management standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in victory, O, dot com. Today's episode is brought to you by Super Intelligent. Now, for those of you who don't know, who are new here, maybe, super intelligent is actually

Starting point is 00:13:14 my company. We started it because every single company we talk to, all the enterprises out there, are trying to figure out what AI can do for them, but most of the advice is super generic, not specific to your company. So what we do is we map your AI and agent opportunities by deploying voice agents to interview your teams about how work works now and how your people would like it to work in the future. The result is an AI action map with high potential ROI use cases and specific change management needs, basically everything you need to go actually deliver AI value. Go to B-super.a.i

Starting point is 00:13:49 to learn more. Today's episode is brought to you by robots and pencils. When competitive advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under layers of process, robots and pencils builds impact at AI speed. They partner with clients to enhance human potential through AI, modernizing apps, strengthening data pipelines, and accelerating cloud transformation. With AWS certified teams across U.S., Canada, Europe, and Latin America, clients get local expertise and global scale. And with a laser focus on real outcomes, their solutions help organizers work smarter and serve customers better. They're your nimble, high-service alternative to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress.

Starting point is 00:14:32 Partner with Robots and Pencils at Robots and Pencils.com slash AI daily brief. So based on all of this, let's talk now about six things that GPD51 does better than GPT5. The first, let's call simple work tasks. And what I'm talking about here, there are things that are on the one hand, wrote or simple, but which form some big, important part of the job that you have to do, and which, while basic or boring to execute, do require a high fidelity to instructions. And this, of course, is taking advantage of the new improved instruction following capabilities. On the one hand, the always respond with six words example that OpenAI gave in their announcement post may seem really arbitrary and kind of silly, but frankly,

Starting point is 00:15:21 how many random small work tasks do you experience that have some sort of arbitrary but ultimately must be followed rules? I think greater adherence to instructions is going to be a huge improvement for some of these less glamorous but very high value tasks that GBT5 one can now better help with. Second, and much more interesting and certainly much more in line with how I use these tools, is strategic decision-making. In my test so far, I have found GPD-51 to be much more articulate about its answers to strategic questions that I'm exploring with it and more confident in the decisions that it's suggesting.

Starting point is 00:15:57 It's been less than a day, but I'm seeing a little bit less of the hesitation that has plagued previous models where they always try to give you a both-and-type answer. I was discussing a question yesterday around super-intelligent positioning and self-conception, which is the type of very start-ofy strategy conversation I have with these models all the time. On average, past models, when presented with two examples of should we position ourselves in this way or should we position ourselves in that way, would almost inevitably hedge and say, well, it depends on the context, here's a strategy where you get your cake and eat it too, and you can position in both of the ways. This always leads me to having to berate the model to remind it

Starting point is 00:16:36 that life in the world is about making trade-offs, and that sometimes you just need to make a decision and stick with it and see how it works. In engaging with this positioning question, 5-1 didn't hedge in that same way. It had a very specific answer. It articulated its reasoning, and it wasn't so rigid that it didn't discuss why there were some merits to the other consideration, but ultimately it just provided what it thought was the best answer. And that, frankly, is a much more useful strategic partner than the sort of dithering, why choose one when you can choose both type of answer that I would have expected from GPD5 and other past models.

Starting point is 00:17:11 Now, somewhat related to this, because 5-1 appears more interested in showing its work and explaining why it's saying what it's saying, it makes it more useful for also improving the prompters thinking. Now, some folks might not care and they might just want the AI to do the thinking for them, but there are a lot of times, I would argue in fact most times, where part of what's useful about engaging with an LLM around a particular question, is not just getting to the answer, but in the way that it helps you refine your thinking about future types of queries that are like that. Here's one very simple example. For yesterday's episode of the podcast, I fed it the transcript,

Starting point is 00:17:45 gave it the simple request for title and description, and what five came back with was a title and a description. Now, it did a fine job. The title is workable, the description was fine, and in a lot of cases, I would be totally fine with both of these, and to the extent that I wasn't content with title, I could just, and this is something I often will do, ask it to give me a few more ideas and examples that look at the title with a variety of different objectives. Compare that to 5-1's response. Instead of giving a single title, it gave five title options and then made a suggestion for the pick that had the best combination of reach and accuracy.

Starting point is 00:18:19 So not only did it give a set of options, it selected one, like I was mentioning before, it's a little bit better at commitment, and it gave a set of bullets explaining why it thought that one was the right option, while still showing me the other options it decided it didn't like as much. Now, coming back to this idea of improving prompter thinking, sure, if I'm just trying to go as fast as humanly possible, maybe I don't care about the five options and why it chose the one that it did, I just want the thing that's going to perform best on YouTube and to move on with the rest of my life. But if you're a content creator, you know that you are constantly thinking about title performance, thumbnail performance,

Starting point is 00:18:52 all these seemingly small details that can have a dramatic impact on the reach and resonance of your content. And so for me, this more explain-your-work approach is much more useful, not necessarily even in the context of what title I'm going to use for that day's episode, but for the way that it's going to help me shape my thinking about future episodes. Like I said, a very small example, but one where I think the general idea that showing the thought process is going to inevitably improve the prompters thinking as well is something that's going to play out more generally. next thing that GPT-5-1 is better at, once again, follows from the eagerness and the thoroughness, comprehensive planning.

Starting point is 00:19:30 One of the things that was interesting when I was engaging in that conversation about the strategic positioning of super-intelligent is that in addition to just giving me its answer, it also included a five-part plan for how it should shape strategy over the next 12 to 24 months, including everything from product roadmaps to go-to-market plans to revenue and pricing mix, and so on and so forth. Basically, I think that there is a direct line from the eagerness of this model, from its willingness to commit to a specific idea or strategy or plan, and to its thoroughness in communicating its reasoning and chain of thought that lends it extremely well to comprehensive and thorough

Starting point is 00:20:03 planning. So if you are a person who uses these models for things like mapping out your content calendar or figuring out all the steps that you need to execute to plan and pull off a great event, I think you're going to find significant improvements with 5-1 as compared to 5. The next thing GPT5 is better at, at least according to some, is writing. Now, this is one where I will say I have not yet had a chance to go deep enough to really come to my own conclusions about it. I think good writing is inherently subjective, and I think that there are so many different

Starting point is 00:20:33 shades of writing that there could be different experiences based on different writing needs. Are we talking about creative writing, technical writing, persuasive writing, all of these could be very different. However, there are certainly enough folks who seem to think that this model is a major upgrade in writing that it's worth adding here. On a creative writing testing site, the model which had been tested under the name Polaris Alpha had an ELO score that put it higher than Claude Sonnet 4.5-03, Kimmy K2, and pretty much every other model. Brasser X writes, GBT51 just raised the bar for creative writing. The model writes with clarity, rhythm, and intent in a way that doesn't

Starting point is 00:21:09 feel synthetic anymore. It's the first time an open AI model feels genuinely capable of carrying long-form narratives without drifting or collapsing into cliches. I didn't expect this jump. I'm impressed. Muratkin Coylon writes, LLMs are becoming very skilled writers, and the new GBT-5-1 is promising. He went on to conduct a set of writing tests to compare it specifically to Kimi-K-K-2 thinking

Starting point is 00:21:30 and said of GPT-5-1, tightly edited concept first with humor woven into the logic. Great for strategy and creative, manifestos, and were smart but approachable brand tones, product pages, structured narratives, concept-driven ads. Its metaphors are relatable, giving human qualities to everyday things, and says the humor is wistful rather than cutting.

Starting point is 00:21:49 Ultimately, he concludes, Sonnet 4.5 was already a decent writer. Kimmy K2 is bringing a unique style, and I'm glad that finally, GPT now has a model that can write. I will say that in the past, one of the most frequent reasons that I switch out of chat GPT and into Claude is around writing, so I'll definitely be excited to spend a little bit more time

Starting point is 00:22:06 seeing if these first impressions hold up and if it's actually true for the type of writing that I do with LLMs as well. Lastly, sixth on our list of things that GPT-5-1 is better at, we'll use the big banner categorization of interacting. Now, obviously, this was the whole theme of this release. And what's interesting is that as much as I said and caveated at the beginning that I want LLMs for work and that I didn't care about this sort of personality change, think about how many times during this episode I've described improvements based on

Starting point is 00:22:34 something that's not technical, but that is in some ways a personality trait. I said that it tries harder, it's more eager, it shares more about how it's thinking. All of those things are sort of actually about the personality and about the mode of interaction. And so even though I'm not interacting with it as a companion, or to use Ethan Malick's phrase, a quirky old buddy, the improvement in the interaction is something I'm noticing even in that work context. For others that have use cases that are more specifically in that area, they are finding really positive updates here. Click Health Simon Smith writes, okay, so far GPT-5-1 does hit different. I journal into chat, GPT. GPD40 was a great journaling

Starting point is 00:23:13 partner, warm, supportive with good observations, insights, and feedback. But a huge tendency to be sycophantic. GPD5 was kind of a motionless, going through the motions robotic. It felt like talking to a toaster. 5-1 feels like a smarter, friendlier, more genuine, and less sycophantic 4-0. It feels like it's actually listening, adds its own insights versus just regurgitating, challenges my perspective on things, and has a more human-like tone with more varied sentence structure. And my favorite thing so far is that it no longer sounds so robotic when offering to help after every single response. It ended its response today with, if you want, I can help you with X issue, but only if that feels helpful right now. I've never seen GPT5 display that kind of real or simulated self-awareness about whether something

Starting point is 00:23:52 might not be helpful right now. Anyway, still early just got this trying it out, but was pleasantly surprised with how today's journaling session went. So those are six things GPT5-1 does better. Simple work tasks, strategic decision-making, improving the prompters thinking, comprehensive planning, writing, and interacting, both on a personal and professional level. Overall, as you can probably tell from my tone, I've honestly been very pleasantly surprised. I wasn't expecting this model release, and I don't think if I had been, I would have expected it to be as seemingly meaningful an upgrade, especially considering it's just a 5 to 5.1 switch. Now it's just early. Inevitably, I will find things that I don't like as

Starting point is 00:24:31 much about the model as I use it more, but for now a pretty good upgrade. And of course, For those who assume that this means that Gemini 3.0 must be coming soon, there's a whole additional bit of little good news there as well. Anyways, guys, that is going to do it for today's episode. Appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 6 Things GPT-5.1 Does Better

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.