The AI Daily Brief: Artificial Intelligence News and Analysis - 4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro

Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI has released a new image generation model. We are gathering all of the first responses as well as talking about four areas where I think you may prefer it, even over Nanobanana Pro. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Litsy, Rovo, and Robots and pencils. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcast. And of course, to learn about sponsoring the show, send us a note at sponsors at AIDailybrief.aI. A little teaser here.

Starting point is 00:00:43 As you know, we've got the early readout results of the AIROI benchmarking survey coming. And just in general, if you're interested in that sort of data and research, might I point you to AIDB Intel.com? We're going to have a lot more interesting things in these domains coming next year, and you can sign up to get notified as we share more of that information. Now, one more note before we dive in, as is usually the case with a new model release. This episode got long and consumed the entirety of the space that we have. At this point, we're getting due for an extended headline, so that will be coming soon,

Starting point is 00:01:12 but for now, enjoy this look at ChatGBT-GBT Images. Another day, another new model. Look, this competition between the big labs may be stressful for the people working there, but for us consumers, it means nothing but more choice. Today we are talking about OpenAI's latest image generation model and the new house they put it in, which they are calling Chat-GPT images. Now, overall, this is one that I kind of expected. You might remember in the December prediction episode, even before I think Sam Altman had

Starting point is 00:01:40 declared Code Red, or at least before we knew about it, my best guess for a response to Gemini 3 and Nanobanana Pro was an OpenAI image model. It had just been a really long time since we got an update on that. It was clearly an area where they were pretty far behind, and it seemed like based on the fact that it had been so long since we got an update, and knowing the speed at which OpenAI delivers, they had to be pretty close, one would think, to being able to release a new model. Now, I didn't expect a full 5.2 release, and that's obviously the first output of code red, but yesterday on Tuesday, OpenAI dropped their new chat CBT images. As benefits, they point to stronger instruction following, precise editing, detail preservation, and a big speed boost as compared

Starting point is 00:02:20 to before. So let's talk a little bit more about what OpenAI points to as the benefits here. A lot of this is about feature parity with Nanobanana Pro. Remember, the real value of Nanobanana Pro was not just that it was an improvement in terms of raw generation capability. It was about the controls that the user had over it. Whereas in the past, to get exactly what you wanted out of a generation, you'd just have to kind of prompt it over and over and over again and pick the one that was closest. Nina Banana Pro allowed for more precise edits. That capability has now come to chat GPT images as well.

Starting point is 00:02:51 They write, the model adheres to your intent more reliably, down to the small details, changing only what you ask for while keeping elements like lighting composition and people's appearance consistent across inputs, outputs, and subsequent edits. Interestingly, they point to some pretty consumer-centric use cases for that, which is a theme we'll come back to throughout this episode. They continue, this unlocks results that match your intent, more believable clothing and hairstyle try-ons, alongside stylistic filters and conceptual transformations

Starting point is 00:03:16 that retain the essence of the original image. Another capability they point to is adding, subtract, and combining blending and transposing. For example, taking a set of inputs and turning it into a single composition. Another capability that they're really hammering is what they call creative transformations, basically taking one image and turning it into a different style preset, a movie poster, or turning someone into an 80s fitness instructor, taking someone's photo and turning it into an ornament, etc, etc. Once again, and as we'll come back to, I actually think that they're highlighting this

Starting point is 00:03:45 says a lot about who they are intending this product for. Other benefits they point to include better instruction following, up to an including much more precise prompting, and they also point to much better text rendering. Now, this was obviously one of the biggest changes that we got with Nanobanana, is that in addition to just being able to have text, with Nanobanana and then Nanobanana Pro, you could get a ton of high-fidelity text,

Starting point is 00:04:08 opening up new possibilities for things like infographics. One final interesting thing from their announcement post is that while in most areas the model improved, they actually did find some regressions as well. For example, they write, the ability to generate some specific art styles has regressed from the previous version. The example they give is draw me like I'm in a dark fantasy anime, with the new version completely

Starting point is 00:04:28 100% not being that at all. There are other limitations as well. For example, when there's a picture with a lot of different faces in it, keeping all those faces consistent between generations can be difficult. Overall, they claim a big improvement, but still a lot more opportunity ahead. So what were people's first impressions? I think my sense is that people were kind of prepared to be somewhat underwhelmed. I'm not exactly sure what the reason for that is.

Starting point is 00:04:53 Maybe it's a concern that because this was part of that code red, that this and basically any other model that they might release would be a rush job. But for a lot of people, even though they were prepared to be underwhelmed, they were, I would put it, kind of whelmed. Justine Moore from A16Z writes, in early tests, this is a big step up in maintaining consistency of characters and objects from uploaded images. In other words, your face still looks like you. It may be a real competitor to Nanobanna Pro. Simon Smith from Click Health wrote, I wasn't expecting OpenAI's new image generator to be comparable to Nanobanana Pro, so I ran it head-to-head-head-on prompts I tried with NBP. Surprisingly, it did as well or better.

Starting point is 00:05:29 But it has a different personality, at least via ChatGBT, BT. Less whimsical, more professional. So here are a couple of the examples he gave. Research when prominent people, especially the leaders of big AI labs and forecasters think we'll get AGI. Then illustrate this on a timeline and put the faces of the people on the timeline on the years when they think we'll have AGI. give this a fun kind of cartoony but not too silly feel. Now, a couple things. First, I think this is a good test to see how well integrated

Starting point is 00:05:55 with the rest of the model image generation is. In other words, this requires not just image generation, but it's also reason and research. And the second thing that this brings up is that inherently the challenge with all of this episode, and by the way, this is a good one to watch if you're just listening, is that to some extent quality is going to be subjective. Although in this case, I certainly see why he prefers chatch-EBT images version

Starting point is 00:06:15 as opposed to nanobanana. He tried creating a cell cutout diagram, which again is a little bit in the eye of the beholder, but certainly holds its own, alongside a skeleton anatomy chart, and a prompt that said, search up today's top headlines and then give them to me in the style of an old newspaper. Now, the two models in this case took the prompt in very different directions, and I actually prefer aesthetically, nanobanana pros, but overall, Simon says, I was prepared to be disappointed and I'm not. That's saying something because nanobanana a pro is amazing. I need more time to play around with the new image generator, but my first impressions are positive. He then came back and said, slides, however, may be a weakness of GPT image 1.5,

Starting point is 00:06:52 before very quickly returning and saying, okay, I take it back. GPT image 1.5 can do gorgeous slides. You just need to prompt it. I gave it the same template in the above example, but used GPT 5.2 thinking instead of instant and a broader prompt. He did point out, however, that there are real limitations to the aspect ratios that you can get with GPT image, which has always been an issue for chat GPT images. Still, all of this added up for Simon to him actually thinking that GPT Image 1.5 has beaten Nanobanana Pro on his personal scorecard. And it wasn't just Simon. Alam Arena tweets, Image Arena shakeup. OpenAI's GPT Image 1.5 is number one in text to image. ChatGPT Image latest is number one on image edit. GPT Image 1.5 holds a commanding 20

Starting point is 00:07:36 point lead on text-to-image while maintaining a narrow three-point edge over Nanobanana Pro on on Image Edit. Now, they do say that these scores are preliminary and we'll see where they settle, but still I think this would surprise a lot of people. Artificial analysis found something similar. They wrote, on both text-to-image and image editing, GBT Image 1.5 again surpassed Nanobanana Pro on their tests. They gave a couple of different text-to-image generation examples, a couple of editing examples like changing a car's color, and inserting a family of ducks crossing a railroad, ultimately again ranking at number one. Now, there are a million examples out there

Starting point is 00:08:12 if you want to go see direct head-to-heads on ChatGBTGBT versus Nanobanana Pro. And my strong suspicion is that if you don't have a particular horse in the race or a set of biases that you're bringing in to start, you're likely to find some where you prefer ChatGBTBT and some where you prefer Nanobanana Pro. For myself outside of just exploring a bunch of things that I thought were interesting, I ran a couple of tests.

Starting point is 00:08:34 For instruction following with multiple constraints, I ask for one person standing and pointing at a screen, two people are seated. The screen shows abstract charts with no readable text. The room is modern and minimalist. The color palette is black, white, and light gray only, no windows, no plants, no logos. In that case, both Nanobanana Pro and GPD images were able to do it equally competently. On a test of photorealism. I asked for a photorealistic image of a hand holding a clear glass coffee mug filled halfway

Starting point is 00:08:58 with black coffee. The hand has to have all five fingers and have them all visible. The glass has to show realistic reflections and refraction. the coffee surface needs to be flat and level, natural indoor lighting and a neutral background. Again, in both cases, the models were pretty equally competent. Getting into more stylistic and aesthetics, I asked for a 1950s retro-futurist style illustration, with flat, bold shapes, a limited color palette of teal cream and muted orange, clean lines in an optimistic mid-century modern aesthetic.

Starting point is 00:09:26 Once again, they were both competent, and ultimately the preference here is going to be in the eye of the beholder. One of the challenges that this shows is that a single stylistic prompt can mean different things. These are both examples of 1950s retrofuturism, but one is a little more Jetsons and the other is a little more abstract. When we created a character and then put them in a different setting, both models had no problem keeping consistent from one to the next. And of course, on YouTube thumbnails, a very common use case for me, frankly they were both pretty garbo, although I know for a fact that I could improve that with different prompting. As you can

Starting point is 00:09:59 probably tell across my test then, what I found was pretty meaningful parity, not necessarily a clear or huge improvement over Nanobanana Pro, but clearly a huge improvement from where OpenAI's image generation model was before this. However, it's not hard to find people who feel the opposite if you go check out Twitter slash X. There were many people who were just kind of generically underwhelmed. AI News by Small AI said shipping anything is hard so we rarely call out misses and OpenAI rarely misses, but this was clearly a miss. OpenAI Image 1.5 claims to beat Nanobanna Pro number one across all arenas, but completely fails vibe checks. The Ejiko did a test and found that character-faced accuracy was kind of lacking.

Starting point is 00:10:38 Brand designer Darius Krova gave a base input image as well as a product package and asked both models to make the girl in the input image hold the bottle and said, well, it's better than before, Chatsybt didn't get the scale and change the product and the light. And if I ask it to make some edits, it reworks the whole image. We'll keep testing, but for now it's one to zero for Google. David Shapiro provided a bunch of images of himself and asked both models to create a YouTube thumbnail, which in this case, undeniably, nanobanana smashed compared to chat GPT. Some people were even quite flabbergasted with the arena and artificial analysis results.

Starting point is 00:11:11 I Am Emily 2050, re-shared artificial analysis's post and said, what a joke. I'm not going into the conspiracy side, but this is really not looking good for artificial analysis. When someone said, how, that can't be right, Emily responds, Open AI gained the benchmarks or paid them to say so, which hold aside the substance of that argument, I think reflects people's skepticism. The X comments on both artificial analysis post and the Elam Arena post also show just tons of skepticism. All right, let's talk about the signal versus the noise in enterprise AI. The challenge right now isn't just about what's possible, it's about what's practical.

Starting point is 00:11:47 That's the entire focus of the You Can With AI podcast I host for KPMG. Season one cut through the hype to focus on deployment and responsible scaling. Season two goes a level deeper. We're bringing together panels of AI builders, clients, and KPMG leaders to debate the strategic questions that will define what's next for AI in the enterprise. Six episodes packed with frameworks you can actually use. Find you can with AI wherever you get your podcasts. Subscribe now so you don't miss the new season. This episode is brought to you by Blitzy,

Starting point is 00:12:16 the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise Engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.

Starting point is 00:12:45 Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit Blitzie.com and press get a demo to learn how Blitzie transforms your SDLC from AI a assisted to AI Native. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio.

Starting point is 00:13:14 Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence, and Jira service management standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate?

Starting point is 00:13:44 If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com. AI changes fast. You need a partner built for the long game. Robots and pencils work side by side with organizations to turn A.O. AI ambition into real human impact. As an AWS certified partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value. And their partnerships don't

Starting point is 00:14:11 end at launch. As AI changes, robots and pencils stays by your side, so you keep pace. The difference is close partnership that builds value and compounds over time. Plus, with delivery centers across the U.S., Canada, Europe, and Latin America, clients get local expertise and global scale. For AI that delivers progress, not promises, visit Robots and Pencils. So what to make of all of this? I think Peter Gostov from Ella Marina is directionally correct when he writes, My anecdotal impression of GPT 1.5 versus Nanobanana Pro is that they are pretty neck-and-neck overall.

Starting point is 00:14:50 I find GPT a lot easier to prompt. With nanobanana, you often had to iterate several times before getting a good result. While with GPT, you typically get what you ask for. But I think Nanobanana has slightly nicer taste, e.g. for infographics, slides, Google has the advantage. I found GPT style quite heavy, with the important point in the part I'm saying directionally correct being the pretty neck and neck overall. Jimmy Apples had an even simpler version of the same statement, big upgrade over the previous

Starting point is 00:15:15 model. It's not as smart as banana, but it's going to be subjective on what you like on style versus style. Personally, it really hits the image in my head I have for this prompt. Just use what you prefer. I'll be using both. And that is exactly what my overall conclusion is. Before this, Nanobanana was undeniably and very clearly better than

Starting point is 00:15:33 anything Open AI had going on with image generation. Now, it is not so clearly better, at least not in all cases. What that means practically is that for really high-quality image generation, on Tuesday morning you had one option, and now in a lot of cases you're going to have two. Now, one interesting point that Swix made is that we may also be seeing the limits of how far we can go in image generation with current methods. He writes, I think today's Image 1.5 launch illustrates one of the reasons why people are betting so hard on explicit world models. For the next level in realism, we're going to have to teach the models to see the world as we live it, not through occasional snapshots. He pointed to a post on R-slash-Chat

Starting point is 00:16:11 CBT that said, the new image gen is nuts. Someone responded, however, yes, but also the details are a little off. Why is one leg bare and the other covered by pants? What kind of car has a vanity table behind the front seat? Where is the passenger seat? Maybe it's covered by her, but a lot of the background and context still seems off. Still, the people at least look human and not like plastic anymore. So as we round out, let's ask, is there anything that ImageGen, I think, does distinctly better than Nanabanana right now? And while my answer is no, there's no one use case where I thought, just in every test that I tried, ImageGen crushed Nanobanana Pro or anything like that, there are four areas right now, with a fifth potential bonus area in the future, that I think

Starting point is 00:16:53 ImageGen may be a desirable alternative to what Nanobanana can do. First up, let's talk infographics. One of the incredible things about Nanobanana Pro when it was released, is that all of a sudden this new capability of making infographics from text came online. I'm sure that you have seen a ton of these floating around the internet, and indeed, that ubiquitousness and commonality of style is exactly why I think in some cases you might want to use ChatGBTDBT images instead of Nanobanana to make your infographics, for the simple reason that they don't look like a nanomana infographic, which already has a particular flavor in style that people can spot from a mile away.

Starting point is 00:17:31 I dumped in a recent episode transcript to get an infographic based on it, and both models were able to do this, although they each had their own quirks. As it often does, Nanobanana's first iteration gave a bunch of citation references, even though those are completely useless and wasted space on a visual infographic like this, whereas chatGBT images just had a few little mistakes here and there. For example, in the three biggest barriers to agentic AI section, it only has two barriers. There were also some random spelling mistakes like bigger being spelled BIG, GER. Now, perhaps the better approach than using chat GBT images is just to try to prompt your

Starting point is 00:18:07 way out of the standard look of Nanomonanapro, but my point here is that you at least now have a competent visual alternative. I might add to this use case, things that need really high text fidelity. That was one of the things that OpenAI called out in their announcement post, and I did some tests around that as well. I asked for an over-the-shoulder shot of Abraham Lincoln sitting at his desk writing the Gettysburg Address, make the entire address readable, although in this case I found both models able to do it. So once again, we're back into stylistic preference area. A second area where I think genuinely,

Starting point is 00:18:39 chat chag-G-T images right now might have an edge, is around hyper-precise instructions and complexity. I took this six-by-six grid idea and really ratcheted up the complexity. I said, make a six-colums-by-six-rows grid of Lofcraftian artifacts and entities where each cell contains exactly one distinct illustration centered within its square and not overlapping grid lines. Overall style is 1920's Pulp illustration meets a cult manuscript. Inked line work, muted sepia and see green tones, subtle paper grain, no modern elements,

Starting point is 00:19:08 no text anywhere in the image. And then just to add another layer, I actually precisely gave it everything I wanted in all 36 squares. It did just a phenomenal job. There wasn't a single square that didn't have a strong, competent version of exactly what I asked for. Nanobananas pros version of this was an absolute mess. Instead of a 6x6 grid, I got an 8x5, it didn't follow the overall instructions as well, and tons of the individual squares were just out of the blue in nowhere. Now, of course, this is just one test, but I noticed a couple others also preferring

Starting point is 00:19:42 chat GPT images for some of these hyper-precise or complex instructions as well. Ethan Malik writes, I tried something fun that worked better with chat GPT image generator 1.5 than Nanobanana Pro. Point and click adventure game me, you are the parser, make images as the output and take in commands, make the world super interesting, keep track of inventory state, et cetera. So you can see it basically creates a screenshot from a video game, and then Ethan prompts it to go to the next shot in the game. Look at the laser. Cover the laser with map and inventory. Run through the portal. Chagibit did a really good job with this. Nanobanana Pro did not. In its first attempt,

Starting point is 00:20:19 the second image was completely different than the first scene, and then it just completely bowed out, and in the second attempt it sort of did it, but with a much, much harder time. Then, of course, there was Peter Gostov again, who tweets, I know people like nanobanana, but I have some important needs that it just cannot meet. His prompt was create a square image of a hand with six fingers, a wall clock showing 822, a glass of red wine full to the top. Nanobanana Pro had a normal hand, a clock at 758, and a wine glass that was mostly but not entirely full, whereas the new image gen model had a completely full wine glass, 822 on the clock, and seven juicy weird fingers. A third area where I think you might prefer or at least want to

Starting point is 00:21:02 test chat GPT images as opposed to Nanobanana Prob is for aesthetically focused and higher taste prompts. Flowers shops are a couple of examples where I think that the GPT images version is just a big step up visually from the nanobanana version. Here's another example with a logo. And Aziz AI found something similar. The prompt that he tried was create a clean look website in Apple style for Nike in a four to five aspect ratio. He said, The winner was GBT in aesthetics of UI and understanding the prompt. Now, I will say very clearly here that the point that I am trying to make is not, especially in this case,

Starting point is 00:21:39 that I think that chat GPT images will always be better. It's that because these models are both so at the high end now, when you are trying to find something that matches your vibes and reaches the levels of the high taste that you're going for, you now have a couple of options. Images is, in some cases, going to be better and in some cases going to be worse. But again, that means you've gone from one option to two options, basically overnight. The fourth thing that I want to mention in terms of an area where chatch EBT images excels as compared to nanobanana is the actual interface for using it. And I think this reveals quite a bit about how they're imagining usage of this tool. Certainly myself, and I'd be willing to bet

Starting point is 00:22:16 many of you are coming at this conversation from a standpoint of a business or power user. You want these fine-grained editing controls. You're imagining how you can use this for your solopreneur business. But I think OpenAI is imagining that a lot of the usage of this is in fact just going to be people messing around and having fun. Whereas with Gemini, there's absolutely no difference when you're creating an image other than you say create image. In the chat GPT web app now, there's a whole different section with slightly changed visuals and a whole lot more options. In addition to your standard text prompt field, you also have a row of styles underneath that you can try on an image, sketch, holiday portrait, dramatic, plushy, baseball bobblehead, etc.

Starting point is 00:22:56 Then below that, they also have a panel of ideas to just discover something new, like creating a holiday card. What would I look like as a K-pop star? Me as the Girl with the Pearl Earring. And you get the sense from this that they want to solve the blank slate problem and get people messing around with this not for a business purpose but just for fun. I'm sure it's not lost on them that one of their big things. moments of user growth, if not their biggest moment of user growth ever, and certainly in

Starting point is 00:23:20 2025, was when we got the giblification trend where everyone turned everything into a studio gibli image. These sort of interface options are very clearly aimed at the average user, who isn't thinking about business outcomes and ROI, but is just there to have some fun. Given how much of chat GPT's usage is regular everyday people, I can see why they're making that bet. So that is four areas where I think you might want to try chat GPT images, either instead of or at least in addition to Nanobanana Pro. The bonus, however, in fifth future area, is of course when you want to make Mickey or Moana or a Disney character.

Starting point is 00:23:57 Now right now, Chad Chbitty Images is much more locked down in my tests at least than is Nanobanana. I gave the prompt Sam Altman water skiing behind a boat driven by Andy Jassy. This obviously relating to the news that OpenAI might be doing a deal with Amazon. From Gemini, I got this cool Ralph Stedman-looking image. From Chad GPT, I got this. The image generation request did not follow our content policy. Of course, we just learned that OpenAI and Disney had done a deal, a deal that will explicitly

Starting point is 00:24:24 bring Disney's characters into SORA. If that extends into image generation, it could be a big deal as well. Simon Smith again writes, If OpenAI and Disney surprise everyone by allowing character generation with the launch of images V2, pretty sure it will spark a ton of chat GPT use over the holidays. Parents alone will burn up GPUs inserting characters into holiday messages for their kids. Now, one thing Simon references there is V2. Remember that this is version 1.5, and people are expecting a lot more in the relatively near future from an even better image generation model.

Starting point is 00:24:55 OpenAI staffers are indeed suggesting that this is just the start and that we are in for more image generation updates in the future, which, as I said right at the beginning, is nothing but good news for us consumers. So, friends, that is my first look at ImageGen 1.5. Hope this was useful, certainly if you haven't yet, get in there and start creating. it continues to come early as we get more and more AI toys. For now, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.