The AI Daily Brief: Artificial Intelligence News and Analysis - 25 Things Nano Banana Pro Does That AI Couldn't Before

Starting point is 00:00:00 Today on the AI Daily Brief, 25 things you can do with nanobanana Pro that you couldn't do with AI image generation before. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick notes before we dive in. First of all, thank you to today's sponsors, KPMG, Rovo, robots and pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. To learn about sponsoring the show, send us a note at sponsors at AIdailybrief. Lastly, we are in the last couple days of the AI-R-OI benchmarking study. Check it out at ROISurvey.aI, and thanks to all of you who have contributed so far.

Starting point is 00:00:44 Final quick note before we dive in, we are just wallowing in all of the fun ways to use this new model today. So we will be skipping headlines. Then, of course, over the weekend, we have a big think episode, and we will be back with our normal format on Monday. With that out of the way, let's dive in. Welcome back to the AI Daily Brief. The last two weeks have been an absolute embarrassment of Rich. We got Gpti 5.1, followed by Gemini 3, followed by GBT51 Pro, and Codex max, followed, it turns out, by Nanobanana Pro. And in some ways, in terms of immediate impact and change in your capabilities with

Starting point is 00:01:21 AI, I think that this image model might be the big one. Now, quick warning, if you are not already, you got to switch over to the visual version for this episode. You can find it on Spotify. There's also a version of it on the YouTube. Today we are talking about 25 things you can do right now with Nanobanana Pro, which is of course Google's latest image generation model. And importantly, this isn't just cool things that you can do. This is stuff that you pretty much couldn't do as of like three days ago with the previous image state of the art. Now by way of background, you might remember that a couple months ago everyone started going wild for the nanobanana image model. Nanobanano was of course its codename but it became so beloved that it sort of

Starting point is 00:02:01 stuck. Its technical name was, I think, Gemini 2.5 Flash image gen or something like that. In any case, what made nanobanana so interesting to people was not that it had raw output that definitively beat other image models, it's that it was so unbelievably steerable. It provided for fine-grained editing in ways that really opened up new use cases. So much that it got me thinking that we need a different type of heuristic or metric or eval when we look at a new model that's not so much about its performance on benchmarks, but is instead about what capabilities it unlocks. I propose the idea for an unlock score, which would basically be exactly that,

Starting point is 00:02:38 a determination of what new possibilities a model opened up. And in any version of that, Nanobanana Pro would just score off the charts. Now, in terms of the capabilities, there are really two big things that feed into everything else in my estimation. The first is text representation. The difference between any other model putting words on an image and what nanobanana can do is the single biggest jump that I've ever seen between models when it comes to image generation. Period, full stop.

Starting point is 00:03:07 Now, the second factor that combines with that to open up all sorts of new possibilities is the fact that Gemini is able to reason on top of image generation. So when you are inside Gemini, it is not two disconnected experiences, but you don't just have to prompt an image you can actually talk to the model and figure out exactly what you're trying to do. That reasoning on top of image generation once again opens up totally new possibilities. Finally, a third factor which should be mentioned as well, is the incredible fidelity that it has to whatever edits you want to make,

Starting point is 00:03:38 all of which adds up to this being a model that is going to turn you from an average image generator to an absolute professional. In a lot of ways, I think that the core story here in some ways, and the meta category that a huge number of these use cases fall out of is the idea of visual compression. One of the big themes that you see across a ton of the early experiments that people are doing is taking a bunch of information and making it visual. In other words, the difference in how Nanobanana Pro can use text

Starting point is 00:04:08 is not just a change in scale, but a change in kind. The new unlock is not just being better able to use text, but being able to do it so well that you can start to tell visual stories. So a first example of what you can do with NBPro is compressing lots of data individuals. Take, for example, financial results. Didi Das of Menlo Ventures took the entire Nvidia Q3 earnings PDF and generated a single page infographic that had the key highlights like revenue, operating income, net income, gross margin, and was also able to highlight other key parts of the report, including segment performance and drivers and capital strategy and risks.

Starting point is 00:04:42 Justine Moore from A16Z did something similar with Alphabet's Q1 earnings release. That one was even higher on chart density, with accurate representation charts showing revenue growth, operating income growth, as well as revenue composition. Accurate scale charts is actually a great example of where the more sophisticated intelligence changes the nature of what you can do. To demonstrate this capability, Simon Smith made a chart of bananas, where he showed the difference in magnitude between 25%, 50%, 75% and 100%, and it actually gets it right. Simon said, I've previously tried to generate charts with image generators

Starting point is 00:05:17 and they failed to get the correct lengths for bars and columns. Koshik Shivakumar found something similar. The Google Deep Minder wrote, An emergent capability of Nanobanana Pro that took me by surprise, the ability to generate beautiful and accurate charts that are to scale. In his case, he tested GDP per capita, and it not only is accurate, it really is aesthetic at the same time. Another approach to this idea of compressing lots of data individuals

Starting point is 00:05:41 that I kind of think is going to become a trend, is the whiteboard trend. Pietro Sharano writes, Nanobanana Pro is wild. Here's my favorite use case so far. Take papers or really long articles and turn them into a detailed whiteboard photo. It's basically the greatest compression algorithm in human history. He then shows a compression of a 92-page PDF from the Lama 3 herd of models,

Starting point is 00:06:03 converted to a professor's whiteboard. And while obviously you can only fit so many details into a whiteboard, it's an incredibly impressive summarization. So again, just to reinforce the points about what's different here, we have two things coming together, better ability to handle and represent text, and the ability to reason on top of image generation to create truly native visual outputs. This leads to a fourth, very broad category of use cases, which we'll call educational.

Starting point is 00:06:30 Effectively, image generation can be an educational tool alongside LLMs now in a way that just was impossible up into literally a couple of days ago. We have infinite examples here, but to take a few, we have a visualization showing what parts of robotics are solved versus where there are key bottlenecks and hurdles, Clark Wimberley created a visualization explaining how a touchscreen works from tap to action. From the literal dead simple prompt, make an infographic explaining how a touchscreen works, Nanobanana Pro was able to put together a four-part visual that looks great and explains how the process goes from physical touch to sensing the interaction, to signal processing to executing the command. Swix went meta and asked

Starting point is 00:07:09 Nanobanana Pro to explain Nanobanana Pro. He got it in two very different ways. One is a sort of good-looking but ultimately academic infographic, while the other is a literal classic comic strip explaining what Nanobanana can do. I'm of course seeing a ton of parental use cases. Google's Jacqueline Conselman created this gorgeous tour of our solar system that looks like the type of poster that you'd put on a three- or four-year-old's wall. Speaking of three- or four-year-olds, I have a four-year-old who is learning to read and who is very, very much into construction equipment and construction tools. And so, of course, I had to put together an alphabet chart that was based on that theme.

Starting point is 00:07:46 And if you've ever tried to do something like this, you will know that while this feels like it should have been table stakes, it absolutely was not. It was nearly impossible to get something like this before, and pretty much genuinely impossible to get it with no errors and without specifying all the different elements. I didn't have to tell it to put an asphalt paver for A or a bulldozer for B. I just told it that I wanted an alphabet chart with these themes, and it figured out all the rest. As you can see, we're starting with these very broad use cases that are actually lots of use cases bundled together, but the next one we'll look at is a sort of subset of infographics, which is flowcharts.

Starting point is 00:08:20 Ethan Mollick prompted it, I need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible, and it did that with grand fashion. Now, Ethan was being silly, but this ability to actually show the representation of different visual elements as a flow chart, is obviously incredibly valuable, not just in a silly way as well. What if AI wasn't just a buzzword, but a business imperative? On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward-thinking enterprises. Hosted by me, Nathaniel Wittamore, and powered by KPMG,

Starting point is 00:08:58 this seven-part series delivers real-world insights from leaders who are scaling AI with purpose, from aligning culture and leadership to building trust, data readiness, and deploying AI agents. Whether you're a C-suite executive, strategies, or innovator, this podcast is your front row seat to the future of Enterprise AI. So go check it out at www.kpmg.org.us slash AI podcasts or search you can with AI on Spotify, Apple Podcasts, or wherever you get your podcasts. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents,

Starting point is 00:09:34 or build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence and Jira Service Management Standard, premium, and enterprise subscriptions.

Starting point is 00:10:04 Know the feeling when AI turns from tool to teammate. If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com. AI changes fast. You need a partner built for the long game. Robots and pencils work side by side with organizations to turn AI ambition into real human impact. As an AWS certified partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value. And their partnerships don't end at launch. As AI changes, robots and pencils stays by your side, so you keep pace. The difference is close partnership that builds value and compounds over time. Plus, with delivery centers across the U.S., Canada, Europe, and Latin America, clients get local

Starting point is 00:10:48 expertise and global scale. For AI that delivers progress, not promises, visit robots and pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours to understand Enterprise-scale codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task.

Starting point is 00:11:20 Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-Native SDLC into their org. Visit blitzie.com and press get a demo to learn how Blitzie transforms your SDLC

Starting point is 00:11:43 from AI-assisted to AI-Native. Next up, number six is visual tutorials. Callum McClark put together a chart of the correct bowing procedure for Taekwondo, dividing it into the four steps, as well as providing insight on when to bow. Once again, Calam didn't provide it a ton of information. When someone asked, he said,

Starting point is 00:12:07 the prompt was fairly simple. Generate me an infographic explaining how to bow correctly in ITF Taekwondo. Now, I didn't see any versions of this, but can you imagine how valuable this would be for instructions on assembling something? Another sort of separate category of instruction that a lot of people are experimenting with is visual recipes. Chubion X built a chart showing how to make the perfect cardam T. Vittorio created a step-by-step guide for cooking the perfect pasta. Anatomical and technical drawings was a huge theme. The JSON prompts account shows, a bunch of Pikachu anatomy drawings, including Squirtle, Bulbosaur, Charmander, and Pikachu himself. Another use case, which I think we'll see a ton of, is taking one type of media and turning it into another type of media.

Starting point is 00:12:51 Shopify CEO Toby Luckke took a video of a speech that he gave a number of years ago to his team and turned it into a rich complex visualization, which is something you better believe I'm going to try with the transcripts to this show. Another subgenre of technical drawings that were seeing people experiment with his blueprints. The AI for Success account wrote, it did not just create the image. It first read the blueprint properly and then created the final output with every small detail. Another great representation of the power of true multimodal understanding. Sort of related to that is another use case that I think is going to be highly commercialized,

Starting point is 00:13:23 which is virtual staging. Justine Moore from A16Z gave it a set of three pieces of furniture and said, stage a living room with this couch, table, and two chairs. It executed the output very well, and Justine wrote, The first iteration of the model was good at this, but I find the new model is much better at retaining textures and asymmetry and unique features of objects. Alcine went farther, writing, Nanobanna Pro is making millions of interior designers obsolete.

Starting point is 00:13:47 I uploaded by 4Plan and it designed the whole house for me and even generated real images for each room based on the dimension. Now, in my opinion, I think that this is once again an example of where professional interior designers are just going to be able to do more, faster, potentially for less and for a greater number of clients, but the capability increase is absolutely huge. Another capability that Google talked a lot about when they launched this was the ability to combine multiple people into a single photo. FOFR found that Nanabanana Pro could take up to 14 reference images, and that while it worked best with around five people, sometimes you could push it farther. If you've ever tried to combine people into an image, you'll know that AI models frequently have a hard time and end up kind of blending people's features together rather than putting them next to one another.

Starting point is 00:14:29 FOFR actually showed a couple examples of the different styles that you could combine all these reference images to create something with. A big theme here is really precise instruction following. Halim Al-Rasihi gave the model two characters, a style reference, and a sketch of an action pose that they wanted, and got the exact image that they were looking for. Now, this precision instructions and precise editing is, once again, I think, going to be one of the most commercially viable and important unlocks of the whole model. The original Nanobanana was good at this. This was actually the core of where the unlock score idea came from, this ability to spot

Starting point is 00:15:02 edit photos, but Pro takes it to another level. Clark Wimberley took a photo of a man in a warehouse and prompted the model, this man just got a supplier price change request and looks concerned. The model makes the change in ways that look incredibly natural and not over-exaggerated. Clark also turned a white claw into a glass of soda with a striped straw. Got to give a shout out to Prins as well, who took a handful of magic cards, split between red and black, and made them all into black cards. Now, if you don't know magic, I want to underscore a couple things about this that make it even more impressive than it seems. The first is that it was able to tell that based on what the user was

Starting point is 00:15:36 asking, it needed to change the mountain on the left to a swamp, which is the basic land associated with black in the game. That involves a whole different level of understanding and comprehension that wasn't there in the prompt. The second part is that the borders of different colors have different visual cues. And so the model knew that it couldn't just change the color of this pattern. It actually had to change the pattern to what black cards look like. Now, well, this was just a demonstration example, that level of precise editing opens up such a crazy world of new opportunities. Speaking of fidelity to instructions in precise editing, the ad agencies are going to be absolutely salivating. One of the most common type of examples that people were sharing were product and brand shots.

Starting point is 00:16:18 Someone created high fidelity advertising visuals for earbuds. Hedra Labs took its logo and put it on a billboard. Jacob Palsall took a set of reference product images and turned it into a magazine style ad. Now, staying in the brand marketing advertising theme, a lot of people also experimented with logos. Now, this is one where I will say, for the sake of having some amount of skepticism, I still got to think that the logo outputs of Nanobanana Pro are, to put it uncharitably, tasteless and ugly, but I also haven't gone in and tried to get something really great out of it, and to give it credit and acknowledgement, most of the logos that it was trained on, I also think, are absolutely horrifyingly ugly. Still, bringing it back to the very impressive,

Starting point is 00:16:56 Pro isn't just able to generate a logo or a brand asset. It's able to generate bulk brand assets. Crystal Maria writes, one shot at a brand and put it on merch with a low effort prompt on Nano Banana Pro. She created a new chicken pizza company and designed a pizza box, a t-shirt, and a hat,

Starting point is 00:17:14 all with an integrated logo system that was consistent. Andrew Lane did the same for a matcha energy and collagen brand. Now, one thing that people noticed as they were doing this is that Google appears to have wound back the guardrails just a little bit. It's more comfortable producing images of people and owned IP. For example, folks were able to get the Star Wars and Disney logos really accurately. Now, whether this is something that will persist, I have fairly big questions of,

Starting point is 00:17:39 but the more that within reason they can just let people do those sort of logo identities at least, I think the more use cases it opens up. Just a few more use cases before we wrap up, tons of people were experimenting with movie stills. Viral AI advertiser, PJ Ace, wrote, Nanobanana Pro is the most cinematic model on the planet. I asked Gemini to generate photorealistic leaked images from the new Legend of Zelda movie, and this will change Hollywood.

Starting point is 00:18:02 Archit Rathie did the same thing, with a Wallace and Grommet still, but was able to take it from multiple angles, calling it a leapfrog moment for AI filmmaking. Speaking of filmmaking, Nanobanana Pro's text capabilities also open up improved possibilities for using AI video as well. Nick Matarise writes, Step 1, upload an image or generate an image using Nanobanana. Step 2, use Nanobanana Pro to annotate the image.

Starting point is 00:18:26 His prompt was, add sketch annotations on top of this image explaining the camera movement. I want it to crane up and look down as an aerial shot. Step 3, use V03.1's frames to video to bring it to life. Basically, the annotation on the image, allow the video model to know what to do. There is a ton of media remixing, like people putting digital news articles on old newsprint, people taking contemporary logos and turning them fluffy, people taking photos of their kids and turning it into movie posters. In an impressive display of physics, Christopher Friant found that he could apply an image, in this case of Sidney Sweeney to a dodecahedron, and Fawfer again was able to take a meme

Starting point is 00:19:01 and turn it into Legos. Indeed, I think the meme potential here is pretty unlimited. I took the base face kid meme, basically an image that people share when they really like a song, particularly in dance music circles, and I asked Nanobanana to turn it into a four-part visual. scale where the face goes from normal to the most insane base phase. You can see here that I think that it absolutely nailed it, creating settings for normal, mild base, intense, and insane base face. What's clear from all of this is not just that the unlock score of nanomanana is off the charts, but that it pretty fundamentally redefines how we have to think about image generation capabilities.

Starting point is 00:19:38 For those of you who have followed Ethan Malik for a while, you'll know that he's used a similar test prompt for years to see how new image generation models fail. models fail. It's basically otters on a plane using Wi-Fi. He writes with tongue-in-cheek, I think my otters on a plane using Wi-Fi may be a saturated benchmark now that Nanobanana Pro can do this. The image is a set of white lab coat and glasses-clad otters, describing on a whiteboard why models previously had a hard time with this, with a gallery wall on the right, showing all of the previous generations. We are, in other words, in very new territory. Now, we'll explore a lot more about just what the implications of this are. For now, if you have access to Gemini,

Starting point is 00:20:17 I would highly recommend just going and spending a bunch of time playing with this. Try exploring not just interesting image generation, but something where you need to convey a lot of information density with visuals. I think you'll be impressed and I think it'll change in a good way what you think is possible with AI image generation.

Starting point is 00:20:34 For now, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 25 Things Nano Banana Pro Does That AI Couldn't Before

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.