Limitless Podcast - ChatGPT Images 2.0: The Visual AI that Actually Thinks

Starting point is 00:00:00 Just yesterday, OpenAI released ChatGPT Images 2.0. And the model blew my mind. I was up until 2 o'clock last morning playing around with it because of how powerful it is. As I was watching Sam announced this model, he was talking about how image gen wasn't really that important to him. He felt like they already had a good image generation model. When he was presented with the outputs of this one, he had his holy shit moment. It's actually really phenomenal.

Starting point is 00:00:23 And through trying it ourselves, we have uncovered that it is actually true. I mean, we've really frequently used nanobanana as the go-to default image generator, but now it's getting close to being indistinguishable from reality entirely. And we have a series of examples that we're going to show you that are probably useful for your actual applicable life, things like interior design or generating comics or generating sales graphics. I don't think there's anyone who wouldn't find a beneficial use case of an image generator model that is as good as this one.

Starting point is 00:00:51 So let's get into the actual announcement. Let's walk through the examples. It's pretty amazing stuff. Around midday yesterday, OpenAI tweeted this very mysterious post and it goes, this is not a screenshot, which is weird because it looks like a screenshot of someone's Mac desktop, except this is completely AI generated. And this was the precursor to their official announcement, which is ChadGBT Images 2.0. It's their new image model and it absolutely blows every other image model out of the water.

Starting point is 00:01:22 And I don't mean that as an understatement. it is number one across every single image benchmark. It's beaten nanobanana two, and any of the Chinese image gen models just completely don't weigh up. So what are some of the new things here? Well, the fidelity and quality of these images are incredibly high.

Starting point is 00:01:39 You're seeing a demo video here where we have a chameleon in various different positions. The wording text is typically such a hard thing for image models to nail, especially within the AI world. It would jumble up the letters or it wouldn't spell things correctly. Now we have that completely,

Starting point is 00:01:54 and utterly resolved. And so you can see some of these examples come to life here. For example, look at the fidelity of this image of rice. Typically, this would just look like a garbled white mass. And now you can individually see each grain, which is pretty nuts. And then you have examples which are a little scarier, where this looks like a real screenshot of a handwritten note in someone's stylistic way, but it is very much completely AI generated. So you can imagine this could be useful, various different nefarious purposes, which might get more malicious. But there's a ton of different examples. And we want to get straight into it, starting with ones that we've generated ourselves. There's one around furniture, right? Yeah, but I actually want to start with the rice one,

Starting point is 00:02:34 because you mentioned what the rice is that it's precise enough to show the grains of rice, but it's also precise enough to write a single word on a grain of rice. And that fidelity is new. So what I did is I actually went to chat chip between myself and tried to emulate this. And I asked it to create a piece of rice with the word GPT Image 2 generated on it. And this was the output that I got. Actually, this was the first output that I got. And I spent maybe five minutes trying to find the grain of rice. I don't think it worked. So I asked it to draw a box around the grain of rice and it drew a box and then actually etched it in the middle. So there are some edge cases that don't quite work. I mean, that grain of rice was not in the original one. But for the ones that

Starting point is 00:03:12 do work, it's pretty incredible. And you mentioned furniture. I am currently living in apartment that can use a little bit of extra furnishing. This, unfortunately, is not what my apartment looks like. This is a much nicer variant of something that I would like to aspire to. So what I have prepared here is a reference image for chat GBT along with the prompt of what I would like it to do. And that involves just doing things like adding lamps and adding different furniture, basically swapping out the existing furniture that exists in this living room and moving it into a totally new vibe and style that I think I would more likely appreciate and resemble. So while that's thinking, I guess we can kind of get into some more of the interesting parts of this model.

Starting point is 00:03:52 Well, I have an example that I actually have ready to go here. I was kind of obsessed. I don't tell anyone this. I was obsessed with manga as a kid. And so I was like, you know what would be cool? If we could turn our show, you and I, Josh, into a manga comic. So I created this detail problem, and I gave this beautiful photo of us. Oh, look at those handsome guys.

Starting point is 00:04:14 Look at this handsome, very handsome guys. And I basically asked Chat ChpT to generate the prompt for me. So I gave it a rough idea of what I wanted to create the scene, as it were. And it created a very detailed prompt with stylistic references, details, stuff that I wouldn't know because I'm not a storyboard. I'm not a manga creator. But funnily enough, I have an AI that can do it for me. So I don't know if anyone is paying close attention to this storyline here. But if you're not, that's great because I want to show you the end output.

Starting point is 00:04:43 So as you can see, very long prompt, and this is the finished result. So what you are looking at here is Josh and I, let me explain this. Josh and I have been filming a podcast. As you can see, we've got our setup over here. But then we look out the window and there is a shadow. And we notice that it is Sam Altman, goggzilla size, coming down upon us, terrorizing New York City. I'd say the time estimate is roughly five years in the future. even three. Don't know how quickly AGI gets here. We grab our weapons. It is clored. This is not a

Starting point is 00:05:18 sponsored video, by the way. I just came up with this randomly. And it shoots out prompts that wrap around Sam Altman and eventually bring down GPT5 from taking over the world. Now you know what's going on in my head. But if you just notice this, like look at the fidelity of this. This like took five seconds to create the prompt and then another two minutes to create the actual image. Look at the fidelity of this. Like the writing is all accurate. This would cost like thousands and thousands of dollars and weeks, maybe months of time to actually create from scratch. And this did it in a bunch of seconds for a couple of cents. Like it's pretty impressive.

Starting point is 00:05:54 Oh, it's so good. So if manga isn't your thing, we have the furniture example. It's ready to go. So here I have the original that we're seeing on screen right now. This was the original living room. I fed it the prompt. And here is the new one. It totally maintained the integrity of the room whilst swapping out just a few key pieces of furniture to change the vibe.

Starting point is 00:06:13 And I think it's a testament to a practical use case that a lot of people might have is they want to design things. They want to make things look good and maintaining the personalized fidelity of whatever space it is. If you have a piece of clothing, I know this works for trions. It's really good at maintaining continuity throughout these images. So I thought that was a pretty interesting thing. If you have an apartment, if you have a closet worth full of clothes, you can just place those clothes out, take a picture of you, take a picture of their clothes, ask it to address you, ask it to redo your living room, whatever it may be super powerful and works fairly quick. I mean, this output, so. maybe a minute to generate.

Starting point is 00:06:45 And for those not sure, this is actually available to all users of chat GPT, I believe. Very limited instances for the free users, but if you have the plus plan for $20 a month, you can just go off and start creating images and they will look just as good as this one. Yeah, I mean, if you're a professional

Starting point is 00:07:01 that has been toying around with using AI, but it's just never been good enough. It's always got some form of error, whether minor or big, now we have a tool that actually work for you. So if you're a designer, if you're a floor planner, there's a bunch of other examples I'll show here. This becomes a practical tool. Like GPT Images 1 was very much a novelty and a toy. It was fun to see everyone in Studio Ghibli versions of ourselves, but now you can use this to create certain

Starting point is 00:07:27 things. Now, not all use cases are good. If you're like me, I use social media to disseminate a lot of the breaking news that happens in the world of technology, AI or whatever it might be. but you now have reached a point where we can't necessarily believe everything we see. And images too from chat GPT doesn't make that any easier. What you're seeing on the screen right now is not an official take or update on the Bloomberg terminal. That is also not my desktop monitor. This is completely AI generated. And you can probably tell parts of this kind of gives it away.

Starting point is 00:08:04 It's a little too zoomed in, unless, of course, you can like change the default settings in your Bloomberg terminal. But some of these things are really good. Like, this is exactly where this is on the Bloomberg terminal. The percentage mark isn't that large on the actual thing. But it's got all the sections pretty much nailed. So you know that the model looked up official Bloomberg terminal layouts and like recreated it. But it added a completely fake kind of like bit of news.

Starting point is 00:08:28 So you could change that bit of news to represent real news, but it would still be fake. So there's a lot of like avenues here for misinformation or disinformation. So like not entirely accurate, but somewhat accurate. You can imagine the kind of social media frenzies that this would create if people were to believe and buy into these things. Like imagine if you read an announcement that wasn't actually real bought a stock and then it like realized that it was fake and then it crashed. You could end up losing money.

Starting point is 00:08:52 You could fake data. There's a lot of avenues that this go down. Yeah. There's two points on this. One is that like we're at the point now where even if you pixel peep, it is almost indistinguishable from real life. You can't really tell what is AI generated and what is not. And as that kind of gap converges, I imagine it will create a lot of chaos where there's just

Starting point is 00:09:11 no way to tell what's real when these images are so good. The second thing that I'll mention is from this model in particular, anytime it's asked to generate a visual asset of a piece of software, for some reason it's exceptionally good at understanding the nuances of every frame of every piece of software. Last night, I had it do DaVinci Resolve, which is what I edit a lot of videos in. I had to emulate Photoshop. and it got every single slider down to the correct pixel, which leads me to believe that it appears as if there was some training customization

Starting point is 00:09:43 around the software project in particular. And you have to ask the second order question, why is it so good at all of the software? And I guess the answer for me is, well, it's probably because they want their agents to understand how to navigate it and then eventually emulate it and then eventually replace it. And this nuanced understanding of how everything works is training for the image generation model, but also training for just, I mean, the future of what these agents are

Starting point is 00:10:09 going to look like. So there might be some hidden stuff going on behind this image generation model as well. So back to the demos, in addition to these capabilities, we have another one teed up right here, which is to create a premium infographic poster. Another strong suit of this model is text and how well it's able to render text that looks lifelike, looks accurate, and is able to kind of create a storyboard, if you will, a poster, it can create multiple outputs. What I've asked, to do here is create an editorial infographic, and this is the first time I'm actually seeing the output of this. And it seems pretty cool. So this is for Limitless, as you are familiar with. And it kind of walks through our week in review. So the things that Limitless mentions,

Starting point is 00:10:49 this is the poster that serves as like the weekly roundup, the weekly review. It is pretty good. Is it accurate? Yeah, I'm curious. You check the accuracy. I'll check the QR code, see if that works. Because word on the street is that QR codes work pretty well. Wow. I might need to replace the entire Roundup newsletter, Josh, with something like this, just a quick glance. A quick take for it. You can imagine how this can kind of carry out to other applications, right? It's like if we want to juice the newsletter up a little bit, we could just create a graphic with one prompt by feeding it the context of everything we spoke about to give you these detailed infographic. This also applies to educators and people who are teaching things. It's really easy to make graphics on particular lessons or mathematical equations or diagrams or anything you want visually represented, it's exceptionally good at that. So I thought this demo was kind of fun. It creates the limitless we can review as a poster that's printable. The QR code does not work, but I've asked it to make the QR code scanable. So while it finishes that up and we test that, maybe we could go on to another

Starting point is 00:11:52 example. Yeah, I was just going to say before we move on, the educational point is a very precedent one, mainly because if you're like me, you could like read as much text as you want, but sometimes a visual that summarizes everything really helps. You can now plug like an entire book's worth of text into a single prop. Like a lot of these frontier models now have like a million contexts, which is like a couple of novels or like many, many novels. And so if you can imagine if you're trying to learn about something and you want like the key points, you can not only ask the AI to summarize things and give you a bullet pointed list, you could get them to transform it into an illustrative poster that just you can look at in a glance before you go to bed and learn something

Starting point is 00:12:33 brand new. So I can imagine this being used in science as well. Back when I had a biology degree or back when I was doing my degree, I remember we used to have these like research poster conferences. And they used to be like, like, I don't know, A1 size. It was absolutely massive. And you would have so much condensed information there. And it took me weeks to make. And the fact that I now have a tool here where you can just probably plug in a bunch of papers, get out, extract the right information and then put it out in a very visual way, just blows my mind. Like we are condensing a lot of frontier research and education tools like with this one simple update. It's very, very cool. But to move on to one more example that we generated, one thing that's cool about

Starting point is 00:13:16 images too is you can play around with one image and make it into several different aspect ratio. So what we have here is an individual, I don't know who this individual is, but it has generated it, looking out onto the greatest city in the world, in my opinion, New York City, and it's in like a nice little sunrise or sunset. I can't tell which one is which. But as you notice, it gives us different aspect ratios of the guy. Like over here, we see him on the left. Over here, we see him from a distance back. Over here, we see a panoramic view where we can see him looking out onto, what is this? This is Brooklyn Bridge. So the details of it, you know, you can see some of it as, you know, you can see some of it is that kind of like blurred aspect ratios as well. It's just very impressive. And you could start

Starting point is 00:13:56 creating like storyboard sequences from this or just kind of like pitching visuals to whatever you, whatever kind of like idea or concept you want to make. You could use this in the product realm if you're like trying to figure out whether a model looks good advertising your product in, like let's say the product was a coat in this particular way. Or it could just be something advertising completely different. It's very cool. So how does this model perform so well, I think is a question. And one of the novel breakthroughs that this image gen model has that others don't is the detailed reasoning capabilities. This is an image generated model that will think before acting and will reason through the steps required to get the best image output. So generally, it's just pure inference. You give it a

Starting point is 00:14:37 context, you give it input, and it just spit something out. This one actually reasons through the, I guess, the reasoning of why it's doing these things. And that's part of the reason why, even though you're not giving it necessarily the best prompt, it's giving you a really powerful output. it. And I have another fun example here of just like more comic books that you can make. This was a single prompt and that generated like an entire comic book with a really accurate character that's carried throughout. Another fun feature is the character continuity where you can generate a character it will be prevalent throughout all the images. And then also one last example that we have here is of anyone who's involved in social media or just creating any sort of marketing material. I asked it to create

Starting point is 00:15:13 an ad package for a masha shop in Williamsburg called Sage Bird. And Sage Bird now has a full kit of various aspect ratios to be posted on any platform that looks photoaccurate. If you'll notice there's even a street sign that says Bedford Avenue, which is a street in Williamsburg, which is very funny. So I think the fidelity, the quality, the capabilities of this model are really endless. And again, the constraint is your imagination with how far you can push this thing because it's just, it's so powerful. I had so much fun using this. I must have generated at least 100 images so far just in the last 24 hours. And it is like, it's so fun. I recommend everyone go and try it and figure out what use cases are best for you.

Starting point is 00:15:52 So a question that came to mind immediately is, okay, it's good, but how does it compare to some of its competitors, primarily nanobanana 2 from Google, who has previously held the number one spot here? Now, if you look at this image over here, it's not just number one, it's number one by a far mile. I think it has like 150 point increase on image arena. If you don't know what this is, this is like the number one benchmark to test these image models. GPT images too isn't just number one overall. It is number one across every single category that is measured within this benchmarks. By a long shot. By a long shot. So it has a very distinctive lead. And if you're looking in this and you're saying, okay, well, whatever, people can like orient benchmarks around this.

Starting point is 00:16:34 So like, we don't know if it's real. I have a direct comparison for you. So the same prompt fed into GPD Image 2 versus Danobanata Pro. And you can see that there is quite a lot of differences. You can see GPD images 2 over here on the left. The lighting is much. brighter. The fidelity is arguably a lot better. And as you can see, like, you know, there's more expression on her face. She's smiling. And there's a lot more things in the background. Like, if you look at the plants in the back, it's way more hyper-realistic and harder to create for an image model. Now, if you look on the right, nanobanata 2 is very good, but there's less complicated things going on behind them. The lighting is a little bit off. And you can kind of tell that,

Starting point is 00:17:12 I don't know, like maybe still on both sides, you can tell that they are kind of slightly AI generated. I would actually argue that images too, now that I'm looking at it. And you can for longer, looks like the glisten just seems too glisteningy, but nanobonon2 can get away with it because the lighting is a little less. But the point is, these models are getting way, way better. And the examples keep coming, but it's not just visual things. Like, we're not, like social media influencers don't have to be worried here. You can start using this for very practical purposes. Now, there was this awesome example over here where a guy took an image of a book, right? and he said, could you generate me a barcode for this book?

Starting point is 00:17:51 And he generated the barcode. And when you scan the barcode, it takes you, it's basically an embedded link. It takes you to a page where you can then purchase or buy the book. Now, this is very impressive for if you're like trying to sell a particular product, especially if it's physical, you now don't need to go through the complicated process of generating barcodes, getting it printed. You could feasibly create your own design book cover, print it out, and then wrap it around your actual product, and it actually works. It works with your internal

Starting point is 00:18:19 system. So I just thought this is pretty cool. Yeah, it's amazing. The clarity, and again, I think this is a testament to the reasoning where it can actually reason its way through and generate an accurate barcode in a world where it previously couldn't. So now not only can it make infographics, but it could link these dynamic elements to real world artifacts, to a custom domain, to your book. They're actually usable without needing to take it into Photoshop and take it that final mile. And that's like a really cool unlock. We have another, example here as well of the front page of the New York Times, which of course is entirely fabricated, or at least partially. So like this isn't a real article. This isn't a real image of a paper,

Starting point is 00:18:58 but all the information on it. So if you actually dig in here and read it, all the information about open air unveiling GPT image two is accurate. They pulled it from the blog post. You didn't have to provide the blog post. They independently did it. It reasoned through it, pulled out the most important points and then wrote it in a stylistic manner of a New York Times writer. So you can start imagining what this could do for press and media. If you are a reporter, you might be thinking, huh, so you're telling me I could just feed this the bullet points that I want it to make and it could write it in my voice, in my DNA that I like stylistically write an article for. That's amazing. You could also ask it to generate the image for you. So there's like this metro approach where like you're

Starting point is 00:19:37 talking about the product, but then you use the product to generate an example image that you then put in. This is, of course, also generated by images too. So there's a lot of applications here. Again, I mentioned earlier, disinformation is a very real thing. So you can imagine people sharing fake news articles about things that aren't real that might sway markets or inform people in the incorrect way. But cool, nonetheless. Yeah. And then there's more examples for people who are involved in architecture at all if you're doing floor plans.

Starting point is 00:20:04 I mean, this one was cool where you fed it an image of a house and then it generated a floor plan. But the next example, I think, was even cooler because this. This was a digital rendering of a large building that had all of the specs listed next to it. And using that spec sheet and using that 3D rendering, it created a fully rendered floor plan that you can actually use and send to an architecture, or send to an architect to actually make blueprints and build the building. I'm not sure if this is up to code. I'm not an architect. But I imagine you can probably iterate your way through this with a proper architect to get it to be compliant to get it up to spec if it's not already and train it to do that.

Starting point is 00:20:41 So there's this unbelievable unlock that happens for pretty much any profession that's generating any sort of image. All you need to do is put a stamp on the bottom. It looks like it already stamped it with some fake stamp of proof. But I'm sure if you do this type of work, you can kind of put your own spin on and throw your own stamp on there. If any of you are architects, listen to this. I encourage you to try this out because I'm actually curious whether this is accurate or if not, like how accurate is it? Because obviously, like, architects in training, like, trained for seven years at school, which is just insane. They have to understand the physics behind the buildings that they're designing.

Starting point is 00:21:18 And I'm wondering, is this physically accurate? Are the estimates, like, do they make sense? Or is this completely made up and we still have a long way to go? It looks legit to me, but then I'm not an architect. So if you're listening to this, let us know. There's another cool thing here where, again, I mentioned earlier, if you are a visual learner, sometimes you just, there's too much information. You can create these posters bracketed by a particular subject and it kind of splits it up.

Starting point is 00:21:44 So like with here we have like all the things going on in AI. You've got AI models and agents, robotics, semiconductors. And you just have images which explain the start to end process of creating these different things and what they actually do with a few words underneath it, which I thought was cool. And then there was this final example over here from Matt Schumer where I can relate to this because I formerly worked at a Big Four consultancy. and we had to create slide decks, and it would take so long

Starting point is 00:22:09 because you had to move things in a specific way or re-format the text, and Matt Schumer one shot at an entire slide deck by just providing it a bunch of information, and it created it in the style of Spotify by the looks of it. So very cool, loads of different applications, and I can't wait for more people

Starting point is 00:22:27 to actually use this for professional purposes. Yeah, the model's awesome, and I guess the ask is to share whatever you're using it for. Because again, like those prompts, those examples are the only limiting factors to really what this can do. Because it has the reasoning, because it's so capable, it has the pixel perfect fidelity, it's really just a matter of massaging it with prompts to get the output you want, not really a limitation of the model anymore. And to Sam's point early in the episode, it seemed like it was great before. Now this is

Starting point is 00:22:55 just unbelievable. I can't imagine going back to Nanobanana Pro knowing that this exists. And it's just a testament, again, to how fast we're going. And like what the downstream implications of this may be in the future. When you can generate infinite images for cheap that are pixel perfect and indistinguishable from reality, what type of downstream effects does that have on every visual artifact that we interact with on a day-to-day basis, I mean, there's no way you could be sure. And this has a lot of implications that I'm not sure we're fully aware of now, but will surely become known well as we kind of navigate through this. It creates a weird dynamic that seems a little uncomfortable. And now I have to navigate the internet with such a strong filter to just try to

Starting point is 00:23:36 parse through what's real and what's not. I'm curious whether this tool can be used to generate visuals that humans hadn't thought of before necessarily. As the AI becomes smarter and is trained on our prompts and largely our flaws, like you know, you can ask an AI to generate a detailed prompt to then prompt it itself because we don't know how to prompt it itself. Like it can do the same with images where it's like, I get that you just probably missed this point. And so maybe if I create this visual in this particular way, it's one that he hadn't thought of, but now it like breaks new ground for it. So I wouldn't put it past this model to like the model that we have today to generate something, a visual artifact that will soon be kind of like groundbreaking for

Starting point is 00:24:21 humans to use. Like maybe it's not a poster, maybe it's not a slide deck, maybe it's something completely new that we haven't seen before. Pretty exciting stuff. Yeah. So that's chat chip images 2.0, the newest and hottest image gen model in the world. I encourage anyone to try to displace it because that would be amazing if it gets better than this, but it's worth trying. It's worth sharing what prompts you use that give you some specific outputs that you may find helpful, interesting. The use cases are the currency. Please share yours in the comments section down below. If you enjoyed this video, don't forget to share it with a friend who may also want to generate some images. Perhaps they're involved in social media. Perhaps they just want to redesign their hypothetical

Starting point is 00:24:57 apartment. Whatever it may be, it's fun. It's worth testing. It's worth trying to just like feel it and understand the intelligence. But yeah, I think that's pretty much it for today's episode. You have any final pot or any thoughts here? Nope. If there's one request that I have, I want to see the images that you generate and try and surprise us. Try and do a use case that we haven't covered on this particular video because I'm curious of the creative purposes around this. Our social media profiles will be linked below. DM us there. And yeah, I look forward to seeing what you have to make. Awesome.

Starting point is 00:25:28 Cool. All right. We'll see you guys the next episode.

Limitless Podcast - ChatGPT Images 2.0: The Visual AI that Actually Thinks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.