Limitless Podcast - ChatGPT Images 2.0: The Visual AI that Actually Thinks
Episode Date: April 22, 2026OpenAI's GPT Images 2.0 is out, and it's pretty wild what it can do. Remember ghibli-gate? Remember when it couldn't write words? Things are changing so fast these days.------🌌 LIMITLESS ...HQ ⬇️NEWSLETTER: https://limitlessft.substack.com/FOLLOW ON X: https://x.com/LimitlessFTSPOTIFY: https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQAPPLE: https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890RSS FEED: https://limitlessft.substack.com/------TIMESTAMPS0:00 ChatGPT Images 2.0 Unveiled2:24 Realism4:42 Design Applications6:59 Professional Use8:55 The Risks of Misinformation16:27 Benchmarks18:37 Digital and Physical Worlds22:55 The Future of Visual AI25:03 Closing Thoughts------RESOURCESJosh: https://x.com/JoshKaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
Just yesterday, OpenAI released ChatGPT Images 2.0.
And the model blew my mind.
I was up until 2 o'clock last morning playing around with it because of how powerful it is.
As I was watching Sam announced this model, he was talking about how image gen wasn't really
that important to him.
He felt like they already had a good image generation model.
When he was presented with the outputs of this one, he had his holy shit moment.
It's actually really phenomenal.
And through trying it ourselves, we have uncovered that it is actually true.
I mean, we've really frequently used nanobanana as the go-to default image generator,
but now it's getting close to being indistinguishable from reality entirely.
And we have a series of examples that we're going to show you that are probably useful
for your actual applicable life, things like interior design or generating comics or generating
sales graphics.
I don't think there's anyone who wouldn't find a beneficial use case of an image generator
model that is as good as this one.
So let's get into the actual announcement.
Let's walk through the examples.
It's pretty amazing stuff.
Around midday yesterday, OpenAI tweeted this very mysterious post and it goes,
this is not a screenshot, which is weird because it looks like a screenshot of someone's Mac desktop,
except this is completely AI generated.
And this was the precursor to their official announcement, which is ChadGBT Images 2.0.
It's their new image model and it absolutely blows every other image model out of the water.
And I don't mean that as an understatement.
it is number one across every single image benchmark.
It's beaten nanobanana two,
and any of the Chinese image gen models
just completely don't weigh up.
So what are some of the new things here?
Well, the fidelity and quality of these images
are incredibly high.
You're seeing a demo video here where we have a chameleon
in various different positions.
The wording text is typically such a hard thing
for image models to nail,
especially within the AI world.
It would jumble up the letters
or it wouldn't spell things correctly.
Now we have that completely,
and utterly resolved. And so you can see some of these examples come to life here. For example,
look at the fidelity of this image of rice. Typically, this would just look like a garbled white
mass. And now you can individually see each grain, which is pretty nuts. And then you have examples
which are a little scarier, where this looks like a real screenshot of a handwritten note in someone's
stylistic way, but it is very much completely AI generated. So you can imagine this could be
useful, various different nefarious purposes, which might get more malicious. But there's a ton of
different examples. And we want to get straight into it, starting with ones that we've generated
ourselves. There's one around furniture, right? Yeah, but I actually want to start with the rice one,
because you mentioned what the rice is that it's precise enough to show the grains of rice,
but it's also precise enough to write a single word on a grain of rice. And that fidelity is new.
So what I did is I actually went to chat chip between myself and tried to emulate this. And I
asked it to create a piece of rice with the word GPT Image 2 generated on it. And this was the output
that I got. Actually, this was the first output that I got. And I spent maybe five minutes trying
to find the grain of rice. I don't think it worked. So I asked it to draw a box around the grain
of rice and it drew a box and then actually etched it in the middle. So there are some edge cases
that don't quite work. I mean, that grain of rice was not in the original one. But for the ones that
do work, it's pretty incredible. And you mentioned furniture. I am currently living in
apartment that can use a little bit of extra furnishing. This, unfortunately, is not what my apartment
looks like. This is a much nicer variant of something that I would like to aspire to. So what I have
prepared here is a reference image for chat GBT along with the prompt of what I would like it to do.
And that involves just doing things like adding lamps and adding different furniture, basically
swapping out the existing furniture that exists in this living room and moving it into a totally
new vibe and style that I think I would more likely appreciate and resemble.
So while that's thinking, I guess we can kind of get into some more of the interesting parts of this model.
Well, I have an example that I actually have ready to go here.
I was kind of obsessed.
I don't tell anyone this.
I was obsessed with manga as a kid.
And so I was like, you know what would be cool?
If we could turn our show, you and I, Josh, into a manga comic.
So I created this detail problem, and I gave this beautiful photo of us.
Oh, look at those handsome guys.
Look at this handsome, very handsome guys.
And I basically asked Chat ChpT to generate the prompt for me.
So I gave it a rough idea of what I wanted to create the scene, as it were.
And it created a very detailed prompt with stylistic references, details, stuff that I wouldn't know because I'm not a storyboard.
I'm not a manga creator.
But funnily enough, I have an AI that can do it for me.
So I don't know if anyone is paying close attention to this storyline here.
But if you're not, that's great because I want to show you the end output.
So as you can see, very long prompt, and this is the finished result.
So what you are looking at here is Josh and I, let me explain this.
Josh and I have been filming a podcast.
As you can see, we've got our setup over here.
But then we look out the window and there is a shadow.
And we notice that it is Sam Altman, goggzilla size, coming down upon us, terrorizing New York City.
I'd say the time estimate is roughly five years in the future.
even three. Don't know how quickly AGI gets here. We grab our weapons. It is clored. This is not a
sponsored video, by the way. I just came up with this randomly. And it shoots out prompts that wrap
around Sam Altman and eventually bring down GPT5 from taking over the world. Now you know what's
going on in my head. But if you just notice this, like look at the fidelity of this. This like took
five seconds to create the prompt and then another two minutes to create the actual image. Look at
the fidelity of this. Like the writing is all accurate.
This would cost like thousands and thousands of dollars and weeks, maybe months of time to actually create from scratch.
And this did it in a bunch of seconds for a couple of cents.
Like it's pretty impressive.
Oh, it's so good.
So if manga isn't your thing, we have the furniture example.
It's ready to go.
So here I have the original that we're seeing on screen right now.
This was the original living room.
I fed it the prompt.
And here is the new one.
It totally maintained the integrity of the room whilst swapping out just a few key pieces of furniture to change the vibe.
And I think it's a testament to a practical use case that a lot of people might have is they want to
design things. They want to make things look good and maintaining the personalized fidelity of whatever
space it is. If you have a piece of clothing, I know this works for trions. It's really good at maintaining
continuity throughout these images. So I thought that was a pretty interesting thing. If you have an apartment,
if you have a closet worth full of clothes, you can just place those clothes out, take a picture of you,
take a picture of their clothes, ask it to address you, ask it to redo your living room, whatever
it may be super powerful and works fairly quick. I mean, this output, so.
maybe a minute to generate.
And for those not sure,
this is actually available to all users
of chat GPT, I believe.
Very limited instances for the free users,
but if you have the plus plan for $20 a month,
you can just go off and start creating images
and they will look just as good as this one.
Yeah, I mean, if you're a professional
that has been toying around with using AI,
but it's just never been good enough.
It's always got some form of error,
whether minor or big,
now we have a tool that actually work
for you. So if you're a designer, if you're a floor planner, there's a bunch of other examples
I'll show here. This becomes a practical tool. Like GPT Images 1 was very much a novelty and a toy. It was
fun to see everyone in Studio Ghibli versions of ourselves, but now you can use this to create certain
things. Now, not all use cases are good. If you're like me, I use social media to disseminate
a lot of the breaking news that happens in the world of technology, AI or whatever it might be.
but you now have reached a point where we can't necessarily believe everything we see.
And images too from chat GPT doesn't make that any easier.
What you're seeing on the screen right now is not an official take or update on the Bloomberg terminal.
That is also not my desktop monitor.
This is completely AI generated.
And you can probably tell parts of this kind of gives it away.
It's a little too zoomed in, unless, of course, you can like change the default settings
in your Bloomberg terminal.
But some of these things are really good.
Like, this is exactly where this is on the Bloomberg terminal.
The percentage mark isn't that large on the actual thing.
But it's got all the sections pretty much nailed.
So you know that the model looked up official Bloomberg terminal layouts and like recreated it.
But it added a completely fake kind of like bit of news.
So you could change that bit of news to represent real news, but it would still be fake.
So there's a lot of like avenues here for misinformation or disinformation.
So like not entirely accurate, but somewhat accurate.
You can imagine the kind of social media frenzies that this would create if people were
to believe and buy into these things.
Like imagine if you read an announcement that wasn't actually real bought a stock and then
it like realized that it was fake and then it crashed.
You could end up losing money.
You could fake data.
There's a lot of avenues that this go down.
Yeah.
There's two points on this.
One is that like we're at the point now where even if you pixel peep, it is almost indistinguishable
from real life.
You can't really tell what is AI generated and what is not.
And as that kind of gap converges, I imagine it will create a lot of chaos where there's just
no way to tell what's real when these images are so good.
The second thing that I'll mention is from this model in particular, anytime it's asked to
generate a visual asset of a piece of software, for some reason it's exceptionally good at
understanding the nuances of every frame of every piece of software.
Last night, I had it do DaVinci Resolve, which is what I edit a lot of videos in.
I had to emulate Photoshop.
and it got every single slider down to the correct pixel,
which leads me to believe that it appears as if there was some training customization
around the software project in particular.
And you have to ask the second order question,
why is it so good at all of the software?
And I guess the answer for me is,
well, it's probably because they want their agents to understand how to navigate it
and then eventually emulate it and then eventually replace it.
And this nuanced understanding of how everything works is training for
the image generation model, but also training for just, I mean, the future of what these agents are
going to look like. So there might be some hidden stuff going on behind this image generation model
as well. So back to the demos, in addition to these capabilities, we have another one teed up right here,
which is to create a premium infographic poster. Another strong suit of this model is text
and how well it's able to render text that looks lifelike, looks accurate, and is able to kind of
create a storyboard, if you will, a poster, it can create multiple outputs. What I've asked,
to do here is create an editorial infographic, and this is the first time I'm actually seeing
the output of this. And it seems pretty cool. So this is for Limitless, as you are familiar with.
And it kind of walks through our week in review. So the things that Limitless mentions,
this is the poster that serves as like the weekly roundup, the weekly review. It is pretty good.
Is it accurate? Yeah, I'm curious. You check the accuracy. I'll check the QR code,
see if that works. Because word on the street is that QR codes work pretty well.
Wow. I might need to replace the entire Roundup newsletter, Josh, with something like this, just a quick glance. A quick take for it. You can imagine how this can kind of carry out to other applications, right? It's like if we want to juice the newsletter up a little bit, we could just create a graphic with one prompt by feeding it the context of everything we spoke about to give you these detailed infographic. This also applies to educators and people who are teaching things. It's really easy to make graphics on
particular lessons or mathematical equations or diagrams or anything you want visually represented,
it's exceptionally good at that. So I thought this demo was kind of fun. It creates the limitless
we can review as a poster that's printable. The QR code does not work, but I've asked it to make
the QR code scanable. So while it finishes that up and we test that, maybe we could go on to another
example. Yeah, I was just going to say before we move on, the educational point is a very
precedent one, mainly because if you're like me, you could like read as much text as you want,
but sometimes a visual that summarizes everything really helps. You can now plug like an
entire book's worth of text into a single prop. Like a lot of these frontier models now have like
a million contexts, which is like a couple of novels or like many, many novels. And so if you can imagine
if you're trying to learn about something and you want like the key points, you can not only ask the
AI to summarize things and give you a bullet pointed list, you could get them to transform it into an
illustrative poster that just you can look at in a glance before you go to bed and learn something
brand new. So I can imagine this being used in science as well. Back when I had a biology degree
or back when I was doing my degree, I remember we used to have these like research poster
conferences. And they used to be like, like, I don't know, A1 size. It was absolutely massive. And you
would have so much condensed information there. And it took me weeks to make. And the fact that I now
have a tool here where you can just probably plug in a bunch of papers, get out, extract the right
information and then put it out in a very visual way, just blows my mind. Like we are condensing
a lot of frontier research and education tools like with this one simple update. It's very,
very cool. But to move on to one more example that we generated, one thing that's cool about
images too is you can play around with one image and make it into several different aspect
ratio. So what we have here is an individual, I don't know who this individual is, but it has generated
it, looking out onto the greatest city in the world, in my opinion, New York City, and it's in like a nice
little sunrise or sunset. I can't tell which one is which. But as you notice, it gives us different
aspect ratios of the guy. Like over here, we see him on the left. Over here, we see him from a
distance back. Over here, we see a panoramic view where we can see him looking out onto, what is this?
This is Brooklyn Bridge. So the details of it, you know, you can see some of it as, you know, you can see some of it
is that kind of like blurred aspect ratios as well. It's just very impressive. And you could start
creating like storyboard sequences from this or just kind of like pitching visuals to whatever
you, whatever kind of like idea or concept you want to make. You could use this in the product
realm if you're like trying to figure out whether a model looks good advertising your product in,
like let's say the product was a coat in this particular way. Or it could just be something advertising
completely different. It's very cool. So how does this model perform so well, I think is a question. And one of
the novel breakthroughs that this image gen model has that others don't is the detailed reasoning
capabilities. This is an image generated model that will think before acting and will reason through
the steps required to get the best image output. So generally, it's just pure inference. You give it a
context, you give it input, and it just spit something out. This one actually reasons through
the, I guess, the reasoning of why it's doing these things. And that's part of the reason why,
even though you're not giving it necessarily the best prompt, it's giving you a really powerful output.
it. And I have another fun example here of just like more comic books that you can make. This was a
single prompt and that generated like an entire comic book with a really accurate character that's
carried throughout. Another fun feature is the character continuity where you can generate a character
it will be prevalent throughout all the images. And then also one last example that we have here is of
anyone who's involved in social media or just creating any sort of marketing material. I asked it to create
an ad package for a masha shop in Williamsburg called Sage Bird. And Sage Bird now has a full
kit of various aspect ratios to be posted on any platform that looks photoaccurate. If you'll notice
there's even a street sign that says Bedford Avenue, which is a street in Williamsburg,
which is very funny. So I think the fidelity, the quality, the capabilities of this model
are really endless. And again, the constraint is your imagination with how far you can push this
thing because it's just, it's so powerful. I had so much fun using this. I must have generated at
least 100 images so far just in the last 24 hours. And it is like, it's so fun. I recommend everyone
go and try it and figure out what use cases are best for you.
So a question that came to mind immediately is, okay, it's good, but how does it compare to
some of its competitors, primarily nanobanana 2 from Google, who has previously held the number
one spot here? Now, if you look at this image over here, it's not just number one, it's number one
by a far mile. I think it has like 150 point increase on image arena. If you don't know what
this is, this is like the number one benchmark to test these image models. GPT images too isn't
just number one overall. It is number one across every single category that is measured within this
benchmarks. By a long shot. By a long shot. So it has a very distinctive lead. And if you're looking
in this and you're saying, okay, well, whatever, people can like orient benchmarks around this.
So like, we don't know if it's real. I have a direct comparison for you. So the same prompt
fed into GPD Image 2 versus Danobanata Pro. And you can see that there is quite a lot of differences.
You can see GPD images 2 over here on the left. The lighting is much.
brighter. The fidelity is arguably a lot better. And as you can see, like, you know, there's more
expression on her face. She's smiling. And there's a lot more things in the background. Like, if you
look at the plants in the back, it's way more hyper-realistic and harder to create for an image
model. Now, if you look on the right, nanobanata 2 is very good, but there's less complicated
things going on behind them. The lighting is a little bit off. And you can kind of tell that,
I don't know, like maybe still on both sides, you can tell that they are kind of slightly
AI generated. I would actually argue that images too, now that I'm looking at it. And you can
for longer, looks like the glisten just seems too glisteningy, but nanobonon2 can get away with it
because the lighting is a little less. But the point is, these models are getting way, way better.
And the examples keep coming, but it's not just visual things. Like, we're not, like social media
influencers don't have to be worried here. You can start using this for very practical purposes.
Now, there was this awesome example over here where a guy took an image of a book, right?
and he said, could you generate me a barcode for this book?
And he generated the barcode.
And when you scan the barcode, it takes you, it's basically an embedded link.
It takes you to a page where you can then purchase or buy the book.
Now, this is very impressive for if you're like trying to sell a particular product,
especially if it's physical, you now don't need to go through the complicated process of generating barcodes,
getting it printed.
You could feasibly create your own design book cover, print it out,
and then wrap it around your actual product, and it actually works. It works with your internal
system. So I just thought this is pretty cool. Yeah, it's amazing. The clarity, and again, I think
this is a testament to the reasoning where it can actually reason its way through and generate an
accurate barcode in a world where it previously couldn't. So now not only can it make infographics,
but it could link these dynamic elements to real world artifacts, to a custom domain, to your book.
They're actually usable without needing to take it into Photoshop and take it that final mile.
And that's like a really cool unlock. We have another,
example here as well of the front page of the New York Times, which of course is entirely fabricated,
or at least partially. So like this isn't a real article. This isn't a real image of a paper,
but all the information on it. So if you actually dig in here and read it, all the information about
open air unveiling GPT image two is accurate. They pulled it from the blog post. You didn't have to
provide the blog post. They independently did it. It reasoned through it, pulled out the most important
points and then wrote it in a stylistic manner of a New York Times writer. So you can start
imagining what this could do for press and media. If you are a reporter, you might be thinking,
huh, so you're telling me I could just feed this the bullet points that I want it to make and it
could write it in my voice, in my DNA that I like stylistically write an article for. That's amazing.
You could also ask it to generate the image for you. So there's like this metro approach where like you're
talking about the product, but then you use the product to generate an example image that you then put in.
This is, of course, also generated by images too.
So there's a lot of applications here.
Again, I mentioned earlier, disinformation is a very real thing.
So you can imagine people sharing fake news articles about things that aren't real that might sway markets or inform people in the incorrect way.
But cool, nonetheless.
Yeah.
And then there's more examples for people who are involved in architecture at all if you're doing floor plans.
I mean, this one was cool where you fed it an image of a house and then it generated a floor plan.
But the next example, I think, was even cooler because this.
This was a digital rendering of a large building that had all of the specs listed next to it.
And using that spec sheet and using that 3D rendering, it created a fully rendered floor plan
that you can actually use and send to an architecture, or send to an architect to actually make blueprints and build the building.
I'm not sure if this is up to code.
I'm not an architect.
But I imagine you can probably iterate your way through this with a proper architect to get it to be compliant to get it up to spec if it's not already and train it to do that.
So there's this unbelievable unlock that happens for pretty much any profession that's generating any sort of image.
All you need to do is put a stamp on the bottom.
It looks like it already stamped it with some fake stamp of proof.
But I'm sure if you do this type of work, you can kind of put your own spin on and throw your own stamp on there.
If any of you are architects, listen to this.
I encourage you to try this out because I'm actually curious whether this is accurate or if not, like how accurate is it?
Because obviously, like, architects in training, like, trained for seven years at school, which is just insane.
They have to understand the physics behind the buildings that they're designing.
And I'm wondering, is this physically accurate?
Are the estimates, like, do they make sense?
Or is this completely made up and we still have a long way to go?
It looks legit to me, but then I'm not an architect.
So if you're listening to this, let us know.
There's another cool thing here where, again, I mentioned earlier, if you are a visual learner,
sometimes you just, there's too much information.
You can create these posters bracketed by a particular subject and it kind of splits it up.
So like with here we have like all the things going on in AI.
You've got AI models and agents, robotics, semiconductors.
And you just have images which explain the start to end process of creating these different things
and what they actually do with a few words underneath it, which I thought was cool.
And then there was this final example over here from Matt Schumer where I can relate to this
because I formerly worked at a Big Four consultancy.
and we had to create slide decks,
and it would take so long
because you had to move things
in a specific way or re-format the text,
and Matt Schumer one shot at an entire slide deck
by just providing it a bunch of information,
and it created it in the style of Spotify
by the looks of it.
So very cool, loads of different applications,
and I can't wait for more people
to actually use this for professional purposes.
Yeah, the model's awesome,
and I guess the ask is to share whatever you're using it for.
Because again, like those prompts,
those examples are the only limiting factors to really what this can do. Because it has the
reasoning, because it's so capable, it has the pixel perfect fidelity, it's really just a matter
of massaging it with prompts to get the output you want, not really a limitation of the model
anymore. And to Sam's point early in the episode, it seemed like it was great before. Now this is
just unbelievable. I can't imagine going back to Nanobanana Pro knowing that this exists. And it's
just a testament, again, to how fast we're going. And like what the downstream implications of this
may be in the future. When you can generate infinite images for cheap that are pixel perfect and
indistinguishable from reality, what type of downstream effects does that have on every visual
artifact that we interact with on a day-to-day basis, I mean, there's no way you could be sure.
And this has a lot of implications that I'm not sure we're fully aware of now, but will surely
become known well as we kind of navigate through this. It creates a weird dynamic that seems a little
uncomfortable. And now I have to navigate the internet with such a strong filter to just try to
parse through what's real and what's not. I'm curious whether this tool can be used to generate
visuals that humans hadn't thought of before necessarily. As the AI becomes smarter and is trained
on our prompts and largely our flaws, like you know, you can ask an AI to generate a detailed
prompt to then prompt it itself because we don't know how to prompt it itself. Like it can do the
same with images where it's like, I get that you just probably missed this point. And so maybe if I
create this visual in this particular way, it's one that he hadn't thought of, but now it like
breaks new ground for it. So I wouldn't put it past this model to like the model that we have
today to generate something, a visual artifact that will soon be kind of like groundbreaking for
humans to use. Like maybe it's not a poster, maybe it's not a slide deck, maybe it's something
completely new that we haven't seen before. Pretty exciting stuff. Yeah. So that's chat chip
images 2.0, the newest and hottest image gen model in the world. I encourage anyone to try
to displace it because that would be amazing if it gets better than this, but it's worth trying.
It's worth sharing what prompts you use that give you some specific outputs that you may find
helpful, interesting. The use cases are the currency. Please share yours in the comments section down
below. If you enjoyed this video, don't forget to share it with a friend who may also want to
generate some images. Perhaps they're involved in social media. Perhaps they just want to redesign their hypothetical
apartment. Whatever it may be, it's fun. It's worth testing. It's worth trying to just like
feel it and understand the intelligence. But yeah, I think that's pretty much it for today's
episode. You have any final pot or any thoughts here? Nope. If there's one request that I have,
I want to see the images that you generate and try and surprise us. Try and do a use case that we
haven't covered on this particular video because I'm curious of the creative purposes around this.
Our social media profiles will be linked below. DM us there. And yeah, I look forward to seeing what
you have to make.
Awesome.
Cool.
All right.
We'll see you guys the next episode.
