The AI Daily Brief: Artificial Intelligence News and Analysis - 10 Things Transformed by ChatGPT's New Image Generation Model

Starting point is 00:00:00 Today on the AI Daily Brief, 10 things that are transformed by ChatGPT's new image model. Hello, friends. Welcome back to another long reads episode of the AI Daily Brief. Although once again today we're doing things a little bit differently. This week's big topic of conversation has, of course, been Chat Chip-T's new image generation model. The tangibility of image generation made it sweep aside even other really important news like Google's Gemini 2.5 release. What's more, this was one of the... those model moments where the new performance was not just incremental, but actually opened up entirely new categories of use cases, that to the extent that they had been explored with

Starting point is 00:00:40 previous models relied on either complex wrapper software or complicated workarounds and workflows, but are now just built into the model at a core level. And so what we're going to do today is read a long tweet from Balaji Shrinivasan about 10 things that this new model release changes. I'm then going to pick out a few of them that I think are most important or most interesting to discuss and build the conversation from there. So let's do this. Let's read through Bologi's tweet first, and then I'll dig in for myself. Bology writes, a few thoughts on the new chat GPT image release. One, this changes filters. Instagram filters require custom code.

Starting point is 00:01:18 Now all you need are a few keywords like Studio Ghibli or Dr. Seuss or South Park. Two, this changes online ads. Much of the workflow of ad unit generation can now be automated. Three, this changes memes. The baseline quality of memes should rise because a critical threshold of reducing prompting effort to get good results has been reached. Four, this may change books. I'd like to see someone take a public domain book from Project Gutenberg, feed it page by page into Claude, and have it turn it into comic book panels with the new ChatGBT.

Starting point is 00:01:46 Old books may become more accessible this way. Five, this changes slides. We're now close to the point where you can generate a few reasonable AI images for any slides. slide deck. With the right integration, there should be less bullet point-only presentations. Six, this changes websites. You can now generate placeholder images in a site-specific style for any image tag as a kind of visual Laura Mipsum. Seven, this may change movies. We could see shot-for-shot remakes of old movies and new visual styles, with dubbing just for the artistry of it, though these might be more interesting as clips than as full movies. Eight, this may change social networking.

Starting point is 00:02:19 Once this tech is open source and or cheap enough to widely integrate, every upload image button will have a generate image alongside it. Nine, this should change image search. A generate option will likewise pop up alongside available images. 10, visual styles have suddenly become extremely easy to copy, even easier than front-end code. Distinction will have to come in other ways. All right, so that's the frame set.

Starting point is 00:02:41 I'm not going to talk about all of these. I'm going to bop around a little bit to the ones that I find most interesting to explore a little bit more deeply. First of all, let's talk about Bologis first, the idea that this changes filters. Now, obviously, we have seen this happen over the past few days where a huge number of people have giblified themselves or their families. Sam Altman himself has a studio Ghibli-style image now as his avatar for X. But I think it's not just changing filters. I think it's the fact that filters can now apply to entirely new domains. Basically, instead of just applying a filter to a single

Starting point is 00:03:15 image or photo, you can now effortlessly apply an aesthetic to an entire experience. Take, for example VC and builder Yohei of UntappedVC, who giblified their entire website. For those of you who are listening, not watching, this is another time that it's really worth checking out the visual, even if it's just you going to Untapped.vc. In addition to the background of the website feeling like a Miyazaki movie, all of the Portco logos are once again a cover image that looks like a studio Ghibli film. Now, on the one hand, you could dismiss this as just a very in touch VC, an AI community member riding the AI trend, but I think what it shows is the idea of being able to port entire aesthetics onto big categories of content on the scale of an entire website. So Bologi is right that

Starting point is 00:04:00 it does change filters, but it's not just Instagram filters. Filters could now be applied to a much wider range of assets and domains. Next up, let's talk about memes. Bology says the baseline quality of memes should rise because a critical threshold of reducing prompting effort to get good results has been reached. What we don't have yet now just whatever it is, four or five days after this model was released is the first example of a specific meme. We have a meme template in that we have giblified everything, but we don't have a native chat GPT image generation meme that has arisen specifically because of the new capabilities. Instead, where everyone's been for the last couple of days is just copying old memes in the new style. Dan Romero did the

Starting point is 00:04:42 classic bar scene from Goodwill Hunting, obviously in Studio Ghibli style. With the text, of course that's your contention. You're a first-day chat-GyPt image prompter. You just got finished converting popular internet memes to anime, Studio Ghibli probably. You're going to be convinced at that till next week when you get to SpongeBob, and then you're going to be talking about how the visual styles of late 1990s Nickelodeon translate perfectly to the format.

Starting point is 00:05:01 That's going to last until next month. Then you're going to be in here regurgitating diffusion models are actually better ex-post talking about, you know, the superior techniques available in the upcoming Mid Journey V-7. Now, there is a very specific audience for that meme, of which I am probably the epicenter. but the point is that every old meme that has ever been on the internet at this point is being giblified in this way.

Starting point is 00:05:22 Pixelocifer got even more meta when they said, okay, this is a meme created by GPT when I asked her to make a meme about humans using AI to make memes. It shows a four-panel cartoon titled The Evolution of Memc Creation. In 10,000 BC, a caveman draws a mammoth and says, me draw a funny mammoth tribe laugh. In 2005, a programmer writes me using cool, cool fonts for memes. In 2025, a reclining person says, hey, I make a meme about human. humans using AI to make memes. And in 2030, a humanoid robot says, wait, am I making fun of myself?

Starting point is 00:05:50 Am I the meme now? Next up, let's talk about number four. This may change books. A couple of things here that are interesting to me. First of all, you are seeing a lot of comic book or graphic novel style creation already. Midasquant, for example, gave ChatGPT for images and asked it to turn it into a comic book and actually got something back. I saw other people using the character consistency dimension of this to make storybooks for their kids. Basically, in other words, one of the capabilities of this new model is that because it is natively integrated with the text model, you can use text to have fine-grained control and change very specific parts of the image. So you can start with one base image and then ask to put that same character in a new pose or a new

Starting point is 00:06:36 context, and it's going to do that in a much better way than the previous versions of the model, which had to go outside to the other separate Dali model and then bring it back in could actually do. So right there on your own, already this is going to be way better for any sort of visual storytelling like that. I think Bologi is right, though, that there may be some other types of capabilities that aren't just generating totally new books from scratch, but actually change the way that we interact with existing material as well. Interestingly enough, Ryan Hoover from Product Hunt posted separately, Request for Startup, Audible 2.0. Books are too verbose. voice readers are often sterile. No taking is clumsy. But thankfully, we have LLMs today that can

Starting point is 00:07:15 rewrite to be more concise and adapted to my preferred style of communication. Allow me to select a preferred reader, Morgan Freeman, please. Bookmark key concepts via dictation, e.g. Save the point about X. Now, Ryan says he doesn't think that this would be a good business and obviously the licensing is tricky, but he still wants it. I do think that the choice that's going to be offered in the future around how to consume content is really powerful and what this new model opens up is the visual aspect. of that. Next up, let's talk about coding for a minute. In number 10, Bology writes, in general visual styles have suddenly become extremely easy to copy, even easier than front-head code. Distinction will have to come in other ways. What I think is interesting about this,

Starting point is 00:07:55 is the way that this tool is going to hybridize and blend with the rise of vibe coding tools. For example, Riley Brown fed in a bunch of code to chat GPT and asked it to render it as an image, which it did flawlessly. I've seen other people go the other way, asking it to design a particular UI and then turn it into code, which it once again did really well. And in general, this is one more thing that is transforming what it's going to mean to build production software. On the one hand, we have text to code capabilities coming up. And on the other hand, we have text to UI design capabilities coming online via this sort of image generation. And where those two meet will be a very powerful place.

Starting point is 00:08:34 Now, as an aside, the CEO of Replit has officially come out saying that he no longer thinks you should learn to code, which is probably a longer conversation. But as these categories of tools converge, you can kind of see why he might feel that way. Number seven, this may change movies. We could see shot-for-shot remakes of old movies and new visual styles with dubbing just for the artistry of it, though these might be more interesting as clips than his full movies.

Starting point is 00:08:58 While count on the internet to get this one sorted right away, AI filmmaker PJ Ace posted within hours of this model going live, what if Studio Ghibli directed Lord of the Rings? I spent $250 in cling credits and nine hours re-editing the fellowship trailer to bring that vision to life. And sure enough, we have the full Fellowship of the Ring trailer as a studio Ghibli film rendered incredibly impressively. Now, one could be tempted to write this off right now as just simple novelty or toy. But novelties and toys are so often the way that we experiment with what will eventually become transformational. I would expect that the first wave of this transformation will be things exactly like this.

Starting point is 00:09:37 this, scoring viral hits by applying one aesthetic filter to a popular media asset in a different aesthetic, but I'm also quite sure that that's not where this will stay. And this sort of weird blend and hybridization will just become something that has a bigger, more fundamental impact on creation. Finally, let's talk about number two. This changes online ads. This has maybe been the most obvious transformation and the one that feels like it has the most disruption to an existing business. Lorenzo Green writes, the AI image generation war for ads is over. Ad teams are about to get smaller, way smaller. By way of example, he took a book, Dopamine Nation,

Starting point is 00:10:16 and asked Chatchapit to create an image of Mark Zuckerberg reading the book, which it did flawlessly. He took Liquid Death in an Apple ad and said basically create an ad for liquid death in this style. He points out that if you have an asset like a shoe, but no model, that is no longer a problem, creating an image of a happy nurse wearing a particular shoe, and so on and so forth. In fact, after Studio Ghibli memes, this is probably the most prominent type of generation that you've seen on your timeline. What's significant to is that while people are mostly showing their one-shot generations, again, the native capability of the model to custom modify very particular pieces of a generation means that you're not just stuck hoping your one-shot generation gets it right. You can go back and actually have fine-grained editing.

Starting point is 00:11:02 So where does this leave the ad industry? I do not think that it just ends it overnight. The world is awash in visual ads of all types, and some are better than others. Taste, creativity, concept. These are not things that are limitless even when you introduce AI. Think about Super Bowl ads. Super Bowl ads are literally the most important ad asset of any given year. Everyone who's making a Super Bowl ad has spent at least, and I'm not joking, at least $10 million on that ad, between the ad time and the ad creation process.

Starting point is 00:11:33 and usually it's closer to 15 or 20 million, and still most of them are absolute garbage. Still, what does absolutely change is that there's no way that the cost structure for visual or print ads doesn't come down. There's no way that the creative process around these assets doesn't change. We're back once again to the Dr. Strange Theory of AI work,

Starting point is 00:11:53 where I think part of what will be different is that creatives will test out a huge variety of ideas. Instead of sitting there in pitch meetings with a very small number of mock-ups, creatives will test hundreds of concepts. They'll design swarms of agents to test concepts based on dozens or hundreds of different styles.

Starting point is 00:12:11 They'll probably have other agents which test all of those ads against panels of theoretical people. And then ultimately, they'll take all of the advice and the ideas from AI and use their human taste to make a judgment call.

Starting point is 00:12:23 Still, it is undeniable that this is a massive, massive structural change moment for the ad industry. and trying to view it as anything less than that is sure to be trouble for businesses in that space who take that opinion. Now again, we are just a couple days out after this release. We're barely scratching the surface of what it can do. And already we've got these 10 areas or more where things really have changed. I for one can't wait to see what comes next. But for now, let's close this so we can

Starting point is 00:12:52 go mess around and giblify all of our family photos before that trend dies entirely. Appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 10 Things Transformed by ChatGPT's New Image Generation Model

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.