Everyday AI Podcast – An AI and ChatGPT Podcast - EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the all-in-one creative AI studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. This is one of the biggest weeks in AI development like ever.

Starting point is 00:00:53 And I don't say that lightly. I've been doing this for like two and a half years. But let's just preview what happened this week in AI news. Well, we have the world's most powerful large language model that was released. And I'll tell you why I think it's a bigger deal than you might think. we got the most capable and flexible image model we've ever seen that I think will absolutely disrupt creative industries. One of the largest companies in the world released some breakthrough multi-agentic flows and maybe even biggest yet. There's talks of a certain AI lab raising $40 billion and a separate company made a $30 billion acquisition.

Starting point is 00:01:39 This is wild. This all happened in one week. Don't worry if you're scratching your head like what the heck is going on. I'm going to get you caught up and bring you the AI news that matters. What's going on, y'all? My name is Jordan Wilson and welcome to Everyday AI. We're your daily live stream podcast and free daily newsletter helping everyday people, not just learn AI, but how we can leverage it to grow our companies and our careers.

Starting point is 00:02:09 So if that sounds like what your doing, doing, you're in the right place. So it starts here with this live stream slash podcast, but it continues on our website. So if you haven't already, please make sure to go to your everyday AI.com, sign up for our free daily newsletter. Also on our website, being to know this, there's like now 500 almost episodes. So you can go whatever you were trying to learn in the world of AI, whether it's marketing, it's agents, it's ethics.

Starting point is 00:02:35 We've already interviewed hundreds of the world's leading experts. and you can access it all online for free on our website. It is a free generative AI University, so make sure you go check that out. All right, so welcome to our weekly installment of the AI News That Matters. We do this almost every single Monday. We cut through all the fluff, the BS, the press releases, and just bring you the AI News that matters.

Starting point is 00:03:00 So it's live and unscripted, and hey, live stream audience, do me a favor. Do I sound okay? I had a couple, you know, it was a couple minutes late getting this show started. Had some mic issues. So, yeah, hopefully y'all can hear me okay. Let me know in the comments. Love to see everyone tuning in. Max from Chicago.

Starting point is 00:03:21 Marie, Colby, Pedro, Brian, everyone else. Tons of people joining on the YouTube machine. Sandra's on the elliptical. Hopefully I don't keep you too long. All right, but let's, enough chit-chat. Let's talk about what's happening in the world. of AI news. So first, hardly no one is talking about this,

Starting point is 00:03:42 and I don't know why. So Microsoft has released a bunch of new, very capable agents. And I think that there's two in particular that are, you know, readers and listeners are really going to like. So Microsoft is solidifying its leadership and enterprise AI by unveiling major announcements to its co-pilot studio platform, including deep reasoning capabilities and agent flows. So Microsoft announced two key additions to co-pilot studio, deep reasoning capabilities

Starting point is 00:04:19 for tackling complex problems with your own data and agent flows that integrate AI flexibility with rule-based automations. So we did have Ray Smith, the VP of AI on, sorry, the VP of AI agents at Microsoft. on the show on Friday. So make sure if you're interested in this, go listen to episode 492. We gave you a complete breakdown, like the first people in the world

Starting point is 00:04:45 to get it straight from Microsoft to you guys. So make sure you go listen to that episode. I think it was a great look from Ray just into the future of AI agents and everything that Microsoft is working on with these new agents. But today is actually the day that agent flows is being released.

Starting point is 00:05:05 Yeah, today's March 31st. So, I mean, this is super new. So make sure to check if you are a heavy Microsoft 365 co-pilot organization, make sure you check in on Agent Flows, which should be released today. So Microsoft also announced that they've more than 400,000 AI agents were created in Copilot Studio just last quarter, showcasing some rapid adoption amongst Enterprise users. So there's a lot of new agents. I mentioned two of them,

Starting point is 00:05:39 but another one, the new analyst agent, I think is a standout feature. So it's functioning as a personal data scientist, capable of processing Excel files, CSVs, and embedded tables to generate insights via Python code and visualizations. The deep reasoning,

Starting point is 00:05:56 which I think is grabbing a lot of headlines, so the deep reasoning agents has some new and improved capabilities that allow it to perform methodical analysis, enabling use cases like generating RFP responses or conducting due diligence for mergers and acquisitions. I mean, whatever you might be using it for. And then agent flows, which should be released today's. So that combines, and this is huge, deterministic business logic with AI reasoning, addressing

Starting point is 00:06:25 customer needs for industries all over the place from fraud prevention to operational optimization. So here's why that's important and the whole deterministic piece, right? So that means there's a piece of this new agent flow that's not generative, right? So I talk about, and I've talked in the past a lot about notebook L.M and how it's grounded in your own data. And if you ask it something that's not in the data, it's just going to be like, yo, I don't know. So this is a feature of the new agent flows. It's deterministic, and it lives just inside of your essentially your Microsoft,

Starting point is 00:07:02 graph integration. So any of your live dynamic data within Microsoft 365 co-pilot, that's what draws or that's what this new agent is drawing from. So, you know, a less in a lessen likelihood of hallucinations in the way that that's set up. So other industry players, including Google, Open AI, Salesforce, and Amazon are intensifying competition with their own agetic platforms. But Microsoft's approach prioritizes accessibility, offering tools for both technical and non-technical users to create custom agents through natural language interfaces and low-code environments. That's the huge thing here, y'all.

Starting point is 00:07:45 Everything I just said, natural language, right? You don't have to be a developer. You don't have to know Python. You don't have to know JavaScript or any other programming language. The language that this accepts is human language, right? So you can build these very impressive multi-agetic flows that use deep reasoning. So this uses Open AIs, O3 mini model, you know, to run this, which is wild, right? So a lot of these agents that we talk about, you know, through whether it's Google, whether

Starting point is 00:08:24 it's Open AI, you know, there's the new Manis, right? they're great. Don't get me wrong, but one of the problems is a lot of times when you set up these agents, they're not necessarily working with your dynamic data. So you might be uploading some data, but that data changes, right? That document, that report, that, you know,

Starting point is 00:08:45 that quarterly draft that you're, you know, updating a huge document. That document might change monthly, weekly. It might change every single day. So that's one of the downs or one of the cons when you're working with some of these other agentic flows that aren't Microsoft and that don't have access to your up to the second data and information. So I think huge announcements from Microsoft that I don't think a lot of people were paying attention to, which is why I brought Ray on the show on Friday. So make sure you go listen to episode 492 for that. All right. What's that $40 billion number? I teased. Well,

Starting point is 00:09:21 that's Open AI is reportedly close to securing a massive record. breaking 40 billion with a B. I know I kind of get the sniffles. The allergies got me today, but that's not a sniffle. That is $40 billion funding round, signaling some significant growth and investment. So this is according to Bloomberg. So reportedly, SoftBank is leading this 40 billion round with an initial investment of $7.5 billion, followed by an additional two, $2.5 billion from a syndicate of investors. So a second tranche later this year is expected to see SoftBank contribute another $22 billion, along with $7 plus billion from other investors. So according to Reuters, Open AI, though, must first finalize its transition to a for-profit entity by the end of 2025 to secure the full $40 billion funding round led by SoftBank. So that's huge.

Starting point is 00:10:31 And failure to meet that deadline could result in SoftBank reducing its investment to only $20 billion significantly impacting Open AIs growth plan. So this funding round follows Open AI's previous $6.6 billion raise in October of last year, which was led by Thrive Capital and valued the company at $157 billion. So with this new round, Open AI's valuation is projected to soar to $300 billion, showcasing its rapid assents in the AI industry. So yeah, this one's obviously going to get very interesting with specifically Elon Musk and XAI, seemingly doing everything it can to try to slow down Open AI's plan to convert from a nonprofit, which it was originally set up 10 years ago as a nonprofit. And they've been trying to transition to a for-profit now for the better half of a year.

Starting point is 00:11:34 But Elon Musk and some others are trying to delay their plans in doing so. So now the stakes are extremely high. We're talking $20 billion potentially that they could not receive if they do not transition to a for-profit by this kind of end-of-year. deadline. So, wow, wow. I mean, talk about, talk about high stakes, right? We all think that the work that we're doing on a day-to-day basis is high stakes. And don't get me wrong, it is, right? If you're saving lives, you know, helping people. But man, I'm glad that I'm not in the, you know, CFO seat over there at Open AI. It's like, yeah, talk $20 billion in funding in the ballots.

Starting point is 00:12:19 If you cannot complete this transition to a for-profit in time. And that's, according to report. So yikes, I would not want to be in that role. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant now live in the Adobe Firefly app, the all in one creative AI studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across

Starting point is 00:13:06 Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible, so you can refine, redirect or take over at any time. You stay in the driver's seat as the creative director.

Starting point is 00:13:36 Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. Next, this little company that is a little behind an AI named Apple, right? They face so many, I mean, they face class action lawsuits in the past couple of weeks because they've been promoting this Apple intelligence thing that doesn't really, exist. Well, now there's some new AI news from Apple, and they're reportedly developing an AI-powered doctor and revamped health app. So, according to Bloomberg reports, Apple is advancing its health care technology with an AI-powered doctor, a redesigned health app and a personalized health

Starting point is 00:14:25 coach under the codenamed Project Mulberry. I don't know why they're always like Barry code names, but I don't know. Anyways, Apple is creating an AI power tool that will analyze health data from devices like the Apple Watch to provide tailored health care recommendations such as dietary advice for users showing signs of high blood pressure. So the redesigned health app, which is right now, will be unofficially named Health Plus, will feature food tracking a first for Apple, placing it in competition with platforms like, My Fitness pal and Noom. So the app may also act as a personal trainer by using iPhone's camera to excess workout techniques and suggest improvements potentially integrating with Apple's fitness plus service. So according to reports, Apple is collaborating with its in-house physicians and plans to expand

Starting point is 00:15:23 its team with specialists to produce educational content, possibly featuring a celebrity doctor to enhance engagement. That's just what we need. more celebrity doctors. So, uh, Apple CEO, Tim Cook has emphasized Apple's commitment to health and wellness,

Starting point is 00:15:41 uh, for the better part of half a decade, calling it the company's quote unquote, greatest contribution to mankind. Uh, we also had a story in our, uh, AI news that Matters segment last week that said Apple was reportedly,

Starting point is 00:15:56 uh, trying to jam some AI powered cameras into its Apple watch into, Apple AirPods, essentially trying to put cameras everywhere, right, not just on your phone. And now this kind of might make a little bit more sense when we see that Apple is really just trying to prep an AI doctor and really just trying to make an even bigger investment into the AI health space. Live stream audience, what do you feel about this, right? Do you want AI powered cameras on your Apple watch?

Starting point is 00:16:31 or on your AirPods or, you know, do you want all of this AI technology on every single other thing? For me, I'm torn, right? I have an Apple Watch. I have AirPods. I don't want cameras in them because I think that's super weird and probably intrusive. But I also like, I love this idea of an AI doctor, right? So, you know, it's one of these things that we constantly have to grapple with, you know, not just as consumers, but as business leaders, right? like how much of our data do we want to offer up for the sake of you know for the sake of increased

Starting point is 00:17:05 productivity for the sake of potentially increased revenue for the sake of increased health right it's it's interesting it's something i'm even always personally grappling with right Marie says seems like Apple is scrambling for a win uh astute observation there Marie yeah Apple has been you know getting crushed uh in the essentially the large language model race, bringing artificial intelligence features to its iPhone, which has been in abysmal rollout. I do think there has to be some sort of movie made one day about how Apple potentially lost trillions of dollars in market cap by screwing up Apple intelligence so badly. So, yeah, it should be interesting. Yeah, most people are just saying no. You know, Richard from

Starting point is 00:17:54 YouTube says, bring back the oxygen level. Max from LinkedIn says AI doctor is cool. More cameras, I don't know. Yeah, same thing, right? Yeah, like we all want the capabilities, but yeah, do we all want, you know, 10 cameras, right? If you have a smart ring, do you want a camera on that? Do you want an AI powered camera in your tennis shoes?

Starting point is 00:18:18 I don't know. All right. Here's the multi-billion dollar story that really no one is talking about. Yeah, there was so much. AI news this week. Some of these stories just got no real attention, but Elon Musk's X-AI has acquired Twitter or Axe for $33 billion to strengthen its company X-A-I. Yeah, confusing, right? So Elon Musk's AI company has acquired Twitter now known as Axe, which is also his company. So this was a little bit more of kind of some paperwork and more of, you know, kind of some official acquisition, you know,

Starting point is 00:19:03 legalese, but, you know, essentially now Elon Musk's AI company is the official new owner of the social media platform, Axe, formerly known as Twitter. So Elon Musk's artificial intelligence company XAI has purchased X, formerly Twitter, in a $45 billion. all-stock deal, which includes $12 billion in debt. So the social media platform itself is valued at $33 billion in this transaction, according to Reuters, which, you know, I believe Elon Musk originally acquired Twitter for $44 billion. But we saw reports previously that the valuation had dropped to about $10 billion.

Starting point is 00:19:51 So this was actually some, at least for me, some personally shocking piece of reporting here that we had saw the valuation from the original price of $44 billion. We saw reports that it was valued earlier this year at only $10 billion. Yet in this acquisition, which is kind of an acquisition, kind of not, but definitely still an acquisition. I know it's weird. but we saw it valued at $33 billion. So the deal, though, positions Musk to merge resources between XAI and X, consolidating data, computing infrastructure, distribution channels, and talent to enhance AI development.

Starting point is 00:20:36 So Musk emphasized that integrating X with XAI will improve training data for its chatbot GROC, potentially accelerating advancements in AI models and capabilities. So this acquisition follows XAI's recent funding success, where it raised $10 billion at a valuation of $75 billion, solidifying its position as a key competitor to open AI and other global AI firms. So the merger could allow X to serve as a distribution platform for XAI's products while leveraging real-time user data from X to improve AI training capabilities. My gosh, too many, too many X's. I still just like calling it Twitter. To bolster AI capabilities, XAI has been expanding its infrastructure.

Starting point is 00:21:25 Its Memphis-based supercomputer cluster called Colossus is reportedly the largest in the world in is designed to train next generation AI models like GROC 3. I don't know. You guys want a hot take? I know this is a new show, but it's going to take more than tens of billions of dollars. and, you know, all these acquisitions for anyone to actually use XAI or GROC products.

Starting point is 00:21:54 So I don't necessarily understand this, right? The problem with this, this kind of this merger and this, you know, this beautiful, you know, partnership between XAI slash GROC and Twitter is, well, Twitter has been shown or, you know, Axe, whatever you want to call it, in many recent studies. as the number one worst platform for disinformation, for bot activities, etc. So one of the biggest concerns for many companies when it comes to using large language models, it's truthfulness. It is, you know, having trust in the models that they're using.

Starting point is 00:22:37 So, you know, I've been saying this on the record for, I don't know, since GROC 1 was a thing that no one's going to use it. It doesn't matter how powerful it is, right? We saw GROC 3 released earlier last month, you know, from a benchmark's perspective, it did fairly well. Yet I literally do not know a single enterprise company that is using it as its main large language model driver, nor do I think any enterprise company should, right? It's not yet available via the API, so you can only use it, you know, at grok.com or

Starting point is 00:23:16 you know, within the Twitter platform. So I'm not really sure what the long-term plan is here for profitability from Elon Musk and X-A-I. Yeah, that's me. I just don't think it is a smart idea for enterprise companies to be using a model that now is even more tightly ingrained with the social media platform that has one of the highest instances of misinformation, disinformation in bot activities, and that's been shown across multiple studies. So, you know, do with that as you may. Marie says trust plus transparency equals trust equals more customers.

Starting point is 00:24:02 Yeah, good equation there. But yeah, I mean, without more trust in transparency in your training data, people aren't going to want to use it. So, yeah, I would still, from an enterprise perspective, I wouldn't touch GROC or XAI with a 100-foot pole. I don't care about the $33 billion valuation. All right. More large language model news.

Starting point is 00:24:26 So OpenAI has updated their previous flagship model, GPT40, and it has jumped up to the number two spots in the LM Arena leaderboards. So, yeah, another small story that's actually pretty big. So OpenAI announced an updated version of its GPT-40 model, highlighting major improvements in coding, instruction following, and creative capabilities. So most impressive, though, is that this updated version of GPT-40 shot up on the L-M Arena leaderboards from fifth place to second place, only trailing the, just released an extremely impressive Gemini 2.5 Pro from Google.

Starting point is 00:25:19 So also, you might be wondering, yes, that means that the now updated GPT4O model has surpassed OpenAI's newest model, which is GBT4.5, at least when it comes to head-to-head human preferences. So that is what the LM Arena measures. I think it's an extremely important kind of leaderboard. or measurement to talk about, you know, all these large language models, especially recently, I think they're overfit, right? So what that means is I think that the engineers building these models, you know, especially in 2023 and 2024, really tweaked them to perform really well on certain industry

Starting point is 00:25:58 benchmarks, but they weren't necessarily recognized by humans as being better. So I think it's important to look at both traditional benchmarks and these ELO scores from LM Arena, which is essentially, you know, you put in one prompt, you see two outputs, you don't know who the outputs are from, and you choose which one is better. So it is the blind taste test Pepsi versus Coke, but this new version, the updated version of GPT40 has shot up and it's actually really, really good. So the update has been described as making the AI more intuitive and flexible, with some users referring to its responses as unhinged.

Starting point is 00:26:41 due to its ability to generate less restricted content. So, yeah, OpenAI CEO, Sam Altman in the announcement said the new version of GPT40 is particularly good at coding, instruction following, and freedom. So yeah, there's kind of this low-key unhinged mode that a lot of people are talking about. I kind of tested it out a little bit as well, but the guardrails on the new GPT-4-0 are down a little bit. It's not talking necessarily about the new image model, which is actually, you know, became a little more restrictive over the weekend, which we're going to be talking about soon. But with its actual base GPT4-O model, it's actually a little less restrictive.

Starting point is 00:27:28 Also, Open AI, bless up. They talked about some of the updates better at following detailed instructions, improved capabilities to tackle complex, technical and coding problems, improved, intuition and creativity and bless up, finally fewer emojis by default. So maybe we can stop seeing all the social media posts and emails that have like 42 emojis. Thank you. I know I use emojis sometimes sparingly, but I'm tired at like one point on my screen, seeing like a dozen emojis is just like, y'all.

Starting point is 00:28:05 So thank you, Open AI, for getting rid of that since this is how everyone just writes now anyways. All right. So the next piece of AI news, also about OpenAI, but on the legal side. So a federal judge has ruled that the New York Times' lawsuit back from December 2023 against Open AI can advance. So the lawsuit accuses Open AI of siphoning the Times articles and actually millions of them, allegedly, without permission or, payments to train its GPT models in violation of copyright laws. So that is according to the accusations, according to the New York Times. So attorneys for the Times claim the newspaper's content is one of the largest sources of copyrighted text used to build chat GPT, alleging the AI sometimes regurgitates articles verbatim. So the judge rejected open AI's request to dismiss the case, but at least in the

Starting point is 00:29:10 a small victory to OpenAI narrowed the scope, allowing the primary copyright infringement claims to go forward while promising a detailed opinion soon. So OpenAI argues that its data collection practices are protected by quote-unquote's fair use, citing research and innovation, but the Times claims its reporting was neither transformed nor lawfully reused. A key legal issue is this term of market, substitution, with publishers fearing chat bot summarizing news could divert readers away from their websites, which obviously impacts their ad revenue. Open AI claims the Times manipulated prompts to force verbatim outputs, which it says are a typical for regular users of chat GPT. Evidence gathering in pretrial hearings will now proceed with depositions expected to remain confidential, while public disputes

Starting point is 00:30:10 over evidence are settled. This one could be huge, y'all. I've been talking about this a lot since December of 2023. Another small detail of this whole case is the New York Times literally asked for the GPT technology to be destroyed in his lawsuit, right? That's not an exaggeration. That is something they actually asked for because they said, okay, well, the New York Times and all of our paid articles were one of the big pieces that was

Starting point is 00:30:46 in this data set. So I don't think that will happen, but it's going to be interesting to see what actually happens here. It could be a monumental kind of ruling that could impact millions of businesses worldwide, right? Because so many people now, hundreds of millions of business professionals, like us are using AI models in their day-to-day work, right? And it's not just chat GPT because there's probably thousands of other AI apps that use the GPT technology. So in the, I think very rare case, you know, I don't want to put percentages on it,

Starting point is 00:31:30 but it has to be less than a 1% chance that they say, okay, the GPT technology has to be destroyed because I don't even know if that's feasible. It's already out in the wild, right? It's already been used to distill other models. So you can't exactly, you know, just take it away. But, I mean, this would be a huge impact to everyone, especially in the U.S. So it's definitely one to keep an eye on. All right.

Starting point is 00:31:59 Another thing to keep an eye on is this new study from Anthropic. So researchers at Anthropic have made a significant breakthrough in understanding how large language models work. potentially paving the way for safer, more reliable AI systems. So in a new newly released study, Enthropic created a new tool akin to an fMRI scan for AI, enabling researchers to trace how large language models process information and make decisions. So yeah, this is pretty cool.

Starting point is 00:32:32 It's really interesting if you want to better understand how large language models work, I highly recommend you go read this study from Anthropic. We shared about it in our newsletter last week. So this new tool that they developed that they detailed in the study is called a cross-layer transcoder or CLT, and it identifies circuits of neurons linked to specific reasoning tasks, which offers new insights into the internal logic of AI models. So, yeah, essentially generative AI and large language models, I mean, people largely call them and think of,

Starting point is 00:33:08 them as a black box, right? People don't necessarily understand how they work. So this new paper from Anthropic, very, very telling. So the study also revealed that multilingual models like Anthropics' own quad share conceptual reasoning across languages. So instead of reasoning separately for each language, the model uses shared neural circuits to process universal concepts and translate the output into the desired language.

Starting point is 00:33:42 That's wild to think about, right? It's not that they have created their own language, but this large language model, according to this new research from Anthropic, they're essentially saying when it's thinking in a multi-language capacity, it's not like, oh, you know, let's say it's using English, Spanish, and French, right, for whatever reason. Maybe you're working on translations, right? It's not each time translating it back and forth, right? But instead, it's using these kind of mural circuits to process universal concepts, right?

Starting point is 00:34:27 So it is doing the work almost outside of normal language capabilities, which is also pretty weird and wild to think about. So a little bit more about this CLT approach. So it allows researchers to trace reasoning processes across layers of the neural network. So this could improve auditing AI systems, which is huge for safety concerns and help develop better guardrails to prevent hallucinations, jail breaks, or just erroneous outputs. Right now, though, this technique has limitations.

Starting point is 00:35:04 including its inability to capture dynamic attention shifts in large language models. So attention mechanisms play a crucial role in how models prioritize input while generating responses, which this CLT does not fully address. So scaling the method for longer prompts remains a challenge as well. Analyzing circuits for prompts of even tens of words, not even when we're talking about hundreds or thousands or millions of words, but analyzing circuits for prompts of just tens of words requires several hours of expert work, raising questions about the practicality for more complex outputs of using this kind of CLT methodology.

Starting point is 00:35:50 But this breakthrough could encourage businesses to adapt AI more confidently by making the inner workings of large language models more transparent. Companies may feel safer integrating AI into their operations. yeah Sandra says blowing my mind yeah same

Starting point is 00:36:09 like I've read this multiple times and each time I'm just kind of silent and I'm thinking like huh this is weird right

Starting point is 00:36:19 it's like the more you use large language models right and I remember using the very early you know versions

Starting point is 00:36:27 pre-chat GPT versions of you know GPT3 technology in early Burt, right, pre, you know, Gemini, in seeing how much they've improved and seeing these reasoning models now. And then reading this study, it was really eye-opening. I'll just say that, right?

Starting point is 00:36:51 I don't want to take away all the goodies of going to read it yourself. So, yeah, we'll make sure to link it again in today's newsletter. So if you haven't already, make sure you go sign up for that at your everyday AI.com. All right, undoubtedly, the most talked about thing, at least on the internet this week in AI news, was the new OpenAI GPT40 image generation. Yes, the new name. So Dolly is dead. There is, well, technically, Dolly is still around in some of the older models if you really want to go use it. I've never really used Dolly.

Starting point is 00:37:26 It's not good at anything. But Open AI has officially launched the native image generation capabilities of its. It's multi-model GPT-40 model for chat GPT users, marking a major milestone in the AI technology. So the name of this is just 40 image generation. All right. I'm sure some unofficial name will catch fire, and people will be calling it that.

Starting point is 00:37:52 But right now, like I said, you know, this isn't a new version of Dolly. This isn't Sora photo. Right now, it's just called 4-0 image generation. and it is bonkers. So their new multimodal GPT4O model is now capable of handling text code and images. It is currently available for paid users. Originally, it was supposed to be released to free users as well,

Starting point is 00:38:23 but over the weekend, the company announced that access to free users would be delayed, and they also instituted rate limits on pay. paid accounts for image generation. As they said, the new feature was quote unquote melting their GPUs due to biblical demand. So the new feature went mega viral as the entire internet was scrambling to create Studio Ghibli's style visuals, right? Which I don't necessarily understand, right? But it's this kind of anime-esque style and everyone's, you know, taking their family photos and up, you know, uploading. them and, you know, getting these Studio Ghibli outputs.

Starting point is 00:39:07 I didn't do it. I don't care about that kind of stuff. But, I mean, literally, every single, you know, AI media outlet, every single social media, even LinkedIn was being just overrun by everything Studio Ghibli from this new OpenAIs 4-0 image generation. So unlike the older Dolly 3 model, GPT4-O's image generation is integrated directly. into the same system. So yeah, GPT-4-O, the O is for Omni. So it is a true multi-modal large language model,

Starting point is 00:39:41 whereas before how it worked when it was just GPT-4 or GPT-4 Turbo, even when we're talking about text to speech or voice, there was technically multiple models under the hood, right? So now with GPT-4-O, now that we have the new image generation model, it's all under this Omni model, making it more accurate,

Starting point is 00:40:02 in interpreting prompts and producing detailed, life-like images. So, yeah, I'm curious, live stream audience. Did any of you use this over the weekend? I'd love to know your thoughts. I'm personally blown away, but a couple more details. So users can refine images in real time. That's huge through conversational edits, achieving higher precision and flexibility compared to previous models.

Starting point is 00:40:30 So key features of this new, 4-0 image generation include accurate text rendering within images. That's big because so many, aside from models like ideogram, which does great with text, right, so earlier versions of things like Mid Journey, you know, obviously Dahl-Lee, Google's earlier, Imagine, AI photo apps, they all really struggled with text.

Starting point is 00:41:00 And that's what a lot of people, you know, sometimes want to do, whether they want to create a photo with text on it, or if they want to create infographics, if they want to create, you know, things with branding in logos and having words, you know, mixed in these images. I mean, it was pretty abysmal, you know, prior to the end of 2024. But this new model, the new 4-0 image generation is extremely, extremely good at working with text. And that really opens up the capabilities. Because now, It can handle complex prompts. It can support different artistic styles, right?

Starting point is 00:41:37 But now there's some great practical applications for it. So things obviously like marketing with social media graphics, invitations, recipes, education, creating scientific diagrams, infographics, game development with consistent character design, right? Things with consistent branding, you know, with logos and advertisements. So it's really important. impressive. The other thing is it really has improved contextual understanding. So you can as an example, upload 10 different photos and say, hey, mix these together, right? You can upload an image of a

Starting point is 00:42:16 backdrop. You can upload, you know, images of three people. You can upload, you know, six, you know, six products and say, hey, combine all these. And it does it. And it looks, again, this is an early version, but it is extremely impressive. And y'all, I am, I am someone, I am not easily impressed, right? Yes, I cover AI every day. I've done 500 episodes. I've been lucky enough to, you know, partner with big brands like Microsoft, you know, Adobe and others, right?

Starting point is 00:42:46 So I get to use, you know, a lot of these AI tools even before they're publicly released. And I'm not easily impressed, if I'm being honest. Very impressed with this new GPT. T4O image generation. Right now there's limitations still, right? It's obviously not perfect. There's widely been reported. There's, you know, cropping issues, aspect ratio issues,

Starting point is 00:43:10 challenges with non-Latin-based fonts and scripts, and still difficulty retaining details in small text. OpenAI CEO Sam Altman did describe the launch as a quote-unquote new high watermark for creative freedom with the company actively refinement. finding the model based on user feedback. So this release positions Open AI to compete with the new and also extremely impressive multimodal capabilities of Google Geminize to Flash model, which introduce similar but not as robust

Starting point is 00:43:47 multimodal capabilities earlier this month. Yeah, Pedro, yeah, I like this. Pedro just says it's brutally good. Douglas, what's up, Douglas? Douglas said, I uploaded my headshot and had it made a South Park version of it, results were spot on. So yeah, I think there's a lot of fun, you know, cutesy, you know, things that you can do with this model, right?

Starting point is 00:44:12 But as someone that has worked in, you know, Martec and Coms for 20 years, right? I was lucky enough to spend a good chunk of my career working with, you know, not just the marketing and comms departments from Nike and Jordan Brands. but working with dozens of the largest creative agencies in the world, right? So I really have seen a lot of behind the scenes of how big brands, you know, essentially create their marketing, create their advertisements. And, y'all, I cannot underestimate what this does, right? Now, anyone with a $20 a month chat GPT Plus account,

Starting point is 00:44:57 and anyone that knows how to work a computer can literally produce advertisements and marketing campaigns that are on par, and I kid you not, that are on par with the biggest multi-billion dollar advertising and marketing companies in the world, right? It's been very impressive. Like single person studios just over the weekend since this has been released,

Starting point is 00:45:23 have been releasing some behind the scenes of how they're creating these campaigns, campaigns and it is they are mind-bogglingly good, right? Like especially I think for product advertisements, things that are, you know, obviously very visual, but this new model's ability to just accurately, you know, take multiple images that you upload, work with tax, but also work with the context window. That's the thing that most people aren't taking advantage of, I think right now, right? When I was just playing around with this, I uploaded, you know, an entire transcript of one

Starting point is 00:45:56 of my interviews from last week and I said, hey, make me an infographic that explains some of these more complex topics, right? And it did it, right? Whereas previously, if you're working with something like, you know, mid-jurney or stable diffusion or some of these other diffusion-based AI models, that's not how it works. You kind of had to talk in, you know, prompt language and, you know, describe things to a T. No, you can just dump a bunch of contexts, just a bunch of texts, and say, hey, make me something, right? And if it's not good enough, you can talk to it in natural language. That's the promise here and the power of this new update, extremely, extremely impressive. Sandra is saying, can you do a show showing us how to do that? I don't know.

Starting point is 00:46:39 Do you guys want that? Let me know. Yes or no, live stream audience. If you want more of a behind the scenes, you know, how to do all this, like I said, my background, I've done this. I know how this works. I'm personally blown away. I don't know. If you guys want something like that, know here in the, you know, in the live stream, just, you know, say yes, do a visual. You know, if you're listening on the podcast, I always keep my email, my LinkedIn, so you can just reach out to me as well. All right. Our last story of the week, I saved, I think, the best for last,

Starting point is 00:47:12 even though people weren't talking about this release from Google Gemini just because of, you know, the power in the kind of the viral nature of Open AIs, GPT40 image generation. But literally, the world's most powerful large language model was released this week and hardly no one's talking about it. It doesn't make sense. But Google has introduced Gemini 2.5 its most advanced AI model yet. So Gemini 2.5 features a massive 1 million token context window, enabling it to process extensive data sets, including text, audio, images, video. and even code repositories. So an upgrade to 2 million tokens is expected soon,

Starting point is 00:48:02 further expanding its capabilities. So if you're not super technical, you might be saying like, okay, what does this mean? In theory, let's say you have a PDF of a book, Gemini 2.5, you can copy and paste,

Starting point is 00:48:13 you can drop that thing in there, and it's going to be able to go through and answer any questions you have, right? Even great models like chat GPT, you know, Claude has a decent, you know,

Starting point is 00:48:26 or has had a decent 100,000 context window. But a lot of times, the more you work with a large language model, right, it might start off really great. And it's like, hey, this is fantastic. This large language model is remembering everything. And the more you use it, then it starts to get a little dumb. That's because a lot of the times some of the information that you share, or if you're trying to refine a prompt, it gets lost, right?

Starting point is 00:48:49 Eventually, that initial information that you share is outside of the context window, which is why sometimes when you're using, you know, an AI chatbot, it starts out great, and then it starts to just stink. It's because of the context window. So this is extremely impressive, a one million token context window for Gemini 2.5 Pro. Also, it instantly took the number one spot on the LM Arena leaderboard,

Starting point is 00:49:18 and it was not even close, right? I believe it was almost a 40-point lead, so to speak, in ELO scores. When normally, when a new large language model gets dropped and, you know, oh, it's the best model in the LM arena, it might be by like one or two points, maybe three, right? The new Gemini 2.5 model, Gemini 2.5 Pro, came in at almost 40 points higher than its nearest competitor, which is now the updated version of GPT40 from OpenAI.

Starting point is 00:49:52 Also, maybe even more than. important news related to this over the weekend, Google, again, silently. Google stopped, which I am impressed with, right? You know, they had a bad rollout originally of Bard. You know, I won't get into that. I've covered that a lot. But, you know, over the last six months, I love what Google's doing. They're not, you know, investing heavily into the marketing.

Starting point is 00:50:17 They're not making this a big show. They're just shipping. They're just shipping huge releases, shipping impressive, updates. And another impressive update is over the weekend, Google also made this available to free users. So, like, you don't even have to go into AI Studio. So Google, you know, they have their AI studio, which is more for developers. And then they're kind of front-end Google Gemini chatbot. So, yeah, you can use now because previously for the first like year or so of Google Gemini, they didn't put their most powerful models in Google Gemini. You had to go inside Google's AI studio,

Starting point is 00:50:52 which does not protect your data, right? unfortunately, but Google Gemini does on the front end if you are paid users, but now you can access Gemini 2.5 Pro even if you are a free user. Also, what's very important, the model now has a thinking mode. So it is more of a hybrid model, Gemini 2.5 Pro, because it allows the model to reason through its thought process before delivering responses, potentially, again, inching closer and closer to the ever-moving goalposts,

Starting point is 00:51:25 of artificial general intelligence or H.E.I. So obviously, aside from ELO scores or human preferences, which Google Gemini 2.5 cleaned up on, it also did fantastically, not surprisingly, on all of the normal benchmarks, including the newest kind of trending benchmark, which is humanity's last exam, which is a challenging data set designed to test the limits of human-like reasoning and knowledge.

Starting point is 00:51:52 So the previous high score from a large language model was Open AIs 03 Mini that had a 14% and a DeepSeek R1 had an 8.6% yet the new Gemini 2.5 Pro, 18% on that. So more than double DeepSeek R1 and comfortably ahead of OpenAIs 03. So Gemini 2.5 Pro like I said, is now available for both. everyday users on the front end of Gemini chatbot, as well as developers and enterprises inside Google AI Studio. And a rollout to Vertex AI is planned in the coming weeks. So according to Google, the model's advancements in reasoning, personalization and coding could significantly impact industries

Starting point is 00:52:45 ranging from software development to research, offering businesses and developers tools to innovate faster and more effectively. One of the things that I was purported, personally most impressed by Gemini 2.5 Pro is its ability to one shot anything when it comes to coding, software development, extremely impressive, right? Not that you ever should do anything one shot, right? You should always go back and refine something. But, you know, I made a, what did I make? Just to see its ability, you know, I made a side runner, you know, 2D game where it's, you know, Chicago deep dish pizza is running through the city or something like that in one shot.

Starting point is 00:53:22 And it got it right. It was extremely impressive. But I really think we should be paying attention to Gemini 2.5. All right. That's it. Y'all. Let me quickly recap the biggest AI news stories that matter for the week. So first, Microsoft released some pretty groundbreaking new agenetic capabilities in Copilot Studio.

Starting point is 00:53:46 So no code. Being able to just talk to Copilot Studio, use its new. reasoning model and deterministic capabilities. Next, Open AI is reportedly nearing a $40 billion funding round led by SoftBank, though it could be less than that if Open AI is not successfully able to convert from a nonprofit to a for-profit by the end of the year. According to reports, Apple is reportedly developing an AI-powered doctor and revamped health app. Elon Musk's XAI acquired the social media platform X,

Starting point is 00:54:26 formerly known as Twitter, for $33 billion valuation, which is technically $45 billion in stock because it included $12 billion in debt. OpenAI updated its GPT40 model, which shot it up, actually passed GPT 4.5 into second place on the LM Arena Board. A judge, a federal judge, has allowed the New York York Times' copyright lawsuit against Open AI to proceed. A new breakthrough study from Anthropic was released that helps everyone better understand the black box of large language models going over its new tool and technology called a

Starting point is 00:55:09 cross-layer transcoder or CLT. OpenAI released their new very viral GPT-40 Image Generation. And then last but not least, we have the world's most powerful AI model that was also released this week. My gosh, that was a lot to cover in a very short amount of time. Y'all, the AI world is straight up in Fuego. So much going on. So if you thought AI was hitting a wall, if you thought capabilities were near the ceiling, not even close. Another exciting week in AI.

Starting point is 00:55:46 And I hope this was helpful. If it was, please let me know if you're listening on the podcast. I'd appreciate if you hit that little subscribe button, go find it, right? If you're listening on Spotify or Apple, would appreciate you leaving us a review as well. If you're listening on social media, let me know what you want to hear more of, but also click that repost button if this was helpful. You know, a lot of people tell me, everyday AI is their cheat code. I'm like, yo, don't keep it to yourself.

Starting point is 00:56:09 Share this with someone, right? I'd appreciate it if you did that. Also, I'd appreciate if you tune in tomorrow and every day for more everyday AI. Thanks, y'all. Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words, and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps,

Starting point is 00:56:40 including Photoshop, Premier Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating.

Starting point is 00:57:10 It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers, and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.