Everyday AI Podcast – An AI and ChatGPT Podcast - EP 472: OpenAI’s new GPT-4.5: What’s new and who can benefit the most

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the all-in-one creative AI studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. The AI model releases don't stop.

Starting point is 00:00:50 In the past 10 days, we've gotten some groundbreaking, large language model updates from GROC out of XAI from Anthropic with their new Sonnet 3.7. And now open AI. So after a lot of waiting, what seems like years, we have Open AI and ChatGPT's next big step forward with GPT 4.5. But let me tell you something. This one's weird. Not saying in a bad way, it's different. And probably for the first time in a long time, I've spent hours now at least playing with GPT4. And I'm like, this isn't for me.

Starting point is 00:01:42 Again, not in a bad way. I just think that OpenAI's new model and GPT4.5 is something much different than it's released before. It's not breaking any records. It's not climbing to the top of every single benchmark. It's a vibes model. It's to feel more relatable and more relational. with the end users, all of us. All right.

Starting point is 00:02:12 So I'm excited today to talk about OpenAI's new GPT 4.5. What's new and who can benefit the most? All right. If you're excited to learn about it, you're in the right place. Welcome. My name's Jordan Wilson and I'm the host of Everyday AI. This thing is for you. It's your daily live stream podcast and free daily newsletter,

Starting point is 00:02:34 helping us all not just learn what's happening in the world of AI, but how we can all actually leverage it. what it means and how we can use the information to be the smartest person in AI at your company. If that's already you or if that's what you're looking to do, welcome. We just became best friends. Also, your actual best friend is our website, your EverydayAI.com. Go sign up for our free daily newsletter there. Also, people don't know this.

Starting point is 00:02:59 You can go listen to every single podcast episode ever on our website. Go watch every single video. There's close to 500 now, all sort of by category. So no matter what you want to learn, we have. it for you. We've probably had a world's leading expert already come and share their secrets. So make sure you go check that out. All right. So I am excited to talk today about Open AI's new model. But before we get started, we're going to start off as we do most days by going over the AI news. So Nvidia's revenue has soared 78% year over year to 39.3 billion in the fiscal fourth quarter,

Starting point is 00:03:37 ending January 26, fueled by strong demand for GPUs. So the company's data center products accounted for 35 billion of the total revenue, with half of that coming from cloud service providers like AWS, Google Cloud, Microsoft Azure, and Oracle Cloud. So, Nvidia's Blackwell GPU, which was launched in December, generated 11 billion in revenue in the first quarter, marking the fastest product ramp in the company's history. yeah, NVIDIA's earnings coming out. So pretty interesting here.

Starting point is 00:04:13 And CEO Jensen Wong announced the upcoming launch of Blackwell Ultra in the second half of 2025, promising a smoother transition compared to the Hopper to Blackwell shift, which faced production changes due to design changes. Sorry, production challenges due to design changes. So Blackwell Ultra will feature advancements in networking, memory, and processors, while NVIDIA's next generation, Vera Rubin architecture, combining CPU and GPU technology is set to debut next year in 2026. So Wong emphasized that NVIDIA's manufacturing partner, TSM, exceeded expectations in expanding production capacity, helping meet surging demand despite initial hurdles. So NVIDIA's revenue from

Starting point is 00:04:57 China has dropped by half since U.S. restrictions on chip exports began in 2022, but the company now offers a less advanced processor, the H20, specifically for the Chinese market. All right. Next piece of AI news, meta is looking to compete with chat GPT in a completely different way. So according to reports, meta is planning to release a standalone meta AI app in the second quarter of 2025, according to sources familiar with the subject, marking a pretty big step in CEO Mark Zuckerberg's push to dominate the AI space. So the app will reportedly expand meta AI beyond its current integration with Facebook, Instagram,

Starting point is 00:05:40 WhatsApp and Messenger, allowing users to interact more deeply with the Gen AI assistant. So right now you can obviously just go to meta.a.i and use their AI that way. But it looks like meta is looking to compete more directly with open AI as a standalone AI app. So in April 2024, Meta replaced the search feature in its apps with MetaI, positioning the chatbot is a central feature for billions of users. So the new standalone app will allow for greater personalization, conversational history organization, and integration with Meta's hardware, such as Rayban Smart Glasses, according to Zuckerberg publicly agreed with on threats. So meta is also exploring a paid subscription for meta AI. Interesting, right? A fairly open source model, but you will have to pay to use it.

Starting point is 00:06:39 Similar to OpenAI's chat GPT Plus and Microsoft co-pilot, which could generate revenue through premium features and paid recommendations. So meta AI currently has 700 million active users. So yeah, pretty wild there. It should be interesting. And, you know, CEO, Sam Altman, Open AI CEO. kind of responded jokingly on Twitter and said, you know, hey, maybe we'll just release a social media app.

Starting point is 00:07:04 All right. Let's get into it. Let's talk about what's new inside OpenAIs GPT 4.5. And this is part one of two, right? I understand y'all. Sometimes these shows go way too long. And the other day, I'm like, oh, yeah, there are some updates. We're going to do a short show.

Starting point is 00:07:22 And that show ended up being an hour. Whoops. I got to stop doing that, right? No one wants to listen to me blab when I'm tired and overcaffeinated for an hour plus. We're actually going to be breaking this one down. I'm not going to be doing any live demos today. Those usually take a lot of time to put together. So probably in the future when there's at least big new models like this, we're going to break

Starting point is 00:07:44 it up into two portions, just like I always say, hey, we're going to learn in leverage. So today we're going to learn about the model, what's new, who I think it's going to benefit. And then we're going to have a second show probably next week on, you know, the best ways to leverage it, probably do some live demos, some examples, all that good stuff. All right. So what the heck is GPT 4.5? Well, it is the last non-chain of thought model from Open AI. So in the future, Sam Maltman has said that future systems are going to be hybrid.

Starting point is 00:08:19 So what that means is reportedly GPT5 will be more of a system. And you aren't going to necessarily be choosing between these quote unquote old school transformer models like GPT 40 or GPT 4.5 and reasoning models like 03 and 01. Right. So in the future, it said it's going to be more of a hybrid approach and it's going to be a system that you talk to. and maybe the system is just going to choose which model is best for your query, or maybe it's going to use, hopefully one of my predictions, one of my AI 20, 25 predictions, which you should go listen to those shows, is moving away even from a mixture of experts and going to a mixture of models.

Starting point is 00:09:04 I hope we see that, right? I hope if in the future, if you have a very advanced query, part of it might use in theory under the hood a GPT-type model, and then part of it might use an O model. But it is the last non-chain of thought model from Open AI. And it's really expensive on the API side. So right now, just FYI, this is only available to pro users. It's only available to people on that $200 a month plan, although OpenAI did say that it

Starting point is 00:09:37 will be rolling out in the coming weeks to all paid users. So, you know, I feel most people listen in the show are probably chat GPT plus. on the $20 a month plan. So you don't have this yet, but probably I'm guessing sometime early to mid-March, most paid users should have access to GPT 4.5. But it's super expensive on the API, right? So, you know, developers or, you know,

Starting point is 00:10:03 maybe if you are a technical person and something at your company runs on the back end on GPT, you know, 4-0 maybe or 4-0 mini, you're probably not going to be using this. if I'm being honest, and more on that in a bit. But ultimately, I think humans are going to like this, right?

Starting point is 00:10:24 I think humans are going to like this. And hey, live stream audience, thank you for tuning in. I forgot to shout you guys out. But if you do have a question, let me know. So thanks for Big Bogey and Harvey and Samuel

Starting point is 00:10:36 joining on YouTube and Woosie Rogers, joining us on LinkedIn. Dr. Harvey Castor are doing double time, joining us on LinkedIn. and YouTube. Love to see it. Stephen, happy Friday to you and Brian and Joe, Michelle, Dr. Scott, everyone. Can't go through everyone. But thank you all for joining live. If you do have questions, try to get them in now. I'll try to answer them either as we go as they pop up, right? Yeah, this is a unprompted, unscripted, live stream, the realest thing, and artificial intelligence.

Starting point is 00:11:07 So, you know, get your questions in. I'll try to either tackle them as we go or at the very end. So here's some more details on what you need to know on the new GPT 4.5. model. So like I said, it is $200 a month right now to use on the pro plan. So that's if you're using it on the front end chat bot, right? Logging into chat, gbt.com is not going to be there unless you're on that $200 a month pro plan. But it should be rolling out in the coming weeks. It's the first major model upgrade in over two years, though. So that's important.

Starting point is 00:11:37 So we've seen iterations and upgrades over the GPT4 model, right? But it's been more than two years since this base model was actually refreshed, right? Let me tell you what I mean about that. So GPT4 came out back in, gosh, 2023, right? But it was kind of refreshed. So then we went to GPT4 Turbo. Then we went to GPT4O or Omni, right? So it was this Omni model bringing more modalities.

Starting point is 00:12:12 but the base, the engine, was still kind of the same. In Open AI, you know, they did some fancy engineering and, you know, tweaked it a little bit. But for the most part, it was an old engine that was still running this thing, but it was still the most powerful single-use model in the world, right? And when I'm talking about single-use, I'm meaning non-reasoners. So this is pretty big. It's the first major model upgrade.

Starting point is 00:12:42 over two years. But here's the thing. It's built for empathy. It's built for relationships. It's built for intuitive conversations. People are saying this is a vibe model, right? It's not shooting off the charts on every single benchmark, but Open AI hopes that when you talk to chat GPT, you're like, oh, this is very human-like, right? They're like, oh, you're going to feel some AGI vibes. some artificial general intelligence, right? But here's, I started the show by saying this. This is not for us, right? If you're a power user, if you're following AI every day, like me, right?

Starting point is 00:13:26 Depending on the day, I know it changes. You know, I'm spending, who knows, anywhere from three to eight hours a day using large language models. For the most part, chat chbtee. I'm in chat chbtee all day. This isn't for me. This model is not for me. mean. It's really not. I think it's for everyone else. This is for casual users. This is for my mom, right? This is for my mom.

Starting point is 00:13:57 This is for companies that maybe did not get on board with AI previously. So it's not just like that because of a vibe. It's a vibe model, right? Oh, it feels good. It feels natural. It feels human. It feels like it understands my emotions, right? Because it's not the best. It's not the fastest. It's not the cheapest. So then it's like, what the heck is this thing then? If it's not the best, it's not the fastest, it's not the cheapest. It's not for power users. Like, what the heck, Open AI? I really focus this on two things. You know, and again, this model literally just came out. I wasn't part of the early testing group, so I look tired if you're on the live stream and the coffee is probably a little stronger. But I think probably if I had to boil this down to two words, it would be reliable

Starting point is 00:14:59 and relatable. And again, for me, as a power user, I've never had problems with those, right? I don't run into a lot of hallucinations because I know prompt engineering very well. I know how to make sure and to refine a large model and kind of train it up on a very smaller skill set and to increase the accuracy and decrease the hallucinations. But now out of the box, hallucinations are lower. Now out of the box, it's not going to sound like talking to a robot, right? Maybe that's why for the first time, I'm like, ah, this doesn't really seem like for me. It's still probably going to be a model that I use very often, even though it's not the best. not the fastest, not the cheapest, right? But I will assume that, you know, GPT-40 won't be around for forever, right?

Starting point is 00:15:56 So I do need to also understand that I need to start using this model. I need to get used to it. I need to adjust how I talk to it. I need to adjust my expectations, right? This is why also I'm updating our free prime prompt polish course. Yeah, I know it's been a few months. don't worry. We're shooting for a March date. Keep an eye on the newsletter for that. Right. This isn't for me, but I'm still going to use it. Right. It's funny because I think for probably a good 15 years, you know, my friends and coworkers have called me a computer. They're like, oh, Jordan's a computer. Yeah, he's not human. Be poop, beep. That's why for me, right? I don't know.

Starting point is 00:16:41 I don't, like for me, I don't need a relatable chat pop. I don't. I don't need to talk to something and be like, oh, this feels human, right? Maybe it's because, you know, my EQ is not off the charts, right? But if you are someone that really cares about feeling heard, about feeling understood, if you want to feel a relatable relationship with an AI chat bot, not in a weird way, right? But when I think about things like business, like a business coach, a strategist for your company, for your department, right?

Starting point is 00:17:23 A true creative thought partner, GPT 4.5 is going to be much better at those things. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the Assistant. The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premier, Lightroom Express, and more to help bring your ideas

Starting point is 00:18:15 to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. Angie, says Jordan's an AI agent. Sometimes I wish I was. I think agents don't need sleep and agents don't get tired. Those are two things that I'm both struggling with right now. Nancy, former everyday AI guests, what's up, Nancy, says the single reason I couldn't live without the pro

Starting point is 00:19:12 subscription of Claude is because I couldn't stand to talk to chat TVT all day, LOL. Yeah, that's a great point, right? Because I've never liked Claude. I know people do because, you know, they're like, oh, it spits out more human-sounding content and it feels more like I'm talking to a human. Well, this is, I'm not saying that GBT 4.5 is Open AI's answer to Claude. It's not, but you will get those vibes. Right. You'll get those vibes that the output, the written text is going to look and seem much more human-like. It's going to seem like a much less robotic, process, both in the output and the interaction between you and the chatbot. So, yeah, a lot of people are saying, you know, and Nancy is definitely not alone in this,

Starting point is 00:19:59 right, that people, a lot of people prefer Claude who use AI just for content writing and who don't want to necessarily go the extra mile in prompt engineering. And they just want to be able to get more human-sounding output out of the gate, out of the box, and they want to be able to have it feel more like the assistant understands you, like the AI chatbot understands you, right, which is something I think Claude's been great at. Again, for me, I'm a human or like, I'm a human, but I don't know. I feel I think in bits and bytes. I think in ones and zeros.

Starting point is 00:20:33 So I don't necessarily need to feel understood by an AI or anything like that, right? But that's a great point there. So open, and this is interesting, y'all. So opening I says GPD 4.5 is not a frontier model, which is wild to think, right? And that means that it's a model that does not represent a groundbreaking or revolutionary advancement over its predecessors, right? Open AI straight up said this. You're like, yeah, this isn't like benchmarking off the charts. This is not, they literally said this is not a frontier model, right?

Starting point is 00:21:06 Frontier models are those large language models that are supposed to be revolutionary, right? This is not it. This is more of a foundational model. But I think here what we're actually doing is this is building for the future. I think this is all about the training data. This is about how we're interacting with this model. And Open AI is obviously collecting all of that. They're not collecting the data that you upload, right?

Starting point is 00:21:32 FYI. People always get that wrong. People are like, oh, anything I upload into chat, GPT, it's like, you know, it's like I'm printing it on the internet. No, it's not what that is. turn off your data sharing and then you're not sharing anything, right? But you always have an opportunity. ChatGPT will ask you, sometimes it'll give you two responses, which one's better, right? That's being sent to open AI, right?

Starting point is 00:21:54 If you say this is wrong, that's being sent to open AI. So what I think is actually happening here is there's a big, large expensive. They didn't say how many parameters this model is, but apparently it's enormous because the API costs are insanely. high, right? I don't know who's going to be using the GPT 4.5 API. I'm going to show you the prices. It doesn't compute, right? Using that much compute doesn't compute. So this model has to be enormous. But I do think that this is going to be the last enormous model from Open AI. Because I think what this is setting the stage for is to get that data on how users like and interact with the model and also those that don't turn off data sharing, right?

Starting point is 00:22:41 And I think this is going to lead for better and smaller distilled models for, like as an example, when we talk about the GPT5 and, you know, the 040, 03 models of the future, I think are just going to be distilled based off of this super big model. So it's expensive. And also, this thing maxed out, open AIs compute. CEO Sam Altman literally said, yo, we're out of it. GPUs, right? At least that's what he said. You know, he said, hey, we can't bring this out to all chat GPT plus users right now because it'll burn us. He literally said, we're out of compute. We're out

Starting point is 00:23:20 of GPUs, right, which means this thing is enormous. Like I said, the benchmarking improvements modest, not meaningful, right? A lot of times when you get a, you know, new frontier model, it completely shifts the conversation on benchmarks. And you're like, well, this one went through the roof. This is not that. And it's designed for more natural human interaction. And like I said, I think this is laying the foundation for future smaller models. And it more combines the EQ with the IQ, right, that emotional intelligence with traditional intelligence. That's what this is.

Starting point is 00:23:54 This is a much more human touch. All right. Here's what OpenAI said specifically about 4.5. Said we're releasing a research preview. Yeah. This is a research preview, y'all. keep that in mind. Our largest and best model for chat.

Starting point is 00:24:15 They didn't say our best model. They said our best model for chat. Best model for humans to chat with, right? GBT 4.5 is a step forward in scaling up pre-training and post-training. By scaling unsupervised learning, GPT 4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning. Early testing shows that interacting with GPT 4.5 feels more natural. It's broader knowledge-based, improved ability to follow user intent, and greater EQ, emotional intelligence, makes it useful for tasks like improving writing, programming, and solving practical problems.

Starting point is 00:24:57 We also expect it to hallucinate less. Remember those two words I said? I think it's going to be more reliable and more relatable. So who can benefit the most? I told you kind of what's new, gave you some of the bullet points. Who can actually benefit the most, right? Like, when are you going to use this? So I think for more human-like conversations, it's going to be ideal in the long run for

Starting point is 00:25:23 companies to use this for customer support or for you to use this for customer support, right? Maybe you're in customer service and you just copy and paste a bunch of information in here and you're trying to work through tough customer support problems, I think it is great for understanding nuances in human language, right? It's probably going to be pretty soon better than humans at understanding nuances in human communication, which is weird to think about, right? So I think it's ideal for customer support, therapy, education.

Starting point is 00:25:56 It has enhanced creativity that I think will help writers, marketers and designers, generate creative ideas, and for some advanced coding and technical abilities will benefit developers and data analysts. Not everything, right? This is not going to be something that you're going to, you know, plug in and use to code. I don't think that's what we're going to see here. Although, interestingly enough, even though the benchmarks did not shoot up, this is pretty interesting. So cognition labs, right? So they have Devin, which is an AI programmer, very popular. And they were kind of looking side by side, looking at different models for agenic coding evaluations, right? And even though GPD 4.5 is not necessarily supposed to be a coding tool,

Starting point is 00:26:44 it did very, very well on their evaluation. So as an example, its predecessor, GPT 4.0, got a 49% on this agentic coding evaluation. And GPD 4.5 got a 65%, you know, only trailing, Sonnet 3.7, which got a 67. So, again, it's not going to be used. You know, programmers aren't going to use this. Developers aren't going to use this because right now the API costs are high. It's slow, right? Even using it, you know, obviously, anytime a model first comes out, it's always going to be

Starting point is 00:27:18 slower. But I expect this to be slower in the long run. So it's not the fastest. It's not the best. It's not the cheapest. But it's very capable, even when it comes to agentic coding. So that's according to cognition. Let's talk about the emotional intelligence.

Starting point is 00:27:32 So more natural human-like interactions than GPT-40, it's going to be better at reading and responding to emotional cues. And according to OpenAI, it was preferred by users in about 56 to 63% of different use case tests against GPT40. So, right, showing kind of two responses side by side. So the majority of the time, users prefer this to GPT40. So it will presumably, and in my limited testing so far, this is true. Great at storytelling and generating ideas and just generating written content, right?

Starting point is 00:28:10 Which that's something that you use AI for, which I know a lot of people do. I think there's so many use cases people should be using AI for, but they're not. And they're just like, you know, I need help writing this blog post or I need help writing a paper, right? And ultimately, that's what they're using it for, which I, why I think a lot of people flocked to clawed. early on and I'm like nah like you can do this in chat gbt you just got to know how to use it right so I think it's going to become a much better writer it's going to write more clearly and concisely and also I think it's going to be much stronger in design and creative tasks this is a good news y'all this is good news second straight day that the sun is shining in my face

Starting point is 00:28:54 and I had to close the curtain here oh bless up There's nothing worse than waking up for a live stream in the months of winter. And it's just dark outside, right? Oh, sunshine, right? Maybe I need to go outside and touch grass. And then I'll appreciate this more human side of GPT 4.5. All right, let's talk about some of the technical features. So, according to Open AI, it is trained.

Starting point is 00:29:18 This was trained with 10 times more computing power than the previous GPT models. Also, 128,000 token. in context window for deeper conversations. I'm going to be testing that one ASAP because for years, Open AI has said, and maybe they just didn't differentiate and maybe this is just the API and they didn't say, hey, it's $12,000 in the API versus $32,000 when you're using chat GPT. So I'm going to be testing this. Don't worry.

Starting point is 00:29:49 And I'm going to talk about that in our part two of the show because for so long, when you're using the chat version, right, the front end chat gbt.com, it has to be testing. had a hundred and 28,000 token context window, right? So that means that chat GBT will start to forget things much sooner. So it actually had a 32,000 token context window, which is about 26, 27,000 words, a back and forth interaction with chat GBT, and then it would start to forget things. So I'm excited to test that out to see if that is just on the API side or if that's going to be also in the chat window.

Starting point is 00:30:22 If that is in the front end chat, that's going to be big. Also improved coding, especially in comprehensive. complex tasks. So again, according to Open AI, this was pre-trained simultaneously in multiple data centers, which I believe will be the first for a model like this, right? That says something when, you know, Open AI has access to some of the biggest data centers and the most compute in the world. They're like, yeah, we can't train this in one place. This is too big. I don't know how big this model is, right? It's got to be multitudes larger than the original GPT model, right?

Starting point is 00:31:03 If it's costing this much in compute, if it's costing this much via the API, if it's causing Open AI to run straight up, run out of GPUs, it's got to be huge. I don't know, right? Reportedly earlier versions of GPT4 were about 1.8 trillion parameters. I don't know. this thing's got to be double that, maybe, maybe more. I don't know, but that's not sustainable in the long run, right? Which is why I think this is actually a foundation for Open AI to better distill and

Starting point is 00:31:36 to make better, smaller hybrid models as they switch to that kind of setup. Also, can handle tasks involving visual understanding. So I did some tests on this. Vision capabilities pretty good so far. We're going to do more on that in our Part 2 show. We're probably going to show some comparisons between GPT40 and GPT 4.5. But out of the box does very well with visual understanding and being able to see and synthesize information in photos. So yes, this is multimodal.

Starting point is 00:32:09 FYI right now it has access to all of the tools. I should have maybe started with that, right? Because these reasoning models, a lot of them, the 01 series, doesn't have access to all the other tools. right, Canvas and, you know, Dolly and advanced data and at GPT search, right, all these other tools that really make a large language model agentic, right, didn't have access. So right now, you do have the rest of the tools, although I would like and hope that eventually we will see tasks get GPT 4.5 as well as GPTs. My gosh, Open AI.

Starting point is 00:32:49 I know there's, you know, a lot, not a lot, but plenty of you all listening because you reach out and let me know. Can we update GPTs, please? These poor things are just like that, that poor, forgotten about child in the corner, right? This is McCulley Coulkin in Home Alone, you know, we're leaving to the airport without GPTs. GPTs, y'all, enterprise companies, they hire us and they want to build us GPTs, right? And I'm like, y'all, like, I don't know. We might have to build you projects instead because poor GPT. are in the corner and they haven't been updated in forever, right?

Starting point is 00:33:23 So hopefully we see the GPT 4.5 model eventually be rolled out to other things like tasks and like GPTs. Here's the thing, reliability. Let's talk about accuracy and knowledge because I started the show up by saying a lot of companies didn't get on board with AI because they're like, yo, it lies, it hallucinates. Is GPT 4.5 free of hallucinations? Absolutely not. If you know how to use it, you're probably going to be.

Starting point is 00:33:49 going to see a great reduction in hallucinations. But according to OpenAI, it knows more and hallucinates less. So the hallucination rate has gone down significantly, higher accuracy and factual questions. But what's important to know, the knowledge cutoff has actually been rolled back. So GPT40 has a knowledge cut off of June 2024, which is reasonable to work with. This one is October 2023. Right. So I'm sure they'll be. be updating the knowledge cut off in the future. But just know if you're using GPT 4.5 right now in the chat or when you're using it, when it rolls out to chat GPT plus users, you should probably, in many use cases, use

Starting point is 00:34:32 our refined Q method that we teach in our free prime prompt polish prompting course. You need to bring in more accurate and more up to date information for whatever it is you're working on to get started with or make sure you go retrieve that by using chat GPT search. Right. Here's the thing. I'm going to have to do a dedicated episode just on training data in what this means. Right. So people think, oh, that means it knows every single thing and it's 100% accurate and up to

Starting point is 00:35:00 date by October 2023. No one doesn't, right? A lot of these data sets that companies use to train their models, you know, by saying, oh, it cut off in October 2020. Well, what happens to that data set is updated once a year? What happens to that data set has some extremely outdated information? You hope that through reinforcement learning with human feedback, you know, a lot of that older information gets kicked off when they're going through and they're, you know, training the model,

Starting point is 00:35:26 but not necessarily. So keep that in mind. The knowledge cut off is rolled back. You need to do a better job. If you are using GPD 4.5, you need to do a better job at making sure it has more accurate and more up-to-date and relevant up, like fresh information if you are relying on it for accurate up-to-date outputs. The entire world changes around us every single day.

Starting point is 00:35:47 to work with knowledge, a knowledge cutoff from 2023, you got to be careful, right? It is computationally demanding. So like we said, it's very limited right now. Open AI's ability to scale this out to users because of GPs, sorry, because of GPUs. Also, it does have weaker performance and complex reasoning compared to specialized models. All right. let's talk a little bit about accuracy and knowledge. All right.

Starting point is 00:36:24 So this is simple QA, which is actually Open AI's own benchmark, right? I would really like other people to start using this or something like it. But this is essentially like, is this getting things correct? Right. So simple QA accuracy, where higher is better. This is just, is it factual? Is it getting questions correct? Can it recall information in the right way?

Starting point is 00:36:54 So on this, some of GPT's previous models or some of OpenAI's previous models, so GPT40 scored a 38% on this, where GPT 4.5, not double, but pretty close, got a 62%, where even the reasoning models got a 47% and a 15%. So if you're wondering, what's the point of this model, boil it down to two words. It's relatable and it's reliable. It has a much higher accuracy. And let's be honest, we just sometimes look past large language models and we just assume that it's always accurate and we can always rely on them.

Starting point is 00:37:42 That's bad. I don't know why people are trying to take human out of the loop. and we expect large language models to always be 100% factual and accurate, right? They're trained out the internet. Is the internet 100% factual and 100% accurate? Absolutely not, right? I read an article on, I think it was chat TVT from a huge publication last week. And it was completely wrong.

Starting point is 00:38:08 All their facts were wrong, right? I'm not going to name shame them. Maybe I should, but a publication we've all heard of. Everyone out here reads it. I was thinking about roasting them on Twitter and fact checking it. And I'm like, this is all wrong. This is all not correct. But guess what?

Starting point is 00:38:24 All this information that people put out on the internet, sometimes people intentionally put out misinformation, disinformation. Sometimes people don't know what they're talking about. But all that goes out on the internet models gobble this up. And you hope that humans can pick out, you know, information that's in the training data that's not right versus what's right, right, through reinforcement learning. But much more accurate.

Starting point is 00:38:46 almost twice as accurate. And what's pretty interesting for me, at least, is 03 mini there with a 15% on this simple QA accuracy and GBT 4-5 with a 62%. I love 03 mini. It's probably my most used model, right? Again, I do a good job at making sure I feed it the accurate and relevant information that it needs and I'm not necessarily always relying on it to go and seek and find the absolute truth on its own, but out of the box, Jeep GPT 4.5, according to OpenAI's own

Starting point is 00:39:21 internal benchmarks, extremely reliable. And let's talk about hallucination rate. Same thing. Much lower. In this case, lower is better. So in their test, it's only, it's a 37% hallucination rate. Now, I want you to keep in mind, that doesn't mean it hallucinates 37% of the time. In these tests and in these benchmarks, they're hard, they're tricky. They are made to get the model to kind of screw up, right? So a very, very, very low hallucination rate actually for a 4.5 with a 37 percent, where 03 mini, as an example, 80 percent and the GBT 40 at 61 percent. So again, these are intentionally very difficult questions that are meant to make models hallucinate. So it's more reliable. It lies less. All right. Other benchmarks, again,

Starting point is 00:40:13 nothing here is jumping off the page. Many of the major benchmarks, this is not Open AI's best model, right? It's in some cases, it's actually about the same or on par with GPT4O, or it's just behind O3 Mini, which again, that is my workhorse model.

Starting point is 00:40:34 I'll probably do, let me know, live stream audience, let me know yes or no. Should I just do a show where I tell you what models I'm using and for what? it might have to wait a couple of weeks to see how and where I'm using GPT40. I had some people ask about it recently.

Starting point is 00:40:52 I didn't think it was that interesting, but if it's interesting, let me know. Maybe, I don't know, maybe it will be more interesting now that we have like nine models to choose from. But one thing that I thought was pretty impressive about these benchmarks. So it did score better in the MMMU, which is the multimodal equivalent of the MMLU. And it scored fairly well on the MMMLU, which is the multilingual equivalent of MMLU. So MMLU has historically been one of the, you know, it's one of the benchmarks that we talk about most. I say it's like the ACTTs for AI models, right?

Starting point is 00:41:33 So it did perform well or better than GBT40 in those models pretty significant. but Sweet Lancer, I love this. So this is an actual test that Open AI developed and, you know, other models. They use other models, right? And essentially, like when Claude came out, Claude was better. And Open AI said that. They're like, yo, Claude does way better at Sweet Lancer, right? This is essentially a test where it goes out and performs the type of tasks you would see on like Upwork, right?

Starting point is 00:42:03 but this one outperformed the other models by far. It completed 32.6% of tasks, whereas OpenAI's O3 Mini completed 10.8%. And GBT40 completed 23%. So that's interesting. Also, O3 Mini, I'm guessing at the time, did not have access to all the same tools. O3 Mini does have access to the Internet, which is huge because the other O models do not have access to the internet. All right, here we go.

Starting point is 00:42:37 Here we go. The costs. I don't know. You know what? I actually can't wait to talk to companies that are using this on the API. Because I'm not sure who is going to use it. It costs $75 per million input tokens and $150 per million output tokens. so expensive.

Starting point is 00:43:06 So I guess it's going to be those people that, those companies that really value a reliable and relatable model, right? So this is just if you're using it on the backend on the API, right? So if you're logging into chatGPT.com, you don't got to worry about this, right? But I do, I would assume that when this does roll out to plus users, it's got to be limited. I don't see them rolling out this extremely expensive model that they're probably going to be losing money on. When you look at the API cost, I don't see chat TPT plus users getting unlimited access to this. I would assume that there would have to be some rate limits.

Starting point is 00:43:51 So let's go ahead and look at some of the cost comparisons. Ready? So I said, $75. per million input, GPT 40, 250. $2.50. So we went from $2.50 to $75. Yikes. Yikes.

Starting point is 00:44:16 30 times more expensive. Is that right? I just said that math in my head. Hopefully that's right. And then the output, 15 times more expensive. The output for 1 million tokens for GPD 40, $10. And then on GPT 4.5, 150, right? And like everyone was losing their marbles when Claude 3.7 saw it, less than a week ago, came out.

Starting point is 00:44:41 And their API pricing didn't change, right? And everyone's like, oh, Claude Saunit is so expensive, right? And aside from if you're using it from coding, there's no need to ever use Claude Sonnet 3.7 via the API. Now you're looking at GPD 4.5. You're like, all right, well, Claude 37 Sonit doesn't sound like that bad, right? $3 per million tokens. input and 15, 4 million output. So yes, I mean, we were looking at the Claude 3.7 sonnet versus GPD 40.

Starting point is 00:45:13 And Claude is like, oh, it's like, okay, well, that's 50% more expensive for output. Oh my gosh. And then GPT 4 or 5 comes out and says, hold my GPU, right? You won't believe this price. I don't believe it. But we'll see. We'll see who uses it. clearly someone's going to use it.

Starting point is 00:45:33 So let's talk about the strategic impact. Like I said, I think this is a base model for future AI development. I think it moves open AI towards integrating soft skills with technical skills, right? That's like when we talk about, oh, this is a vibes model, this is an EQ model, right? Where I think previously, which is why I never had a problem with it. I don't need vibes. I don't need EQ. But I think a lot of people do.

Starting point is 00:45:57 I think looking at it even as soft skills. This is a soft skills model, which I think why it's actually a big step forward. But it's a big step forward in areas that we're not used to looking at. Normally we look at big step forwards in AI models, in benchmarks, in features, but we're not looking at it in terms of it being more relatable and reliable, like a human. So I think that's a big thing is this is going to be probably the most human model out there. is going to be the best at certain tasks? No.

Starting point is 00:46:31 Is it going to be the most reliable and relatable model? Probably. And I also think that this is indicating a possible limits to continued scaling with the current GPT versus O Series architecture. So yeah, next we're going to see this hybrid setup. All right. So that's a wrap, y'all. Look at that. We didn't go one full hour.

Starting point is 00:46:58 Bless up. So we're going to do a part two next week. Let me know what use cases do you want to see? Let me know in the comments today in the newsletter. So make sure you go sign up to it. You can just reply to the newsletter. Let me know what do you want to see? Do you want to see writing use cases?

Starting point is 00:47:16 Do you want to see a creative strategist? What do you want to push the boundaries on? So we'll do this show likely either next week. or the week after. I got to look at what we have scheduled. We have some great guests coming up, y'all. I'm very excited. I know sometimes, you know, the show,

Starting point is 00:47:33 I do a lot of the shows. Sometimes we go through periods where it's a lot of guests. Sometimes it's a little bit in between. We have some fantastic guests coming up. But let me know what type of hands-on you want to do. We're going to do it live. We're going to do it hands-on. Let me know what you want to see.

Starting point is 00:47:50 Also go to our website, your EverydayAI.com. Sign up for our free day in the newsletter. We're going to be recapping this. But like I said, high level here, GPT 4.5, it's out. It's only out for pro users right now on that $200 a month plan. Or if you're paying through it via the API, which is crazy expensive. This is not a groundbreaking model by traditional metrics.

Starting point is 00:48:16 But I do think it could be a groundbreaking model by just the vibes, by how we feel, by how we interact and think about the AI. All right. A couple of questions. Let me see. Douglas was saying, how would a mixture of models compare to the idea of a reasoning orchestrator and then transformer agents specialized? Oh, Douglas, you're really trying to push this episode to more than an hour. All right. I think I covered most of that in the AI predictions show. So say, I'll say go listen to that, Douglas. Maybe I'll leave you a more thoughtful comment on the live stream later and explain that.

Starting point is 00:48:54 sorry if I got that wrong, is asking Jordan, would you use 4.5 instead of 4-0 moving forward? Here's the thing. It's better. 4.5 is better than 4-0, right? It's just not better in the same step that normally a new model would be, right? There's very few metrics or instances where 4-5 is going to be worse, right? Which is interesting because a lot of the chatter so far around Saunit 37 is a lot of people are saying it's worse for certain situations than three five. It's still too soon to answer that. But from everything that I've used it for, unless I need speed out of 4-0, which is usually not something I'm looking for, right? I'm patient enough. But I don't think that I'll be using 4-0 much, except, you know, in GPTs, right, except with tasks.

Starting point is 00:49:49 But for the most part, if I'm looking at a non-reasoning model, I'm probably going to be using 4.5. Samuel asking, does 4.5 support live voice, canvas, et cetera? So voice is still powered by 4-0, but you can be in 4-5 mode and use voice. I did test that last night. So it's not a new voice model, but 4-5. still integrates in work with voice mode. It does also work with Canvas.

Starting point is 00:50:19 I did test that last night. I also test the combination of the two. So you can be in GPD 4.5. You can use voice mode and it will update in Canvas. So pretty cool. Sam Sara from YouTube is saying, why is Google so bad? They're so bad that no one wants to compare their models against Gemini.

Starting point is 00:50:35 I think Google's great, if I'm being honest, right? I think their front end Gemini chat really was neglected until about five. to six months ago. I think the new Gemini models are fantastic. I think their integration into Google workspace leaves a lot to be desired. I think their AI studio is extremely powerful. But no, I think the Google models, I mean, they're top of the charts for many benchmarks, including the L.M Arena.

Starting point is 00:51:06 Yashel with another great question here. Is there a benchmark to measure EQ for AI products? As far as I know, no, because I was researching. the same thing. So yeah, that should be interesting. How can you benchmark these soft skills? I don't know if there's going to be one that's developed. I would assume after this model, there will be one that developed right now. There isn't one. Douglas asking, are you going to look for a PPP update that has transformer model and reasoning model for the different methodologies? Great. Great question, Douglas. So the PPP, and it's still going to be free,

Starting point is 00:51:40 the updated PPP is still going to be based on the the GPT infrastructure and the PPP Pro also free. We'll go over prompting for reasoners as well as some other advanced features. All right. We got through most of the questions, y'all. Thank you for tuning in. I hope this was helpful. Let me know in the comments.

Starting point is 00:52:01 Please share this with your friends. If this was helpful, you know, our team spends a lot of time putting this together. We want you to be the smartest person in AI at your company. So if this was helpful, please let me know. others know as well. Share this and go to your everyday AI.com. Thanks for tuning in. Y'all. We'll see you tomorrow and every day for more everyday AI. Thanks y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across

Starting point is 00:52:42 Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going.

Starting point is 00:53:15 For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 472: OpenAI’s new GPT-4.5: What’s new and who can benefit the most

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.