Everyday AI Podcast – An AI and ChatGPT Podcast - EP 474: GPT-4.5 hands on - What it can do and how you can use it

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. GPT 4.5 is officially the world's best large language model.

Starting point is 00:00:53 But how do we use it? Right. This is something that a lot of people have been talking about for the last few days since OpenAI released its first big updated base model in more than two years, right? because two things that we're going to talk about today and we're going to do a live demo as well about how this new model, it's more relatable and more reliable. But that's really just begged the question, okay, what does that mean for actually how we use it?

Starting point is 00:01:29 What is it actually better at? And when we talk about a model excelling in emotional intelligence, I mean, you can't really benchmark that. So how do you actually know when you might want to really take advantage of this new GPT 4.5 model? All right. We're going to be answering hopefully those questions and a lot more today on Everyday AI. What's going on, y'all? My name is Jordan Wilson and I'm the host.

Starting point is 00:01:57 And Everyday AI, it's for you. It is your daily live stream podcast and free daily newsletter, helping us all not just understand AI, but how we can use it to actually grow our companies and grow our career. Yeah, when all these new models come out seemingly every single week, you might be wondering, is this for my department? Is this for my company? Well, hopefully, at least after today's show,

Starting point is 00:02:20 you'll have a little bit better of an idea, at least when it comes to GPT4.5, Open AI's newest model. All right. So if you're new here, thank you for tuning in. This thing's live. It's unscripted. It's unedited. So, you know, I try to bring you something real in artificial intelligence,

Starting point is 00:02:37 which I think is rare nowadays, right? Everyone's creating all these super polished rundowns of, you know, models and, you know, using AI, even avatars even, right? Like, this is real. So, you know, if you are listening on the podcast, thank you for tuning in. Please make sure to subscribe to the show, leave us a rating. That would be great. And join the live stream, right?

Starting point is 00:02:57 Yeah, we have real, other real humans, you know, hanging out here in the live stream. So thanks for everyone joining us, Max, Holman down in Chicago. big bogey face on the YouTube machine. Douglas, Woozy, Sandra. Let's see, who else? Christopher, Brian, Marie. Thank you all for joining. All right.

Starting point is 00:03:16 So I am going to need your help. All right. I don't know if these comments went on YouTube. Maybe they did. Maybe they didn't. All right. But I listed 13 different questions. All right.

Starting point is 00:03:31 I want you all to go through. I have them on my screen here. I'll show them in a bit. they're super small. Just write down the number of which one you want. All right. So just say, hey, I want to see number five. You know, I want to see number seven.

Starting point is 00:03:45 All right. So live stream audience, I need a little help from you guys. If you scroll through the comments, hopefully you should see it. I do have a slide up here later. It's super small. But let me know which one you actually want to hear. All right. Let's get into it, y'all.

Starting point is 00:03:59 So if you want the daily AI news, sorry, go check the newsletter. Didn't have time to put it all together this morning, if I'm being honest, because I was putting in a lot of work on this show. I don't sleep a lot if you haven't noticed, you know, live stream audience by looking at me. I don't sleep a lot. Yeah, I have AI go do a lot of my homework, but it's a lot. So if you do want the AI news, we're going to have that in the newsletter. Don't worry. And this is also as an FYI.

Starting point is 00:04:26 This is part two. So I specifically, you know, I noticed and I heard from a lot of you all, right? Like, you're like, hey, these shows are too freaking long, right? I don't need an hour, 10-minute podcast on a new AI model. So we actually broke a bigger show down into two separate parts. So if you want to hear the first part, where we went over a lot more of the technical detail, we went over some benchmarks. We went over a lot of those other things for Open AIs new GPT-45 model.

Starting point is 00:04:54 So if you want that, go listen to Episode 472. So, you know, you can just scroll like two episodes back, two episodes back. That one is called Open AIs new GPT4.5, what's new and who can benefit the most. But today we're going to be looking at really a comparison. We're going to be going in and actually using this model, hopefully with some feedback and suggestions, some suggestions from our live stream audience. But let me just go over some of the basics. Some of these we already covered in the previous show, some things we didn't.

Starting point is 00:05:27 So here's kind of some talking points from Open AI. So they are reiterating that GPD 4.5 is a research preview. It is their largest and best model for chat, all right, for chat. It is a step forward in scaling up pre-training and post-training. And by scaling unsupervised learning, GPT 4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning. All right. Early testers, early testing shows that interacting with GPT 4.5,

Starting point is 00:06:00 feels more natural. It's broader knowledge-based, improved ability to follow user intent and greater EQ, emotional intelligence, make it useful for tasks like improving, writing, programming, and solving practical problems, and they also expect it to hallucinate less. All right. So this is kind of some of the bullet points that I said on our first show. You know, this is the last non-chain of thought model developed by Open AI. So Open AI CEO, Sam Altman, did say, hey, future model.

Starting point is 00:06:30 that are under this GPT5 kind of architecture, it's going to be a hybrid model. All right, so keep this in mind. But this is essentially a new base model. All right. And when we talk about reasoning models like 01 and 03, right, so we might not actually, you know, see an 04 as an example, right? Just like we might not, you know, see, you know, certain minis, 04 mini, right?

Starting point is 00:06:56 It just might all be under GBT5. We don't know yet. they might say it's GPT5 and it uses 04 reasoning, right? But in the future, you're just going to be working with one model. And that's why this is extremely important. I think a lot of people were kind of like upset, right? And they're like, oh, this GPT 4.5 didn't break every single benchmark, right? This GPT 4.5 is extremely expensive in the API.

Starting point is 00:07:19 Yeah, I don't know any company that is going to be able to afford to use this in the API, right, for like 75 input and 150 output per. million tokens, which is just, you know, 30 times more expensive than their previous GPT4 models. So, but I did say after the first show, I said, humans are really going to like it, right? Because this is, I think, I've always, you know, anytime you go chat, put unquote, chat with any AI model, I don't know, to me, it's never felt human. feels like you're chatting with a machine. GPD 4.5 is the first time I felt different, right? To me, it doesn't feel like I'm chatting with a human.

Starting point is 00:08:07 I don't know what it is about me and how I tick as a person, right? I know, right? Yeah, it's like, yeah, I'm talking to a computer still, but it felt like real human. It felt like a real human computer person, right? I know a lot of people like reading people's first experiences, they're getting enthralled in GBT 4.5's ability to carry on a conversation and to show kind of these EQ tones that make humans human, right? People are always like, hey, what, you know, what separates humans from AI, from humans from large language models? And widely, those things are usually

Starting point is 00:08:45 considered things that are EQ, right? Emotional intelligence, being able to understand nuanced conversation, right? And right now, at least from a tax standpoint, GPT 4.5 is doing amazingly well. And I did predict that humans are going to like it. And sure enough, humans loved it. Because even though GPT 4.5, and again, this is a preview, even though it did not break every single benchmark ever, right, which is what I think a lot of people were expecting or were hoping from this model, what it did do in the LM arena. So I talk about this. The easiest way to think about this is, you know, those like blind, like Pepsi versus Coke tests, right?

Starting point is 00:09:29 From, I don't know when that was, the late 90s, early 2000s, right? Someone goes, there's no label. They drink both and they say, oh, this one's better. Right. That's kind of what ELO rankings are or the arena score from LM Arena. All right. So what this means is you put in a prompt, you get two different outputs. They're blind.

Starting point is 00:09:52 you know, so you don't know which one's which and you choose which one is better. So this is, at least in terms of how humans actually use a model, right? Yes, we have dozens of benchmarks that test different things from coding to writing to math to science, right? So you have all of these kind of, you know, systemized and organized and categorized benchmarks, but it's always like, well, what about humans, right? Do humans care? Well, humans notice. Well, with, GPT 4 or 5, the answer is yes, right? Because it quickly shot up to the number one spot in the LM Arena Board. So think of this, you know, every single model out there is in this, right?

Starting point is 00:10:36 When you go into this blind taste test and GBT 4.5, immediately once they got enough votes to rank on the chart, they were number one. So the best model in the world. I do know a couple hours later, you know, Grock's newest version came on. So I think it's technically in a tie now. But still, even without smashing every single benchmark, this new model, GPD 4.5, just elevated itself to, I think, probably the most preferred general use case model in the world, which is extremely important because, like I said, in the future,

Starting point is 00:11:12 these reasoning models are going to be built on top of this. All right. So let's talk a little bit before we jump in live. in live stream audience, I see a couple of you voted if you could go through. Let me know which one, which one you want to see. I agree with Douglas. Douglas said Jordan needs coffee. That was me sitting on the coffee.

Starting point is 00:11:34 But let's talk about a little bit about the model itself and how it performs inside of chat GPT, inside of the chat GPT interface. Also, FYI, let me get this off my chest, right? Because there's people online, you know, and they're like, oh, I tried this model. It's in, you know, and I'm like, oh, okay, how did you try it? Oh, via a third party. You can't do that. All right. So I do have to talk about access because at least as of this hour, GPD 4.5 is only available

Starting point is 00:12:05 for pro users inside of chat GPT. All right, that does cost $200 a month. But presumably either by this week or early next week, that will be going to all paid chat GPT subscribers. So even if you are on the $20. a month, chat GPD plus, you should be getting access to GPD 4.5. So you might not have access to it now, but I highly encourage you. Yes, there's third-party platforms.

Starting point is 00:12:36 You know, if you're on paid versions of other, you know, perplexity or PO or something like that, you can probably go use 4.5 in a limited capacity if you're on a paid plan for one of those services. But that's not the best way to understand a model, right? You should be using it in its natural environment. So, yes, there's also the API that's available. that's extremely expensive. All right. So if you are trying to see what's best for your team, and a lot of times I don't understand why every single Big Fortune 500 in America doesn't have at

Starting point is 00:13:06 least a Teams or an enterprise chat GPT account. It's mind-boggling to me because, yes, your company can have an internal version, right, that maybe you use for customer support or sales or something like that, but you should all, you know, and I'm not just saying chat GPT, but you should every single employee if you are a CEO of a small to medium sized company if you will if you are a in an IT leader if you're a CMO whatever you need to be pushing for your entire team whatever your AI operating system of choices you need to also have a full team or enterprise license whether that's chat GPT Gemini obviously co-pilot right if you're a windows organization Microsoft organization quad whatever it is but because when you are

Starting point is 00:13:51 using these models inside the chat interface, they come with a lot of tools, right? I did a show probably about a year ago. I should update it, you know, kind of like, hey, what needs to happen, you know, for us to get to something like artificial general intelligence? And one of the things is, number one, a model needs to have access to the internet. And number two, it needs to also have tool use, right? So this tool use, this occurs inside of the chat GPT interface. And yes, third parties sometimes have versions of some of these tools, but I mean, the tools are best in the native interface.

Starting point is 00:14:27 But right now, at least, not all of the tools and features work with GPT 4.5. So let's go over what does work and what is available versus what isn't. Because OpenAI didn't say this. I went through and tested it all for you so you know. So, again, whether you have a pro account now or you're going to be getting GPT 4.5 in the coming days or weeks, here's what's available out of the box. Okay, so projects. You can use GPT4-5 in projects.

Starting point is 00:14:58 You can use Dahl-E, you know, the AI image generator, which I don't know why anyone would, right? It's not that good, although it will be getting updated soon. SORA does have photo capabilities for early beta testers, just FYI. And also, if you don't know projects, that's essentially where you can organize chats into one folder, which is great, but you can also upload documents that that fold their chats can access to as well as special instructions. So it's similar to GPTs a little different.

Starting point is 00:15:26 So GPT4-5 does have access to projects. It does have access to Dali. It does have access to chat GPD search, which is extremely important because actually the knowledge cutoff for GPT 4.5 was rolled backwards. So its memory is a little worse, or at least it's the recency in the training data. So GPT40 is June 2024.

Starting point is 00:15:50 GPT 4.5 is October 2023. So keep that in mind. And that's why it's important that GPT 4.5 has access to chat GPD search. You can upload files to GPT45, which is a must. So glad that's there. Also, Canvas mode, one of the most underrated, I think, tools or functionality of any, you know, AI, you know, large language model company out there. Canvas is available.

Starting point is 00:16:16 So right now, unavailable, and this is as of the time of, you know, when I checked 19 minutes ago, okay, right now tasks do not work with GPT-4-5 and GPTs. So those custom small versions of chat GPT that you can create doesn't work right now with 4.5. Both of those things, both tasks and GPTs, those obviously still work with GPT4.4. All right. So let me just boil this out of two things. I wish OpenAI would just put these two words somewhere very large on their GBT4.5 page because a lot of people are asking. Two things that I think really separates the biggest difference between four or five. And this is in my experience so far.

Starting point is 00:17:11 It is more relatable. So more human-ask, right, that EQ. And more reliable. So we went over the reliability. a little bit in our first show, going over benchmarks, accuracy, lower hallucinations, etc. It just knows more. It knows way more.

Starting point is 00:17:28 Actually, there was a, you know, there's a website that does, sorry, IQ scores for large language models. And GPT-4-5 actually got the highest score for a non-reasoning model, which is pretty impressive because that was the first time a non-reasoning model performed at the same IQ level. as the average human, right, which is pretty big, right? When you have a reasoning model, it does way better because it uses more compute. But the fact that a non-reasoning model in GVT4-5 scored this high on an offline IQ test, right? So this is an IQ test that is not in training data.

Starting point is 00:18:09 It's pretty impressive. So it is definitely more reliable, but it is also much more relatable because the emotional intelligence. So this is from OpenAI, but more natural. human-like interactions than GPT-4-O. It's better at reading and responding to emotional cues, and it is preferred by users as well against GPT-4-O. All right. So we're going to jump in.

Starting point is 00:18:33 We're going to jump in soon. All right. I know this is small on the screen. I don't know if these comments posted to YouTube as well, but I'm going to go ahead and try to send them again. So live stream audience, I know a couple of you guys have already voted. I don't know if these comments are coming through. hopefully they are, but I have 13 essentially examples.

Starting point is 00:18:52 All right. And I want to do these live. I want to maybe do two or three. We'll see how long it takes. And I want to show you the difference, the difference between a query in 4-5 and a query in 4-0. Full disclosure, haven't run any of these yet, right? I run a ton of tests, but I like to do this live.

Starting point is 00:19:13 This is unedited, unscripted, right? So live-stream audience, just put the number. Try not to put anything else. Just put the number. And I'm going to scroll through the comments here on the right side of my screen, you know, bringing in comments from from LinkedIn, Twitter, YouTube, et cetera. So which one do you actually want to see? So I haven't done these.

Starting point is 00:19:32 And I'm going to read the prompt out. These are very short prompts, right? They're supposed to be short. I'm not going to go through the whole like prime prompt polish process, which if you want the best output, you should be doing the basics of prompt engineering still. But I want to show you just hopefully some short prompt. prompts the inputs and then the outputs. And we're going to run this in GPT 4-5 and GPT-40 and talk a little bit about the differences.

Starting point is 00:19:59 And hopefully we'll see the difference. Who knows? Maybe we won't. That's the downside of trying to do unscripted, uned edited demos and examples inside generative AI. So maybe if you're brand new to chat GPT, large language models, generative AI, and you don't know a ton, that's fine. we try to keep it simple, but let me say this. Generative AI in large language models, they're generative. They're not deterministic.

Starting point is 00:20:25 So what that means, as an example, if you go search for something on the internet, search engines are, for the most part, deterministic, right? Yes, there's some personalization and some localization, but for the most part, those search results are going to be roughly the same every single time you put them in. A large language model is complete. It is a roll of the dice. It is generative, right?

Starting point is 00:20:46 It is, you know, there's some next token prediction. So, you know, you could in theory put the same prompt 10 times. You could get nine very different answers. You could get two very different answers. You could get five things that are pretty much the same, but just worded differently. So that's another thing to keep in mind. Generative AI is generative, right? Which is why sometimes these live demos are super fun.

Starting point is 00:21:09 All right. I see some votes here. All right. I'm looking through here. I'm seeing which are some of the most, some of the most voted ones. So I have 13 different examples on the screen. And I really focused on a couple things.

Starting point is 00:21:24 So these prompts are supposed to, you know, rely on creativity and intuition, you know, so storytelling, being able to think and write clearly, strong in design and creative tasks. But they're also really around these four categories where I think EQ shines in a large language model. Right.

Starting point is 00:21:42 So think. So if you are using, chat, GPT, as a personal or life coach. So some of these prompts are more in line with that. If you're using it as a therapist, if you're, or a work therapist even, right, to work through tough problems, tough issues, how do I send an email, right? Those things. Content writer, business strategist, and creative partner.

Starting point is 00:22:04 So that's where I think some of the categories where the everyday person is going to really see the benefits of four or five. So let me repeat that. if you're using this as a personal life business coach, therapist, content writer, business strategist, and creative partner. Yeah, there's other things that it's actually going to perform really well in. You know, I know a lot of people are saying, oh, 4.5 isn't great at coding. It's actually really good at coding across the board, like across the LM Arena test. It like swept everyone in almost every single category.

Starting point is 00:22:35 So it is measuredly better in almost every single category that you would use a large language model for. I think hopefully we'll see the biggest improvements in some of these areas. All right. So let's go ahead. Let's see if we can do this. Live stream audience, as always, please let me know when you can see my screen. We're going to do this live podcast audience. I'm going to try not to make this one too long.

Starting point is 00:23:01 I'm going to try to be somewhat concise. All right. So as a reminder, if you have a normal chat deep D plus plan and you log on it today, you're not going to see this 4.5, all right? But when you do, I'm guessing within a couple of days to a week or two, this should be rolling out to most paid users. So chat ChbT Plus, chat ChbT teams, chat EDU, as well as Enterprise. I think Enterprise might be a little after for those companies that hire us to train their large teams.

Starting point is 00:23:33 You might not be getting Enterprise in your PPP biz training, at least not in March. All right. So we need to select GPD 4.5 in the drop down. All right. So last dream audience, let me know if you can see this. I'm looking to see which one, some of our most popular ones. Okay. Adobe just introduced an entirely new way to create,

Starting point is 00:24:03 bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI assistant. Now live in the Adobe Firefly app, the all-in-one creative AI studio. powered by Adobe's creative agent, Firefly AI assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life.

Starting point is 00:24:38 You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. All right.

Starting point is 00:25:12 I see quite a few people. And again, both on LinkedIn and YouTube, it looks like in a couple of on Twitter. Okay, so I'm seeing a lot of threes. I'm seeing some sixes and 13th. Let me go up earlier. Five, 12. Okay. 9, 11, 7, 12. Okay. So I see, I see quite a few 12s. All right. So let's go ahead and grab number 12, shall we? All right. Lives in your audience. Can we see? Can we see the screen? All right, let's do number 12. Let's see what number 12 was. Okay, people, people, people were going for humor.

Starting point is 00:25:53 I wrote this one and chuckled. All right. So here's what number 12 was. People wanted to see this. All right. It's saying you work at Microsoft, write a convincing memo on why Microsoft Zune should be reinstated. Y'all remember Microsoft Zune? All right.

Starting point is 00:26:10 So actually what we're going to do is we're going to first do this with 4.0. I'm going to read the response. Then we're going to do the same with 4 or 5. All right. And we'll see how long it is. All right. Now, I'll try to do my best to even describe what's going on. So unfortunately, chat GPT is being extremely slow right now.

Starting point is 00:26:32 So this is going to be fun. All right. So GPD40 is giving us a memo. It is emoji heavy, which I'm not a fan of. And also, I intentionally left these prompts kind of open-ended, right? Because I'm doing this for everyday people, right? everyday people write it in a super short prompt and they hope to get a lot out of it. And I think, I think and I hope maybe that's where GPT 45 shines.

Starting point is 00:26:55 Okay. So here's what GPT40 says. Here's a compelling internal MO advocating for the return of Microsoft Zoom. I really wanted to Microsoft Zoom when it first came out. All right. So like I said, there's a ton of emojis. It gives a date to Microsoft Leadership Team from your name, your position, subject, the case for reinstating Zoom. So it's pretty long so far.

Starting point is 00:27:22 It's going a little bit slow. So we might have to cut it off here if we want to do two or three quick examples without turning this into an hour show. So it starts with an executive summary. It says the Microsoft Zune, despite its untimely discontinuation, was ahead of its time in fostering a seamless ecosystem of music, community, and content consumption. Given the resurgence of retro tech nostalgia, the rise of the rise of. dedicated music devices and the renewed focus on hardware-specific integration, Microsoft is uniquely

Starting point is 00:27:54 positioned to reintroduce Zoom as a modern AI-powered cloud-connected audio experience. This memo outlines how a Zoom revival aligns with current market trends in Microsoft's broader strategy in AI, hardware, and consumer engagement. So so far, executive summary from 4-0 seems pretty good. little wordy. I don't know that we need to, you know, say things like this memo outlines, you know, depends on, you know, how you would communicate to your coworkers internally. I wouldn't put something like that in an internal memo saying this memo outlines.

Starting point is 00:28:30 But other than that, pretty good. So let's go through. So here's number one, it says the market opportunity. Number two, it says the vision for Zune 2.0. Again, we're seeing some emojis, which for the most part, I don't think we'd see on an internal memo, but that's fine. So it's going over key features of the Zoom reboot included AI powered music curation, lossless and spatial audio support, Microsoft 365 and Xbox integration,

Starting point is 00:28:59 cloud sync and offline playback in Zoom Social 2.0. I know I know we have some listeners from Microsoft. So let me know if this is going to hit the shelves anytime soon. Three, it goes over competitive advantage for financial viability and go-to-market strategy. So pretty good. It has a launch plan, a marketing approach. Conclusion, a strategic appeal. All right.

Starting point is 00:29:26 So I'll just read the intro and the conclusion for both. So conclusion, the Zoom brand still holds strong, emotional value. And with the right vision, it can be reintroduced as a powerful differentiator in Microsoft's consumer hardware and AI portfolio by integrating AI superior audio technology. and cloud capabilities, the next generation of Zoom could redefine how people explore music and content. Let's bring Zune back, not as a competitor to smartphones, but as the intelligent music companion for the AI-driven era. All right. So not bad. You know, I wouldn't ever, you know, use this necessarily. But, you know, my background, it's in content writing, right? I was a journalist. and, you know, more or less, I've been getting paid to write for two decades.

Starting point is 00:30:15 So I wouldn't use this. I'm sure some people would look at this and be like, oh, good enough. Not necessarily. All right, but let's go ahead and switch model. So yes, if you didn't know at the bottom of any prompt, there's these little icons, right? So once you do get access to four or five and you're trying to see which model should be my daily driver, I encourage you to run tests like this. You can also do this in the back end of ChatGPT's kind of playground.

Starting point is 00:30:41 it will cost you on the API side. So I would probably just wait if you don't want to pay those super expensive prices. All right. So we're going to do the same thing with 4-5 now. So I just clicked 4-5. Interestingly enough, okay, this is great. 4-5 was, I'm guessing, smart enough to know that I would probably want to use Canvas mode. So it's, uh, love this, love this, right?

Starting point is 00:31:14 Uh, super smart, FYI. Uh, so it started answering it in canvas mode, even though I didn't tell it to, uh, 4-0 did not because it probably knows, right, as a memo, it probably knows I'm going to want to go back and forth, kind of like how I said, eh, I wouldn't use this. Uh, it, it probably knows that, hey, you're going to want to update this. So let me go ahead and throw this in canvas, even though you didn't ask me to, right? and then it has the option here to click to answer in the chat instead. All right.

Starting point is 00:31:44 So let's go ahead and take a look. We're going to read both the intro and the summary, much more concise. So personally, I like that. Maybe it's because I use AI too much. And I think, you know, let me just go on a small and short soapbox here. Large language models are trained to be helpful assistance. right? It's in their system prompt. And usually what happens if you don't go through this, you know, what we teach like the refined Q process, right? And that's when you make a large model

Starting point is 00:32:21 smarter, smaller, and more specific for all of you that have taken our PPP course, right? For the most part, a model doesn't necessarily know exactly everything that it would want to know in order to provide an answer. So this is something I've seen from four or five and I personally appreciate. but this might be a preference thing. GPD 4-5 is much more concise, right? Where 4-0 and previous models, they just get super verbose, right? Like, and this is something that large language models do all the time.

Starting point is 00:32:53 If you don't give it a lot of detail, if you don't go through a back-and-forth process of essentially, you know, you having a conversation with, you know, 4-0 or 4-5 about the Zoom, and here's my role, here's my reasoning, here's my rationale. A lot of times it just is going to, spit out a bunch of overly kind of generic content.

Starting point is 00:33:12 And it's going to try to tackle thing from too many different areas, but it's not going to be super strong. So that is something that I've noticed in my, I won't say heavy use of GPT 4 or 5, but I mean, I've used it fairly often. I know it just came out like Thursday, right?

Starting point is 00:33:27 And today's Tuesday. So, you know, a big chunk of that was the weekend. And I was trying to sleep for the first time in like a week. But I've used it very heavily since it came out. And this is something I've noticed. Answers are much more concise, sometimes more confident, which can be a good or a bad thing.

Starting point is 00:33:43 You really just have to have an eye on the output, but it's much shorter, right? Another thing, I see no emojis, which I personally like. All right. So let's go ahead. So here's the internal memo. So it says to senior leadership team from your name, strategic product development, date, subject. All right. So let's at least read the overview, the conclusion, and then talk about what else is in here.

Starting point is 00:34:06 And hey, live stream audience, let me know which one also you think is better. All right. So overview, given current market trends and the growing nostalgia for early digital media devices, I propose that Microsoft just strategically reinstates the Zune product line. Originally discontinued in 2012, the Zune has since garnered a devoted fan base and iconic status, becoming emblematic of a simpler digital age. All right. So then it goes over Market.

Starting point is 00:34:39 opportunity. It outlines strategic benefits. A lot of similar things that we saw from 4-0, right? So brand strengthening, ecosystem expansion, talking about integrating it with Microsoft 365, Xbox GamePass, Azure Cloud Services, that's cool, differentiation, the implementation approach, going through initial market research and validation, modern product development, marketing and launch strategy as well. And then the conclusion. So, reinstating the Microsoft Zune is a strategic decision with significant potential upsides. It positions Microsoft at the forefront of a profitable retro tech revival, reinvigorates consumer interests, and strengthens brand affinity.

Starting point is 00:35:23 I look forward to discussing this further and obtaining your insights. So I don't know if this, in this case, if there's one, you know, definitive winner and one definitive loser. For me, my personal taste would be the second one, right? an internal memo, I mean, 4-0's version, super long, right? So if I X out of canvas here, if you didn't know this at the bottom, you'll see now one of two and two of two because I use that little model switching icon. So I can just toggle left and right and compare the different outputs.

Starting point is 00:35:58 So for me, I much prefer the second version. All right. Let's see, it looks like some other people wanted number one. So let's do number one. I think this one might be a good example as well. All right. So we're going to launch a new chat. We're going to start in 40.

Starting point is 00:36:17 We're going to do 40 first and then we're going to do four or five second. All right. So this one, a couple of you voted for, write an email to a colleague who missed a deadline due to a family emergency. All right. So we're doing GPT 40 first. So subject line, hope you're doing okay. Hi, colleague's name. I just wanted to check in and see how you're doing.

Starting point is 00:36:42 I heard about the family emergency and I hope everything is okay on your end. If there's anything I can do to help or support you, please don't hesitate to reach out. I understand that the deadline project name was affected and I completely get it. Family comes first. Let me know if you need any assistance in catching up or if we should discuss a plan to move things forward. Take care and looking forward to hearing from you when you are ready, best your name. pretty good. I don't necessarily see, again, I'm not, let me be honest, I'm not the most empathetic person, right? I'm not the, you know, I'll say my IQ is probably higher than my EQ. So I don't

Starting point is 00:37:22 see anything wrong with this email. Maybe it's, I don't know, maybe it's, it's, it's, you know, setting off a bunch of red flags in your mind. All right, let's go ahead. Let's switch over to that GPT4.5. And let's see if we can spot differences in how 4.5 would handle this. Presum, it might be, I would hope it would be better, right, since, you know, 4-5 is supposed to be much more relatable to humans. All right. So same thing here. 4-5 launched this in Canvas, which I like, because then you can use the Canvas features,

Starting point is 00:37:55 right? As an example, you can highlight something. You can suggest edits. You can adjust the length. You can change the reading level. You can add polish. You can add emojis, right, which I would never do. But launching it in Canvas, again,

Starting point is 00:38:07 Again, small thing, super smart, but four or five assumes, right? There we saw it twice. The model says, oh, okay, well, you're probably going to want to add some of your own insights. There's some placeholders in here, right? If you want to get work done, we're going to launch this thing in Canvas. All right. So subject line, checking in and offering support. Hi, colleague's name.

Starting point is 00:38:28 I wanted to reach out and let you know I've been thinking of you. Okay. I see one difference right away, right? again, I don't know if this is this is good or bad, right? But four or five is saying like, Ben thinking of you, all right? I'm trying to jot some notes because I actually can't as easily compare to them in canvas mode. I heard about your family. And I'm truly sorry for, I'm truly sorry you're going through a difficult time.

Starting point is 00:38:58 Okay. Please don't worry about the missed deadline. Okay. Interesting. Don't worry about the missed deadline. Your family and personal well-being are the priority right now. If there's anything I can do to support you during this period or assist with any pending tasks at work, please feel free to let me know.

Starting point is 00:39:20 We're all here to help you manage through this. Take all the time you need and please keep me updated when it's convenient for you. Okay. So actually, now that I'm reading this, I'm seeing. some nuances, right? Some small things, right? And again, this is not my area of expertise. I'm not going to be hired as anyone's HR head anytime soon. But I do see a couple of things in 4-5's response, and I'm kind of bolding them on my other screen right now. But let me know live stream audience, which one was better? And do you see anything that you liked in 4-0 versus 4-5 or 4-5? Or 4-5?

Starting point is 00:40:05 versus, you know, four. Let me know if one was preferable. All right. So here's a couple of things I picked up on. So let's go back to 4-0. So at least for me, 4-0 starts with saying, right? Again, I didn't think there was anything necessarily wrong with 4-0's response until I read 4-5's response.

Starting point is 00:40:36 And I'm like, wait, okay, some things are better here. So 4-0's response, the first thing it says, says wanted to check in, right? Yes, it says wanted to check in and see how you're doing. But I think even when you read that email, right, if you're in that situation of a family emergency and someone says, wanted to check in, it sounds kind of business, right? It sounds, I guess, a little cold. Whereas four or five says the first sentence, I wanted to reach out.

Starting point is 00:41:11 and let you know I've been thinking of you. Okay. There we, I mean, I mean, just that right there, I think you can hopefully see and realize the, the bump in EQ, right? And I think maybe that's where there's also, I don't know, in my mind, I'm also, you know, trying to describe in real time the vibe. The vibe of the 4-0 letter I'm getting now that I'm reading it is sympathetic. you know, with a little bit of like, hey, let's get this project going forward.

Starting point is 00:41:45 Where 4-5, I think, is maybe a little more empathetic and talking about working together to move something forward. That's what I'm getting. 4-5 says, please don't worry about the missed deadline, right? where 4 says, you know, I understand that the deadline was affected, where 4-5 says, please don't worry about the missed deadline. Okay. 4-0, you know, to kind of move the project forward, says, let me know if you need any assistance in catching up or if we should discuss a plan to move things forward. okay so again when i'm reading that by itself i'm not necessarily like oh this is bad all right and then

Starting point is 00:42:38 four five says if there's anything i can do to support you during this period or assist with any pending tasks at work please feel free to let me know all right where even just saying please feel free versus four oh just says let me know essentially about these tasks and then four five again, it looks like showing a little more empathy versus sympathy and maybe prioritizing the family situation where at least now, as I'm kind of comparing the two, you know, it looks like 4-0 is just like wrapping up some sympathy and like, yo, let's get this project going, right? Which I don't know.

Starting point is 00:43:22 What do you guys think? Denny says, 4-5 sounds like the person really does care. and 40 sounds like I need to write this email to show I care. That's a great observation from Denny Max says either one would work. 4-0 is what I usually would expect from the regular office people. 4-5 is superior EQ and empathetic more than usual office humans. Yeah, that's what I'm saying, right? Like when I first saw 4-0, I'm like, nothing wrong with this, right?

Starting point is 00:43:54 But then when I said 4-5, all of a sudden, I'm like, oh, okay, yeah, I can see how on the, you know, on the human side, there's maybe some things that could have been improved in this 4-0. Michael said, I would prefer to receive 4-5. I feel like I would write something closer to 4-0. My gosh, Michael, we are the same, right? We are the same. I'm reading these and I'm like, oh, yeah, I like that one. But I totally, I totally would have personally written something like 4-0. Yeah, Sandra's saying 4-5 is more empathetic. Jonathan, what's up, Jonathan, Jonathan says,

Starting point is 00:44:29 4-0 minus the hope your well seems more HR-friendly, while 4-5 just seems friendly. Yeah, I agree with that. Yeah, I do think maybe, you know,

Starting point is 00:44:41 4-0 is maybe more in line with, you know, maybe HR guidelines. And 4-5 is probably something that is going to resonate with the human receiving it, I think, much more. Cecilia said 4-5 version

Starting point is 00:44:54 drops the need to move the deadline forward and puts it on a pause. 4-0 makes it clear that we will need to move things forward on the specific deadline missed. Yeah. So I guess it ultimately depends on what's the most pressing thing, right? Is the company going to shut down if this deadline isn't met? Right. So great observations. I'd love doing this live, y'all, because you know, you guys spotted a lot of things. Robert from Twitter saying 4-5 has more empathy in it. All right. I think we can do one more. So scrolling through here to see one more we can do that people voted for. Okay, let me bring this in. So a lot of people wanted

Starting point is 00:45:39 number six. So let's try that. All right. So we're going now to a new chat. We're going to start with 4-0. Here's number six. All right. So it says, draft a daily motivational message that encourages personal growth with clear fact-based tips. Not a lot of guidance. We'll see what happens. All right. So first, we're running with 4-0.

Starting point is 00:46:06 Lots of emojis. So if you're personally motivated by emojis, you can already tell, 4-0 is your model, right? That's something I've noticed as well. Not getting a lot of emojis out of 4-5. All right. So we can read this.

Starting point is 00:46:22 So here's 4-0's daily motivational message that encourages personal growth with clear fact-based tips. All right. Live stream audience, let me know after I read both of these, which one gets you pumped up, which one of these is getting you ready to face the day. All right. So daily motivation, small steps, big growth, rocket emoji. All right. Every day is a chance to become 1% better.

Starting point is 00:46:43 Personal growth isn't about massive leaps. It's about consistent small actions that add up over time. Fact-based tip. All right. We're even labeling it here. studies show that setting up specific achievable goals increases your chances of success by up to 42%. Harvard Business Review. I'm obviously going to Google that to see if that's real.

Starting point is 00:47:04 All right. And it is. All right, good. So at least 40 did not hallucinate that part. All right. So studies show that setting specific achievable goals increases your chances of success by up to 42%. Instead of vague intentions like get healthier. Try walk 10,000 steps daily or read 10 pages before bed.

Starting point is 00:47:29 It's actually a great tip. Actionable step. Pick one small habit to improve today. Whether it's drinking more water, writing down your thoughts, or practicing deep breathing for focus, small wins lead to big transformations. Keep moving forward. Your future self will thank you. Emogies. Hashtags, right?

Starting point is 00:47:49 Sorry, I hate reading this. This is everything that's on social. media right now, right? I want to bomb it as a content writer. I hate this. I hate anything one-shotted from any large language model because most of it always reads like this, right? You can get writing, chat GPT writing or AI writing to sound extremely human, but you're, I don't care, right? Oh, people are like, oh, Cloud 3.5 or Claude 3-7 sonnet sounds so human. GPT 4-0, GPD 4-5 sounds so human. No, for the most part. No, it sounds like this. This is what I read on social media on LinkedIn, on Twitter, and I want to vomit my mouth.

Starting point is 00:48:28 There's nothing wrong with the actual tips. I just, the content writing is getting me woozy, like Woozy Rogers. All right, let's go 4-5. So, again, draft a daily motivational message that encourages personal growth with clear fact-based tips. All right. Interestingly enough, I already see one emoji in 4-5. I think forehead like, let's see how many forehead. one, technically, two, three, four, five.

Starting point is 00:48:59 All right, so we had four emojis and two hashtags in four, oh, and then in four, five, we had two emojis, no hashtags, so a little better. All right, ready. Here's four, five. Today's motivation, embrace your growth. Growth doesn't happen overnight. It comes from small, consistent actions taken each day. Here's your simple, fact-based guide to grow personally and professionally

Starting point is 00:49:25 today. Number one, learn something new, even just 10 minutes. Research shows daily learning builds neural connections, improving memory and cognitive skills. Two, practice gratitude. Write down three things you're grateful for daily has been scientifically proven to enhance overall happiness and reduce stress. Number three, step slightly outside your comfort zone. Small challenges creates confidence through psychological resilience, enabling you to face bigger challenges more effectively. Take a small step forward today. You have everything it takes to grow, learn, and succeed.

Starting point is 00:50:04 All right. Which one, live's your audience, which one has you pumped up and ready to run through a wall like the Kool-Aid man? Oh, yeah. Do any of these? GPD 4-0, GPD-4-5. Let's talk quickly about some of the differences. Surprisingly, on this example, 4-0 was a little more. more concise. It was a little too heavy on the emojis, a little too heavy on the on the hashtags.

Starting point is 00:50:31 Not a huge fan of this. One other thing is even if we're just looking at the quality of the content writing, I think 4-0 was a little poor. There is no cadence or switch up in the sentence structure. Yeah, I'm getting a little, you know, putting on my old writing hat, right? You always want cadence in your written content. What that means, I try to throw cadence in my podcast. right? I don't just always speak monotone. I don't always go in, you know, sentences that are, you know, 15 to 20 words. I try to pause. Sometimes I talk slowly. Sometimes I talk really fast and I have these long sentences that go together and there's no period. There's no punctuation and I talk all excitedly. That's cadence, right? So, 4-0 has no cadence. It actually falls

Starting point is 00:51:21 into this compound sentence, right? So, yeah, we're talking about content writing now, but that's something that I think is significantly improved in 4-5. 4-0s, you know, I know you're maybe not, you know, if you're listening on the podcast, maybe this doesn't worry or matter as much. But 4-0 is kind of the equivalent of watching paint dry when it comes to content structure. Yeah, I was a journalist. I wrote a lot. For the most part, most of these sentences are it looks between 12 to 20.

Starting point is 00:51:56 words and the majority of them are compound sentences with an M-Dash. All right. So, yeah, all those people are like, oh, you know, an M-Dash is definitely, you know, a sign of AI writing. Not really, right? I was using M-Dashes back when I was a journalist at the Freeport Journal Standard in 20-02 or 20-03 or whatever, right? Love M-Dashes, love compound sentences, but huge over-reliance on them here from GPD-40. So let's see. One, two, two, three. So out of like the six sentences, three of them are compound sentences with m-dashes. Not good. We only have, let's see, we have zero sentences that I would consider short, which is five words or less. All right. So if we look at GPD 4-5, we only have one compound

Starting point is 00:52:53 sentence with an m-dash. So that's better. Okay. We do at least have one. We do at least have one short sentence. All right. So a little better, a little better in terms of content structure, you know, some cadence, some variance,

Starting point is 00:53:09 but still nothing great if we're just looking at content, right? I know this is more about the motivational message, but I did want to take a second to look at even just how the content is produced because I think that is another small detail that 4 or 5 actually has better.

Starting point is 00:53:25 So yeah, less, you know, like, oh, people are always like, oh, content, right? You can't technically tell, although there's a lot of telltale signs, right? Heavy emojis, double emojis and headlines, you know, random hashtags, you know, like I said,

Starting point is 00:53:40 an overreliance or a heavy percentage of sentences that are compounding compound sentences separated by an m-dash. So overall, the content writing, I think, is much, much better on four or five. All right. So what do you all think as we as we wrap up here? But like I said, these are the areas. And I think you saw it in probably that middle example, probably the best, the email example, how really we're having a little bit of EQ, some emotional intelligence and being relatable as a human, right? A lot of you said the same thing. I said the same thing as well. I'm like, I want to receive that second email that we talked about, the one that was from 4-5. It just felt more human.

Starting point is 00:54:28 It was probably more human than something I would have written, right? Which is pretty impressive, right? It is pretty impressive. And I think that's one of the reasons why this new model, GPT-4-5, when it comes to humans' preferences, right? Yeah, you know, 4-5 didn't crush every single LLM benchmark. it improved on almost all of the benchmarks from 4-0 to 4-5. But, you know, people were like, oh, open AI has hit a wall. Open AI is going to go bankrupt.

Starting point is 00:55:01 Open AI is garbage. It didn't, you know, break every single benchmark out there, right? I don't think most companies, we saw the same thing with Claude 3-7 Anthropic. Saw it, Sonnet 3-7 from, from Claude. It didn't break every single benchmark out there. It really excelled and just, widen their lead in anything software development, anything on the on the dev side, right? But I think now we're going to see companies probably more focused on something like

Starting point is 00:55:31 Elo scores, right? On the on the chatbot arena, right? And they're like, yeah, we hope our actual benchmark, you know, our STEM, our math, our reason, right? All these kind of like quote unquote, more scientific research base category based benchmarks and proof. But I think ultimately we're past that. I think we're past that.

Starting point is 00:55:52 And this is indicative. The fact that GPT-4-5 did not crush every single benchmark on paper that people said, oh, these are important. But at the same time, instantly shot up to the number one model in the world preferred by humans. That says something, right? There is a human side to large language models that I think for the most part, you know, that we ignored before 2023, right? Everything was about overfitting models to hit certain benchmarks.

Starting point is 00:56:22 And I think over time we saw, okay, that's great for benchmarks, but it's not benchmarks using these models. It's humans. It's humans trying to solve real problems. It's humans trying to sell things to other humans, trying to improve customer relationships, trying to increase accuracy and reliability, which are all things I think GPT-4-5 does a great job of. So before you listen to that random influencer online that is just spitting out these benchmarks and It's like, oh, open AI has hit a wall.

Starting point is 00:56:53 I'd say the exact opposite. I'd say the exact opposite. I'd say, if we're being honest, right, a lot of the things that we do on a day-to-day basis are creating communication for other humans. And as someone that's, y'all, I've won national writing awards. I've done okay. I was a Pulitzer fellow. Some of those emails better than I would have written.

Starting point is 00:57:20 If I had to write some of those emails because it's thinking about the human. It is trying to be more relatable. It is really flexing its EQ skills, which I think is ushering in a new era, not just of how large language models are built, but how they ultimately should and could be used to strengthen relationships and connections between humans, while also still, you know, hopefully excelling in all those benchmarks. But in the end, that's what it's all about. All right.

Starting point is 00:57:54 I hope this one was helpful, y'all. If it was, please go to our website. Go to your everyday AI.com. Sign up for that free daily newsletter. Also, share this, right? I know a lot of people tell me, oh, Jordan, I'm not going to tell anyone about this, right? Everyone at my company thinks I'm a genius, right? I've gotten so many, so many messages.

Starting point is 00:58:18 I love these. Reach out if, you know, if you have a story like this. I always love hearing it. It makes the long nights and early mornings really worth it. I love hearing from people that are like, hey, I just got a job. My first job in AI, thanks to, thanks to, you know, your podcast, thanks to these guests you bring on, right? And people tell me, like, I'm not telling anyone about this. This is my cheat code. This is my secret. Share it, please. People are always like, how can I help? How are you making all this information free? It's because of those of you that actually do share this. So if you're listening on the podcast, thank you. I appreciate it.

Starting point is 00:58:50 Please subscribe. Please leave us a rating. That would be great on the podcast. And also go to Your EverydayAI.com. Sign up for the free daily newsletter. Read the daily newsletter as well. Each and every day, we break down exclusive insights that you didn't hear from the podcast. We're going to take this a step further as well as keeping you up to date with everything

Starting point is 00:59:08 else you need to know in AI. So thank you for tuning in. Hope to see you back tomorrow and every day for more Everyday AI. Thanks, y'all. Meet Firefly AI Assistant. now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words, and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps,

Starting point is 00:59:36 including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayaI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 474: GPT-4.5 hands on - What it can do and how you can use it

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.