Everyday AI Podcast – An AI and ChatGPT Podcast - EP 318: GPT-4o Mini: What you need to know and what no one’s talking about

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. Chat GPT maker OpenAI just released a new model.

Starting point is 00:00:51 And no, it's not GPT5. It's actually called GPT40 mini. But don't think small because I think this is actually going to make a huge difference. Not just on the surface because it looks like just a light version of Open AI's latest model. It's more than that. There's a lot more beneath the surface from a competitive standpoint. And I think this is going to greatly impact how we work, not just in a many way. So we're going to be going over that, doing some live testing, and a lot more here on everyday AI.

Starting point is 00:01:30 What's going on, y'all? My name's Jordan Wilson. And everyday AI, it's for you. It's a daily live stream podcast and free daily newsletter, helping us all learn and leverage generative AI. And this is whether you know it or not, you are going to be leveraging GPT40 mini a lot in the very near future, even if you don't know it. All right. So if you haven't already, this is one of those. You've got to make sure you go to your everyday AI.com.

Starting point is 00:01:57 Sign up for the free daily newsletter. Check out your show notes. If you're listening on the podcast, we always keep all that information in there to quickly get the newsletter that goes along with today's episode, as well as if you want to reach out, connect with me on LinkedIn. an email, whatever you want to do. For our live dream audience, thank you for tuning in. This is technically kind of pre-recorded just by a couple hours. I'm going to be on an airplane at our normal time.

Starting point is 00:02:20 So I hope you can bear with me. But I'll be in the comments after. Tell me what you think of this new model. And for that reason, make sure if you're looking for your daily dose of AI news, make sure to check out the newsletter. Also, before we get started, yeah, one quick fun announcement. Tuesday, this coming up Tuesday, a couple days, mark your calendars. That's all I'm going to say, a $1,000 live challenge.

Starting point is 00:02:47 If you can get every single answer right in real time, there's going to be a $1,000 prize, whether that's one person, if three people split it, they split that $1,000 prize. If you can get every single question right on chat, GPT and other large language models, yeah, yeah, we're going to do it. We're launching a little campaign called Thanks a Million. So this is just a little fun event to celebrate that launch. So make sure Tuesday, July 23rd, if you're normally a podcast listener, Tuesday, July 23rd, 730 AM Central Standard Time. We'll have the link in the show notes as well.

Starting point is 00:03:22 Enough with that. Let's get into the new model. So Open AI just hours ago released their newest model called GPT40 Mini. So this is technically a quote unquote mini or lightweight, version of their most powerful. And I would say, well, not me. All tests, all benchmarks say the world's most powerful large language model right now is GPT4.

Starting point is 00:03:49 Oh, so this is the light version, right? The feather weight, right? The cheaper and more flexible and faster version of the world's most powerful model. So there are so many business use cases and so many ways that we're going to be using this in the near future. All right. So we're going to be going over all of that. But also, another shout out to the newsletter.

Starting point is 00:04:11 This didn't come as a surprise to me or if you listen to this podcast and read our newsletter every day. I literally told you Monday about this model. GPT Mini, got to be coming soon. So make sure you read the newsletter. A lot of times, y'all, I'm a former journalist. I spend so much time researching each and every episode, even just this episode right now, spent about two and a half, three hours reading everything, testing the model out to bring

Starting point is 00:04:37 you the latest, the greatest, and the truth. The realest thing in artificial intelligence, I'd say. You know, this show's unedited, unscripted. All right. So let's talk about first big picture. Let's zoom out and talk about what this means. Well, there's a lot of business use case, which we're going to get into, but I even want to talk about in the competitive landscape, right? Because things have changed a lot over the last couple of months. I would say notably Anthropic has been making some power moves. So Anthropic just released their Claude 3.5 sonnet. So Claude has three flavors, the smallest and fastest and cheapest haiku, the middle version sonnet and the most powerful version, Opus. Well, that was until they

Starting point is 00:05:18 released 3.5 middle model saw it. So they don't have 3.5 for the small one and they don't have 3.5 for the big one. And we gave you a lot of in-depth insights in our newsletter on why. That's why you got to read it. All right. So anyways, also, we saw Google just also recently come out with their 1.5 flash. So again, a small version. So, and we've been talking about this on the show as well. I've been saying for a long time, the future of large language models is smaller models.

Starting point is 00:05:48 And it seems like everyone else has been playing there, right? Anthropic has been playing there, right? With haiku, specifically, that's their small, fast model. They just announced last week that you are also now able to fine tune that model on AWS, Amazon Web Services, right? So a lot of flexibility being able to fine tune a model like that. We're going to go into more what that means. Don't worry.

Starting point is 00:06:13 But also the same thing with Google 1.5 Flash just announced, I believe, early, I think it was late June, early July. So very recently in the last couple of weeks, everyone else has been making this shift toward small models. Open AI hasn't until today. So people are only talking about the mini part, right? They're only talking about, oh, this is just a smaller version. This is a crappier version of GPT4O, which a lot of people, I don't understand it, right?

Starting point is 00:06:47 If you're a model geek like me, you might appreciate this, everyone else you might not. But everyone's making these complaints. Oh, GPT4O is worse than the previous version, GPT4 Turbo. No, it's not. people, I think people have this unrealistic expectation, right? Like when you get something set up for like GPT4 turbo and then you try to use it exactly as is in GPT40 and it might not work as well. Well, it's for a reason, right?

Starting point is 00:07:11 You have to re-engineer whatever you built, right? You can't argue. I don't understand people that argue with math and science and benchmarks. GPT-40 is the most capable model by far. It's not even close. Look at the benchmarks, right? All right. So anyways, big.

Starting point is 00:07:28 competition here. And there's a lot of kind of upward mobility for these other companies, right, especially Anthropic, when they are going to release a 3-5 haiku and a 3-5 opus. I mean, it's going to go wild. All right. So I'm going to try to keep my dorkiness at bay here, but this is why this is ultimately important. All right. So I have a screenshot for our podcast audience. So shout out Tom Keldinick on the Twitter. machine here. He kind of broke all this down for us to look at the pricing because this is ultimately what it comes down to. This isn't just a small version, a faster version of GPT40. It is a cost efficient and very capable version. So up until recently, it was Claude Haiku that had the

Starting point is 00:08:25 kind of most affordable in terms of power. All right. So there's a quadrant that we will show. I should have thrown it up here for the live stream. I can probably find it, you know, if I can multitask. Okay. But, you know, up until recently, I would say it was haiku and probably Gemini Flash that had the right kind of balance of cost and output quality, right?

Starting point is 00:08:53 So when you were looking at a small model, that's ultimately. All these businesses, thousands, tens of thousands of businesses are using Open AIs API. Should have started by explaining this, right? So many of the systems, of the software, of the tools that you use on a daily basis, you might not even know it because it's happening behind the scenes. The future of just about everything is generative AI, is large language models. And so many of these companies that you use softwares that you subscribe to, to things that social media, everything, right?

Starting point is 00:09:28 They're running behind the scenes on large language models. And what that means in most cases is they're using an API from a company like Anthropic Claude, like Google Gemini, like OpenAI. So that's why this cost and output balance is so important because it impacts our daily lives. And we don't even understand it. So before today, you know, if we look at input. and outputs.

Starting point is 00:09:56 Okay. So I'm not going to go into tokenization, but think of tokens kind of like words. They're parts of words, parts of phrases, right? We've gone through, I've had a whole episode on tokenization, so I'm not going to take up your time. All right. So this is the price per million tokens. All right.

Starting point is 00:10:13 So, uh, 25 cents input for Claude Haiku and a dollar 25 output. Comparatively now, we have GPT4-0 mini. 15 cents on the input versus Claude Haiku's 25 cents cheaper, all right, marginally. And then we have the output, 60 cents for a million tokens versus a $1.25 haiku. And for comparison, GPT4O, the big boy model, right? Let's do that. $5 per million input. So again, $5 versus 15 cents.

Starting point is 00:10:55 And then on the output, $15 for GPT40 versus 60 cents for GPT40 mini. All right. So now you can probably, hopefully understand why that balance is so important, right? The balance of power and affordability. And, you know, it's, we're going to show you some, some examples here of why I think this is striking the right balance, right? All right. So let's keep this thing going.

Starting point is 00:11:31 And let's go over some of the basics here. All right. So some of the basics of the model, it's more cost effective. We're going to talk about benchmarks, but it got a very impressive 82% MMLU score. So again, this classification of models, right? We're not comparing it to Opus. We're not comparing it to, you know, the most powerful models out there. These are the small, large models, right?

Starting point is 00:12:02 I don't know if these are technically considered small language models. I think for that we'd be looking at, you know, Google Gemini Nano and some of the mistral models, right? So these are technically, think of them as small, large models. I know that's a little confusing. We also don't know how many parameters it is. However, it scored an 82% on the MMLU score. So we're going to talk about a little bit more on those benchmarks here a little bit.

Starting point is 00:12:25 But that is an extremely impressive score for a smaller model. Also, on some instances on the new leaderboards, right? So we talk about the chatbot arena leaderboard all the time. It is already out benchmarking GPT4 Turbo, the previous version before GPT4. That is extremely impressive, especially when you look at the cost differentiation. That's what I'm saying. It's the cost. It's the cost.

Starting point is 00:12:57 And what that means, well, you know, the output too, right? You have to have that cost output sweet spot. But it is the cost in the output. That is going to really change things. All right. A couple other things we talked about 15 cents per million input, 60 cents per million output. And it is more than 60% cheaper than GPT 3.5. turbo. Also, this is replacing 3.5 turbo. All right. That's also important to keep in mind because so many

Starting point is 00:13:28 applications, developers, companies started to build a backbone for their future business off of GPT35 turbo, right? So this is another important thing to keep in mind here. All right. So now I'm going to go ahead. I kind of talked about this sweet spot, right? So I'm going to go ahead now for our live stream audience. I'm going to go ahead and share my screen. I'll try for our podcast audience to describe this the best I can. Okay. So this is essentially, you know, you have your X access, your Y access, a little quadrant, right? So price, the more expensive is on the right.

Starting point is 00:14:08 And then quality is down. So you don't want to be right and down. That means you're expensive and not good quality. So right now you kind of have GPT 35 turbo there. You have Command R, some other models, okay? But you want to be in the upper left-hand quadrants. Okay, that means you have the highest quality at the cheapest price. So is GPT40 Mini the absolute cheapest when it comes to, you know, price per million input and output?

Starting point is 00:14:41 No, Mistral and Lama are a little bit cheaper. But the quality of those is my. much, much lower. I mean, an MMLU score, I don't think people understand. An MMLU of 82 is wild for this price, wild, right? And I'm getting excited about it because I read these, like, scientific papers, like, on the weekend. I'm a dork.

Starting point is 00:15:05 And I've been reading them for years, right? So even getting something in the 80s, like a year or two ago was mind-blowing, but expensive. The fact that it is this cheap, if you are a business leader, if you are looking to figure out what is your future in a generative AI world, this changes it. This changes it. It's not even close. All right. So that's kind of an overview of the model.

Starting point is 00:15:31 And again, we talk about this significantly decreasing costs. All right. So this is a little graph here tweeted out by the Open AI developer account. And it just shows a downward step in the price over time, right? The fact that in March of 2023, so about what, 15 months ago, it was $2 for a million tokens. Now it's 24 cents. That's a blend. That's a blend right there.

Starting point is 00:16:10 Just FYI. That's a blend of input output. going from $2 per million tokens to 24 cents, right? It's a 90% reduction in the quality is astronomically better. Things are getting cheaper and they are getting better, but faster than anyone ever could have predicted, right? The very famous interview with Bill Gates and Sam Altman on Bill Gates' podcast, that was one of the biggest takeaways was they were both flabbergasted by how fast the development has come and how much cheaper essentially all these models are because it is cheaper to train them as well, right? That's another piece of the puzzle. When we talk about chip makers, GPU makers like Nvidia, they're becoming more and more efficient.

Starting point is 00:17:08 in turn more and more affordable, which drives everyone's costs down. All right. Let's keep this thing going. Let's talk about some capabilities and features of the new GPT40 model. All right. So it does support text and vision in the API. It's also important to note. This is accessible on the front end of chat GPT.

Starting point is 00:17:30 I'm going to show you all of that here in a minute. So this is accessible via the front end of chat GPT. If you log into your chat GPT account, you can use it. I don't think there's a lot of use cases if you're using it on the front end. There's a couple that we'll talk about. But for the most part, if you're just logging into chat, GPT and using this as a front end user, you're probably not going to be using this new model a lot. However, if you are a developer, if you're working on the back end, GPT4-0 Mini is your new best friend.

Starting point is 00:18:01 It is amazing. It is going to change the way that you can do business, right? because essentially, and this is just an explanation, if you are a beginner here like myself, I'm kind of a beginner, but I'm kind of not. But for the everyday person, like, why is Jordan talking so much about the cost and the training and all of this? Well, again, like I said, this is going to change how all, almost all companies online, you know, everyone's AI enabled, right?

Starting point is 00:18:31 Go on anything, everything, you know, on the table. TV, everything's AI, AI, AI. Guess what? They're all going to be paying significantly less now, which ups the competition in the marketplace, but also in theory for consumers, we can start demanding lower prices or there will be competitors that come up that can provide better quality at lower prices, right? If the quote unquote power or if the cost of electricity goes down by 90% and you're in the business of using electricity, guess what?

Starting point is 00:19:02 That changes your business. All right. So it does support text and vision in the API and in the chat interface. So it features support for text images and coming soon, video, audio inputs and outputs. More on that in a bit. Context window in the API, not in the chat interface is 128,000 tokens. All right. It supports up to 16K output tokens per request.

Starting point is 00:19:29 And same knowledge base, same knowledge cutoff is October 2023. as well as an improved tokenizer for non-English text. In other words, it converts the inputs for non-English better to tokens. That's the thing a lot of people don't understand when we talk about tokens. Go listen to that episode. But essentially, chat GPT doesn't know words, both when you give it words and it gives it back. It converts everything to tokens, to actually understand what you're talking about, right? So the tokenization in other languages is very important.

Starting point is 00:20:04 All right, last but not least here in our bullet points, benchmarks and performance. So let's just go straight into the chart for this one. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want and shape the outcome as it takes form with the assistant.

Starting point is 00:20:43 The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta.

Starting point is 00:21:22 See it today at firefly.adobie.com. If you're on the podcast, the benchmarks are silly. They're silly. Like never, even if you follow large language models, did someone a year ago, two years ago could look at this chart and think that this would ever be possible at this price. Okay. GPT4-0 Mini is outperforming all other small, quote-unquote, small models, except for the Math Vista. All right.

Starting point is 00:21:59 So essentially we have, what, eight different, yeah, we have eight different benchmarks. So from the MMLU to the drop to the MGSM, math, human events. MMMU, right? So all these different benchmarks that all researchers put models through, you get a score. That tells you, is a model good or is it not? Is it a bunch of marketing or does it actually have the chops? This is the thing I don't understand y'all when people are like, oh, GPT4O is so much worse compared to GPT4 Turbo.

Starting point is 00:22:29 No, it's not. You just have to relook at what you're doing, right? Large language models are generative. That's the thing people don't understand. Just because you had something working in a previous model, doesn't. mean it's going to work in the next model. Get to get to know it. You might have to re-engineer some things. Are some things worse? Absolutely. Across the board, no, you can't argue with more than a million votes in the chatbot arena leaderboard. All right. So some of these worth noting,

Starting point is 00:22:58 some of these metrics, I just want to point out a couple. So MMLU, that is the massive multi-task language understanding test. So that is, I would say, still the gold standard. I think we're probably going to be moving to the MMMU as the gold standard because that's multimodal. But the thing of this is text-based, right? MMLU is essentially the ACTs for large language models. It is the gold standard. What is your score, right? And it measures the model's knowledge across 57 different subjects, including STEM,

Starting point is 00:23:29 humanity, social sciences, everything, right? Y'all, this blew everything out of the water. It blew out Gemini Flash, Claude Haiku, Claude Haiku, GPD35 Turbo. It is not close, right? You might look at it on this chart and be like, oh, it's only a three point difference behind Gemini Flash and about a, what, a seven point difference behind Claude Haiku. That in MMLU, that is a lifetime away. When we look at the biggest models, you know, they're usually within like a half point, half point, maybe even a point one. I believe Opus, you know, Cloud 3 Opus when it first came out was like 0.1 higher than

Starting point is 00:24:14 GPD 4 turbo, right? Usually it's like a half point. To be three to seven points ahead of your nearest small model competitors means that this is freakishly good. I cannot under understate that how much that little seemingly small margin means in the MMLU. Same thing. in math purpose blows everyone out of the water. We're talking like 30, almost 30 points ahead of the others.

Starting point is 00:24:45 Wow. So math purpose is essentially complex math topics such as calculus, linear algebra, higher level mathematics. So it is freaking really good at math, all right? And that's important for data analysis. Again, we have to think, not just what you are going in to do inside of chat, GBT. That's not what this is about.

Starting point is 00:25:06 where this is going to pay dividends and change the way we all work is through the API, is through those hundreds or hundreds of thousands, right, of different companies, startups, products that you use. You know, you're probably playing with a bunch of AI tools all the time. I have dozens of AI tools that I use on a week-to-week basis. Mostly all of them use GPT. They use the GPT for from OpenAI, right? So this is going to change and improve the quality.

Starting point is 00:25:36 of all these other tools that you use, especially if you're someone in marketing, advertising, communications, or if you're just someone who's constantly playing with AI tools in large language models. This changes everything. All right. All right.

Starting point is 00:25:50 Now we're going to have some fun, y'all. Let's go ahead and do a couple of things live. All right. So a while back, let's see if we can get a good view here for our live audience. and hey, podcast crew, I'm going to do my best to describe what we have going on here. Pretty simple stuff.

Starting point is 00:26:15 All right. So about a month or so ago when Claude 3 Opus or sorry, Claude 3-5 Sonnet came out, we did some comparisons. We just did some live prompts, and we compared the output. Some of them were deterministic, right? There's a right answer. There is a wrong answer. Some of them required us to make a call. All right.

Starting point is 00:26:40 So now what I'm doing, I am in the Open AI playground. Okay. So what that means is kind of the back end. If you build something, you know, all developers who are using the API, this is where they go. But I'll tell you this, you don't have to be a super technical person. You can get a free account. You can go in there and just play in the playground.

Starting point is 00:27:00 It's a great way to learn. If you want to become better at prompt engineering, if you want to become better at understanding models, do this inside of the, playground. Anyways, so I'm comparing two different models side by side at the same time. So we're going to be looking at, we're going to be asking some questions that have a simple yes and no answer.

Starting point is 00:27:18 Is it right or is it wrong? And then we're probably going to be doing some that require a little bit of, let's use our brains. Okay. So again, I did this previously between Claude 3-5 Opus and GPT-40. So now we're going GPT-40 on the left, GPT-40 mini on the right, doing the same prompts, we're going to be looking at speed. We're going to be looking at quality, and we're going to be looking at, in some cases, a little bit of reasoning and logic. All right. So the first one, again, if you've tuned into the show, we've done these a couple of times. We're not changing the system

Starting point is 00:27:51 prompts. We're leaving everything as is. So out of the box, here we go. The first one, I just woke up today with six apples and three bananas. Yesterday, I ate an apple and, or sorry, Yesterday I ate a banana and two apples. This morning, I will eat one apple and no bananas. However, I don't really like apples and one banana may turn brown tomorrow. Assuming nothing else changes, how many apples and bananas will I have tonight? All right, we're going to hit Ron, and we're going to see who finishes first, and do they both get it right? Wow.

Starting point is 00:28:28 Okay. So, GPT40 Mini was faster there. Pretty impressive. All right, and let's see if they got it. who got it right and who got it wrong. Interestingly enough, GPT4O got it right. Oh, mini, got it wrong. There we go.

Starting point is 00:28:47 We already have one use case. This is one. I've seen similar ones like this. I kind of made it up. You know, or modified something similar that I had already seen. So it looks like, let's see where GPT4O mini kind of got tripped up.

Starting point is 00:29:02 So it got tripped up on something that I put in there to trip models up. I test models all the time. Yeah, I told you guys, I'm a geek. All right, so I put something in there about eating, eating the fruit yesterday. But guess what? I started it off by saying, I just woke up today with six apples and three bananas. So whatever I talk about yesterday is irrelevant. So GPT40, many did not understand that.

Starting point is 00:29:27 GPT40 got it right. The correct answer is five apples and three bananas. And GPT40, Minnie said three apples and two bananas. So it got tripped up by yesterday. Hey, if I give you today, yesterday doesn't matter. All right. Let's do another one. Kind of, kind of a tricky one.

Starting point is 00:29:47 We're going to, I think for the most part, I'm always going to refresh. It shouldn't, it should. Well, actually, I'm not. It's going to take too long because we're not going to be worried about the context here. All right. The next one here, I'm saying if it takes three hours to dry 10 t-shirts in the sun, how long will it take? to dry 30 t-shirts in the sun.

Starting point is 00:30:09 All right, so let's go ahead, click run. This should be a quick one. All right, so this one, GPT-40 was faster. Oh, mini was slower. So the opposite, last time, O mini was faster. All right, the correct answer should be three hours. So let's see, GPT-40, got it right. It said it will also take three hours.

Starting point is 00:30:35 Let's look at a 4-0 mini. Let's see. and it says it will still take three hours. Okay. So our score, our score so far for those keeping track. So we're going to do fast and we're going to do right and wrong. All right. So so far, we have, it's tied and fast, but in the end, fast doesn't matter if you're wrong, right?

Starting point is 00:31:00 So one we have many and one we have O and then right. So right now we have, we're going to do this. We're going to do right. Sorry, y'all. I want to make sure for our podcast audience, we can recap this at the end. All right. So, so far we have, we have, oh, right.

Starting point is 00:31:20 Sorry, I'm making a little chart here. Oh, right, got two. And then we have many, right. So far, just one. All right. Next prompt here. This is pretty, well, this one is, it takes a little logic here.

Starting point is 00:31:32 All right. So, here we go. And what I do love is, at the bottom, you can see the latency or how long it took and also the tokenization. All right. So the next prompt here. A box is locked with a three-digit numerical code. All we know is that all digits are different.

Starting point is 00:31:52 The sum of all digits is nine and the digit in the middle is the highest. What is the code? So I've even gotten tripped up on this and models have gotten tripped up. But before I hit enter, there's actually multiple answers. One time I got confused and a model said this is the, only correct answer. And I'm like, huh, yeah, that's right. And no, there's actually multiple answers.

Starting point is 00:32:12 So let's go ahead and run it and see how it goes. All right. So both are going pretty fast, still computing. All right. So they finished just about at the same time. It looks like Mini was just a little bit faster, but let's see who got it right. So I'm looking here at GPT40. It's walking me through things.

Starting point is 00:32:40 It says the digits are different. The sum of the digits is nine. The middle digit is the highest. So it's doing a little bit of a formula. And let's see. So it's going through different cases, case one, case two. So it says, I don't even know if either of these finished generating, they may have stopped. It looks like they both stopped halfway and didn't complete.

Starting point is 00:33:07 So I'm going to try this one more time and see if they finish this time. Okay, so it doesn't look like GPT, I'm double checking here. It doesn't look like GPT40 finish this, which generally when I do this, it finishes. I don't know if it's because I'm in the compare mode in doing head to head. It really shouldn't make a difference. So it's going through, yeah, I'd have to read this. I'd have to read this. I'd have to take a little time.

Starting point is 00:33:38 There's a lot of math on the screen. It didn't give me a clear answer. I'm going to try the prompt. And I'm going to say at the end, I'm going to say, please respond. only with the correct answer or answers. I'm going to say you do not have to show your work. Sorry, I'm getting confused. Too much math on the screen.

Starting point is 00:33:59 It's late. All right, here we go. That worked a little better. So neither of them got it right. We'll just say both. Okay, so neither got it right. So GPT40 said the code is 273. Middle digit is the highest.

Starting point is 00:34:13 However, those do not add up to nine when combined. and GPT40 Mini said the code is 147 also do not add up. So they both got that one wrong. All right. We're going to do just one more test. Maybe we'll do two more. So I'm going to do one. I'm going to say, please let me get this other prompt that I had ready to go here.

Starting point is 00:34:40 So this one is about kind of starting a company. All right. So let's go ahead and here it is. I've did this before. This one's a little longer. So I'm going to click run while they go and I'll look at the latency because this one's a little longer. Okay. So what I'm saying here is create a new company and brand for a future smart home device.

Starting point is 00:35:02 This will solve a problem that does not currently exist. To start, come up with the company's name and its first flagship product. Give the product a new branding campaign, go to market strategy, tagline, and ratchew and rationale for why it'll work, responding to a succinct way, keeping responses to short bullet points, but with ultra-specific facts. All right, so from a speed standpoint,

Starting point is 00:35:23 it looks like GPT-40 was slightly faster, but not by a lot. All right, and there is no right or wrong here. This is subjective. So I'm not going to read the whole thing. So let's just see the takeaway here with what GPT-40 came with. It came up with Verity Solutions.

Starting point is 00:35:42 The flagship product was no, Meesey. Not sure what that is. It's eliminating the complexity and stress associated with managing and memorizing the various names and access code for different smart devices in a home. All right. Okay, I see that. Let's see. The branding campaign is smooth management starts here. I got some social media campaigns. Simple go-to-market strategy in multiple phases. Good. The tagline is no mezy simple. Okay. Not the best, but not terrible, right? Again, how I wanted the output, right? I said, keep it succinct, give it to me bullet points. So I think this is where in theory, GPT40 may have shined over GPD40 mini. Anyways, let's look at GPT40 mini, what it came up with. So it is home vigil and the flagship

Starting point is 00:36:37 product is lifeguard. So again, I'm asking it to solve a problem that does. currently not exist in a future smart home device. All right. So GPT40 Mini came up with home vigil company name. The flagship product is Lifeguard. It is a smart home device equipped with AI-driven sensors to detect subtle changes in a household's atmosphere, such as smells, noises, movements, etc. To predict potential safety hazards.

Starting point is 00:37:05 All right. That's pretty good. All right. The branding campaign is safety in the silence. Ooh, okay. I like that from Lifeguard, Safety and the Silence. Same thing, go-to-market strategy. It even gave us a target audience, which I don't think that GPT-40 did.

Starting point is 00:37:24 Gave us different channels. All right, I like that. It gave us even a launch event, whereas for the most part on GPT-40, it just went through phases. It was awareness, engagement, expansion, where on the other hand, GPT-40 Mini really took it a little more granular. It gave us a target audience, different channels, you know, different strategies for those different channels, a launch event. So pretty good. And the, oh, I just realized, y'all, why the, why we didn't get the, the full from the code one. It's because I think I have a cutoff here.

Starting point is 00:38:02 That's probably, that's probably what it was. That's probably what it was. All right. So anyways, I think for G, for this test. I would say GPT40, Mini won this one. All right. So I'm actually, I'm cranking back up the maximum tokens. I feel I didn't give a fair shake.

Starting point is 00:38:23 I kept wondering, I'm like, why is this cutting off when we are asking about the locks? So we're going to do that one one more time. But however, I'm giving that one, if I'm being honest, I'm giving that one to Minnie. I didn't expect Minnie to win in something that, you know, I'm not saying this involved logic and reasoning, but it was much better, right? GPT4-0 gave me something kind of complicated. Again, large language models are generative.

Starting point is 00:38:49 I'm pretty confident if I did that a hundred times. GPT-4-0 would win out against many, but that just shows many is capable. If you don't think it is, that just shows to me, that shows me it's capable. All right, I'm going to run this code one one more time. That's why it was cutting off. I forgot.

Starting point is 00:39:06 I had that setting on there. you can restrict how long the output is so you don't accidentally spend way too much money, right? Because you're paying for the tokens, but they're pretty cheap. All right. So, wow. Okay. A lot here from GPT40 mini, and we still just have more math code problems.

Starting point is 00:39:27 So here we go from, okay, so we got a, we got multiple correct answers from GPT40 when we re-ran this. So essentially, it needed more memory. That's all it is. It couldn't do it with the limited tokens. It actually wasn't done. I cut it off early. So GPT-40 got multiple correct answers, 153. That's correct. Middle digits highest adds up to nine. 163 is not correct. That adds up to that adds up to 10. So it did get one correct variation, 153. All right. So let's go ahead and look at GPT40 Mini. So it says one valid code is 243. That's correct.

Starting point is 00:40:17 So that adds up to nine. So it didn't give me any wrong answers. GPT40 mini. GPT4O gave me one wrong. There's actually a lot more correct answers. And GPT40 one time gave me a list of like, I don't know, like 15 correct answers. So in this case, actually, GPT40, many got it a little more right because it didn't give me a wrong answer.

Starting point is 00:40:42 I'll give that one like a half. It didn't give me any wrong answers. It only gave me one correct answer, whereas GPT40 gave me a correct answer and an incorrect answer. So, all right, let's do one more. And this is our last one. This isn't supposed to just be a live head-to-head comparison, but I wanted everyone to be able to get at least a good understanding of this.

Starting point is 00:41:06 So for this one, here's what we're going to do. We're going to test the vision capabilities. So I just dropped a photo. I took this on the road in Chicago. All right. And I'm putting in a prompt. So this one I am pretty curious, which one's going to finish first? So I'm saying, please identify where this picture is located, what direction the photo is

Starting point is 00:41:32 facing, and every other detail that you can make out. All right. So I'm going to go ahead for our live stream audience. I'm going to throw this up on the screen big so you can look at it. I got, I'm telling you, I got the perfect photo to use for this because it shows like, I don't know, 20 cars, but you can't even see any license plates. So it's not like I'm even putting anyone's personal bat out there. I was kind of happy about that one.

Starting point is 00:41:56 However, the correct answer is we're on 9094 heading southeast. And you can tell that by. So this requires actually a lot of knowledge, which is why I use. use this because the way that you can tell this, aside from the, uh, the order of the buildings, right, that's one way because you have the Chicago skyline. And depending on which way you're, you're looking at it, right? If you're looking at it like north straight down, the buildings, the skyline are going to be in a different order. If you're coming up from the south, they're going to be in a different order, right? So that's one way. But also you can see two different

Starting point is 00:42:27 exit signs. So as an example, we have 46A California, 47A Fullerton. So in theory, it should know, right? It's heading in this direction. The numbers are going up, right? So it has plenty of clues. All right. So let's jump in. Let's run this live and see how each model does. Let's go. All right. So put the prompt in. GPT40, much faster. Many took a while to get start. Okay. Interesting there. So mini took longer to get started, but was actually done much faster. GPT40. got started way quicker, but took a little bit longer. But ultimately, what I care about is who got it correct? All right.

Starting point is 00:43:18 So interesting. So the GPT4, actually, let's keep going left to right here. Great job. GPT4. Oh, this is very impressive. All right. So it says the photograph has taken in Chicago, Illinois, landmarks in the background, Willis Tower, John Hancock Center.

Starting point is 00:43:38 Cool. It says direction. The photo is facing southeast towards downtown Chicago. Correct. Specific highway details. It's breaking them all down, right? 47A for Fullerton, 47B for California. Yield sign is talking about traffic.

Starting point is 00:43:55 It's even telling me it's probably during rush hour. It wasn't. It was on a Sunday. I think there's a Cubs game. Anyone else a Cubs fan or we socks fans? Who's your favorite baseball team? Let me know. Infrastructure.

Starting point is 00:44:06 This is great. It's giving me a huge break. it's even going into vegetation. Y'all, the vision models are nutty like a squirrel on keto. All right, weather and time. My gosh, this is good. It's giving me guesses. It says general vibe.

Starting point is 00:44:23 It says typical urban highway. This is good. So it answered everything correctly. Not only did it answer the simple question about where I was and what direction the photo was taking, but I also asked it, give me every other detail that you can make out. So interestingly enough, GPT40 Mini's vision doesn't give me a lot of, doesn't give me a lot of the same. It essentially says, I can't identify specific locations or details about the image you provided. It does say there's traffic, there's cars, there's tall buildings.

Starting point is 00:44:56 You know, it shows that there's Fullerton Avenue and California Avenue. There's photos in the sky. So that's interesting, right? So actually what I'm going to do. So hands down, I'm giving. I'm giving O the win on this, all right? Many did not win. GPT4 won this.

Starting point is 00:45:14 So I'm going to say, I'm going to add some more things. I'm going to put, please take your time, go step by step, and understand the context clues in order to provide the answers. All right. So I'm wondering with a little more, I mean, quote unquote prompt engineering, right? Yeah. With a little more prompt engineering, I'm wondering if GPT4O Mini can do this because I was assuming it could have.

Starting point is 00:45:48 Again, this is available in the API. So if you're thinking, oh, should I be tapping into the API? I mean, it didn't get anything wrong, but it also didn't get it right. Or am I just really underestimating or not giving enough props to the level of detail that GPT40 gave. All right, so I'm doing it one more time. I'm just doing a rerun. I'm curious if a little bit of extra prompting

Starting point is 00:46:11 will help GPT40 Mini. So same thing. GPT40 got started right away. Looks like we're getting installed out. Looks like we're getting installed out here on GPT40 Mini. Yeah, server airs. I'm not surprised, y'all. This just came out.

Starting point is 00:46:30 People are building on it. They're breaking it. I'm actually surprised how fast it's been, if I'm being honest, GPT40 Mini, considering probably all dorks in the world are trying to break this thing right now, including myself. We're going to try it one more time. See if we're going to get another error message. But, hey, I'm curious, podcast audience, drop me a line.

Starting point is 00:46:51 Are you going to be using this? Are you going to be moving everything over to GPT40 Mini? What tests are you running? All right. So here we go. Same thing. GPT4O did a great job. job. Let's see if GPT40 Mini did a little bit better. So it did with a little bit more prompting and a little

Starting point is 00:47:13 bit more help on my end. It did say Chicago. It noticed the Willis Tower. Although, what's the Willis Tower? Y'all? That's Sears. Don't mess. Now it's now it's giving me more information. Traffic situation, road signs, direction. So let's see if it got the direction. The direction, the photo is facing appears to be toward the city skyline suggesting the photographer is moving toward the downtown Chicago. Okay, correct. It didn't get it. So I'm going to try one more time. Again, we're giving the win on this one to GPT40. I'm going to say given all of those context clues, what direction is the photographer facing? All right. GPT4 already got this right. I'm just wondering if we can squeeze the right answer out of GPT40 Mini.

Starting point is 00:48:09 Gosh, I mean, GPT40, super impressive. All right, let's see. So it said, given the context clues, particularly the presence of Chicago Conline in the background, the photographer is facing east toward downtown. Yeah, it's east. It's southeast, but got to right. So with better, that's, y'all, that also goes to show, number one, understanding generative AI.

Starting point is 00:48:30 It's a roll of the dice. You're going to get something different if you go in there with a weak prompt. I went in there with a weak prompt. The point of this wasn't prompt engineering. But when I followed it up with a little bit better prompt, and I followed it up with a secondary, you know, when I prompted iteratively or just followed it up with, hey, now let's try it again in this way.

Starting point is 00:48:47 It eventually got it right. All right. So that wraps up our live look. Now I'm going to quickly go over what no one is talking about. I know this is a longer one, but I cannot underestimate the importance. So a couple things. If you are on a free account, it's actually hard to access. this.

Starting point is 00:49:06 Right. On the front end of chat GPT, they actually changed it. So previously you had a traditional model switcher. Now all you have is, and this is brand new as of today, I believe. Now you just have chat GPT, which is if you're on the free plan, that's what you're on. And then it says chat GPT plus. So, oh, I forgot my other slide here. But essentially, you can still access.

Starting point is 00:49:35 Let me go ahead and share my screen here and show you. So you can still access the updated version here. So what you have to do, first you have to prompt, right? And then after you prompt, then you get the new model selector. So if you actually want to try out GPT4 Mini and you are on a free plan, you can't select it on the normal drop down. You have to first do a prompt, and then there is a model switcher right underneath chat GPT's response,

Starting point is 00:50:12 and then you can do chat GPT mini or sorry, GPT40 Mini. So now I'm on Mini, and it's obviously asking me, is this better, the same or worse? All right. I'm not here to train you right now, model. I'm in the middle of a podcast. All right. So number one thing you need to know,

Starting point is 00:50:28 it's a little hard to access for free users on the front end of chat GPT, which I think a lot of people are. So that's why I wanted to throw this out there. This is everyday AI for everyday people, right? Number two, something people aren't talking about. What does this mean for the future of Apple's plans, right? Because number one, people aren't talking about this fact either. Apple is actually, they built their own large sandwich model.

Starting point is 00:50:53 They didn't even mention it by name. Man, I feel bad for all those software developers at Apple during Apple's WWDC in June. They didn't even mention their own large language model by name. They didn't even really explicitly in the keynote, at least, I'm talking about in the keynote. They didn't even explicitly decipher that, yes, when we're doing all of this Apple intelligence, quote unquote, that there's actually two different models. We have our own Apple on-device edge AI model running essentially all the queries that just have to do with the information on your device.

Starting point is 00:51:25 And for most everything else, you're going to be prompted to use GPT4. I believe GPT40, right? But what does this mean now? A much smaller model. Again, we don't know how much smaller it is. I'm sure we'll find this out soon. I don't think that this model will be small enough to fit on a future device, right? At least a phone.

Starting point is 00:51:49 I think right now, you know, the Google smartphones, the S-24s that actually have a large language model living locally on the phone. That's why it's important, right? There's more privacy. It's faster. It's more energy efficient. Right. So I think the future is working edge AI, right? On device, smaller models.

Starting point is 00:52:09 But I still think that this new model from OpenAI GPt40 Mini is probably too small to run locally. But I would think the next version of it, probably whether that's in six months or a year, is probably going to be able to fit, which is wild to think about, right? which number one, the impact on the environment, right? The more edge AI that you can run, the better the environment is, right? One of the biggest problems right now is this uses a lot of power, right? A single prompt to chat GPT takes up 10 times more power than a Google search, right? And as we're using large language models more and more, you have to think of the environmental factor.

Starting point is 00:52:51 So moving to edge is better for everyone. Moving to on device is better for everyone. It's faster, it's more secure, it's better for the environment. But right now, it's not really possible for a lot of use cases, right? To get the power that you want, you know, squeezed on a little device, not possible. But I think probably in the future, it's going to be maybe the next version of GPT40 Mini might be able to run locally on a smartphone, which would be amazing. So you have to look at the future of this Apple and Open AI partnership.

Starting point is 00:53:24 Here's another, aside from cost, there's actually a lot of, a reason you might want to use a GPT 4-0 Mini. I talked about, hey, if you're on chat GPT, you probably aren't going to want to use it because there's real no advantage. You're not paying for it anything differently in chat GPT, right? There, if you're on the chat GPT Plus plan, you're paying $20 a month. And so you don't have to worry about cost. The main advantage here is cost in the API.

Starting point is 00:53:47 If you're building a third-party application piece of software, whatever, you know, building, fine-tuning your own model for a company with retrieval augmented generation, right? But here's the other key benefit that no one's really talking about. It is the first model to apply open AIs brand new, which we just talked about on the show this week, instruction hierarchy method, which helps to improve the model's ability to resist jail breaks, prompt injections, and system prompt extractions. So, I mean, that right there is a great reason in why I think so many companies are going to be flocking to GPT40 mini, right? You want your business to be safe, right?

Starting point is 00:54:28 The last thing you want is prompt injections for someone to jailbreak, right? A lot of companies don't take proper safety precautions, right? And part of it is because it's so easy. It's so easy to essentially create a wrapper, right? Put in a little bit of your company's data, do a little fine-tuning. You know, it's not super expensive like it used to be. It's almost so easy that too many companies are doing it and they don't take proper safety measures. So this new version of GPT40 Mini in their API will be a godsend for those companies that are maybe facing some of those issues.

Starting point is 00:55:05 All right. Another advantage or sorry, another thing no one's talking about. And why is no one talking about this? I should have started the show out with this. Ready? Do people not like reading? I still like reading. This was toward the bottom of OpenAIs blog post announcement.

Starting point is 00:55:22 So they said, today GPT40 Mini supports text and vision in the API with support for text image, video, and audio inputs and outputs coming in the future. Oh, look at that. I believe this is the first time in writing that we've seen Open AI said in the future, there will be audio and video input output. SORA, right? Yeah, SORA accessible via an API, Sora accessible via chat GPT, presumably. That's huge.

Starting point is 00:56:04 And also video input. Did you guys know that you can actually input a video into chat GPT right now? Go try it. See, tell me what happens. So this is huge, though. again, this is all reasonable assumptions pointed to this, right? Yeah, you'll be able to, it's multimodal, right? The future is multimodal.

Starting point is 00:56:23 That's the whole point of the GPT40, which is Omni, instead of using three, technically three separate models on the hood, it's using one. So this was always the assumption, but I believe this is the first time in writing that Open AI said, hey, input, output, you're going to have, you're going to have video, input and output. So pretty big, right? Right now, that's actually one area, Gemini's been crushing it in with their super long context window being able to upload videos and it can scarily know everything.

Starting point is 00:56:52 Yeah, talk about wild use cases of generative AI. All right, number five, more and better AI is coming everywhere. So like I said, you probably don't even know this, right? As an example, maybe the wealth management company you use has a little AI advisor that you talk to or maybe a software that you use for work, has a customer support bot, by GBT, right? You probably don't realize it. Apps on your smartphone.

Starting point is 00:57:19 So many things use GPT behind the scenes. All right. And so many of them are about to get so much better, right? Which is related to number six. The prices are going to be, the prices are right now, but they're going to be getting so affordable. Companies can't afford not to build on generative AI.

Starting point is 00:57:44 Let me say that again. prices for building on top of a large language model are getting so affordable, companies can't afford not to build on it. If you would have asked me a year and a half ago, hey, should all companies be building on top of large language models? I said it very early on in the podcast, probably like 13 months ago.

Starting point is 00:58:08 I said, no, companies shouldn't. Because look at the prices back then. And the quality wasn't good. the quality wasn't good and it was too expensive. Now, my gosh, quality is great and it is cheap and think now, right? If your company is on the fence today, well, first of all, reach out to us, right? We offer consulting services. We have other partners we work with. We can walk you through that if you're not sure, right? I'm lucky enough I get to talk to super smart people all the time. I just had a conversation today with, by the time you guys hear this yesterday with, you know,

Starting point is 00:58:41 a senior director at Microsoft AI. I'm lucky enough to get to talk to. to literally the smartest people in the world building things. So we can help walk you through this to reach out to us. You know, I have my email and my LinkedIn in the podcast. And you know, you guys know how to get me here on the live stream. But companies need to be building. If you haven't made that decision now, look at this visual on the screen. This step down in price, right?

Starting point is 00:59:06 And then think of the increase in quality. You need to be thinking now. Okay? you have to be thinking. So it's getting it right now where you need, and this is going to sound bad, but you need to start comparing the costs to train, deploy, and upkeep a model versus the cost of humans because it's going to be lopsided soon. It's going to be lopsided, right?

Starting point is 00:59:34 That's why I think everyone is going to be working, not everyone, but so many people are going to be, quote, unquote, working in AI, right? taking, you know, this first, I call it first party company data in turning that into knowledge for your model. All right. Let's see if we can beat the hour mark here. Number seven, energy demands are going to be bonkers. That's my last thing you need to know.

Starting point is 00:59:59 I kind of already alluded to that. But all of these things adding up, these models getting faster, cheaper, better, easier to work with. demand is going to be, I'm not just saying open AI GPT40 Mini because guess what comes next. Anthropics going to strike back. Google's going to strike back, which is going to make open AI strike back. Energy demands are going to be bonkers until we can get all of these models or many more of these models running locally. Energy demands are going to be out of the roof. And so at the same time, watch Nvidia's stock, watch AMD stock, watch Qualcomm stock,

Starting point is 01:00:43 all the companies that are making chips. Obviously, Nvidia has an unfair head start. Their stocks are just going to continue to rise in the long run because we all need the GPUs. All right. So that's it. An in-depth look. I hope, I hope if you stuck with me to the end, let me know. I'm always curious, did you make it to the end?

Starting point is 01:01:06 I don't know. Send me an email or DM on LinkedIn with the word pancakes. I'm just curious. Did you make it at the end? Let me know. And I'm hungry for some pancakes. So I hope this delivered and I hope now you better understand GPT40 mini. Because like what I talked about in the beginning, it's actually not many.

Starting point is 01:01:28 It's actually, I think, a pretty big deal. This is a pretty big deal. I think this is going to greatly change the way that we all work in really the future of generative AI. All right. Thanks for tuning in, y'all. If this was helpful, please consider reposting this. Leave us a rating if you're listening on Spotify or Apple. Appreciate the support.

Starting point is 01:01:52 And don't forget Tuesday, July 23rd. Hopefully you're listening or watching this before that time. Mark your freaking calendars, y'all. 7.30 a.m. Central Standard time. July 23rd, the link for the LinkedIn live stream. Hopefully LinkedIn doesn't go down. If so, you can join on YouTube. You got to get your questions.

Starting point is 01:02:13 You got to get your answers in live. All right. $1,000 to anyone who gets a perfect score. If no one gets a perfect score, no one gets the money. So there's going to be some easy questions. There's going to be some trick questions. But if you are an avid listener of the everyday AI show, you're probably have a decent chance that you're going to get most of these right.

Starting point is 01:02:33 You got to be quick, but you're going to be quick. got to be there. So please join us and make sure to go to your everyday AI.com. Sign up for the free daily newsletter. Thanks for tuning in. Can't wait to see you next week and every day for more. Everyday AI. Thanks y'all.

Starting point is 01:02:51 Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational. interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us.

Starting point is 01:03:30 If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 318: GPT-4o Mini: What you need to know and what no one’s talking about

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.