Everyday AI Podcast – An AI and ChatGPT Podcast - EP 456: OpenAI’s o3-Mini - The world’s best free chatbot model?

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. I know I've said probably dozens of times.

Starting point is 00:00:49 Don't use free AI models, right? Because the paid versions are ridiculously cheap for what you get $20 a month, $30 a month. It doesn't matter whether you're an individual or buying that for your company with thousands of employees. That's so affordable. So I've always said don't touch the free models, but I might have to change that. because OpenAI has made their O3 mini model free, well, in a very limited capacity in terms of the number of messages. Yet, I think it may be the world's best free chatbot model. So we're going to be going over today, the new 03 mini model from Open AI, talk about what it is, how it works,

Starting point is 00:01:39 its implications, if it is really the best free AI chatbot model in the world and maybe do a little bit of live testing. All right, I'm excited for this one. I hope you are too. If you're new here, welcome. This is Everyday AI. My name's Jordan Wilson, and we do this thing every day. It's for you. This is your daily live stream podcast and free daily newsletter helping us all, not just keep up with AI, but how we can use it to get ahead to grow our company and to grow our careers. And that is a full-time job if you are not tuning in. every day. So we do all that hard work for you so then you can go be the smartest person in AI at your company. All right. So if you are new, maybe listening for the first time on the

Starting point is 00:02:19 live stream or the podcast, thank you for tuning in. Make sure to check your show notes. Very important things. They're mainly our website, your everyday AI.com. That's where you're going to want to go sign up for our free daily newsletter because every single day, we do a couple of things. We bring you all the latest AI news and tell you what it means. But we also break down our podcast, from the day with some more information. So make sure you go do that as well as while you're there. I'm going to keep promoting this, y'all. You need to go listen to our 20, 25 AI predictions and roadmap series.

Starting point is 00:02:51 It is all on our website. It is for free. I'm getting a ton of messages. I just got a message actually today or last night because I think this person's in Europe from one of the largest consulting companies in the world. And they said their team is breaking down those five episodes. and they're going to continue to track them all year. I kid you not.

Starting point is 00:03:11 You need to go listen to them and let me know what you think. All right, enough chit-chat. Let's get into the AI news, live stream audience. This is up to you. I have a question on the screen there. Let me know we're going to be doing an 03 mini live test. Do you want to see A, do you want to see it go through a reasoning rubric, or do you want to see B go through some real world data analysis?

Starting point is 00:03:33 So let me know A or B on the screen. Let me know now. All right. So AI news, a lot going on. Gemini. Google has announced the general availability of its Gemini 2.0 Flash model, a high-performance AI design for developers with enhanced speed and complex problem-solving capabilities. So the Gemini 2.0 Flash model was first introduced at I.O. 2024, its developer conference,

Starting point is 00:03:57 is praised for its efficiency in handling high-volume tasks and multimodal reasoning with a 1 million token context window. Also, a new experimental version of the Big Boy model, Gemini 2.0 Pro, is also available for paid users boasting superior coding performance and a 2 million token context window. I mean, Google is just dominating the context window game. They also introduced Gemini 2.0 Flash light, the most cost-efficient model to date, and that is in public preview, and that's offering improved quality over its predecessor, 1.5 flash. So Google emphasized the importance of safety and responsibility with the Gemini 2.0 lineup, incorporating new reinforcement learning techniques and automated red teaming to mitigate risk and ensure secure usage. Yeah, huge, huge news there from Google. I'm excited to dive into that a lot more. Our next piece of AI news, U.S. lawmakers are proposing a ban on deep seek.

Starting point is 00:05:00 Big surprise, not at all. All right. So lawmakers in the U.S. are planning. to introduce a bill to ban deepseeks chatbot application from right now just government-owned devices over security concerns that user data could be accessed by the Chinese government. Spoiler alert, it can. The bipartisan legislation is echoing previous efforts to ban TikTok from government devices, which was the precursor to TikTok being banned in the U.S., which was quickly there kind of overturned, but it could still happen. All right.

Starting point is 00:05:35 So Deep Seek is a Chinese AI company, and they've rapidly gained popularity in the U.S. becoming the most downloaded iOS app last month. Concerns arose, though, after an analysis revealed that there's hidden code in the app that could send information, user information, to China Mobile, a state-owned company banned in the U.S., a China-state-owned company. So, yeah, the proposed legislation aims to ban sensitive government and personal data from being accessed by the Chinese.

Starting point is 00:06:03 Chinese Communist Party. Other countries, including Australia, South Korea and Italy, have already banned Deep Seek from their government systems due to similar data security concerns. Also, some U.S. federal agencies such as the Navy and NASA have preemptively blocked the app for security reasons. I left to post on LinkedIn for this. I'll probably do a show dedicated at some point. This story is changing so quickly. That's why I've kind of held my tongue on this, because I got a lot of hot takes. So I might need to save that for a week or two. All right, last but not least, new piece of AI news. Chat GPT has made, speaking of things for free, their new chat GPT search available now for free, even for free users who are not logged in. So this new feature,

Starting point is 00:06:50 well, the feature at least new to free and non-logged-in users allows everyone to access up to the day information, such as sports scores, news and stock prices directly through chat GPT. So according to reports, the search functionality uses a fine-tuned version of GPT-40 optimized with synthetic data and output from OpenAI's new reasoning models. So OpenAI has partnered with major news organizations like the Associated Press and Reuters for licensing agreements, influencing the visibility of certain publishers in search results. So this is huge. It's kind of weird now, right? It almost looks like Google is trying to really keep.

Starting point is 00:07:30 compete with what chat GPT was like two years ago. And now chat GPT is trying to compete with what Google has been for the past 20 years, right? Really making a stake to try to erase Google, right? Open AI just wants you to skip Google all together and go use its chat GPT search and you don't even need to be logged in and you don't even need to have an account. So pretty wild. All right. That's enough to chat. Lots more in the in the newsletter today. So let's talk about OpenAIs 03 mini model. It's very impressive. It's very impressive.

Starting point is 00:08:06 I'm just going to say that. Hey, live stream audience, thank you for tuning in. I saw a couple votes, you know, some A's, some B's. So yeah, let me know if you want to go over the reasoning rubric live or if you want to go over the data analysis. So A or B. All right. Also, you're going to want to repost this. All I'm saying, I've been putting these guides together.

Starting point is 00:08:29 after I do something in chat, GVT or other large language models, I'm like, wait, I've just really saved people dozens of hours a week if they go do this. And I realize that, you know, sometimes the podcast isn't enough or putting it in the newsletter. So I did put together another guide specifically on using the O3 mini model. It's fantastic. I just finished it this morning. So if you repost the show, I'm going to send that to you. All right.

Starting point is 00:08:58 So let's get into the O3 mini model. O3 mini model. Here's the gist. It is the first free reasoning model from Open AI, right? So we have the O series of models. It is different than the GPTs, right? So the GPTs are your, you know, quote unquote old school transformer models. And then the O series, this is Open AI's reasoner models, right? The reasoner models have become very popular over the last like four or five months. But this is essentially a model that thinks longer, kind of does more inference or, you know, uses kind of this chain of thought thinking, where it doesn't just quickly respond to something. It takes a while and really kind of thinks internally. So kind of the work that you would

Starting point is 00:09:39 normally do in a transformer model, a GBT40, right, as a human, you'd want to go back and forth with it a lot, kind of these reasoning models, that's why they're so good. But a couple of things. It uses more compute. So generally, they're more costly. So as an example, if you want to have unlimited use of this, you need the $200 pro plan, but at least for, I believe it's 10 messages until you hit your message cap. It's available right now for free users. But it's not just that that makes me excited about this. So if you are logged in, so this is separate news from the chat GPT search that you can use even if you are not logged in, right? So if you do have a free, even a free chat GPT account. So not only do you have a couple, you, you know,

Starting point is 00:10:25 you know, queries that you can use with a new O3 mini model that Open AI just released, but it also is connected to the Internet. That is huge. That is the piece that I think most people have missed or overlooked when it comes to, when it comes to this new model from Open AI. And that guide, by the way, the guide leverages specifically. It's 20 different use cases that combine reasoning and the Internet, right? which is that's what knowledge workers do.

Starting point is 00:10:55 And that's why I think this is so exciting, even for people who are not paid subscribers. Again, whether your favorite chat bot is Gemini, Claude, Chat, Chb-T, whatever, just pay for the base $20 month plan. It pays for itself the first time you hit Enter if you know what you're doing. But even for those cheap skates out there, right? Yeah. I know a couple of you out there that are still pinching your pennies.

Starting point is 00:11:18 And even though you're buying $8 coffees every day, you're like, oh, I'm not going to buy it $20. No, just buy it. But still, this is the first model for free that you can use from OpenAI that is its reasoning model and it's connected to the internet. And reportedly, I don't personally believe this, but a lot of people have said, oh, Open AI did this because of the Deepseek R1 release, which took the internet by storm for a lot of, I'm going to say incorrect reasons. We'll just say that. All right. I don't personally think this is in response to Deepseek.

Starting point is 00:11:54 I think this is actually in response to Google. That's been on a freaking tear since December. Google has been straight up releasing a crazy amount of releases. If I'm open AI, I don't care about Deep Seek, right? It's largely going to get banned, I believe. I'm worried about Google. So I think that this is really a shot at all of the great work that Google's been doing specifically in Google AI Studio.

Starting point is 00:12:21 All right. So let's just answer that question right now. I'm not going to make you wait another 20 minutes and after our tests. Is O3Mini the best free chatbot model? Let me break that down. Free chatbot model. Okay? That's when you log into the front end of a chatbot.

Starting point is 00:12:39 All right. So what do I mean by that? Well, right now, if you have a free account and you log into Gemini, you know, Gemini.com, even though they had all these new releases, you can't use them. You're only using 1.5 flash. but they have great models within the AI studio, but it's a little different. That's not a kind of for beginners. That's more for developers.

Starting point is 00:13:01 So it beats Google. No questions asked. Copilot is powered by GPT40 technology. And last week, if you read our newsletter, you're smart. You already know this. There is some limited free access to OpenAIs, O1 model with the think deeper kind of capabilities inside co-pilot. But I still think.

Starting point is 00:13:22 03 Mini is better because we're talking 01. I believe that one's 01 preview. Microsoft, like I know a lot of you guys listen to this. And I always tell like I met like talked with like 100 of you guys at the build or the Ignite conference here in Chicago. And I'm like, hey, tell me if I'm wrong on this. But I'm pretty sure the think deeper uses the 01 preview, not the 01 pro. So I still think it's better than that.

Starting point is 00:13:50 I still think 03 mini for free is better. than using co-pilot for free. Claude, just L.O.L. That, I mean, by the time, even on a paid account, like, Claude, you can't use it. You just can't, right? Anything more than a couple of prompts and you hit your rate limits. On free, on a free account, even though Claude 3-5 Sonnet is a good model,

Starting point is 00:14:10 it's, you know, now like, I don't know, eight months old. So presumably we'll be seeing new updates from Claude pretty time soon. I mean, on a free plan, if you look at Claude the wrong way, you've already hit your message limit. All right. So you can't really use it a lot. All right. And then deep seek, good luck with that. High risk, lots of questions. Great model. Great benchmarks, right? Good luck. That's all I'll say. All right. So is 03 Mini the best free chatbot model? Yes. It's not even close. This model is so good. Here's the thing. It's limited. If you're on the free plan, I think it's only, I think it's only 10. It's either 10 a day or 10 a week. Open AI doesn't say in all of my accounts are paid.

Starting point is 00:14:52 I was trying to quickly find that answer out. I'll make sure to put it in the newsletter. But yes, it is. It is. And I actually don't think it's even close. All right. Hey, someone here is saying our audio is cutting in and out a bit. Let me know if it actually is or if maybe that person has some computer problems today.

Starting point is 00:15:16 So it is. Open AI's new 03 Mini is the best free chatbot model in the world. and I don't think it's necessarily close. Because it's not just the model. It's everything else that the model has capabilities to do. Like we said, search. Just right there. Chad GBT,

Starting point is 00:15:35 search is great. All right. So let's go over some of the highlights of the model. So it excels in STEM encoding, this new 03 Mini. All right. It is 63% cheaper than 01 Mini, which is the model it replaced. So, yeah,

Starting point is 00:15:52 If you're, FYI, if you're on a paid plan and you're in there looking and you're like, wait, where's 01 mini? 01 minis gone. And now you have 03 mini instead. And there's actually multiple variations of 03 mini. I'll get to that in a second. It is 03 mini is 24% faster than 01 mini. And for API users, there's actually three variations. There's a low, a medium, and a high kind of variety or flavor.

Starting point is 00:16:19 And that's essentially you're choosing speed and cost versus performance. So for the 03 mini low, that is going to be the cheapest, the fastest with the lowest performance. O3 mini high is going to be the most expensive and take a little longer, but it's going to have the best performance, obviously. And then the O3 mini normal is going to be that middle, right? It's like the, what is it, the three beds, right? One's too soft, one's too hard, one's just right. All right. And that's for API.

Starting point is 00:16:59 So if you are on the chatbot version, right, which is many of us, right? So chatGBT.com. You're not, you know, using backend API as a developer. But for chat GPT users, if you are chat GPT plus, so the $20 a month, you have access to 03 mini, kind of the medium or middle version, and then 03 mini high. And that one kind of thinks harder, more or less. It uses a little more compute. And you have, I believe, 50, oh, no, it's 150 a day, I believe now. So plenty, plenty of usage. They just tripled it in the last couple of days. So if you have the $20 a month plan, I don't think you're probably going to

Starting point is 00:17:41 hit your 03 mini high limits. And let me tell you, right now, O3 Mini High is probably one of my most used models. All right. Also, O3 Mini High outperforms the full 01 model on mini benchmarks. Because right now we just have the miniature version, right? We don't have the full O3 version. I don't even know if the full O3 version is going to come out in 2025. I would assume it would, but I don't know.

Starting point is 00:18:11 Because the full O3 model has not been released. The only glimpse of it that we've seen is Open AI did say that it's new deep research, which is mind-blowingly good. It's going to put so many small to medium size and management consultant companies out of business. I'm not kidding. It's freaking good. Anyways, that uses a fine-tuned version of the full O3 model. But this is just, we're just getting the mini. We're just getting the mini one here, y'all.

Starting point is 00:18:41 All right. Let's keep it going. Benchmarks. I know. I'm not going to get too dorky here. But let's look at competition math. O3 Mini High outperforms even the full 01 model on the AIME, I think that's AIME, I think that's AIME, 24 competition math benchmark. Right?

Starting point is 00:19:06 So large language models, they go through all these tests, all these standardized tests, essentially. Think of like a human, you know, there's all these different tests. You take, same thing with models. And then you get benchmarks, you get scores, right? So you can see how capable the model is. So while O3 Mini, high is even more capable than the O1 model, and that's one of the highest scores in the world. And then you have PhD-level science questions, right? Because O-3 Mini-high is great at anything, STEM, coding, research.

Starting point is 00:19:35 It's Chef's Kiss. Good. All right. So on the PhD level science, which is GPQA diamond, for those of you at home keeping score, O3 Mini High also out benches the full O1 Pro, right? Also, this is not yet 03 Mini. So kind of the benchmarks that I talk about a lot on this show, aside from, you know, MMLE and some of those that I just mentioned, are the chatpot arena. scores. Those aren't out yet because this model is barely like a week old, right? But we do have

Starting point is 00:20:14 from artificial analysis, which is a great resource for an unbiased third party model benchmarking service. 03 mini in terms of quality, it is second in the world, only behind the full 01 model. So 03 mini and Deepseek R1 are actually they're tied with scores of 89. where 01 has a 90. For comparison, right, if you love Claude 35 as an example, Claude 35 has a 68, if that puts it on the scale for you. All right, Gemini 2.0, not their newest version, but the one previous to that had an 82.

Starting point is 00:20:56 So what does that mean? It is by far one of the highest quality models in the world. In Open AI made it free for the, limited use case, right? But it's mind-boggling that we have this level of a reasoning model that is one of the most capable in the world. And it also can access the internet, which is one of the reasons why I tell people don't use Claude, at least yet, right? Because there's a somewhat of a business danger if you are taking results from a large language model that has very old data. You shouldn't be doing it. All right.

Starting point is 00:21:37 So another kind of graph here from artificial analysis, this just kind of shows your quality versus price. And that's where you see, oh, okay, when it comes to quality versus price, 03 mini is actually the best in the world. And it's not necessarily close. The only one somewhat close is Deepseek R1. Again, good luck with that if you want to, you know, good luck with that if you want to use it.

Starting point is 00:22:05 I'm not using it on a day-to-day basis. But 03 Mini from a quality and price perspective, right now, it can't be beat. I mean, we'll see. I think Google's announcements yesterday are going to shake this graph up a little bit. And I'm excited to dive into all of the new Gemini 2.0 a little bit. But right now, O3 Mini is technically an elite model. Don't let the Mini confuse you. All right.

Starting point is 00:22:33 So as a reasoning model, these are the API pricing, right? again, you can use it for free. You can use it, chat chvety plus $20 a month. You're probably not going to run out of queries. If you have the pro version, like I do, $200 a month, it's unlimited. But for API pricing, it's $1.10 for a million input token and $440 for a million output tokens. For a reasoning model, so affordable.

Starting point is 00:23:00 It is so affordable. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the Assistant. The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across

Starting point is 00:23:40 Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com.

Starting point is 00:24:22 And you might be confused. I get it. All this O alphabet soup, right? Open AI CEO, Sam Altman did admit that they have a naming problem with the models. It's hard. Right. And especially when they come out with some of these new O reasoning models, some of the older ones get replaced or they're just no longer available.

Starting point is 00:24:42 So let me just give you a quick rundown of the O reasoning models. series. So in September, we got 01 preview and 01 Mini. All right. Then in December, they got rid of 01 preview. And then we just had 01 and they added 01 and 01 Pro. So if you had a pro account, that's the only way you can get access to pro. In December, you had three versions. You had 01 Mini, 01, and 01 Pro. Okay, easy enough to follow along. But then January 31st came around last week, right? And that threw a wrench in it. So now we went to 03.

Starting point is 00:25:21 There is no 02 because that's the trademark name of a British telecom company. So if you're wondering, like, what happened here? Did I miss out on a whole series of AI development? No, you didn't. Right. But now in January, we got this 03 mini and that has 03 mini, high. And then 01 mini is gone. I know, confusing.

Starting point is 00:25:43 So depending on what paid plan you have, you might still have in your account, 01, 01 Pro, 03 Mini, and 03 Mini, high. I know it's confusing. I have a slide here that can hopefully help you make a little sense of it, all right? Because which model should you use, right? Like if you're like, oh, I have a paid chatGBT account, what should I use? Well, there's actually some unique features of each. So listen in here. I have a helpful little graph on screen for our live stream audience. So 01, not the pro. O1 actually has a great advantage to it. Okay. So right now, O1 and O1 Pro are the only O models where you can upload files. Not all upload file types are supported. All right, but it does have for visuals, you know, PNGs and JPEGs, I believe.

Starting point is 00:26:42 All right, so that's the 01 series. So normal 01 can access Canvas mode. All right. O1 Pro cannot yet O1 Pro is much more powerful than normal O1. Okay. So if you need to upload files, right, visuals at least, because you can't upload PDFs or spreadsheets right now into the O1 models. But let's say you're doing a lot of visual, you know, computer vision type work.

Starting point is 00:27:09 You're probably going to want to still choose one of the O1. 01 models. If you love Canvas like I do, you might use 01 because that's the only one that has canvas. If you need just straight up raw power, you're going to want to go with 01 Pro. But 01s don't have access to the Internet. So, 03 Mini, there's no differentiation right now between features or other tools within Chat Chb-T. But O3Mini is the only one that has web search. All right? And that is the only mini model now.

Starting point is 00:27:45 I know a little hard. But essentially, if you need the web, which I highly advised, go 03 mini. That's why I'm using 03 mini a ton. All right. If you need canvas, use normal 01. If you're on the big boy plan, then you can use O1 Pro for some of those very tough tasks. Woo. Does that make sense, y'all?

Starting point is 00:28:07 Hey, if you have questions, get them in now. Podcast audience, I love hearing from you guys. That's why I always put our email in there. I put my LinkedIn. Reach out to me. Let me know if this is helpful. If you have questions, I'm sometimes a little slow getting around to those messages, but I do eventually.

Starting point is 00:28:23 All right. So let's look live. Let's see what won our little poll this morning. Let me count. So our A's, let's see, we had one, two, three, four, five, six, seven, eight, nine, ten. Okay, ten. And then our B's. Let's see, we had one, two.

Starting point is 00:28:42 three. All right. Looks like you guys wanted the reasoning, the reasoning version here. All right. So let's jump into it. Live stream audience. As always, please let me know when and if you can see my screen here. All right. So we are going into chat, GBT. I'm going to do this live. All right. So these, let me make sure I go into O3 mini high. So I'm going to be using O3 mini high for these. All right. So, This little reasoning rubric, I've been using a lot of these questions now for like two years, right? Before there were reasoning models, I had like kind of this common set of about 12 questions that I would give to any models. Some of the earlier models, you know, Claude 35 Sonnet, GPT4, GPT40, Gemini II didn't do very good with this because they're kind of like trick questions. But I actually think this is pretty important. Right. Because sometimes a simple mistake when using chat GPT or Claude or Gemini can screw up your entire output, right? Because large language models, whether you know this or not, they don't understand words, right? You give it a bunch of words. It doesn't understand it. When it spits backwards, it doesn't know what those words are. It converts everything into tokens. All right. So sometimes large language models get confused like humans do, right? But that's important to keep in mind.

Starting point is 00:30:14 But that's why I think this kind of like quote unquote reasoning rubric is important. These aren't questions that you would generally use, right, on a day-to-day basis to grow your company and career. But this just shows you, are these models smart or not? Right. All right. So let's go ahead and try our first question here. So again, I am using 03 mini high. All right.

Starting point is 00:30:38 And you're going to see these live. Hopefully it's not going to take too long. to go in there. So the first one I am saying, I just woke up with six apples and three bananas. If you're a long time listener, you've heard this before. I just woke up with today with six apples and three bananas. Yesterday, I ate a banana and two apples this morning. You know what? I'm going to go ahead and scroll up here. I'm going to scroll up here. Hey, live stream audience, let's see if you can get this. Ready? I'm going to go slow. I just woke up today with six apples and three bananas. Yesterday, I ate a banana and two apples.

Starting point is 00:31:14 This morning, I will eat one apple and no bananas. However, I don't really like apples. And one banana may turn brown tomorrow. Assuming nothing else changes, how many apples and bananas will I have tonight? Live stream audience. What's your guess on that? Podcast audience. Are you scribbling this at home?

Starting point is 00:31:35 This is a fun one. I actually made this one up. Some of these are very widely used. of trick questions or variations of these. Some of them I just made up, right? So I'm curious if our live stream audience can get this one, correct? But let's quickly, I'm not going to do this for each and everyone, but let me just quickly describe for our podcast audience what's actually happening here.

Starting point is 00:32:01 So it says reasoned about fruit consumption and stock for 29 seconds. So you don't get the full chain of thought, right? you don't get to see the raw, unfiltered way that O3 mini high is thinking. But you do get a summary of the chain of thought, right? So I can see what it's thinking. So it's saying assessing fruit intake, right? I woke up with six apples and three bananas. So you kind of get to see how the model is thinking and digesting your question.

Starting point is 00:32:30 Then it says assessing tomorrow scenario, concluding the estimation, adjusting my focus. It says, I initially considered yesterday's fruit consumption. But it seems today's six apples and three bananas take precedence. Yes. You know, a lot of this stuff in here is just to confuse the model. So the model started going down the wrong road, right, which all the non-reasoning models got this wrong because that's what they did. They take like they got this, you know, unrelated information. And it screwed it screwed up what it was supposed to do.

Starting point is 00:33:02 So then it says assessing fruit stability, avoiding overstocking, reassessing preferences. these are just kind of the headlines in the chain of thought thinking, assessing fruit freshness, evaluating fruit stock, right? Keep it going. I mean, this is a lot. And then at the very end, it says, taking a closer look. Okay, I'm listing five apples and three bananas tonight, assuming no changes. Only one apple is eaten this morning, leaving the rest of the fruit untouch.

Starting point is 00:33:28 So the final count, it says five apples and three bananas. You know what? Hey, shout out Vincent. Vincent got it right. Good job, Vincent. So did. So did Marie. Good job, guys.

Starting point is 00:33:46 All right. I'm going to go a little faster with the rest of our reasoning rubric. But I did want to want you all on the live stream and the podcast to kind of see and understand. It actually thought about that at a pretty decent level, right? And going through and reading some of this, again, it's just the summarized chain of thought. But same thing. I played around with Google's new Gemini. And it got some of these questions wrong.

Starting point is 00:34:08 did it with Gemini as well. But the chain of thought was actually pretty impressive, almost like scary, impressive, right? But hey, getting it right is the first most important thing. All right, the next one, which so many models struggle with this one. All right, so this one is, let me get the right level of zoom here. A man in his dog, all right? And hey, live to your audience, let's just see if you guys can beat 03 mini high. Some of these are very easy. All right. This one, you should be able to get instantly. A man and his dog are standing on one side of the river. there's a boat with enough room for one human and one animal. How can a man get across with his dog in the fewest number of trips?

Starting point is 00:34:47 Like reasoning or sorry, transformer models can't get this. They can't, right? Claude saw on it, Gemini, GBT 4-0, none of them can get this, even though this is dead simple for any human with a brain. Right? So let's scroll down, scroll down. A lot of thinking here for something simple, right? But finally, finally, finally, it's just one trip, right?

Starting point is 00:35:11 Usually you would get three to five, even from these very powerful models, right? And this is one of the reasons why a lot of companies like before reasoners were like, I don't know, these models are dumb. Well, yeah, they can be a little dumb, right? Generally, these are trick questions, but now you're seeing it's handling it fairly well. All right. Next question. Here we go.

Starting point is 00:35:34 We're going to go through these quick, y'all. So a man and his dog are standing. That's the same one. I got to copy and paste the other one, y'all. All right. Next one. If it takes three hours to dry 10 t-shirts in the sun, how long will it take to try 30 t-shirts in the sun? Hey, mathematicians on the live stream.

Starting point is 00:35:55 Go. Can you beat 03-03 mini-high? So if it takes three hours to dry 10 t-shirts in the sun, how long will it take to dry 30 t-shirts in the sun? All right. Let's keep going. There we go. Got it correct. Three hours doesn't change, right?

Starting point is 00:36:14 It's saying, assuming you have the room, it doesn't change. All right, our next question. And again, a lot of them got this wrong before the reasoning models. All right. If you have a single match and walk into a room with an oil lamp, a candle, and a fireplace, which do you light first? All right? Live stream, Monis, what do you think? Which do you like first?

Starting point is 00:36:41 I hated these questions, right? Like when these are on standardized tests, you know, a train leaves the station at this time and an airplane's going here and this person's on a unicycle, but the unicycle's going uphill. And I'm like, this is dumb. I don't want to answer this, right? But what do you guys think? All right. Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, Ted, but the answer is the match.

Starting point is 00:37:04 Yeah, it's not the candle or anything else. You got to like the match first. All right. A couple more, a couple more very simple ones, y'all. All right. So here's our next one. What color is an airplane's black box? That's just a trick one.

Starting point is 00:37:21 All right. But it's going to get it right. Because even the Transform models, bright orange is the correct answer. There we go. All right. Our next one on our reasoning rubric for O3 mini high. All right. And again, for all of these, y'all, like, okay, so for that one,

Starting point is 00:37:36 there was not a lot of chain of thought underneath, right? It said understanding the situation. And that's all. It didn't have to go back and forth and second guess itself and, you know, map out all these alternative paths. It was pretty simple. This one is kind of tricky. And transformer models can never get this right. So I said, please give me seven jokes that end in the word blue.

Starting point is 00:38:00 Two should be about animals. Three should be about some other topic in the body of this chat. Okay? And you can make up the other two. I'll tell you this. Large language models aren't funny. All right. So I'm going to just read a couple of these jokes. I'm just mainly going to make sure that do they all end in blue? Is there two about animals, three about context of the chat and two that it made up? They're not going to be funny, right? And it always does the same thing. It's always like, oh, they're feeling blue. All right. This one is taking a little bit longer, right? So it's laying out the options, mapping out the connections, generating

Starting point is 00:38:36 a diverse list, crafting, crafting humorous animal punchlines, brainstorming jokes, right? So a lot of, this is actually a little more difficult for O3 Mini, right? It's taking a little bit more time to think about this. Let's see if it's done. It thought about this for a minute and 10 seconds, right? Kind of a long time. All right. It said refining humor, which I haven't read the jokes.

Starting point is 00:38:59 They're not going to be funny because ending it in blue, there's really nothing. I haven't seen anything, right? humans out there, humans, if anyone can give me a real good joke that meets these criteria, I don't know. I'll pay for a month of chat, GPT, but I don't think there's anything funny that you can actually do because people are always like, oh, it failed. That's not a joke. And I'm like, okay, humans, you go ahead and do the same thing.

Starting point is 00:39:25 See if you can make me laugh with the ending in the word blue. Probably not. All right. So let's see if it actually did it. Look at all this chain of thought, y'all. Sheesh. All right. So it got two animals. Perfect. Ends in the word blue. Perfect. All right. So here, we'll read a couple of these. At the local jazz night, my dog tried to sing along with the band. When I asked him why he kept hitting the wrong notes, he just barked blue. Not funny, but it hits it, it hits it. So it's now it has three jokes about that used the content.

Starting point is 00:40:04 context of this chat, all ending in the word blue. Let's just read one of them. I started here, this one's about the fruits. I started my day with six apples and three bananas, but after all, the breakfast fuss, even the fruit salad confessed blue. Funny? Nope. All right. And then two that it made up on its own. Let's read both of these, because these are, anytime there's something that's like 10% humorous, it's always the one that said it made up on its own. All right. So I visited a paint store looking for a hue to brighten my day. The salesman held up a can and said blue. Not funny. All right. Last one. When life handed me lemons, I tried making lemonade, but no matter how hard I squeeze, my mood still ended up blue. So are these jokes? Kind of. Are they funny?

Starting point is 00:40:50 Absolutely not. Do they hit the criteria that we set forth? Yeah. Yeah. You know, I don't know. Maybe one person out there would laugh. All right. Here is the last one. one that we'll be able to definitively say yes or no. And this is a really good one. All right. Live stream audience, get ready. All right, because I'm pretty sure this is going to think for at least a minute or two. I want to see, can anyone out there in live stream land beat 03 mini high on this. All right? You already see the prompt in there. So humans, you get a head start. All right. So here we go. A box is locked with a three digit numerical code. All we know is that all digits are different if the sum the sum of all digits is nine and the digit in the middle is the highest.

Starting point is 00:41:37 What is the code? All right. Go ahead, humans. Can you beat? Everyone's like, everyone's always like, oh, AI isn't smarter than me. All right, humans, let's see. All right. So a box is locked with a three digit numerical code.

Starting point is 00:41:50 Can you beat 03 mini high? All we know is that all digits are different. The sum of all digits is nine and the digit in the middle is the highest. All right. Let's see. can anyone beat? And I'm not going to show the chain of thought on this to make it fun for our live stream audience to see if you can beat 03 mini high.

Starting point is 00:42:13 I don't see any responses yet, y'all. All right, Marie got one. Marie said 081. Marie beat 03 mini high. All right, good. One thing is I didn't specify. So we'll see if 03 mini high says it. And there's actually a lot of answers.

Starting point is 00:42:32 All right, because I didn't specify if you could use a zero. I should update that, that rubric, right? But let's see how it did. Some impressive chain of thought here, right? So it broke down the rules. It's adding, you know, A plus B plus C equals nine, B is greater than A and B is greater than C, right? All these things.

Starting point is 00:42:52 Step one, you know, so again, it's doing some basic, some basic algebra here. All right. Let's scroll to the bottom. All right. So I did not, I did not designate that zero. I don't know why all models don't think or know that you can start it with zero. They think it's like the first digit has to be a one through 10 and they only use zeros in the second and third spot. So I should update this to say you can use zeros in any of the three numerals, but it did get it right because there are 10 not counting starting off with a zero.

Starting point is 00:43:28 I believe there are 10 different codes, right? So 180, 270, 162, 261, 3650, 153, 351, 451, 450, 243, 342. So, yeah. Hey, good, good job, human friends. You guys, you guys got a lot of the solutions, right? All right, let's just try one or two more. These ones are not, these ones are not something that are like right or wrong. Right? This is more of an arbitrary answer.

Starting point is 00:44:02 So here, I'm going to click the search the web. Okay. So let's go ahead. This is an example of where I think things can get powerful. But this prompt is, again, this is nothing special. All I'm saying is generate unique and creative marketing advertising strategies to grow the everyday AI podcast. Do not suggest general or run-of-the-mill ideas. Only pitch clever advertising.

Starting point is 00:44:28 and marketing tactics to specifically grow the everyday AI podcast by Jordan Wilson. Hey, same thing, humans. Hey, humans in the live stream audience, answer this. How should we grow this podcast? Let me know. All right. So now it's brainstorming marketing strategies, crafting innovative strategies, identifying unique angles, right?

Starting point is 00:44:49 Crafting AI-driven campaigns, all this stuff. Engaging the community, right? I think I'm doing an okay job at that, hopefully. All right. Keep going down. Keep going down. All right. Let's see if we got some answers. So below, did I ask for a certain number? No, I did not. So it said below our seven inventive, tailored strategies designed exclusively to grow the everyday AI podcast by Jordan Wilson. All right. So let's see if any of these are actually good because I've done this with all the different models and generally non-reasoning models. Give me kind of boring stuff. Right. It's like, oh, you know,

Starting point is 00:45:27 take out ads or post something on LinkedIn. I'm like, okay, that's boring. All right, so let's see. Number one is the AI creator accelerated challenge. Launch a branded contest where listeners are invited to submit a brief case study on how they use a featured AI tool. What's very strange, I kid you not. I just thought of this like last weekend in the shower. I'm like, oh yeah, I'm going to start doing this for like use cases.

Starting point is 00:45:52 So, okay, good job, 03 mini. I hadn't heard this from any other non-reasoning model before. All right, interactive AI chatbot ambassador. All right. So develop a custom AI chatbot branded in everyday AI's visual style and tone. All right. Nothing. It's pretty standard.

Starting point is 00:46:11 Everyday AI augmented reality filter campaign. Okay, chat, GPT. I don't know how much time you think I have to do that, but it's unique. All right. Number four, co-branded AI showcases with tool makers, identify and partner with emerging or established AI. tool companies for exclusive co-branded live mini webinars or demo days. Yeah, I get enough of that.

Starting point is 00:46:33 People always want to pitch their garbage to come on the show and sell to you guys, and I say no, right? I think I got like 15 pitches yesterday. All right. And everyone wants to shove their garbage products down your throat. So I'm going to say no to that one. All right. Five, personalized podcast journey generator.

Starting point is 00:46:51 Built an interactive dynamic website feature that asks visitors a few short questions about their industry, career goals, and current usage. All right. That's fine. Six. Yes. I've had this idea. So I like this one.

Starting point is 00:47:05 Embed subtle Easter egg audio clips. Oh my gosh. I love this. I love this. This is actually one of my first ideas that I had like back in 2020 before I even launched this. I'm like, oh, I'd love it because I love this like Easter egg thing. And we're actually going to do this at some point.

Starting point is 00:47:20 So yeah, podcast Easter Egg scavenger hunt. So hiding subtle hints inside certain podcasts. You got to find them. That one's fun. Love that idea. And then last but not least, hyper-personalized social ads powered by AI insights. All right, pretty good. Nothing crazy here. So I have run this. I did do some of these tests last night. And last night, when I turned on the search mode, it did a little bit better of a job. So here's the thing. Generative AI, large language models, unless I tell it in the prompt to explicitly go research on the web, even if I click that search button, sometimes it will, sometimes it won't.

Starting point is 00:47:57 Right. So, you know, I'm just curious. I'll probably just run that one one more time because I'm actually just curious. And I'm going to say use chat, GPT search before you start to better understand everyday AI by Jordan Wilson. Yeah, because I ran this exact same prompt last night. And in this version that I just did live for you guys. Like the whole point is like, oh, watch when I click search, right?

Starting point is 00:48:27 It didn't search. Sometimes it does. Sometimes it doesn't. That's just how large language models work, right? Unless you explicitly tell it to. And when you explicitly tell it to search and you have that search icon, 95% of the time, it actually will. But I was actually a little bit surprised.

Starting point is 00:48:42 All right. So I'm going to let that run. And then we're going to do, we're going to do just our last one here. All right. And then I'm going to read this one. And then we're going to check in on the second attempt. So this last one is create a new company and brand for a future smart home device. This will solve a problem that does currently not exist.

Starting point is 00:49:04 I like this one. To start, come up with the company's name and its first flagship product. Give the product a name, brand, and campaign. Go to market strategy, tagline, and rationale for why it will work. And then I said, respond in a succinct way, keeping responses to short bullet points, but with ultra-specific facts. All right. So now I'm going to click rewind and look at that same come up with, you know,

Starting point is 00:49:32 inventive ways to grow the everyday AI podcast. But this time, even though I had the search button clicked, I had to explicitly tell it, yo, go to the web. Go to the web, homie. And now I see here in the responses it actually did this time because now it's citing things. So, yeah, last night when I ran this, it actually gave me some citations within the actual answers. So in this one, it just did it at the end. So again, generative AI is generative, right? Especially if you're just doing these copy and paste prompts, which I never recommend,

Starting point is 00:50:02 but for live demos, that's the best way to do it. All right? Because I can't sit here and go through a whole prime prompt polish to get the most out of this, right? But you'll see even just being a little more explicit in telling it, yo, go search the web, even though I clicked that search button, it didn't do it the first time. All right, let's look at our last one. And then we're going to wrap this show up, y'all. All right. So pretty good. chain of thought here. It only thought for 17 seconds, which isn't a lot. And we'll see if it actually used the web. I had the search button clicked, but I didn't explicitly tell it to. So maybe it did. Maybe it didn't.

Starting point is 00:50:34 One other thing that it doesn't do, and I wish it did. When you can see the summarized version of chain of thought, I wish that it would show you if it did go to any websites and if it is using that to think, right? Because all you get is citations in the response. I wish you could see like you can in the deep research. Because in deep research, there's an activity tab. So you can see, oh, we went to website one. And now on website one, it found out this. And then it pivoted and it looked at something else.

Starting point is 00:51:03 So I wish we got a little bit of that in 03 mini when you tell it to use the internet, but you don't. All right, let's just see the responses for this innovative smart device that solves a problem that doesn't exist. So the company name is Xenovate Smart Living. And its mission statement is to create craft intelligent, adapt. living spaces that optimize mental well-being and productivity in a hyper-connected future. So here's what it does. It is a smart home hub that gathers biometric data such as EEG, HRV, which I think is heart rate value via integrated sensors and wearables to continuously gauge user stress, focus, and fatigue levels. It dynamically adjusts ambient lighting, temperature,

Starting point is 00:51:44 acoustics, and even scent diffusion to create a personalized cognitive sanctuary. Okay. I mean, If I was like Tony Stark Rich, I would just pay to develop this. This sounds pretty cool. So, okay, 03 mini high. Pretty good job on that. That's open-ended. There's no right or wrong answer. I've run this on, you know, all different models.

Starting point is 00:52:03 And this is probably one of the better responses I've got. Normally, it's just kind of boring stuff. And you'll see in this one here, right? It did, oops, I got to go down. So it did also, oh, got a little confused here. Let me see. Okay, interesting because it's actually a, now kind of melding. Wait, is that right? Hold up. Yeah. So it's bringing in some everyday AI

Starting point is 00:52:33 aspects into this neuro haven, which it shouldn't have, right? But that's kind of why you have always use these properly, right? Generally, I would start a new chat. I would go through kind of quote unquote train it, take it through our prime, prompt polish, go through Refined Q, so it doesn't pull in information from the rest of the chat. But so what do you think, y'all? Are you impressed with 03 mini? Let me just say this. Benchmarks, outstanding. Even the free model.

Starting point is 00:53:04 So I will say yes, right now, but this could change next week. Right now, it is the best free chatbot model in the world. Although, like I said, I think probably everyone out there, every single business, every single business should be paying for either a team's account or an enterprise account for whatever large language model environment you want to work. with, whether that's Microsoft 365 copilot, which I highly recommend, chat GPT Enterprise, you know, Google Gemini for Workspace, Claude Enterprise, sure. Yeah, yeah, if you're fine, not having access to out there day information, sure, right? But you should always, always, always

Starting point is 00:53:43 be paying for a team enterprise subscription in the same way that your employees need like, you know, Microsoft Word or they need, you know, Word docs, they need certain software, right, that cost money, your team needs a paid account. So let me get that out of the way. I'm not telling you to not pay for this. But even for a free plan, I am excited because here's what this means. A year ago, I said, don't touch Jet GPT's free plan. It is absolutely terrible.

Starting point is 00:54:14 It is riddled with hallucinations because you were using the 3.5 version, which is bad. It wasn't connected to the internet. So a lot of what was ultimately shared online, which is just a lot of what was ultimately shared online, was just bad stuff, right? Because people that didn't know AI, they would just go in, create a free account, do a prompter to not knowing how large language models work, not understanding generative AI. They'd get a response that was absolutely horrible. They'd post that online or take that back to their, you know, director or they're bored. And they're like, look, AI is not for us. Well, sorry, that was dumb if you did that. I don't know, y'all. 2025, I'm a little spicier.

Starting point is 00:54:54 I'm a little more tired. I'm a little older. I'm not going to be nice anymore, right? I'm tired of people not knowing how to use AI. And then you go through and you get a bad output and you share it on social media. And you're like, oh, yeah, I will never take my job. And I'm like, yeah, it will. It 100% will because all you did is you just went out there and said, hey, I don't know how to use AI.

Starting point is 00:55:15 Right. A kind of funny comparison that I made to this. This is like if I, right, I'm going to do something live here. sorry if you're on the treadmill and you want to, and you want to end this, right? But this has to do with free chat chit, I swear, right? So this is actually, you can't see this because of the green screen thing, apparently.

Starting point is 00:55:35 Let's see, can you see this one? There we go. So this is like me if I draw something, right? Can you guys see what I drew here? Live stream audience? Can you see this? I'm making a point, I swear. All right.

Starting point is 00:55:49 So this is, if I posted this online, and said art sucks. Look at this. Art sucks. There's no room in the business world for anything artistic. Because look at this, right? I drew a picture of a stick figure. Art sucks.

Starting point is 00:56:10 Right? No, art doesn't suck. I suck at art. Right? There's definitely a place for art in the world. So that's what I think the old free version of Chat-GPT did for, like, the business. business world. It was a bunch of people that had no clue what they were doing. They would go on, use a bad version of GPT, GPT, DPD 3.5 that wasn't connected to the internet. Because when everyone's

Starting point is 00:56:35 trying to figure AI out, they're not always paying for the best models, right? And they're like, look, this is, this is bad, it's generic, it's full of hallucinations. AI stinks. No, you stink. You stink. But now, hopefully in 2025 and beyond, we'll avoid that because now, I think OpenAIs 03 Mini is the best free AI model in the world. And it has now closed the gap. Yes, albeit on a very limited basis because you can't use a ton of messages, right? But it's at least close the gap between what the rest of the world can access and get a taste of and what those that are paying for the best model have as well.

Starting point is 00:57:10 All right. I hope that was helpful, y'all. If so, remember, go check out our AI predictions series. It's all online. not recommend that enough. And I'm going to continue to demand you go listen to that. Because even the things I was talking about two weeks ago, I've already started to come true, obviously.

Starting point is 00:57:29 And if this was helpful, right, the combination of having a reasoning model that can search the internet when you prompt it to, it is mind-boggling, mind-bogglingly good. It is, I think, and if you didn't go share the deep research episode, you missed out because that guide was fantastic. but I do have 20 business use cases that are ready to go. You got to read it. You got to update some placeholders. You got to think, right?

Starting point is 00:57:59 But when you combine the 03 mini reasoning model with search, this changes what's possible. All right. So go repost this show. If you're listening on the podcast, I always leave the link to go repost this show if you want to. I'd appreciate that. I'd appreciate you also go to your everyday AI.com. Sign up for the free daily newsletter.

Starting point is 00:58:17 Thanks for tuning in. Hope to see you back tomorrow. and every day for more everyday AI. Thanks y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps,

Starting point is 00:58:43 including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going.

Starting point is 00:59:15 For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 456: OpenAI’s o3-Mini - The world’s best free chatbot model?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.