Everyday AI Podcast – An AI and ChatGPT Podcast - Ep 751: Hands on with Google’s Gemma 4: How to Use The Open Source Model Locally and Why It Matters

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. If I would have told you a year ago that you could use the world's most powerful models

Starting point is 00:00:51 on your local machine without having to pay for it, you probably would have looked at me and said, you're absolutely crazy. Well, I'm not because that day is here. Thanks to Google's new impressive Gemma 4 model, you can get at least 2025's Frontier AI performance on your local machine running it privately offline with this new impressive open source model. And what I want you to think back to a year issue ago, I'm not just telling you to, okay, think about maybe replacing that $20 a month subscription. That's not where this is going.

Starting point is 00:01:28 What about if your company was spending thousands of dollars or millions of dollars on AI deployments internally or externally? Or when you think about running AI agents around the clock, right? Anthropic recently said, hey, you can't use your clawed subscription anymore for OpenClaw. Well, now you can run with Google Gemma 4. You can run it around the clock and not pay a penny.

Starting point is 00:01:53 No, this is not too good to be true. Yes, it kind of feels like we're living in the future, and we're going to go over it all live. So welcome to Everyday AI. So here's the big picture, what's happening with Google's new Gemma 4, and it's, I think, rivaling. some of its trillion parameter giant competitors.

Starting point is 00:02:15 So Google DeepMind released Gemma 4 its most capable open model family to date. So there's four different variations. We're going to be breaking them all down on today's show. But the big boy, the 31 billion parameter model out ranks models 20 times its size. It is third on the global ranking of all open models on Arena. And that's not even the best thing. The best thing, maybe, is that Google changed its own. licensing and now Gemma 4 is released under the very permissive Apache 2.0 license,

Starting point is 00:02:47 granting full commercial freedom with essentially no restrictions. So not only is this free, you can run it on your computer around the clock if you have the right hardware. I'm going to be breaking all of that down. But you can also create and sell things with this model. So on today's show, here's what we're going to break down. I'm going to break down for you how a 31 billion parameter model now competes with ones 20 times its size.

Starting point is 00:03:14 I'm going to show you and tell you why running AI locally on your laptop or even phone changes the cost and privacy equation. And I'm going to break down live, right, the exact tools and steps to download and run Gemma for for free today as we go hands on. All right, let's get into it. This is everyday AI. Welcome. What's going on?

Starting point is 00:03:36 My name is Jordan Wilson. If you're new here, well, we do this. this every day. And this is your daily live stream podcast and free daily newsletter, helping business leaders like you and me keep up with the ever-changing AI landscape. I help you understand what's important and show you practically how to use that information to grow your company and career. So it starts here, but make sure you go to our website at your everyday AI.com decided for the free daily newsletter. We're going to be recapping today's show as well as giving you all of the other AI information. You need to be the

Starting point is 00:04:03 smartest person in AI at your company. So it's Wednesday. Wednesday, right? We have kind of different shows. On Mondays, we give you the AI news. On Wednesdays, we go hands-on a deep dive with one new AI release, something like that. And then on Friday, we go over in our Friday features, you know, kind of five to seven new AI features. Tuesday, Thursday, we rotate it a little bit. So if you are new here, that's the plan.

Starting point is 00:04:28 But on Wednesdays, we get our hands dirty doing live demos of AI. But first, before we do that, I want to talk about how Gemma, how I think is going to completely changed the landscape. And the cool thing about having a daily AI podcast where you can go listen to all the episodes for free, you can go see. I've been ranting and raving about the power of small language models since 2023. And technically, this is a small language model, but you get large language model performance, right? So the exact definitions of what's a small language model, what's a large language model is ever changing, right? But for the most part, you look at the number of parameters in a model. So we're going to be comparing

Starting point is 00:05:08 you know, what you can get for free out of Gemma 4 with what you could get out of the frontier models about, you know, 14 months ago, which at the time were GPT-40 and Claude Sonnet 3-7. But those are models, right? GPD-40 was reportedly two trillion parameters, right? So here you have a model that's about an eighth of the size and it's open source and it's free.

Starting point is 00:05:32 And here's why I think it's going to change the landscape. Well, number one, I already talked about kind of this anthropic saying, hey, you can't use your Claude subscription anymore to run OpenClaw. You have to pay via API and people are like, that's going to be crazy expensive. Well, now you can run a Gentic AI around the clock for free with Gemma 4. Is it going to give you the same model as an Opus 54 or a GPT 54? Absolutely not. But there's a good chance for maybe 50 to 80% of what you're trying to do.

Starting point is 00:06:03 This model is going to be good enough. And Google also changed the Gemma. license to Apache 2.0, which provides users unrestricted commercial freedom and preventing corporate vendor dependency. That's the big thing here. I think that this is going to lead to a research, and I've referenced this on the show once or twice, but I think we're going to kind of have this future that's kind of retro. I think that desktop software is going to come back. And in the same way that, you know, in the 90s, we saw this wave of personal computing. I think we're going to see personal software, right?

Starting point is 00:06:41 So not necessarily software that's for your whole company, software that's for you, right? And I think that it's going to be models like Gemma 4, right, that are going to allow this to happen. And also the performance versus size ratio absolutely just reset. All right. And I'm going to break down what this means. But essentially think of it like this, right? If you follow, I don't know, boxing or UFC, I don't really follow those things, but there's something called like, you know, pound for pound.

Starting point is 00:07:09 This is the pound for pound best fighter in the world, right? If someone, I don't know, fighting at 150 pounds can knock out someone at 180, that's pretty impressive, right? This Gemma is punching well above its weight class. I'm talking about it is competing with models 20 times its size on the open source side. This is something we've literally quite. literally have never seen in the history of AI, which is why I think Gemma 4 is a huge deal. So even if you don't necessarily think that you or your company need to use this, you're like,

Starting point is 00:07:40 okay, well, we pay for chat chipit enterprise or Google Gemini enterprise, Claude Enterprise, whatever, right? And we have more robust, agenetic solutions already going. Okay, you still need to be learning Gemma 4 and building with it, not just as a backup dependency, but because it can run in probably in the few. right, if we fast forward one more year, I don't know if any open source competitors are going to be able to truly catch up to what we just saw from Google and their Gemma 4. So let's talk quickly about the capabilities.

Starting point is 00:08:15 So it can solve complex reasoning, math, and multi-step logic problems effectively. There's native support for function calling. Yes, in a local model and structured JSON outputs for agentic workflows. It has a context window depending on, right, your hardware will give you all the those specs in the newsletter, right? But if you have capable hardware, you can work with a 256k token window for analyzing large document and code bases. It can analyze text images and videos natively, but excludes audio support for the bigger models, but the smaller models actually support audio. And it can generate incorrect code efficiently as a local offline coding assistant.

Starting point is 00:08:57 here are the four different flavors of Gemma 4. And oh, FYI, as I take a sip of my coffee, yeah, this is unedited, unscripted. I hope that this is going to be interesting for you. But if not, and if you listen to the podcast all the time, make sure you sign up for our newsletter because I put a poll in our newsletter on Monday. I said, hey, what do you guys want to see hands on on Wednesday? And you all voted Gemma 4. So if you want to see other types of demos on Wednesday, make sure you read

Starting point is 00:09:27 our newsletter. I'll usually put out a poll maybe Monday or the Friday before, depending on how busy things are. So I'm doing this for you. This is what you want it, FYI. But I mean, I'm doing it for myself because I'd be doing the same thing anyways. But right now, there are four different variants of Gemma 4, easy, right? But you have the E2B and the E4B. Those are essentially phone models. This is as edge as edge gets because it can actually even run on a raspberry pie, right, and your basic phones. And this is big, right? Especially if you've always, I don't know,

Starting point is 00:10:02 wanting to build a certain app for something, right? And you're confused how or, right, using the Gemma E2B and E4B models can get you there pretty quickly. They are extremely capable. All right. Then you have the two bigger boy models that you're going to need, well, consumer hardware. That's the reality here, right?

Starting point is 00:10:22 Because to get this level of performance previously, before, you know, rewind more than a week ago, you couldn't on a $2,000 laptop, you couldn't run anything, right? That was a top 10 open source model. Now you can. And I think a good way to look at this is the MacBook Pro test, right? So generally, Apple, right, they usually have about three to four different versions of their MacBook Pro.

Starting point is 00:10:50 So obviously, these souped up ones are, you know, a little. expensive. But I say if you take the middle, right, variety or the middle flavor of a MacBook Pro off the shelf, right? Walk into Best Buy, Apple Store, whatever. Look at the MacBook pros and say, give me the one in the middle. Now, that one in the middle can technically run the 26B version of Gemma 4. And that's because it uses the mixture of experts framework, and it only activates four billion parameters. And it's really fast, right? So, That model, the 26B, is actually faster, less capable than the 31B. But by default, you can run that.

Starting point is 00:11:32 You know, you can technically run the quantized version on a 16 gigabyte MacBook, the base baseline. But if you just go for the middle flavor of a MacBook Pro, which I'm trying to see here, you know, what this cost is. I have it open on my other tab here. Let's see. Okay. No trade in. No thanks.

Starting point is 00:11:50 I don't want all this extra stuff. Let's see how much this is. All right. So, 2200, right, which any MacBook Pro I've bought over the last 10 years has been that price or way more, right? So Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative

Starting point is 00:12:23 AI studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision. just describe what you want and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouches,

Starting point is 00:12:59 and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. I don't think people understand. The middle, you know, middle MacBook Pro that you buy off the shelf can now run a model that's about the same capabilities as the best models in the world 14 months ago.

Starting point is 00:13:36 All right. And then you have the last flavor. Okay. So E2B, E4B for the phones and edge devices. Then you have the 26B. And then you have the 31B dense model. All right. So when you're running this,

Starting point is 00:13:50 you're running the entire thing. That's why it's dense. It's not the 26B mixture of experts. All right. And that delivers the highest quality reasoning and coding output. And you will need a more powerful computer. But, you know, luckily for me,

Starting point is 00:14:03 You know, I got a Mac studio, a fairly capable one, and it runs great on my machine. But to run this one, you will either need the most souped out, you know, MacBook Pro or Windows equivalent of that, or you will need a, you know, fairly capable Mac Studio or in Nvidia, GGX, something like that. But you quantized, you could squeeze this one on something with about 32 gigabytes of RAM. it will run better at about 48. So, you know, as an example, my Mac Studio has 64 gigabytes of rent.

Starting point is 00:14:37 All right. So now you understand the technical side. Like I said, if you have a newer middle of the line, MacBook Pro, just an easy way to benchmark it. You're going to be able to run the quantized version of the 26B. You're going to have to have a little bit more of a powerful machine to run the 31B dense model. But it's capable and I'm going to show you here live.

Starting point is 00:14:54 So here's why this is important. If you look at the biggest and best open source models in the, the world, right? Like Kimmy K-25 thinking, uh, well, Gemma 431B is now in the exact same category, but at a fraction of the cost and a fraction of the parameter count to run it. And community testers have confirmed strong results in coding, reasoning and image understanding. And the 31B scored a 1452 on arena, right? So the arena AI and formerly the LM arena, this is blind taste testing. You put a prompt in.

Starting point is 00:15:29 It kicks out, you know, two. different outputs from different models. You score which one's better. That's how you get an ELO score, right? And 15 months ago, the best ELO in the world was not 1450, right? And that's what's crazy. So now this is scoring better, at least, on an ELO score and all the scientific benchmarks as the frontier AI models from 15 months ago.

Starting point is 00:15:54 So here's, I have a little chart here on my screen for our live stream audience, podcast audience. You know, this one's not going to be super visual, but you can always go to our website at your everyday AI.com. Click episodes and you can go watch today's show if you want to see the video version of this, but it should be fairly straightforward. I'm not going to be doing anything too visual. But we do have this from Google's announcement blog posts that shows the model performance

Starting point is 00:16:16 versus size. And you'll see, this is literally the new Gemma 4B is uncharted territory because this is charting Elo's score on one access. and then the total model size on the other. So previously to get anything like a 1450 on an open source model, right, and we don't usually know the size of proprietary models, right? So you're, you know, your Gemini 3.1 Pro, your, you know, Claude Opus 46, your GPD54, you know, et cetera.

Starting point is 00:16:53 But presumably there are, you know, multiple trillions, you know, maybe one and a half to two and a half trillion parameters, right? So think of this is like a hard drive size, right, if you want to simplify it. So to get this same level of performance, a 1450-ish score from an open source model, you're looking at something that's about 300, 400 billion parameters in size. So again, this is about 10% of that size, some of them, right, like Kimmy K-K25, thinking, which is a lot of people's most favorite open source model. This is like a 20th of the size, right? With the same, roughly the same level of performance, at least when it comes to

Starting point is 00:17:37 ELO in most scientific benchmarks. So now let's just quickly talk about, well, why would you even want to run anything low? Like, okay, Jordan, what's the point? I don't care. $20 a month isn't a big deal. Sure. Right. But think about doing this at scale. Think about agentic AI, right? Things like open claw, right? And now that Anthropic has shifted away, a lot of people are having to now use some of these open source models, but using them via open router so it's not completely free because you can't run. You literally can't run models like Quinn 3, 5, GLM5, Kimmy K2 thinking on anything less than a, you know, $8,000 computer.

Starting point is 00:18:13 Definitely on, you know, not the, you know, your average, you know, MacBook Pro, right? So that's the difference. This can't really run on true consumer prior open source models that might be able to run you know, your agentic tools like OpenClaw or other agents, you couldn't run it on consumer models. You have, you had to literally have like an $8,000, $10,000 computer or more, right? And that's where the local aspect comes in handy. So being able to run things locally, keeping costs down, no subscription fees, no API keys, no useage limits after you download something. But also, talk about privacy. You never have to send anything to a, to, to,

Starting point is 00:18:56 a cloud, right? Because yes, keep this in mind, as long as you are on a paid team plan with any of the big four, you know, and as long as you turn off model training or turn on the basic privacy settings, you're not sending technically any of that private information to these companies or they can't really do anything with it, right? However, I do understand with highly sensitive documents how you might not want to send that even if you've turned off model training, right? So any sensitive industries like healthcare and legal can gain very capable AI without any cloud exposure. And then like I talked about, right, the combination of, well, it's free. It can run 24-7. It's sensitive. It runs it all on your machine. You can literally turn the internet

Starting point is 00:19:40 off and use this. But then also with the new licensing, you can have full commercial use. So you can literally use this for anything. And there's three ways to run Gemma 4 today. And then I'm going to show you how to do this live. Thanks for sticking with me. I wanted to, you know, first kind of tell you how important this is and kind of set the context here. But there's kind of three different ways, all for free, that you can run Gemma form. One would be a tool like Olamma, right? That's the one I'm going to show you. This is essentially, it gives local models a graphical user interface like chat GPT.

Starting point is 00:20:15 Very simple. Right. So you can then run a terminal command and download the models in minutes, or you can even run that command in the Olamma interface. Also, LM Studio is a great one. Same thing. Offers you a visual chat interface similar to chat chvety for non-developers. I guess another way.

Starting point is 00:20:33 So, hey, we'll just do four freeways to run it. You know, if any agentic system that you're running locally, you can point that, point that system to Gemma 4 on hugging face or if it does, a lot of the, you know, local agents that run in your computer, they can run via, you know, O Lama as well. So you can run in the O Lama command. hugging face command, you know, point your agents to the download because that's the thing. You essentially download this thing and you run it. Also for the other versions, right? So the two that we're going to be looking at or the one are going to be the bigger variants, right?

Starting point is 00:21:09 But to run the local ones, people don't know this. Google actually has a great app called Google AI Edge Gallery. You can download that for iOS or for Android. But this is huge. If you haven't known this already, you probably should do this. because what that allows you to do is the equivalent of running it offline on your computer. This is an app where then you can download the smaller mobile edge versions. And then, hey, if you're ever in trouble or if you're ever somewhere where you just don't have service,

Starting point is 00:21:38 you at least have a highly capable large language model on your phone that you can use at any time. All right. So let's get going live-ish. First, I'm not going to download this line because it might take a while, right? and trying to download a larger file like this while also streaming live doesn't always work. So here's what I did. I'm going to use O-L-L-L-L-A in this case. So here's what you're going to do.

Starting point is 00:22:04 You're going to go to O-L-L-A-O-L-A dot com. So that's O-L-L-A-M-A.com. If you haven't already, you're going to download the program. Okay? This is a simple desktop client, like I said. It just allows you to use any open source, open weight model on your computer, but it gives it a graphical interface. So in the same way that you would chat with chatGBT.com, Gemini.com, claw.

Starting point is 00:22:31 com, clod. A.i. This allows you to work with open models in that interface because by default, you'd be interacting with them via command line tools or the terminal, which is not always ideal for non-technical users. So go to olama.com, download that, install Olamma on your local machine. All right. That's step one.

Starting point is 00:22:48 Step two, you're going to download the actual model. So you're going to search. You can search models on O-Lama. Just type in Gemma 4, and then you're going to choose the variant that your local machine can run. So for most people, that's going to be the 26B version. For me, I'm going to be showing you the 31B version. All right. So all you have to do is once you bring that model up on the O-Lama website, there's going to be a, it says C-L-I, right?

Starting point is 00:23:13 But there's a little command. You're just going to copy that, right? So this one says, O-Lama, run, Gemma, for 31 B, right? All you're going to do is copy that. Then you're going to open O Lama, right? And again, very simple. All you're going to do is then paste in that command. All right.

Starting point is 00:23:30 And it's going to download the model. So for me, uh, you know, this model was about nine gigabytes. It took, I don't know, five-ish minutes, uh, to download. And then that's it. And then you're ready to run with it. So, uh, let's go live. So here's what we are going to do. Live stream audience, do me a favor.

Starting point is 00:23:47 Let me know if you can see. my screen. All right. So I'm going to be jumping around a little bit here because I'm going to be having some copy and paste prompts. So about 15 months ago, I did a show comparing the latest version of Sonnet, which I believe was 37 to GPT4-0. So again, going back to how I started this show, this was about 14, 15 months ago. These were the best general use case models in the world. And I had a series of prompts. I kind of had like a very, you know, unofficial fun rubric that I would do comparing models. And I'm going to go ahead around the exact same prompts. So we're going hands on here. So I'm going to first put in a message

Starting point is 00:24:36 to Gemma. And this is exactly what I did previously just to kind of level the playing field. So all I'm saying is for this chat, please respond with proper formatting and structured bullet points. Do not waste words, answer in the shortest way possible, while still being detailed enough to fill in the user answer requests. Right. Right. So this is what I did for all the other ones. This is what I'm doing it for now. All right. So here is our rubric. So test one. This is just a trick question. It's logic. All right. And when I did this, both Claude and Claude 37 sonnet in GPT4O, got it wrong. We'll see if Gemma 4 gets it correct. Correct.

Starting point is 00:25:17 All right. Hey, love, love, love when we get little bugs. All right. I'm going to have to run that again. It essentially went through the thinking, right? That's the thing. This model thinks and it reasons as well. And for those watching it live, you can just see that it did this.

Starting point is 00:25:35 So we'll see if this got it correctly, right? The correct answer should, well, I should probably read it. I said, I just woke up today with six apples and three bananas. Yeah, live stream audience or podcast audience. Try to do this live. See if you can get it. I just woke up today with six apples and three bananas. Yesterday, I ate a banana and two apples.

Starting point is 00:25:55 This morning, I will eat one apple and no bananas. However, I don't really like apples and one banana may turn brown tomorrow, assuming nothing else changes. How many apples and bananas will I have tonight? So a little trick question. GPT40 and Claude 37 sonnet got this wrong. All right. Let's see.

Starting point is 00:26:13 All right. So it looks like. also they both got it wrong. So the correct answer is five apples and three bananas. All right. So Gemma four, close, got five apples and two bananas, which technically not that we need to gauge, you know, the level of correct versus other models, right? Claude saw it, said three apples, two bananas. GPD-40 said three apples, two bananas.

Starting point is 00:26:41 So they all got them wrong. Gemma got it a little closer. But hey, did you get this right? All right, our next one, the old man and dog crossing a river, right? So this also shows that the model is thinking, right? So if you're listening on the podcast, you're probably not seeing this, but it's also showing its thinking trace. All right. So the next one, I'm saying a man and his dog are standing on one side of the river.

Starting point is 00:27:03 There's a boat with enough room for one human and one animal. How can a man get across with his dog in the fewest number of trips? Oh, what's so funny is I did all of these, all of these. beforehand just because I wanted to make sure that they would work. The first time, all right, it got this right. And now the second time, it just got this wrong, which is funny. But you can always go back and look and look at how it thought. So same thing.

Starting point is 00:27:30 Claude 37 sought it, got it wrong. GBT40 got it wrong. They both said three trips. The first, first time I ran this, Gemma 4 got it right. This one, I'm doing it live here. It got it wrong. And then just for fun, I re-ran it again and it still got it wrong. said two trips, right? Interesting. That's the thing with large language models, they're generative.

Starting point is 00:27:49 That's why it makes these live demos always, always fun, right? Because doing it before offline, I'm like, okay, cool. Looks like Gemma four is going to perform much better. But it's, again, getting it wrong, as did the best models in the world 15 years ago. But it's getting it a little less wrong, at least for the first two times. All right. Let's try the next one. All right. So our next prompt here. We're saying it takes three hours to dry 10 t-shirts in the sun. How long will it take to dry 30 t-shirts in the sun? The correct answer is three hours.

Starting point is 00:28:25 All right. And for reference, a year-ish ago, Claude and GBT got that correct. All right, it is three hours. The time doesn't change, right? Assuming, and it did say drying principle, the time required to dry laundry is determined by external factors, sun and intensity, humidity, not by the total quantity of items provided adequate space exists. So Gemini 4, sorry, Gemma 4 not only got it correct, but it did provide some nice

Starting point is 00:28:53 rationale as it thought through the problem. All right. The next one. And this already answered. That was so quick, right? Again, this is running all locally. And that was probably faster than I would have even gotten from proprietary models online. Right.

Starting point is 00:29:08 So I said, if you have a single match, high stream onus, you see how fast that was? My gosh. So I said if you have a single match and you walk into a room with an oil, lamp, a candle, and a fireplace, which do you light first? Again, these are just fun, trick questions. The correct answer is the match. All right. So it got that right.

Starting point is 00:29:27 Claude and GBT4-0 also got that right. All right. Our next one, what color is an airplane's black box? All right. It's taking a second to think. Bright orange. Got that correct. Good.

Starting point is 00:29:37 As well as the others got that correct. All right. Here's one. We'll see if. this is actually correct because last time Claude sawed and GPD40 failed on this one. So I said, please give me seven jokes that end in the word blue. Two should be about animals. Three should be about some other topic in the body of this chat.

Starting point is 00:30:00 That's important, right? Although, in fairness to the Gemma models, that technically has a little bit more to do with the harnessing of O-Lama in this. case, right? So not exactly an apples to apples comparison, just FYI. All right. So I said three should be about two should be about animals. Three should be about some other topic in the body of this chat and you should make up the other two. So first, I'm going to see, did it get the correct number? Yes. It gave me two animals, three about chat topics and two original made up. So so far so good. Next, do they all end in the word blue, blue, blue, blue, blue, blue, blue, blue. Blue, blue.

Starting point is 00:30:42 Yes. All right. So far good. And then I'm going to see as long as they make sense, right? These aren't always funny, but it at least has to be a joke to pass this rubric. All right. So animal joke. Why did the monkey fall into the paint bucket? Because he wasn't used to something so vividly blue. All right. Is that a joke? Sure. Is it funny? Absolutely not. All right. Let's look at the chat topics. See if it got it right, pulled the context incorrectly. Why did the farmer throw away the apples? They were no longer. crisp, just a sad, brown, blue. It's borderlining nonsensical, right?

Starting point is 00:31:19 Let's look at the last one, or the next one. Why could the lodger mat predict the drying time? Because the sunlight was so strangely blue. So these are borderline nonsensical. I could say you could make the argument. They make sense. They're on the edge here. Then let's look at the original made-up jokes.

Starting point is 00:31:35 Hopefully these are a little bit better. Why did the geometry student bring a fishing pole? because he was hoping to catch something entirely blue. All right. So the jokes are trash, but they actually follow the instructions. So when we're looking at instruction following, this technically passed,

Starting point is 00:31:51 even though the jokes were absolute garbage. But like I said, Sonnet previously failed in GPD-4 Oveiled. All right, next one. All right, this one is much trickier. So I wouldn't expect Gemma 4 to get this right. So I said a box is locked with a three-digit numerical code. All we know is that all digits are different. The sum of all digits is nine. And the digit in the middle is the highest. What is the code? All right. So this is a very trick question

Starting point is 00:32:21 because there are multiple valid answers. All right. But both Claude saw it got this wrong. GPT4 got this wrong. So what I'm looking for in a correct answer here. Number one, that it even gives me at least one correct answer, but there's multiple correct answers. Right. Like as an example, 180, 270, 351 would meet all those criteria. So Claude in GBT40 got this wrong when we did the original testing. Claude's math didn't add up. GBT40 did not follow the rules. It had ones that added up to nine, right?

Starting point is 00:32:58 But as an example, it gave me 1-26, but that didn't follow the rules because the middle digits two was not the highest. So let's see. I thought for 22 seconds here. It kind of went through the deduction process. And it did give me a solution here. So it technically is correct, right? Whereas the other models did not even give me one correct code. Right?

Starting point is 00:33:23 So Quad 7, Cloud 37 saw on it, said 172. That does not add up to 9. That adds up to 10. Like I said, GPD 40 gave me 126, which is not follow the instructions because the middle digit was not the highest. So here, technically, Gemma got it right. It didn't get it fully right, but it was the only one that got it right. It said the code is 243. This was technically a triple trick question because I asked for a code, but technically there are multiple codes. So it technically answered, but I would have loved a super correct answer where it said, you asked for one correct

Starting point is 00:33:57 answer. Here's one, but there's actually more correct answers. But I will say that at least now, Gemma got the last two right, where the last did not get any of them right. All right, this one, we're going to go into some gray area here. All right, I don't want to make this too long because it'll probably take another five to ten minutes to go through the rubric. So I'm just going to find some other questions that the others maybe failed or just look into some gray area here talking about some creativity. So this one, I said, generate unique in creative marketing, advertising strategies to grow the

Starting point is 00:34:30 everyday AI podcast. Do not suggest general. run-of-the-mill ideas, only pitch clever advertising and marketing tactics to specifically grow the everyday AI podcast. All right. So for reference, a year ago, Claude said run AI teasers, virtual co-host challenge, listener Q&A, augmented reality experience, GPD40 said monthly puzzles, art contests, custom recommendations, guest AI co-hosts.

Starting point is 00:34:59 All right, so let's see what Gemma 4 said. So it said partnership in cross-promotion strategies, which is good because that's, you know, basics of growing a podcast, right, which the others didn't come up with, even though it's not super creative. All right. So it says AI tool integration ads. It said partner with niche specific non-major AI tools. It's a good idea.

Starting point is 00:35:19 Industry vertical sponsorships. Then it said content hijacking viral strategies, doing an AI Mythbusters challenge, interactive prompt battles, then community engagement tactics. So the AI Challenge hotline. I like that. It says dedicate a specific call-in segment where listeners call it with a real world mundane problem.

Starting point is 00:35:37 Should we do that? Should we do that? If you think we should do that, also shout out, because someone from Microsoft did suggest this to me like two years ago. So shout out, I do remember Nisiani.

Starting point is 00:35:50 You said, I should do that. And I'm like, yeah, we should. All right. So if you think we should do that, just say hotline, right? Do a comment in the live stream or leave a comment on the Spotify. Just say Hotline if you think that'd be fun.

Starting point is 00:36:02 Maybe it will. All right. Then it also said micro membership prompt fault. All right. So this is good. I would say these are much more impressive. Yes, this one requires judgment on my part. It is great area.

Starting point is 00:36:16 But looking at what Claude 37 Sonet and GPT40 gave me, Gemma 4, much, much better. All right, let me do one other that for sure some failed on. All right. So uploading, uploading photos, might do that, although I don't have the original photo that I use. Let me see. All right.

Starting point is 00:36:39 Let's just do one other one here. Okay. We're going to do a uploading a transcript. I like that one. So let's go ahead and, all right, let me find this file here. All right. So I'm going to go ahead and put this prompt. in it's a little bit longer.

Starting point is 00:36:59 And then I'm going to be uploading two different files here. So I want to make sure that I get these, get these correct. All right. There we go. I should go in my downloads folder. That would help. All right. So here's what we're going to try.

Starting point is 00:37:18 And this will probably be the last one. All right. So I said for this chat, you will turn a podcast transcript of me, Jordan, the host of everyday AI, talking about AI news. turn it into a choppy and engaging newsletter copy. I've attached examples of previous newsletters and how they should be written as well as the most recent podcast transcript. So this is my podcast transcript from yesterday,

Starting point is 00:37:42 where we did a start here series about vibe coding. And then I said, please write a newsletter for the attached transcript, mimicking the style as closely as possible to the example given. So we'll see here, we're getting a little dot, dot, dot. So if I'm being honest, I don't know the last time that I, uploaded two different file format.

Starting point is 00:38:02 So I uploaded a PDF and an RTF file here inside Olamma. So again, this one is not the fairest comparison because again, technically here, we're also relying on the technically the harness of O Lama and not just the model of Gemma 4. Whereas before, you know, when we're testing this against GPD 4O and Claude Free 7, we were using it. Okay. So I do know, okay, it is working, right? I'm like, okay, this should be able to. Right. Old Nama is amazing. It should be able to handle, you know, multiple kind of file types.

Starting point is 00:38:34 So we are. This one is taking the longest so far. This one is the first time that we're probably going to have it to be able to think or reason for more than a minute or two. And again, y'all, think about this. Just the fact that you can have a local model that now reasons without paying a cent is crazy. All right. So it also gave me a checklist of adherence, which is great, because I didn't even ask it to do that. But that is something that if I was rewriting this prompt that I've been using this for like two years, I would have rewrote this. So it went through. It created a checklist based on what it found from the examples that I that I uploaded. So as an example, right, I gave yesterday's transcript and then I gave a 30 page document of older newsletters. So it went through. It examined

Starting point is 00:39:20 those. It actually only took 27 seconds. It kind of picked out the tone style, the format, the context source, all these things, hook intro quality. Let's see. All right. It actually did a pretty good job because I remember this was my intro of the podcast. All right, so I'll read it. Let's be real. You can tell it.

Starting point is 00:39:39 And if you read our newsletter, let me know if this sounds like it might be in our newsletter. All right, let's be real. You can tell an AI, you're wildest dream home and poof, a building appears in front of your eyes and minutes. It's exactly what you ask for. You move in. It's awesome. Then you want to hang a towel rack.

Starting point is 00:39:52 You run into a wall and you realize the entire thing is held together with duct tapes, hopes, and dreams. There wasn't a permit and the foundation is shaky. You're in trouble. All right. This actually did almost too closely to my actual intro from yesterday. But as I'm looking forward this, as I'm looking at this, it actually did a pretty decent job of writing something in my tone, kind of this short, choppy style like I told it to, you know, an emoji in each headline, which is what we would normally do. It has an actionable try this.

Starting point is 00:40:26 section, which is something that we also do in the newsletter. So although this is not, you know, the best ever, right? It actually did a pretty good job. From what I recall, it did a little bit better job at instruction following than Claude 3.7 sonnet. I do think Claude 3.7 sonnet did a little bit with the tone of voice, but it did a better job tone of voice or matching the tone of voice than it did than GBT4 out did. So overall, when I look at the, you know, six or seven kind of different unofficial rubric test that we did here with a free local model comparing it to the frontier general use case models from 15 years ago or sorry 15 months ago the best in the world it actually did better because even though it failed right and

Starting point is 00:41:14 I wish it would have gotten it right like the two that it failed previously it actually got those right the first time I ran it which didn't happen with a 37 sonnet or gvd4-0 but still had to ahead in this very unofficial rubic. It did markedly better than the best models in the world from a year in three months ago. All right. So as we wrap this one up, here's what I want to leave you with. Open source AI is getting smaller, faster, and harder to ignore. All right, Google built Gemma 4 specifically for agentic workflows with native function calling. All right. So even though I didn't give an example of, you know, running this agentically, you now, I cannot. I cannot tell you how important that is. If you have a middle of the road new MacBook Pro as an example,

Starting point is 00:42:03 you can now have an agent that works for you 24-7 that costs $0.0.0. It's 100% private. Also, that's worth noting, this is based off of the Gemini 3 model family. So you're not getting quite the Gemini 3 level, but again, you are getting a top three open source model in the world. and the only one that you can run on consumer hardware. So now users can route routine AI tasks locally and cut significant costs on AI bills. And the gap between the free local models in paid cloud services keeps shrinking fast and you can no longer ignore it.

Starting point is 00:42:38 All right. If this was helpful, tell us someone about it. You know, if you're listening live here on LinkedIn, take a second to repost this. I'd really appreciate that. If you are listening on the podcast, do me a favor. Take 30 seconds. Make sure that you're following or subscribe to the show.

Starting point is 00:42:52 And if you could, if any episode of Everyday AI has been helpful, right? Because we spend literally countless hours helping you all understand how this works. So if this has been helpful, please leave us a rating on those platforms as well. So thank you for tuning in. Make sure to go to Your EverydayAI.com. Sign up for the free daily newsletter. We're going to be recapping today's show and a whole lot more. So thank you for tuning in.

Starting point is 00:43:14 Hope to see back tomorrow and Every Day for more Everyday AI. Thanks, y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One, Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface.

Starting point is 00:43:41 You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - Ep 751: Hands on with Google’s Gemma 4: How to Use The Open Source Model Locally and Why It Matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.