Limitless Podcast - Google Just Made Their AI Free, Private, and Yours (Gemma 4)

Starting point is 00:00:00 How much money are you paying to use your AI model? Maybe it's $20 a month. Maybe you're on the pro plan for $200 a month. Maybe you're running an open claw instance and you're paying thousands of dollars a month to generate tokens from frontier models. Google has just released a solution to your problem, something that can be solved for as little as an $80 one-time fee just to run a Raspberry Pi. Because that's what this new model runs on.

Starting point is 00:00:21 Chemifor is a new model from Google that is a hyper quantized, very small model meant to run locally on devices like your phone. or your laptop or even your new MacBook Neo. It's very lightweight and it's built for working offline entirely private. And I think the thing that's most noteworthy is how powerful it is. This model that is small enough to fit on your phone and run entirely for free is just as good, if not better than some of the frontier models last year and is even close to performing as well as them this year.

Starting point is 00:00:52 Now, to showcase this, we have some really cool examples that EJS has prepared here. So let's get into what this new Gemma 4 model can do. Yeah, I'm super excited about this model. I mean, it's not just one either. There's four of them. And like you said, it ranges from like $4 billion to I think it's like 50 billion parameter models. Can fit on your phone, can fit on any device.

Starting point is 00:01:12 And like you said, eight months ago, this would have been considered frontier intelligence. But I want to get into like what these things can actually do because it's one thing talking about benchmarks. It's another thing talking about what it can do on your phone, on your laptop, what value can bring to you. This first example is someone leveraged. leveraging the visual intelligence of these Gemma models. Now, typically, if you're an AI model, you're really good at ingesting words and characters

Starting point is 00:01:36 and understanding the word described to you like a book would or like a blog post word. Visual intelligence is a very different frontier that has often been hard to kind of surmount by these new AI models. Gemma does an amazing job. What you're looking at on your screen right now is its ability to identify all the different objects in what is a very crowded room. He raises up a banana and it identifies that. It's also spotting the books that are on his shelf.

Starting point is 00:02:01 In the back, it's spotting the shelf in itself, the lamp, the fact that he's in a room, the curtains around that. And that's really important when it comes to creating apps that can log your visual experience or can track your calories for the food that you're consuming. And you can build a completely different suite of apps based on intelligence like this. This is the first time that we're seeing it appear in an open source open weight model. And Google's been the first to launch that. Yeah. And if it wasn't abundantly clear, this is totally free. You can just go on the website,

Starting point is 00:02:29 download this and run it yourself. And I think looking at this vision example, one of the cool things that I'm thinking of is a lot of people have cameras outside their house, outside their apartments. And this has visual intelligence now to not only see things, but alert you of what it's seeing. One of the cool examples that I saw that we don't have teed up here is just someone who had a little nest cameras right outside the front door. And it would send a notification of what was happening. It's like there is a dog walking in front of your house. There is a man walking. up with two packages in his hands. It looks like the packages from Amazon. And it has that visual intelligence that would normally cost quite a bit of money in those tokens using something like

Starting point is 00:03:03 Claude Opus or ChatGPT, but instead it does it all for free on this tiny little model, which is super cool. We have another example that was mentioned in the intro about OpenClaw here. And OpenClaw is something that a lot of people spend a lot of money on. If people are real hardcore users, they're spending hundreds of dollars a day, up to hundreds of dollars a day, some even thousands of dollars a day, in addition to buying some pretty expensive hardware to run it on. A lot of people bought Mac minis, you can't get a Mac mini if you wanted to because they're so back-ordered Mac studios. People were paying hundreds, if not thousands of dollars to run this software on. And the reality is is that whatever device you're watching this on, whatever device you're listening to right now,

Starting point is 00:03:42 you can run this model on. You don't need something fancy. You don't need a high-end computer to run it. You can just do this on something local, lightweight. And like I mentioned, as lightweight as an $80 dollar Raspberry Pi because you can use the lightest weight model possible. And although the results aren't the best in the world, they are much better than previously expected from these open source models. Yeah, I love that you can finally run OpenClaw on a device that doesn't cost $1,000. These Mac minis actually on the secondary market have gone up sky high. Like the retail price is $800 because you can't get it from Apple anymore. I've seen it as high as like 2K and people are still buying these things. Now, the reason why they've been buying these things is because

Starting point is 00:04:19 they can't fit frontier open source models onto their own mobile phone or their own laptop. And now Gemma 4 has made it super easy to do. So that's amazing. I'm still quite confused as to what the open core users are burning thousands of dollars for on. But that's probably a topic for another conversation. The other thing that I like about this is Gemma 4 can run completely offline. Now, this is a common property and characteristic that you can have for every single open source model, but the fact is you have a model here that is near frontier intelligence to, say, Claude Opus 4.6, and GPT-5.4 will get to those direct comparisons a little later on in this episode, but now you can run it offline. And the great part about this is often you're in areas where you just don't have

Starting point is 00:05:02 internet connection, or it takes a while to inference, now you have it on your phone, you can have it completely offline, it gets access to the world's entire database of knowledge. It might not be real-time, fair enough, but you still get access to call knowledge when you're in a bit of a desperate situation, when you just don't want to use the internet, which is that was really cool. This part is maybe my favorite, where it feels like you truly have access to intelligence at your fingertips, no matter where you are in the world. You can be stranded on an island. You could have no connection. You can be anywhere at any time. And it is completely and totally locally and free, and it fits on your phone. And it feels like having Google on your phone. I remember growing up,

Starting point is 00:05:38 it's like, you're not going to be able to Google everything. You have to learn these things. And the reality is that you have a super genius now that it gets paid. packaged up into something as small as your phone. And that is super cool. Now, naturally where your mind goes with that property is, huh, if I'm in a desperate situation, can AI save my life? So Sky Levels I.O. decided to run Gemma 4 locally on his iPhone, and he simulated his scenario of being abandoned in an apocalypse on an island with no help, and he needs to make a fire to keep himself warm. And so he queries Gemma 4 and he asks how to make fire. And the response I cannot provide the instructions

Starting point is 00:06:17 to how to make the fire. So these models are still kind of censored in some kind of way. It's not completely unfiltered. You can't ask it to help you make a biological weapon or do something illegal, which is, I don't think is a problem, but a lot of people who want unsinsic versions

Starting point is 00:06:34 of these truly open weight models, this isn't exactly that, but still cool nevertheless. I'm going to stop you right there because what you just said is not entirely true. Google doesn't want you to do this, but because it is open source, it is open weight. There is a possibility that you can jailbreak in. Someone took it on their own

Starting point is 00:06:50 to jailbreak the model to get it to do whatever you would like. And it was just released a few days ago. And it seems as if it works pretty well. It runs on 18 gigabytes of memory, which works for most laptops. And it's totally cracked, totally uncensored. You can ask it whatever questions you would like and it will give you whatever answers in return. And I think it's a, it's a testament to the open source community, right? It's like if you're going to publish these tools, again, they are tools there for the public to use them however they wish. Someone naturally is going to try their best to jailbreak them. Having something like this is actually truly empowering because if you are stranded on the island, you do need to know how to make a fire. This will give you that answer,

Starting point is 00:07:28 along with some other pretty unhinged answers if you ask, but it will give you the answer. And I think this is an important thing to know is that these models can be jailbroken to be customized when they are open source. And that is in a way, a way in which you get the most power from them, is you just get them at their purest form without the filters, without the censoring. It's just true raw intelligence delivered to your phone. And I found that pretty interesting. But there's also one final example about the powerful smartphone test and what type of smartphones run this the best? Because not all smartphones are created equal. And some do this a lot better than others. Yeah, a very first world AI problem is you getting

Starting point is 00:08:08 annoyed about waiting for the AI to respond to you. I certainly experience this when I'm using Claude on a very busy day. This test that you're seeing in front of you takes five different mobile phone models and tests Gemma 4 across all of them. So you've got the Gemma models running independently, offline, locally on each of these devices, and they're given the same queries. And you can see that they're very different response rates and generations from these phones. It looks like Apple is the winner in this race, which doesn't surprise. me, they have some of the best silicon manufacturing ever. And then I think Google's pixel phone is the slowest. The One Plus, I think, beat Apple by like half a second. Google took the

Starting point is 00:08:49 slowest, which is very surprising because you would think that Google running their own models would work well. But it turns out they don't have the vertical integration. They don't have the chip set that Apple does. So you could see, yeah, Google took 16 minutes, while One Plus took two and a half minutes and the iPhone took three minutes to run through this test. So it's enough. It's fast enough. We're like, if you are really desperate enough to need local inference like this, it is going to be fast enough to answer the questions that you need in a timely matter. Okay, so what did Google actually launch with these models? We know that they are four models, but let's get into some of the numbers and statistics.

Starting point is 00:09:21 So there's four different sizes. And if I bring up this chart over here, you see we have a 31 billion parameter model, which is the largest and the smallest being a 2 billion parameter model. But the performance across benchmarks is truly very impressive. But going back to the general takes here, up to 2,000. 256,000 context window, which isn't as large as the frontier models, which are hitting a million to two million context window. So you can't put as much information into a single prompt contextually for an AI to understand. You've got native function calling. It can work offline that

Starting point is 00:09:51 we mentioned earlier. It's trained on 140 plus different languages. Now, this is something that sounds kind of insignificant, but Google has done something really well here. They released a translation feature, I believe, last week, which can translate a similar number of languages live in real time as you're talking and listening to someone. It directly translates into whatever listening device that you have. So I think this is super cool to see this run on a locally open source model, and it's commercially permissive. So it has an Apache 2.0 license, which means that you can kind of take it and use it for whatever you want, build any apps on it. And I don't think it becomes a problem unless you get over a certain number of users, from not mistaken. Yeah. There's a

Starting point is 00:10:32 The $2 billion and $4 billion, they're the ones that fit on your phone. And you could think of those kind of like, if you think of these models like engine sizes, those are kind of like the bicycles, right? They're pretty lightweight, maybe a motorcycle. And then the larger ones, the $26 billion, the $31 billion, those are like the V12 engines, those are the powerhouses. Those are the two models that run on the $256K token window. The others run on $128K.

Starting point is 00:10:55 So you're not going to be having very long conversations with these models that are on your phone, but they have the ability to run and do so multimodally. One of the most interesting things is even these very small models that fit on your phone, they support not only text but images and audio as well. And having the audio thing is pretty cool because it understands and interprets audio. And that is a pretty powerful thing to have on this tiny little model. I also had the question as to how this model compares to the other top open source models. Now, it's no surprise on the show we've highlighted them a lot.

Starting point is 00:11:23 China has been leading the frontier here. If you look at this chart, Gemma, both the $31 billion and the $4 billion parameter model, does really well when it compares to Elo's scores. So if you look on this chart, for the amount of intelligence per square density, which isn't an official stat, but it's one that I'm created on this show for the last couple of episodes,

Starting point is 00:11:45 Gemma absolutely crushes it. It's on the top left over here, scoring extremely highly, but with a very small parameter account. Now, if you compare it to the other leading open source models like Kimi K2.5 thinking, they're well over the limit of a trillion parameter model. You got Quinn and GLM5 closely behind that.

Starting point is 00:12:01 So although Gemma isn't as smart of them, they're close enough. It looks like they're about 99% of the intelligence when it comes to Elo scores, but at a fraction of the size, which is why you're able to run it on your phone. Yeah, they're kicking ass. I mean, China still has, in terms of pure intelligence, they're still winning the race. But in terms of intelligence density, intelligence per token, it's really high. And I think one of the cool things that they did with Gemma 3 or Gemma 4 versus Gemma 3 is they gave it the Apache license as well, the Apache 2.4.

Starting point is 00:12:31 and no license. And basically what that means is that previously, a lot of these were restricted and they were limited to enterprise adoption. This is total freedom to modify, redistribute, commercialize with no restrictions. You can use it for whatever you want. You can repurpose this in any way you wish. And having it built in with the 140 language, like you mentioned, and the multimodality. This is kind of like a home run. And when you look at this chart, it also shows the same story, Gem of War versus the World, comparing these to all the other Chinese models. This is a heavy hitter. Yeah, yeah. I mean, if we look at some of these benchmarks, software engineering.

Starting point is 00:13:03 It, okay, listen, it's not number one. It's 68%. I believe Opus 4.6's score on this is in the high 80s. So we're not talking about frontier intelligence when it comes to coding models. You're not ditching Claude code for something like this, but when it comes to generalized intelligence,

Starting point is 00:13:18 when you're replacing your Google queries with an LLM and you don't want to spend 20 bucks per month or 100 bucks per month on a Claude subscription or GPD 5.4 subscription, you can just use this and you can run it locally and offline, privately train it on your your own data. It is incredibly cool. I had the same question to compare it to the frontier models, because I wanted to give this a fair shout. There are some potentially exaggerated stats here,

Starting point is 00:13:44 Josh, if I had to be honest here. I'm looking at how it weighs up. Okay, if we look at software engineering, which we just mentioned, we're right. It's almost 12 points lower than Claude Opus 4.6, which is the leading frontier model, not great. But at some of these other benchmarks, AIME, 2026, it is near frontier as well as GBQA Diamond and MMLU Pro. Do you think these things are gamed or do you think this is an accurate take? Yeah, all the benchmarks are games. And I think the only real way you could test this is by running against your own use cases that you want and just evaluating for your own. Because it absolutely is not 90% frontier capable when you converse with it.

Starting point is 00:14:24 Like when you talk to Gemma versus Opus 4.6, there is a very, stark and clear difference between the, I guess the EQ and the IQ, where one feels much more naturally human, much more is very, one is very dry. Perhaps on these benchmarks, Gemma is 90% in the way there. But in actual practice, when you're using the model on day to day life, it is nowhere close. At least that is my perspective just from trying these things out. And I think we have to take these kind of benchmarks with a grain of salt because they're gamed on very specific things. And if you change the parameters of these benchmarks a little bit, or you change the actual structure of the benchmark,

Starting point is 00:14:59 it won't perform well because to some extent, these models are kind of baked in with the expectation that they're going to need to perform well on these benchmarks and therefore are optimized for these specific types of problems versus general real-world use cases that someone like us is going to use every day or someone who's using OpenClaw actually wants the tokens generated from. But if cost is a determining factor in your decision to use one AI model over the other, Gemini might be quite a convincing bet.

Starting point is 00:15:26 It is a fraction of the cost. I know it says it's 8 cents per million tokens. It's actually three cents. I think we maybe had a bit of an issue generating this particular stat. The point is it's incredibly cheap versus the frontier models. 4.6, you're looking at $10, blended input output tokens for a million tokens. So if you're one of those open-claw users that we mentioned earlier that are using this for myriad different use cases and are burning thousands of dollars per day or per week

Starting point is 00:15:53 doing your different use cases, this might be a better bed. it might be a better trade-off for you to use. And I also want to remind everyone, a very important reminder, which is eight months ago, this model or these models from Google, would have been considered frontier. So it's amazing how much advancement that we've made in eight months. Now, I know in those same eight months, we've also got bigger and better models from the frontier intelligence labs. The question does ring in my mind, which is, will open source ever catch up? If I'm being honest, I thought open source would have died a year ago, but it's still being able to keep up.

Starting point is 00:16:25 Now, part of that is because Chinese models, oh, Chinese AI labs have invested so much in keeping up with the US labs. They've also done distillation attacks and all those other kinds of things. But the fact that Google themselves, who haven't done any of those things, have put out an open source model

Starting point is 00:16:38 near as good as the frontier models, gives me a lot of hope that open source is here to stay. Yeah, I don't see a world in which this slows down. And I really love the trailing progress we get because at some point, we're going to reach the tail end of diminishing returns in which open source models

Starting point is 00:16:52 are just capable enough to do everything the average person wants. What we currently have right now is a problem that we're running up against in terms of frontier AI labs, where the new models just cost too much money. Like, Opus has, or Claude, has Capibara the new model ready to go. It's just, I mean, aside from it being too dangerous, it's just far too expensive. The amount of GPUs that are required to generate tokens from these models are so high. And if you want frontier intelligence, the cost really is kind of creeping upwards instead of downwards. And the tail end of that becomes very commoditized. quickly. It's like the very, very highest end, the stuff that's going to be solving new math and

Starting point is 00:17:29 new science costs a tremendous amount of money. But the open source that's maybe six months behind costs $0.00. So the delta is huge. And if you're not interested in solving these like unbelievably complex problems are writing really high quality code, then the amount of problems that these open source models are going to be able to solve a year from now, when they're better than Opus 4.6 is today, that's going to be a really large amount. And it begs the question is who is actually going to want to continue to pay for these frontier models, if they are that expensive, to run their things like OpenClaw, when the reality is that these open source models, maybe Gemma 5, maybe Gemma 6, will be able to tackle almost all of the problems that we have. And I don't know, it's an interesting

Starting point is 00:18:07 thought experiment, but I think it is certain that open source is certainly here to stay, particularly as it relates to China and the United States going forward with this AI race, because this is a pretty nice jab at the Chinese open source models. Yeah. And if you've been a listener of this show, you'll know that my thoughts on the future of AI is very much AI agents, specifically personal agents that work for you and are trained on your own personal data. Now, if you're the average person, you probably don't want to give open AI and anthropic access to your personal data so that they can train their own models. That's a breach of trust in many different extents. Locally run open source models might be the solution for that.

Starting point is 00:18:47 They may not be as smart, but if they're trained on your data, they could be, they could unlock a new level of intelligence which centralized models can't do. And so I'm optimistic that Gemma 4 and a bunch of other open source models that have come from either Chinese Air Labs or the ones that are going to be released in the future will be able to do that. The other trend that I think is pretty clear is locally run models, right? Models that you can run on your device specifically that doesn't necessarily need to be trained on your data but are local to you. The reason why it's so important is it's cheaper. You can run it privately. And also, it gives you the ability to to get quicker prompts or quicker queries.

Starting point is 00:19:24 It runs seamlessly and you don't have to wait, you don't have to rely on servers going down, you don't have to rely on a centralized data center running your compute. You could just have it all locally on your phone. Those things sounding significant until you have an app that runs locally on your device, which I think would be super cool to see.

Starting point is 00:19:40 And I want to see more of these types of things happening. I think, personally, Apple is going to be the frontier company that leads us into this kind of world because they have the biggest distribution. They have like 3 billion active devices. I would love to run the model on my Apple iPhone right now. So I think that's a trend that we're going to see. And I think open models are the only way to unlock it. How cool would that be? We get WDC coming pretty soon. We're going to be covering that on the show. But that's going to be the Super Bowl

Starting point is 00:20:06 for Apple. We're going to see, this is what, two years after they fumbled Apple intelligence, we're going to see what their new plans are this year. So I'm really excited to see their take on this because like you mentioned, it's really powerful. And I think most people listen to this probably never ran a model locally on their machine. But there's something very empowering to it. If it's not just for generating your own intelligence, it's for the privacy aspect of it, where you know none of the information that you're sharing is getting leaked out to any servers,

Starting point is 00:20:31 no one's training on it. It's all yours to own for yourself. And there's something really nice about that. And I think the final thing we're going to talk about on this episode is why on earth Google would give this away? Because it seems like Google's doing really well. They just signed a deal with Anthropic for their TPUs. They have Gemini, which is a powerhouse.

Starting point is 00:20:49 They have the best world models, video models. They have amazing image gen. Why are they giving away this sauce? You have any idea? I don't have a great idea, but I have some thoughts. And I have one that argues in favor of them doing this and one that argues against it. The one that argues in favor of them is the Android example, which is they open source the entire thing.

Starting point is 00:21:08 They allowed anyone and everyone to hack away different apps and launch it on their play store, that might be, and they gained a lot of mind share and market share by doing this. Now, is it as well curated and beautiful as iOS and the Apple App Store? Most people will probably argue not, but the point is they have one of the largest distribution modes because of this. I think this might be an example of them getting Google AI, not just a specific model, but Google AI in the hearts and minds of everyone. And if they could tap into the locally run device audience, that could be a big win for them. Now, the argument against that is,

Starting point is 00:21:45 dude, you could have been using all this compute to train a better Gemini model and keep up with the frontier AI labs, and that's all that matters. Build a better coding model, because right now it kind of sucks, and you can then build all of this other open source stuff later. The number one primary race to win

Starting point is 00:21:59 is best model and currently you're losing. So, I don't know, do you have the same day? Yeah, that's probably right. I imagine it's for a mixture of reasons. One of them is probably to also feed into their cloud flywheel because we're talking about running these models locally, but how many are actually running these models locally? And for the ones that are,

Starting point is 00:22:16 how many are going to quickly run up against ceilings because they want to do more and more and more. And then eventually they'll just migrate over to the more powerful models and use probably the Google Cloud services. And I think there's a lot of reasons to become the infrastructure. The Android example is a great one. Feeding the Cloud Flywheel is another strong one. And I think this is just a really small side quest for Google

Starting point is 00:22:37 in terms of optimizing for that intelligence per bit, whatever we're going to kind of coin that as, but the intelligence density of a model, this has the highest. This is much more than Gemini. And it's a fun practice as they move forward to these new models of intelligence compression per token. And if they could continue to learn and then publish those learnings and then just keep iterating on that front, I think that's a huge win for Google and also everyone. Google's just doing a nice public service announcement, the nice little public goods. And the team there is doing really cool things with it. Logan Kilpatrick is one person, for example, who is running the Google AI studio team. They have been publishing all of these models, making them super easy to use through the Google AI Studio.

Starting point is 00:23:13 So if you just go there, you can play with the two larger models and just kind of see how they compare to something like Gemina 3.1 Pro. And then see if you want to make your own decision to run these things locally or just go start ping in some APIs or just use your $20 a month plan that you have with Anthropic or chat GPT. But I think that is the Gemma 4 episode. We got it all covered. It's an amazing model. It's available for free to run locally on whatever machine that you wish because it is lightweight enough to fit on an iPhone or a Raspberry Pi, and it's cheap enough to run it for free. If you download these things on your devices, you have free inference forever. You can run it 24-7 on whatever tasks you want, and it will cost you only the amount of the electricity to power the machine. And I think that's pretty cool,

Starting point is 00:23:57 and I'm glad Google is really stepping onto the plate with probably the leading USA Frontier Open Source model. And that's pretty cool. Yeah, and I'm curious what you, the listeners and watches of this show think yourself. Like, go out and try this thing. If you don't want to download it, you can get access to it by Google AI Studio. Give it a few queries. Like, does it match up to your experience with Claude 4.6 and GBT 5.4? Would you replace your $20 to $100 a month subscription with something like this?

Starting point is 00:24:23 Let us know in the comments or DM us on our socials. Our X profiles are linked below as well. And yeah, that's pretty much it. I'm going to be trying out these models. It is definitely the best AI frontier open source model, But I have to say, compared to the Chinese models, they're still kind of like leagues ahead right now. I hope we see more adoption of open source models going forwards.

Starting point is 00:24:44 And when that eventually happens, if there's a new open call breakthrough, you will hear it first here on this show. We also did a cool episode covering some of the Chinese open source models that were lately released last week. Definitely go check that episode out as well. But aside from that, if you aren't subscribed to us, please do.

Starting point is 00:25:01 It helps us out a lot. Turn on notifications. even if you're listening to us on Spotify or Apple Podcasts, give us a rating, give us a review. It helps us out massively. Josh, is there any other parting words that you want to give? Don't forget to share it with your friends, and we'll see you guys on the next episode. Yeah, see you guys.

Limitless Podcast - Google Just Made Their AI Free, Private, and Yours (Gemma 4)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.