The AI Daily Brief: Artificial Intelligence News and Analysis - GPT-4o Mini and the Rise of Smaller, Low Cost AI Models

Starting point is 00:00:00 Today on the AI Daily Brief, we're talking about OpenAI's new GPT40 mini model and why it says a lot about where AI development and competition is happening. Before that in the headlines, Google is bringing AI to the Olympics. The AI Daily Brief is a daily podcast and video about the most important news and stories in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the AI Daily News you need in around five minutes. We have the Olympics coming up pretty soon. They are kicking off on July 26th in Paris, and Google will apparently be the official AI sponsor for Team USA. So what does that actually mean? Well, it means a few things. First of all, the broadcast will pull from the immersive views

Starting point is 00:00:48 that were added to Google Maps over the last few years, adding additional information and giving 3D views for the venues where the competition is happening. Announcers and commentators will use Google search AI overviews in broadcast segments, for example, answering trivia questions about the Olympics. In one part of it, comedian Leslie Jones will ask Gemini to help her learn a new sport, and so on and so forth. This is a straight-up advertising partnership. This is a chance for Google to show off what it's got, which could go either very well for the industry or quite poorly. For now, we will just have to wait and see. Over in Chippland, the information is reporting that OpenAI is holding talks with Broadcom about developing new chip.

Starting point is 00:01:28 Chimmarizes Yahoo Finance, OpenAI is exploring the idea of making AI chips on its own to overcome the shortage of expensive GPUs that it relies on to develop models. The report says that OpenAI is hiring former Google employees, who worked on Google's tensor processing unit, and that they've decided to develop an AI server chip. An open AI spokesperson was circumspect, saying OpenAI is having ongoing conversations with industry and government stakeholders about increasing access to the infrastructure needed to ensure AI's benefits are widely accessible, which basically reads to me like them saying, yeah, of course we're having conversations about building our own chips.

Starting point is 00:02:01 Everyone is having conversations about building their own chips. I think for now it's not clear exactly what will come out of those conversations or where any actual partnerships would come down. Next up, there has been a lot of discussion this week of Samsung's AI features on their new Galaxy Z-Fold 6. The Verge, for example, writes, The Galaxy Z-Fold 6 sketch to image tool is ridiculous, fun, and slightly worrying. They write,

Starting point is 00:02:23 Samsung would very much like us and its shareholders to know that its new phones are the A-Iest phones that ever A-Ied. And the Fold 6 that I'm testing comes with a new tool called sketch-to-image. Draw a rough sketch on a photo or an empty note page, and it will use generative AI to turn that into an image. I shrugged it off as just another AI thing when Samsung announced it on stage, but it's really good, so good that it worries me a little. Using the sketch-to-image tool in a note is pretty harmless. You draw something highlighted and choose from a variety of styles like 3D cartoon and illustration, and turn your doodle into something more detailed. The author says that they worked with their two-year-old to draw things like

Starting point is 00:02:57 goofy-looking dump trucks and school buses. But then they say using sketch to image on a photo is where things got weird. The problem they said is something they called the bee problem. I took a photo off a dock just south of downtown Seattle with some flowers in the foreground. Because they're close to the camera and my focus was in the distance, they're slightly blurred. I drew the world's worst sketch of a bee on one of those flowers figuring AI would insert an in-focus image of a bee, giving it away easily as a fake. Wrong. The issue, the author writes, If I didn't know the B's origin story, there's no way I'd think twice about it if I scrolled past that on Instagram. I'd assume the photographer snapped the picture at just the right time and hung around waiting

Starting point is 00:03:30 for a B to fly into the frame. To be fair to Samsung, there is a little watermark that says AI generated content down at the bottom, but it would be very, very easy to miss. The author did note that the results aren't always that good. For example, a giant pirate ship in the harbor is probably not tricking anyone. Ultimately for this author, there are questions but no good answers, and ultimately just a recognition that the world we're moving into is very new and unpredictable in how it will all play out. Analysts also pointed out that Apple's AI strategy and Samsung's AI strategy could not be more different. Yahoo Finance writes, Samsung is anglied to quickly build a large user base for its generative AI services, thus incentivizing developers to build apps for its

Starting point is 00:04:08 Galaxy AI platform. To do that, they're making those AI features available on phones from the last three generations. Overall, they write, Samsung says it expects to put Galaxy AI on some 200 million devices by the end of the year. Apple, meanwhile, will basically only give the very most recent generations of the iPhone access to Apple intelligence. That's because, of course, Apple is trying to use this to boost iPhone sales. They want to use it as a way to get people to upgrade rather than holding on to their quote-unquote good enough iPhones. Whether that works will likely have a big impact on how Apple stock performs at the end of this year. Lastly today, Google has announced a new AI industry forum called the Coalition for Secure AI or COSEI. They write the new industry

Starting point is 00:04:48 Forum will invest in AI security and leverage Google Secure AI Framework. Their first three work streams are software supply chain security for AI systems, preparing defenders for a changing cybersecurity landscape, and AI security governance. Founding members of the organization include Amazon, Anthropic, chain guard, Cisco, co-hear, Gen Lab, IBM, Intel, Microsoft, and VDIA, OpenAI, PayPal, and Wiz. And all of this will be housed under Oasis Open, the International Standards and Open Source Consortium. On the one hand, it feels to me like there are so many of these announcements that it's never exactly clear which ones are actually significant or not, but the fact that all of the big players are here, with the notable exception of meta, suggest that this might be a fairly

Starting point is 00:05:24 significant effort and one that we will keep an eye on. For now, though, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Super Intelligent, the platform for fun, fast AI learning. Super has a ton of new things going on. We recently announced our partnership with Spotify, through which users of that app can now access super intelligent content directly from their mobile apps. We've also just launched the AI learning feed. In addition to seeing the tutorials that we're dropping, there are polls, news items with related lessons, and a chance for people to show off the projects and use cases that are making AI come alive for them. We've also just kicked off the Super Summer Challenge, where each week we'll

Starting point is 00:06:03 share a new challenge that you can use to discover new AI tools and use cases. Go to Bsuper.a.i and use code super fun for 50% off your first two months. That's Bsuper.A.I. Today's episode is brought to you by Venice. The leading AI companies store your entire conversation history and attach it to your identity forever. Every question you ask, every answer you receive, every image you generate, every thought you share with the machine, it's all being spied on. If you trust all the companies, hackers, and NSA board members that will ever have access to your AI conversations, then rejoice, for you are well served. For the rest of us, Venice is an alternative. Venice is a powerful AI app for text, image, and cogeneration that respects you as a sovereign

Starting point is 00:06:43 individual and believes privacy and free speech are not only human rights, but are necessary for civilizational advancement. Private, permissionless, and uncensored. You can try it for free without an account at venice.aI. Welcome back to the AI Daily Brief. Today we are talking about GPT 4O Mini, and it is both more powerful and cheaper than GPT 3.5 Turbo. Now, in addition to discussing GPT40 Mini specifically, we're also going to be talking about the changing nature of competition among models, price decreases among models, and what it all means for the state of AI. But first of all, let's talk about this announcement. The way that Sam Altman, CEO of OpenAI, teed off, was saying towards intelligence too cheap to meter. And cost, as we will see, really is the theme. In their announcement

Starting point is 00:07:30 posts, they write, OpenAI is committed to making intelligence as broadly accessible as possible. We expect GPT40 Mini will significantly expand the range of applications built with AI by making intelligence much more affordable. GBT40 MNISC4MLU and currently outperforms GPT4 on chat preferences in LIMSIS leaderboard. It's priced at 15 cents per million input tokens and 60 cents per million output tokens, an order of magnitude more affordable than previous frontier models,

Starting point is 00:07:57 and more than 60% cheaper than GPT 3.5 turbo. OpenAI also gives a sense of what types of new use cases this low cost opens up, including applications that chain or parallelize multiple model calls, applications that involve a full code base or conversation, history, and applications that interact with customers through what they call fast real-time text responses like customer support chatbots. Currently, the model supports text and vision, with full multimodal support for text, image,

Starting point is 00:08:21 video, and audio coming. It's got a 128k token context window, as well as supporting 16,000 output tokens per request, and basically they try to point out is the best-performing small model. It scores higher on the MMLU than Gemini Flash or Claude Haiku. Same on the MGSM, which measures math reasoning. On that, it scores much higher. They also point out that it scores much higher on coding performance, which seems to be one of the use cases that they're most focused on. Before releasing this broadly, they say they partnered

Starting point is 00:08:48 with companies like Rampen Superhuman, who they say found GPT40 Mini to perform significantly better than GPT3.5 Turbo for tasks, including extracting structured data from receipt files or generating high-quality email responses when provided with thread history. Some of the response was lukewarm. Professor Ethan Malik writes, first impressions with GPT40 Mini is that it is impressive for a small model, but no replacement for a frontier model. When given complex education prompts, it can't follow instructions as well, and misses nuanced GPT40 nails.

Starting point is 00:09:17 At AI for Success writes, point of view, when you were waiting for GPT5, but GPT40 Mini is coming. And it shows a guy punching a laptop. They continue, OpenAI is on a mission to piss everyone off. Everarts Pietro Sherano writes

Starting point is 00:09:30 never seen such a lukewarm reaction to an OpenAI model release. Sean Ralston, who does API dev support at OpenAI, writes, Pietro, this release is about price and performance. GPD 3.5 was no longer competitive against Gemini Flash, Claude Haiku, etc. Many business customers using LLMAPIs for routine functions like company chatbots and standard work need and want low token costs. GPT4O Mini sets a newest standard of affordability with close to frontier model performance. Of course, we all want

Starting point is 00:09:56 smarter models forthcoming, but those tend to require more compute and therefore are more expensive. Pietro responds, I agree on all these fronts, just sharing the overall sentiment. From my standpoint, however, the response has been exactly focused on what it seems like the important thing was, which is cost. Ben's Bites summed this up by saying OpenAI's 4-0 Mini brings big brains on a budget. It's a small model that beats the early version of GPD4, and it costs 30x less than its Big Bro GPT-40. They sum up, you can now say buy to GPT 3.5 Turbo as 4O Mini will take its place in ChatGPT's free tier. If you were using GPT3.5 Turbo API in your applications, you should switch to 4O Mini. reasons to care. People are getting increasingly smarter AIs for free directly via chat

Starting point is 00:10:39 GBT, and app developers can build powerful AI tools without breaking the bank. And really, I do think that cost is the story here. The end of OpenAI's blog post reads, The cost per token of GBT 4O Mini has dropped by 99% since Text DaVinci 0.03, a less capable model introduced in 2022. Vittorio says 99% reduction in cost in two years is crazy. And that cost reduction has big implications. Back to that blog post again, they write, We envision a future where models become seamlessly integrated in every app and on every website. GPT4O Mini is paving the way for developers to build and scale powerful AI applications more efficiently and affordably. The future of AI is becoming more accessible, reliable, and

Starting point is 00:11:18 embedded in our daily digital experiences. Vitorio even jokes in 2026, OpenAI will give you money to use its models. Professor Ethan Malik came back again and said, The big story with GPT4O Mini is its cost, and how quickly the price of insurance. intelligence of a sort has dropped. Now this is the point where I want to return to that now infamous Goldman Sachs report, Gen A.I. Too much spend, too little benefit. This is from June 25th, and I've already done a full takedown of this, but one of my biggest critiques is the argument from Goldman Sachs head of global equity research Jim Covello that he thought it was unlikely that the cost of AI was going to come down. The interviewer asks, even if AI technology is expensive

Starting point is 00:11:56 today, isn't it often the case that technology costs decline dramatically as the technology evolves? Part of Jim's response was to say, the tech world is too complacent in its assumption that AI costs will decline substantially over time. Now, the point that he was making in this section is that he didn't think there was going to be enough competition for state-of-the-art GPUs to actually bring those costs down. What he seems to have missed is that we're officially in a time where the AI that we have right now is performant enough

Starting point is 00:12:23 that there are many applications that don't require the absolute state-of-the-art but instead just require near-state-of-the-art. and those costs are coming down and coming down dramatically. What's more, it's not even clear that costs of state-of-the-art aren't coming down dramatically as well. Claude 3.5 Sonnet, which is Anthropics' most intelligent model, and which many people have switched entirely, including me, basically, from GPD4-2, does also represent a cost reduction. Claude 3.5 Sonet, for example, costs one-fifth of what Claude 3 opus does.

Starting point is 00:12:54 Andre Carpathy wrote, LLM model size competition is intensifying backwards. My bet is we'll see models that think very well and reliably that are very, very small. Two parameters for which most people will consider GPT too smart. The reason current models are so large is because we're still being very wasteful during training. We're asking them to memorize the internet and remarkably they do and can, e.g, recite SHA hashers of common numbers or recall really esoteric facts. Actually, LLMs are really good at memorization, qualitatively a lot better than humans.

Starting point is 00:13:23 sometimes needing just a single update to remember a lot of detail for a long time. But imagine if you were going to be tested closed book on reciting arbitrary passages of the internet given the first few words. This is the standard pre-training objective for models today. The reason doing better is hard is because demonstrations of thinking are quote-unquote entangled with knowledge in the training data. Therefore, the models have to get larger before they can get smaller because we need their automated help to refactor and mold the training data into ideal synthetic formats. It's a staircase of improvement, of one model helping to generate the training data for the next until we're left with a perfect training set. When you train GPT2 on it, it will be a really

Starting point is 00:13:58 strong and smart model by today's standards. Just maybe the MMLU will be a bit lower because it won't remember all of its chemistry perfectly. Maybe it needs to look something up once in a while to make sure. This certainly reflects what I'm seeing. And what's more, I think it's responding to a business demand. There have been numerous posts recently about how enterprises are getting more sophisticated in their needs with AI. They realize that, yes, there are some cases that they need the absolute state of the art for, and they need it immediately when they need it, and are willing to pay for the fastest, best dancers, but there are tons of functions that don't need to be immediate, and that don't need the absolute state of the art. The information recently published a piece called Why Smaller

Starting point is 00:14:35 Could Be Better. They write, while some developers are racing towards super-intelligent AI technology, others are focused on building cheaper, more practical models. Over the past six months, several big tech companies, including Google and Microsoft, have released small language models in a bid to stake their ground in a burgeoning area of AI research. These models are lightweight enough to run on phones instead of on the cloud. They generally have fewer than 3 billion parameters, a tiny fraction of the more than 1 trillion parameters believed to support OpenAI's GPT4.

Starting point is 00:14:59 And this is a whole additional case that we haven't talked about, the fact that everyone is racing for on-device AI. The reality here is that the lower the cost, the more different types of applications can be built with AI in it, and that really seems to have been the motivation. OpenAI president and co-founder Greg Brockman wrote, We built GPT40 Mini due to popular demand from developers. We love developers and aim to provide them the best tools to convert machine intelligence into positive applications across every domain.

Starting point is 00:15:23 It's hard to overstate how big a difference it makes to reduce cost by this much. Layton Spaces, Swicks, for example, writes, If you're interested in LLMs for summarization, my evaluation of GPT40 Mini is out. TLDR, Mini is the same or mildly worse in some cases, but because it's 3.5% the cost of 4O, I can run 10,000, versions of the mini and use Mini 4-0 to judge and still have money left over to donate. Point being that we are just at the beginning of seeing how many more applications AI will come to as costs come down. And I think that's incredibly exciting. For now though, that will do it for today's

Starting point is 00:15:55 AI Daily Brief. Thanks for listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - GPT-4o Mini and the Rise of Smaller, Low Cost AI Models

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.