Everyday AI Podcast – An AI and ChatGPT Podcast - EP 498: Meta drops Llama 4, Microsoft Copilot levels up its AI game, GPT-5 roadmap hits snag and more AI News That Matters

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the all-in-one creative AI studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. I hate saying this each Monday, but my gosh, it's been another crazy week in AI developments.

Starting point is 00:00:54 I mean, think about it. We've had multiple trillion dollar companies release and update their best AI models and features. We finally have an AI image generator release that we've been waiting on for more than a year. that's still a leader in the pack. We got a bunch of Chad GPT updates and news on GPT5 and another model we weren't expecting from OpenAI. And apparently an AI model has passed the touring test. Yeah.

Starting point is 00:01:28 Can't argue. It's been another crazy week in AI development. And I don't blame you if you can't keep up. I do this every single day. And it's hard for me to keep up. That's why on most Mondays, we bring you the AI news that matters. So what's going on, y'all? My name's Jordan Wilson, and I'm the host of Everyday AI.

Starting point is 00:01:49 And this is your daily live stream podcast and free daily newsletter, helping everyday people like you and me, not just learn what's happening in the world of AI, but how we can all take advantage of it to grow our companies and to grow our careers. Is that what you're trying to do? Trying to make sense of all this AI. Are you trying to learn it and then leverage it in your day to day? Well, it starts here. This is where you learn.

Starting point is 00:02:10 what's going on, but how you leverage it, that happens on our website. So go to your everyday AI.com. Sign up for our free daily newsletter there. Every single day, we recap each day's podcast, live stream, as well as keeping you up to date with everything else that you need to not just keep up, but to get ahead in AI. So if you haven't done that already, make sure you go do that. And you can go listen to now almost 500 episodes. Yeah, I think we're on episode 498 or something like that.

Starting point is 00:02:40 I got to cook up something special for number 500 running out of time. So hope you all can join for that. I believe that's on Wednesday. So before we get into the AI news, yeah, because there's a lot. And like I said, we do the AI News That Matter segment almost every single Monday. A couple things. We extended voting for the Inception game. So that was our partnership with Invidia and their Inception program,

Starting point is 00:03:06 highlighting some of the best AI startups in the Nvidia Inception program. So make sure both in the show notes and on our website, you can go vote. There's two different ways to vote. So we shared that voting ends Tuesday, April 8th at 1159 p.m. Central Standard Time. So if you haven't already voted, make sure you go do that. One other kind of housekeeping thing for us. We will be out in Las Vegas for the Google Cloud next 2025 conference. So in partnership with Google, looking forward to that.

Starting point is 00:03:40 There should be a lot of updates coming out of that show. So, hey, if you're going to be at the Google Cloud Next conference, make sure you holler at me, you know, whether it's on LinkedIn or email. I always put that information into the show notes as well. All right. Enough chit-chat. Like I said, so much AI news this week. Huge releases from META with Lama 4.

Starting point is 00:04:06 Microsoft essentially said copy and paste to every single other cool AI feature that's out there that they didn't have yet. We have GPT5 news. We have chat GPT updates. Mid-Jurdy 7 is finally here. So much to get to. Let's dive in. But hey, what's up live stream audience?

Starting point is 00:04:25 Hey, Graham from Ireland. How you doing? Big Bogey joining us on YouTube. Thanks. Dr. Scott on LinkedIn says, congrats on number 500. Dr. Harvey Castro, great to see you. Sandra and Kyle from. from YouTube.

Starting point is 00:04:41 Thank you all for tuning in. All right, let's get to it. Let's get to the AI news that matters. There's a lot, y'all. First, Microsoft essentially said, oh, there's all these cool new features out there. Let's develop them all and release them all at once. All right, so Microsoft had a celebration of their 50th anniversary,

Starting point is 00:05:03 and they released so much. All right, so Microsoft has rolled out a massive update to its AI assistant co-pilot introducing features like memory, personalization, web-based actions, and a lot more. All right. So here's just some of the new updates. I couldn't even include them all because it would take an entire show. But here's, I think, the ones that are going to probably most impact everyday users.

Starting point is 00:05:30 So co-pilot can now remember users' preferences, interests in details to tailor advice and suggestions in their new memory features. So users retain control over what co-pilot remembers or can opt out entirely. Also, there's some new personalization options. So Microsoft plans to offer personalized appearances for co-pilots, including the option to bring back Clippy. If you've missed Clippy over the last, I don't know, 25 years, you know, the iconic assistant from earlier Windows versions is making its AI return.

Starting point is 00:06:08 Also, there's actions. This one's pretty big. So, co-pilot can now perform tasks directly via its web browser. Yeah, Microsoft just silently rolled out agentic AI in browser. Yeah, so you don't have to download a program. It's just working in the browser with their new actions feature. So you can do things such as booking tickets, reserving restaurants, and even making purchases.

Starting point is 00:06:39 So combined with new shopping tools, co-pilot can research products, find discounts, and streamline online transactions. So yeah, big agent play there for Microsoft as well, as well as a huge expansion of their Microsoft co-pilot's vision feature, which was previously available in web tools. And it's now rolling out to Windows and mobile apps, which is a wildly useful feature.

Starting point is 00:07:04 So I use that all the time on the edge browser. It's kind of like Google has something like this in AI Studio, which it's great when it works. Google AI Studio, their stream in real time was being a little finicky for me this weekend. So, you know, I might be using the co-pilot vision a little bit more where you can literally just tap one button. Copilot sees everything that's on that screen. And you can talk to it in real time. So pretty exciting there. Also, deep research.

Starting point is 00:07:33 Yeah, like I said, like Microsoft literally just rolled out. every single feature that they didn't have already. So now co-pilot can analyze extensive documents in online sources for complex projects integrating with Bing for AI powered search responses. So it can generate also, you know that whole, oh my gosh, so much, so much from Copilot here. I should have teased, tease y'all with this in the beginning. You know that like notebook LM thing that is absolutely amazing and how you can generate a podcast on any of your information.

Starting point is 00:08:09 Well, now you can do that inside Copilot as well with audio summaries to explain detailed topics. Also, new updates to its pages. So the new functionality inside pages enables copilot to organize notes and research across multiple documents into a single workspace simplifying project management and collaboration. That's not even all, y'all. I couldn't do like, you know, 30 minutes of Microsoft news, but everything's rolling out at kind of a different time. So many of these features are launching in initial versions already with improvements expected in the coming weeks.

Starting point is 00:08:48 So availability will vary by market and platform. So we will continue in our newsletter to keep you informed when these all come out. So much. Yeah. Yashel says, amazing. Kimberly says, got to try that out. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.

Starting point is 00:09:21 Meet Firefly AI Assistant, now live in the Adobe Firefly app, the All In One Creative AI Studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations.

Starting point is 00:10:04 Every step the assistant takes is visible, so you can refine, redirect or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. Big bogey is loving co-pilot. Joe says perhaps a co-pilot update walkthrough episode. Joe, you know what?

Starting point is 00:10:33 Maybe. All right. You know, at the end for our live stream audience, I'm going to ask you what we should cover. Maybe tomorrow or later. this week because there's so much. And I do want to do a dedicated show on one of these new updates. So I'm going to let you all choose which one that is.

Starting point is 00:10:49 All right. So maybe I'll have you vote at the end. All right. Next piece of AI news. The king is back. The king has returned to quote one of the best 90s movies ever. So after a year plus of weight, Mid Journey has released V7 of its image generator,

Starting point is 00:11:10 bringing new features like voice inputs in its faster draft mode that allows you to work more in natural language versus more, you know, I'll say mid-journey promptees. So now with Mid-Journey V7, voice input is now available, letting users speak prompts directly to the model, which then converts audio descriptions into text and then generates images. Also, draft mode, I think will be pretty popular. as it offers rapid image creation producing lower quality image, though, in just seconds, whereas sometimes mid-jurdy can take a little bit longer. So users can also refine drafts by enhancing or varying them into high-quality outputs. So I think that's what draft mode is really going to be most used for. Yes, it is faster than the normal full mode inside Midjurdy B-7,

Starting point is 00:12:06 but I think it's more for iterating on images and using. using more natural language in draft mode, where the full mode, I think is more if you are really good at prompting mid-journey, right? I'm a big mid-jorney fan. I always have, but, you know, I don't know. I think over the last year or so, it seemed like the interest for AI image generator, at least for our audience, had gone down a little bit. So I don't know.

Starting point is 00:12:32 Maybe I should ramp it back up now, especially with the new GPT40, image gen that has gone absolutely viral over the last couple of weeks. Also with Google Gemini's new Gemini 2.0 flash that does image generation very, very well, inline multimodal. So yeah, maybe, I don't know, y'all, do you guys care? Podcast audience, let me know as well. Should we do more AI image generation? I think now, obviously, the quality is fantastic.

Starting point is 00:13:05 The quality is fantastic. And it's really good. So let's talk a little bit. So now there is a personalization feature that's actually mandatory for V7 users. So before using the models, users must rate 200 image pairs to create a tailored style for generations. So older V6 personalization styles can still be used, but mood boards remain unavailable for now. So there are two modes available in V7. There's turbo mode.

Starting point is 00:13:37 which doubles generation costs for high performance, while draft mode costs half as much and is much faster. So some features still use V6 technology, including upscaling in painting and re-texturing, though those will transition to V7 in upcoming updates. So far, user feedback is mixed, actually, with some praising improved realism in artistic quality, while others criticizing ongoing issues like human,

Starting point is 00:14:07 anatomy errors and text rendering accuracy. But many feel the update is incremental rather than groundbreaking. So I'll say the same. I think mid-jurney, in terms of visuals in aesthetics, has always been number one, right? Even as we got the new update from chat GVT40 ImageGen, even as we got the Imagine 3 model from Google that you can use inside Google Gemini 2.0 Flash, you know, and obviously like dozens of other AI image generators. Mid-Journey has always been the king when it comes to style, when it comes to aesthetics.

Starting point is 00:14:46 However, where it has lacked, oh, I also got to mention ideogram because I think ideogram V3 that just came out is really, really good as well. But where Mid-Journey has always thrived is in visuals, right? It is the most aesthetic, but it struggles in other areas. It still can't use text, right? So if you want text incorporated at all, mid journey is not really your thing. Also prompt adherence, I think in the very little testing I did, actually got a little worse in V7. So, you know, if you do have much more complex prompts, I do think even something like GPD40, image gen is a little better.

Starting point is 00:15:26 So it just depends on ultimately what you want. But, you know, as an example, if you are creating or if your company is, you know, trying to create better multimedia. with videos and things like that, Mid-Journey might be best for that, right? Because I think it's still probably the best starting point. If you are trying to create AI video and you are going text to video or sorry, image to video, I still think Mid-Journey v7 is probably the best for most use cases, but for everything else, especially when it comes to prompt adherence,

Starting point is 00:15:57 when it comes to iterating on an original image. When it comes to text, Mid-Journey is still not it, y'all. Kimberly says over uh underwhelmed underweld uh all right next i don't know why no one talked about this we covered it in the newsletter and i put it out on the twitter machine um this is actually huge we have like mini rag now inside chat gbt um and i'll explain what that is after i tell you what's new so open a i is starting to roll out its internal knowledge access for chat GPT teams users. So right now it is only available for teams users.

Starting point is 00:16:46 And right now the only thing available is Google Drive. So the new feature, and this is in your connectors settings, if you are on a team plan, it, uh, it just started rolling out this past week and it allows chat, GPT to retrieve real time information from internal files anywhere in your Google Drive. And it can summarize content and create tailored outputs like demo, scripts, or summary. So Google Drive is the first platform supported with access gradually rolling out over the next few weeks. Let me just say this. Scary good.

Starting point is 00:17:24 Scary good. And you might be wondering like, oh, Jordan, wouldn't you just use Google Gemini? it connects to Google Drive as well. It does, ish. So if I'm being honest, this is one area that Google Gemini still struggles in. I think, you know, even though Gemini 2.5 Pro might very quickly become my most used model over GPT4O because, y'all, inside, let me just put this out there. Inside Google AI Studio, 2.5. Pro million token context window, the world's most powerful model.

Starting point is 00:18:06 And it's available for free with a million token context window. On the front end of Google Gemini chat, it doesn't have that million token context window. So also inside Google AI Studio, you can't turn off data sharing. So, you know, definitely don't use it with anything, you know, sensitive or proprietary. So. But it struggles. It really struggles. Google struggles for whatever reason accurately pulling information from Google's own Google Drive.

Starting point is 00:18:37 Chat GPT Teams does a way better job. And it is extremely impressive. So if you do have a Teams account, you need to be logged in as the Teams admin. Go into your workspace settings and look for connectors. So it takes, you know, I actually don't know how long it takes. I just let it kind of sit there in the background. So it might take anywhere from, I don't know, five, ten, a couple hours to fully sync everything.

Starting point is 00:19:03 But then essentially, you can click a new button that says internal knowledge and anything in your Google Drive, instant access, extremely impressive. And the reason why is because it's all dynamic, right? So yes, inside Claude, even inside Gemini, there's certain instances that work great when you can upload files individually,

Starting point is 00:19:23 but it's not dynamic, right? And this is why I do think this might be the first consumer, you know, true mini-rag system. So what that means is anytime you're using a large language model, the thing you always have to keep in mind is, well, your data, recency, and just basic prompt engineering 101. So having this feature inside chat TVT teams is huge. So OpenAI does plan to expand support to other tools, such as CRMs, project management systems. and data analytics platforms soon. But right now, it is just Google Drive. So you do have to be on a team's plan,

Starting point is 00:20:04 which is $25 per person per month. I still believe you have to have a minimum of two users to have a team's plan. But if I'm being honest, even if you're a solopreneur or even if you're the only one using it, it's probably just worth it to just pay that extra license just to use this feature alone, especially if you are a power user of chat GPT.

Starting point is 00:20:29 Yeah, you got to worry about security, as big bogey face says. Yeah, definitely don't just throw in docs in there haphazardly. Also, a good point. If you have that connected and if you're using it, you know, you really have to increase your personal responsibility as the expert in the loop, right? I think I'm going to stop saying human in the loop, FYI, because I really think it's about expertise in the loop. but you have to be much more vigilant

Starting point is 00:20:56 to see what chat GPT is using and what it's not. All right. More chat GPT news. This one might be some of the biggest news that snuck under the radar. So Open AI has announced that they're kind of delaying their plans

Starting point is 00:21:12 for the much anticipated GPT5, but also slipped in that, okay, we're actually going to be releasing two new O series models in the meantime. So OpenAI has unveiled updates to its AI roadmap, including a new 04 mini model and also details about the now delayed rollout of GPT5. So Open AI plans to release a new 04 mini model alongside the previously announced full version of the 03 reasoning model within quote unquote a couple of weeks, according to CEO Sam Altman's Twitter post. the 04 mini model is expected to serve as a next generation successor to a reasoning model that we right now have 01 and 03.

Starting point is 00:22:04 So yeah, I'm really interested to see what they're going to do. Are they going to have three versions of their O thinking models available? Because for some for some instances, I love O3 mini high. That's actually been one of my workhorse models recently. But are we still going to have O1, O3? and 04 available because I still use and prefer 01 pro for certain instances, which you do have to be on the $200 a month, chat GPT pro plan. But 01 Pro is the most powerful model I've ever used.

Starting point is 00:22:40 I think even for certain tasks, it's better than Google Chemini 2.5 Pro. But I mean, we'll see how much we actually get to keep. So what's with this GPT5 delay? Well, GPT5 has been described as more of a unified model incorporating all of the other models, you know, so advanced reasoning, voice functionality, canvas, search, deep research, tools, and everything. So at least what we've been told is GPT5 won't be a new model per se, right, like GPT4, GPT4, 5, GPT4O, it's more going to be a system. And Open AI has said that they will offer GPT5 with tiered access, standard intelligence settings for unlimited use, higher intelligence levels for chat GPT plus subscribers, and even higher settings for chat GPT pro users. So OpenAI also, yeah, in addition to that. But also, let me mention why it's delayed.

Starting point is 00:23:40 Well, at least according to Sam Altman, he noted that the company has found it harder than expected to. integrate all the features smoothly while maintaining performance, but improvements in GPT5 designs have exceeded initial expectations. So kind of telling both sides of the story, like, oh, it's actually going way better than we initially thought, but also at the hard time, or also at the same time, we're finding it difficult, more difficult than we thought to fully incorporate everything. So previously, you know, essentially Open AI said, yeah, we're not going to release any new models before GPT5 comes out.

Starting point is 00:24:17 But change of plans here. So we're going to be getting in 03 full. And we're going to be getting in 04 mini. Personally, I'm not looking forward to this new GPT5 system. And I don't think power users should be looking forward to it either. That's just me. I don't know. It's not out yet.

Starting point is 00:24:37 I would much prefer to not have a system decide what model to use. If I'm being honest, I know better. Right? If you are a power user that has, you know, use every single model, thousands of prompts, you know which model to use for which scenario, right? I know it like the back of my hand. I don't want necessarily a system deciding which model to send it to. I often use three or four models in the same project, but going back and forth in model

Starting point is 00:25:10 switching. So, I mean, hopefully GPD5 is smart enough to do an adequate job. I don't have a ton of hope, if I'm being honest. All right, more open AI news, just some bullet points here. So Sam Altman also tweeted that OpenAI is officially developing an open weights model. So they might actually go back to being open in the Open AI, allowing businesses to customize AI without retraining, but stopping short of full open source similar to Lama or Deep Seek. And then other chat GPT updates. So the very viral and extremely impressive GPT40 image was updated.

Starting point is 00:25:52 So there's a new version that rolled out. It didn't say a lot about it, except it takes more time to essentially think about creating the image before it, you know, gives you the image. Also, they rolled the image gen out to free users, which was previously delayed. And last but not least, they are giving. chat gpt plus away for free to university students all right through may so uh essentially if you are a college student you can get chat gpt plus normally $20 a month for free through may you know so we can delve into uh writing our final papers together with way too many emojis have we passed the touring test apparently a new study says from UC San Diego's language and cognition lab has says that

Starting point is 00:26:50 OpenAI's GPT 4.5 model has convincingly passed the touring test, sparking debate about artificial intelligence's ability to mimic human intelligence and its potential societal impact. So in the study, GPD 4.5 was mistaken for a human in 73% of cases during a three-party touring test, significantly surpassing the random chance of 50%. So this marks a major, literally major milestone in AI's ability to simulate human-like behavior. So in this study, participants engage in text-based conversations. All right. So this wasn't real time. It was text-based with a human and an AI. Then the participants had to try to identify which was human and which was AI. So GPT 4.5 when adopting a specific persona outperformed actual humans in being judged as a human. That's a wild. If you follow AI that like the the the touring test has kind of been this, you know, unofficial gold standard of AI to. development. And now we might have it. So persona prompts, though, were key to GPT 4.5's success with

Starting point is 00:28:15 instructions to act like a young person knowledgeable about internet culture, boosting its win rate to 73%. Without those persona prompts, its success dropped to only 36%. So GPT 4.5 with personas, at least according to this, that he passed the touring test, which is a huge deal. So OpenAI's GPT-40 model, which powers the default version of chat GBT, achieved a much lower win rate of 21%.

Starting point is 00:28:51 But maybe the most shocking thing of all this, the decades old, the original Eliza chatbot that's like 50 years old, right? It was technically the first chatbot. I believe it was from the, what was it, the 60s, it performed at a 23% success rate. So actually, Eliza outperformed GPD 40 by a couple percentage points.

Starting point is 00:29:20 But undoubtedly, GPT 4.5 crushed the touring test, right? A 73% win rate, it's extremely impressive. And we've been saying this all along. So when GPT 4.5 came out, a lot of people were confused. And they were like, okay, this thing didn't crush every single benchmark ever. So why is it important? Empathy. EQ off the charts.

Starting point is 00:29:46 I also think this goes to show how a little bit of best practice prompt engineering goes a long way, right? Having chat GPT with such a simple or sorry, GPT 4.5, act as a young person knowledgeable about internet culture, having it act under that persona increased its win rate exponentially. So the implications of this study are significant with the study's lead author noting that AI's ability to convincingly mimic humans could lead to automation of jobs, enhance social engineering attacks, and broader social disruption. Yeah, I don't think this is necessarily like a great thing for AI. It's actually a little concerning, right?

Starting point is 00:30:32 because all those, you know, scams are about to get a lot better with GPD 4.5. I guess luckily in that regard, GPD 4.5 using it via the API, right? So if it were to be used in a bad way, generally you'd use it via the API because you want to do it in mass. It's extremely expensive still. But I do think that we're going to see a wave of new models in 2025 and 2026 like GPD 4.5. that are more tailored for emotional intelligence versus like standard IQ. And that's what's really going to trick humans. And that's where it gets both useful in many regards, right?

Starting point is 00:31:16 Because then all of a sudden, you know, your AI powered customer support can be a little empathetic and emotionally intelligent. But at the same time, the other side of the coin can be extremely ugly. All right. Amazon. Don't forget about them. They've unveiled Nova Act, a new AI toolkit for autonomous web agents. So Nova Act is designed to create autonomous agents capable of performing tasks in web browsers.

Starting point is 00:31:47 So this move signals Amazon's intensified competition in the race to commercialize AI agents and enhance their functionality beyond simple chatbots. Yeah, I think people kind of forget about Amazon, even though in the same way that's open AI in. Microsoft had this kind of relationship, right? With initially Microsoft being the biggest investor in OpenAI, hey, Amazon is the biggest investor in Anthropic. So you can't sleep on Amazon, but Nova Act, their new agenic AI, is part of Nova's AI initiative, which focuses on developing foundation models for various media and input types, including text, images, and video.

Starting point is 00:32:30 So the new toolkit allows developers to build AI agents, that can complete step-by-step tasks in web browsers, such as submitting time-off requests or placing recurring online orders without relying on APIs. So Amazon claims Nova Act excels and handling complex interface elements like drop-down menus, date pickers, and pop-up dialogues, which are challenging for other systems. So the software package available in Python enables agents to follow natural language

Starting point is 00:33:01 instructions and operate into behind the scenes mode for advanced business use. Developers can run multiple agents simultaneously to handle larger workflows, boosting efficiency for enterprise work. So Amazon's internal testing, this hasn't been verified via third parties, have shown improved reliability compared to existing systems, but the company will monitor real world performance closely. So Nova Act positions Amazon among competitors like Open AI, Microsoft, Google and Anthropic in the race to develop autonomous AI systems capable of completing real world tasks. So yeah, if you don't follow the agent space closely, I'm probably going to do another dedicated agent show or two in the coming weeks because the agent space has obviously been on fire this week. But I think a lot of people are also confused like what the heck is an AI agent.

Starting point is 00:33:55 What's different than an AI agent versus using a large language model that has tool and internet access, essentially, An agent is usually powered by a certain large language model. And an agent can autonomously make decisions on your behalf without your approval. Essentially, you are giving an agent agency, right? That's why they call them agents, right? You're giving it decision-making powers. And it can go off and complete multiple tasks in a single sequence without human intervention, connected to the internet, connected to tools, right?

Starting point is 00:34:32 That's a very oversimplified version of agents. But we'll probably do it a dedicated agents show soon just because there's so much new in the space. I don't know if you guys want it. Should I do that show as well? Let me know. But also Amazon is launching a website to let developers and everyday users experiment with Nova Foundation

Starting point is 00:34:52 Models, which were announced back in December. All right. Our last big piece of AI, news on a Saturday meta unveiled Lama 4. It's highly anticipated successor in its open weight, open source large language model lineup. So the release of Lama 4 features new open weights models designed to push the boundaries of multi-modal AI capability. So yes, now Lama 4 multi-modal by default.

Starting point is 00:35:30 an open source multi-modal large language model that benches actually very well. So there's four new models. Two of them are available now. So that is Lama Scout, which is the smallest model in Lama Maverick. So yeah, apparently we went a top gun there. So they're available now while two more are still in training. So that is the Lama for reasoning model. And then Lama Bohemoth.

Starting point is 00:35:59 So they are slated for release soon. So yeah, Lama sticking with their previous kind of Lama 3.3.2, 3.3 releases, having a small, medium, and large variation with now Scout Maverick in Behemoth, but then also adding that reasoning model. So the drop from Lama, the surprise drop, because we had reports coming out that Lama was facing some issues. internally catching up with other open, open source models in terms of benchmarks, but I don't know. Looks pretty good to me.

Starting point is 00:36:38 But the drop has sparked widespread excitement, particularly due to a 10 million token context window in the small scout model, setting a new industry standard. All right. So, yeah, 10 million tokens. So we're not sure yet how well it's going to perform, right? In the same way, you know, sorry, Google Gemini 2.5 Pro with one million context window. It is wildly useful. But also, there's always going to be drop off with these larger context windows because it takes longer.

Starting point is 00:37:18 If you're using it via an API, it eats up more compute, right? So the 10 million token context window, I think has been probably, the most popular piece of what was announced. But we will have to see in actuality how that plays out because what a lot of people aren't talking about is it was trained on a 256K context window. So, you know, I would say we really have to wait to see until benchmarks show how well this small model can take advantage of that 10 million token contact window. So a lot of people are, you know, shouting out right away, oh, rag is dead.

Starting point is 00:37:56 Retrieval augmented generation is dead. I don't think it is. But I have been saying now for many months that I think in the future, retrieval augmented generation, as we know it, today will be less important than it has been in 2023, 2024, and so far in 2025 because of these longer context windows. Also, I do believe that most AI usage will become agentic and reasoning as well. reasoning models, they eat up more tokens, hybrid models as well, because they're reasoning under the hood. And then when you talk about multi-agentic setups, I do think the rag becomes a little

Starting point is 00:38:39 less important, but I do think that we're going to see a new and improved version of rag that's more applicable for hybrid reasoning and multi-agentic models. But I think, obviously, larger context windows have something to do with that, but there is an offset to that, right? So you can't just think, oh my gosh, 10 million, you know, 10 million, uh, token context window, right, which is, uh, what is that? Like more than seven million words, uh, seven million 500 words roughly, right? So you're like, okay, I can just throw in, you know, dozens of books and, you know, like countless hours of transcribed videos and it's going to remember it every single time, 100%. No. Remember, expertise in the loop is still important. Uh, so, uh, meta CEO, Mark

Starting point is 00:39:25 Zuckerberg emphasized the company's focus on open source AI in his announcement video, stating that it aims to build the world's leading AI and make it universally accessible. So he expressed confidence that open source AI will dominate the field with Lama 4 marking a significant step in that direction. So Lama 4 models are expected to power AI agents capable of advanced reasoning and action. So these agents will be able to be able to. surf the web and perform tasks useful for both consumers and businesses potentially revolutionizing productivity tools. So met up plans to host its first Lama Khan AI conference later this month on April 29th showcasing AI advancements of Lama 4. All right. So we need to talk about the benchmarks. There's a lot of rumors swirling around, you know, people are doubting Lama's own internal benchmarks. I won't say that because here's why. Humans have confirmed it. Third party benchmarking services have confirmed it as well.

Starting point is 00:40:33 So as an example, the third party benchmarking service, which is a great resource, artificial analysis, looked at non-reasoning models. Okay, so non-reasoning. So, you know, none of the OpenAI, O3,01, Pro, Google Gemini 2.5 Pro, et cetera. So among non-reasoning, reasoning models, uh, Lama 4 Maverick is in third place and pretty close behind, uh, GPD

Starting point is 00:41:01 4.0 and deep seek V3, which were both just updated a couple of weeks ago. So actually, if not for those updates from GPT 4.0 in deep seek v3 that just happened. Lama 4 Maverick probably would have been number one, uh, non reasoning model on third party benchmarks, right? So yeah, a lot of people, if you read read the hoopla online, right? Because originally there were reports saying that meta was facing delays. They weren't able to get the benchmark that they wanted. But I mean, here you go for an open source model that is safe to use. That's the other thing.

Starting point is 00:41:34 You know, if you want to use a model from China via the API or the web, I would highly advise against it. You know, it's different if you're downloading it, fine-tuning it yourself for safety reasons, or using a tune that had or sorry, using a version of deep. seek or other Chinese models that have already been scrubbed by a company like, you know, perplexity or Microsoft Azure, et cetera. Then it's obviously safe to use. But, you know, Lama 4 Maverick, pretty impressive on the third party benches. Also, if we look at the Elo scores from the LM arena.

Starting point is 00:42:12 So this is human preference. So instantly, right away, Lama 4 Maverick, which again is that medium model in testing. is now the second most preferred model in the world. And I do think a lot of times, you know, benchmarks are important, right? But I also think maybe just as important is human preferences, right? Because models can essentially be overfit to perform well on certain benchmarks. But, you know, humans might not find the same utility that you might expect based on just benchmarks alone because of that overfitting problem. right. So I think ELO scores like the L.M. Arena where you put in a prompt, you get two responses. You don't

Starting point is 00:42:58 know which is which and you vote for it. Right. After millions of votes, you start to get some clear winners in terms of which models are best for humans. And that's what matters. And Lama for Maverick, very impressively leapfrogs a bunch of very capable proprietary models, right? That's the other thing. So yes, Lama is not true open source. It's not like an MIT license or something like that via deep seek. It's a little different. They have some restrictions. Lama does.

Starting point is 00:43:31 So it's under a more of an open weights, open source, ask Lama license. But still, an open source model immediately goes to the second best model in the world, right, with a 1417 score on the, Elo, the LM Arena. So Gemini 2.5 Pro, so good, by the way. 1439. Lama for Maverick, 1417. And then the updated GPD 4.0, 1410.

Starting point is 00:44:05 All right. So extremely impressive, you know, instant reactions to the new models from Lama. All right. That's a lot. What do you guys want? What do you guys want? All right. I don't know if I can do it tomorrow.

Starting point is 00:44:24 Maybe I can. But let me know what you want to hear more of. There was a lot. There was a lot there. So live stream audience, let me know. I'll probably put something in the newsletter as well. So let's do a quick recap in live stream audience. Let me know what you care most about, what we should cover next.

Starting point is 00:44:40 So here is a very quick recap of all the AI news that matters for the week of April 7th. So like we said, Microsoft unveiled just about everything new. Inside, co-pilot rolling out handfuls of new powerful AI features. Mid-Journey finally released its V7 after more than a year of waiting for its AI image generation model. Open AI sneakily is rolling out what I think is mini-rag with its internal knowledge access for chat GPT plus users connecting to dynamic data in Google Drive. Open AI also announced it's kind of delayed plans for GPT5. Bummer, but on the good side did say that they're going to be rolling out the full version of 03 and a new 04 mini thinking models in the coming weeks.

Starting point is 00:45:35 Next, OpenAI's GPT 4.5 in a study has passed the touring test, rather convincingly. Amazon has unveiled Nova Act, its new autonomous web agent. And then last but not least, meta unveiled multiple Lama 4 models. Two are already out. Two should be released soon. So a lot to cover this week. Let me know what you want to see more of. Also, please don't forget if you're going to be at Google next 2025 in Las Vegas,

Starting point is 00:46:12 let me know. I think I should actually have time to go and talk to a lot of, you know, different providers that are there at this Google Next conference, maybe attend a session or two, right? So I'm excited about this conference in partnership with Google. And then don't forget Inception games. We're going to have, yeah, the madness of March might be just about over. I believe the championship game is tonight. But our AI startup madness continues. We need your vote. It's actually very close. You know, we're going to have our final two back on. on for a show and, you know, we'll announce the prize and some of those other things

Starting point is 00:46:53 live on that championship show of the Inception Games. So if you have not voted already, make sure you go back and listen to episode 497, where we had our awesome eight group of AI startups in the Inception Games, pitch their service to you all. So if you haven't voted yet, make sure to go do that. All right, that was a lot. I appreciate y'all. I would also appreciate you going to Your EverydayAI.com.

Starting point is 00:47:18 signing up for our free daily newsletter. Thank you for tuning in. Hope to see you back tomorrow and every day for more everyday AI. Thanks y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps,

Starting point is 00:47:45 including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stay in control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going.

Starting point is 00:48:17 For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 498: Meta drops Llama 4, Microsoft Copilot levels up its AI game, GPT-5 roadmap hits snag and more AI News That Matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.