Everyday AI Podcast – An AI and ChatGPT Podcast - EP 428: AI News That Matters - December 23rd, 2024

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. After two weeks of a back-and-forth slugfest,

Starting point is 00:00:52 Open AI and Google are maybe done announcing all of their big AI updates for the year. But that's not all that happened in the world of AI this week. InVIDIA went really small, anthropic, really some kind of troubling research. And yes, obviously open AI and Google finished 2024 with a bevy of AI announcements. There was a lot that happened in the world of AI this week. And if you didn't catch every single piece of news, that's okay. That's what we're here for. What's going on, y'all?

Starting point is 00:01:31 My name's Jordan Wilson, and I'm the host of Everyday AI. This is your daily live stream podcast and free daily newsletter, helping everyday people like you and me, not just learn what's happening. in the world of AI, but how we can all actually understand it and leverage it to get ahead to grow our companies and our careers. So maybe that sounds like you. If so, you're in the right place. If you haven't already, please go to your everyday AI.com. There, you will be able to sign up for our free daily newsletter. And in that daily newsletter, we recap every single podcast episode. So when we have great guests, leaders from, you know, big companies, start,

Starting point is 00:02:12 etc. We recap every single podcast episode in our newsletter as well as keep you up to date with all of the AI news and everything else that you need to know to be the smartest person in AI at your company. All right. So if you haven't already, please make sure to go do that. All right. So enough chit chat. Let's get into the AI news for the week. Each almost every single Monday, we bring you the AI news that matters. So that's cutting through the fluff, you know, making sure you don't have to spend three hours a day trying to keep up with everything. You can just join us Mondays. And we cut through everything that happened in the world of AI.

Starting point is 00:02:50 And we say, here's what actually was announced, not just what the company said in a marketing or in a PR release, but here's what actually happened and why it actually matters to you. So let's get into the AI news that matters for the week of December 23rd. Sipping on the coffee. Let's get it, y'all. All right, first, pretty big one here. And not a lot of people talked about this, but Google has unveiled Gemini 2.0 flash thinking.

Starting point is 00:03:24 So Google made headlines this week with the announcement of its latest AI model, Gemini 2.0 flash thinking, which promises to enhance multimodal reasoning capabilities. All right, so this is kind of Google's answer to the 01 model. And before we get into it, I want to start having. you all think about AI models in three tiers. So think of it like this. You have your big model, which is your most capable one, your small model generally built for speed or for developers to use on an API, so it's cheaper.

Starting point is 00:03:56 And now you have this kind of new tier, which is a reasoning model. So in the past week and a half, Google has updated all three of these tiers to its new Gemini 2.0. So yeah, we saw Gemini 2.0 experiment. And then we saw Gemini 2.0 Flash. So that's kind of the big, the small. And now we have the reasoner with Gemini 2.0 Flash thinking. So Gemini 2.0 Flash thinking supports up to 32,000 tokens of inputs and can generate 8,000 tokens in a response. So that's about 50 to 60 pages of text for

Starting point is 00:04:34 context there. So here's what's new. Well, it's the thinking mode. All right. So this offers improved reasoning capabilities compared to its predecessor, Gemini 2.0 Flash, which has only been out for like a week and a half. So users can access, here's the differentiator. Users can access the step-by-step reasoning through a drop-down menu. So yeah, this is all available on the front end of Google Gemini. So that's, well, actually, let me quickly explain what's available and what's not. All right. So on the back end, which is Google Gemini AI Studio, the back end has a lot more, a lot more. So actually, this new reasoning model is only available on the back end, although the front end of Gemini did get some nice announcement.

Starting point is 00:05:26 So on the front end of Gemini, so if you log into Gemini.com, you will get the new 2.0 kind of big model. you will get the 2.0 flash. And then what we covered last week, the deep research, which is running on 1.5. But if you do want to access this kind of reasoning model, this flash thinking, you do need to go to Gemini's AI studio or Google's AI studio. All right. So right now, independent analysis from LM Arena. So that's the chatbot arena that we always talk about. It does rank Gemini 2.0 flash thinking as the top performing.

Starting point is 00:06:05 model across all LLM categories. So that's pretty big. So these reasoning models we've seen, such as O1 from OpenAI or O1 Pro or O1 Mini, we've seen them benchmark much better. Well, that's because they're using more compute. It's using essentially more power to give you better results. It also takes longer. All right.

Starting point is 00:06:28 So unlike competitors, Gemini 2.0 can process images from the start, showcasing its versatility and handling different data types and formats, although the new 01 updates do allow you to take image input as well. So right now, here's the downside. You might be thinking, okay, well, where's the downside? Well, like I said, you can't use Gemini 2.0 Flash thinking on the front end. Also, it doesn't currently support integration with Google search or Google's own tools, which may limit its immediate application. So yeah, if you think that you can use this with real-time information from the web, you cannot. So this advancement, though, positions Gemini 2.0 flash thinking as a serious contender against OpenAIs, 01 Pro and 01 models. So live stream audience, what's going on? Have any of you used Gemini 2.0 flash thinking?

Starting point is 00:07:25 The other thing is, it's free, right? So Gemini AI Studio is free. I do have to put this word of caution out there, though. People are always like, oh, why is it free? Well, you can't turn it. off data training in AI studio. You can obviously on the front end of Google Gemini, but keep in mind, there's usually a price to pay even for a free tool. Meta got in the mix. So Meta has unveiled some new enhancements to its Rayban smart glasses. So Meta announced some updates to its Rayban smart glasses, introducing some new AI

Starting point is 00:08:02 features that enhance the user experience and functionality. So the updated RayBan meta smart glasses now include AI video capabilities and real-time language translation. Those were all demoed a couple of months ago. They weren't released with the first update, but now it is available. So the features were first revealed during the MetaConnect conference in September, and they're now available to members of the Early Access Program. So not available to everyone per se, but if you are in Meta's Early Access Program, the new V-11 software update is what it's called.

Starting point is 00:08:39 It began rolling out the middle of last week. It does enable the glasses to process visual information and respond to user inquiries in real time. In terms of this language translation, so you can now communicate in English, Spanish, French, or Italian, with translations provided through the glasses speakers and also displayed on your phone through the meta app. So additionally, the update includes the integration of Shazam, allowing users to identify songs through the smart glasses. All right. So I'm excited about this. I still have, I haven't set up, this is terrible.

Starting point is 00:09:18 I haven't set up my meta smart rayband glasses just yet. My wife got them for me for my birthday. I've been nonstop. I mean, all this stuff over the last two weeks with everything that OpenAI and Google have been announcing, it's left me very little. little time to do anything else. Also, if you are, if you do have access to the early access program from META, and if you've used these new live updates, let me know. I don't think I'll have access yet to the early access program, but maybe I'll have to reach

Starting point is 00:09:50 out to my friends there at META and see if I can't do that. Maybe we'll do a show. I know Dr. Harvey Castro last year had his meta glasses. I think he did live on our show, which was pretty fun. All right. Salesforce got in the mix. All right. Let's talk about Salesforce.

Starting point is 00:10:13 They announced Agent Force 2.0. All right. Yeah. I know we just got Agent Force 1.0 like two weeks ago. But Salesforce has already announced the upcoming general availability of its Agent Force 2.0 platform, which is set to launch on, sorry. in February 2025, with some features rolling out as early as this month. So the new version of Agent Force includes a library of pre-built skills and improved reasoning

Starting point is 00:10:46 and data retrieval capabilities, enabling agents to handle more complex queries effectively. So early adopters of Agent Force 2.0 include major companies like Accenture, IBM, and Indeed, showcasing its appeal among large enterprises. So obviously, yeah, if you are a Salesforce company, if you use Salesforce for sales, this is going to be a pretty big one for you. So customers can also deploy the platform, the Agent Force platform within Slack starting in January. So you won't have to wait until February for that. According to Salesforce, that will be pushed out in January.

Starting point is 00:11:25 Also, people are, you know, super excited about this. And this is great. However, a new survey by Cap Gemini. And I did reveal that over 80% of executives plan to implement AI agents within the next three years. All right. Also, a pretty interesting approach here to Salesforce. And I know I've kind of made small jokes on this in the past, y'all. But the company, this is great, right?

Starting point is 00:11:53 The company does plan to hire 2,000 humans to sell agent force. It's funny, right? A little bit ironic. because what Agent Force is supposed to do is use autonomous AI agents to sell. So pretty interesting here that Salesforce has decided to not eat its own dog food and is hiring more humans to sell Agent Force, which is the AI that is supposed to sell better or in addition to humans. All right. So while the potential for AI in the workforce is vast.

Starting point is 00:12:30 analysts cautioned that with robust security and governments, enterprise may also face huge security risks, with Gartner predicting that AI agent misuse could lead to 25% of enterprise breaches by 2028. A whole new category of cybersecurity to be worried about, which is agenic AI going off the rails. Invidia went small. Everyone else went big.

Starting point is 00:13:02 Invidia went small. So, Nvidia launched Jetson Oren Nano. All right, so this is essentially, you know, though, like Raspberry Pi, like the $30 computer that's been around for, I don't know, a decade. This is kind of like that, but for AI. So, Nvidia has unveiled the Jetson Orin Nano super developer kit. That's a mouthful. A powerful new AI computer priced at... $249. It's wildly cheap, all right? So making advanced AI processing more accessible to hobbyists and

Starting point is 00:13:43 developers alike. So the new Jetson Nano boasts neural processing capabilities of 67 top. So that's 67 trillion operations per second, which is, it's, it's kind of weird that, you know, we're talking about something that can process 67 trillion operations per second and it's priced at $249. All right. So that's a 70% increase over the previous models, 40 tops. It also features a, sorry, 50% more memory bandwidth, allowing for faster data processing and improved operational efficiency. So the kit retains the same hardware as the original ORN nano, but will benefit from

Starting point is 00:14:29 a new jetpack update. that enhances performance through improved power management. Does anyone remember that jetpack game? I used to love that. I forgot. Was that a computer game? It couldn't have been a smartphone game. I don't know.

Starting point is 00:14:45 Is that from the late 90s, early 2000s? But when I saw this announcement from Nvidia CEO, Jensen Wong, that's all I can think about is the old jetpack video game. So, Nvidia's new power mode increases the GPU memory and CPU. Contributing to the performance gains. So, yeah, if you're wondering, how the heck can a $250, you know, essentially small mini-a-i computer that fits in the palm of your hand, how can it do all that? Well, you know, InVIDIA knows a thing or two about getting the most out of their hardware. So, Nvidia describes the nano-super as an ideal solution for creating chatbots, visual AI agents, and AI-based robots.

Starting point is 00:15:31 expanding the possibilities for developers in the AI space. So yeah, you might see a bunch of these Jetson nanos in, you know, humanoid robots. You know, people might be stacking 10 of these to create a super, you know, a super capable, you know, home PC that can run, you know, edge AI. So there's a lot of different use cases for this, but it's pretty cool to see this because not many people out there

Starting point is 00:16:01 are focused on creating affordable hardware to run AI locally. But I think the big play here is devices. I think this is going to bring essentially smart AI in the future, maybe not with this device, but in future versions of the Jetson or in Nano, right? I think it's going to continue to obviously get smaller and smaller and get more and more powerful. But I think this is ultimately,

Starting point is 00:16:25 and this new category of PCs that I think Nvidia is here creating, is going to bring edge AI to all of our devices. Think now, right, and this is the manufacturer's price. So if you're buying them, you know, in bulk, you know, if you're buying tens of thousands or hundreds of thousands of these, I assume it's going to be much cheaper. But think, this is what's going to bring edge AI, even though I don't think it belongs here, right, to our microwaves, to our, I don't know, headphones, right? This is the type of technological advancements that's going to make it all happen.

Starting point is 00:17:02 I think so many of the other big companies, yes, they're creating silicon for faster AI. But, you know, Nvidia is taking the lead here in putting edge AI in a form factor that you can plug and play, right? Everyone else is, you know, really focused on the GPUs and the NPUs, right? All of these essentially AI chips. whereas Nvidia here, this is a plug-and-play device, right? So I do see this as being a big trend in 2025. And I'm sure all of Nvidia's competitors will probably announce something similar, if not release something.

Starting point is 00:17:43 And I know there's already some competitors in this space. But I think with this one, it's pretty big, right? Just if you look at the benchmarks, if you look at the power, right? 67 trillion operations per second in an edge AI device that can plug and play for $249. Unheard of, right? You would have thrown those specs out four or five years ago. Someone would have slapped you in the face and said, this is fantasy. It's not.

Starting point is 00:18:14 It's here. All right. Let's keep going. This one might be actually the biggest news of the week. Samuel, thanks. Samuel said jetpacked. Joyride was iPad and iPhone. Thanks. All right, but the biggest news of the week could be VO2.

Starting point is 00:18:35 I still don't know if it's VO or VEO. I've heard it called both ways, but regardless, Google has announced VO2 challenging open AISO SORA in AI video generation. So Google has announced the launch of its new AI video generation model VO2, which is now available for users to join a wait list. All right, so VO2 promises to produce high-quality videos across various subjects and styles, emphasizing its advanced understanding of real-world physics and human movements. So aside from having great physics and being able to showcase movement much more naturally

Starting point is 00:19:17 than any other AI video tool out there, another big feature from VO2 is, well, it can generate videos up to 4K. So to compare OpenAI's SORA Turbo, which is what was released to the public about 10 days ago or two weeks ago. And again, that's SORA Turbo, not the full SORA model. I think people are overlooking that. But right now SORA can only produce 1080, where now VO can produce 4K. So according to Google, looking at some user testing, VO was preferred over Sora Turbo. turbo in about 59% of cases based on evaluations from over 1,000 prompts in videos, suggesting it may have an edge in quality.

Starting point is 00:20:08 So Google plans to expand VOS 2 capabilities to platforms like YouTube shorts and other products in the coming year, including a broader integration of this technology. So to access VO2, number one, you got to sign up on the wait list. You got to get lucky because very few people have been given this kind of trusted tester capabilities. So you have to be over 18. You have to live in the U.S. right now. So sorry, rest of the world. And you have to sign up on the Google Labs website.

Starting point is 00:20:44 Here's the thing, though. Let's call it what it is. V-O-2, much better than SORA. It's not close. It's much better than everything. So people are talking about this. So I wanted to address this. Everyone's like, oh, we had to wait for SORA for like nine months, right?

Starting point is 00:21:06 And here we go. A week after Sora was released, V-O-2 comes out and it's much better. Yes, there's no denying it. Head to head, right? Not every single time. So according to Google, about 59% of the time, users prefer Vio. Here's the thing, though. the same people belly aching that we had to wait nine months for SORA.

Starting point is 00:21:32 Do you notice this is V-O-2? Guess who signed up for V-O-1? Me, half the world, right, or half the AI world signed up for V-O-1. V-O-1 was not publicly or it was not publicly released before they started releasing V-O-2. So I don't know, right? Google, let me say this. I think Google won December. If you're looking at Google versus Open AI, it was a slight edge, but from the LLM side,

Starting point is 00:22:10 Google released, Google shipped, right? We got 2.0 experimental. We got 2.0 flash. We got deep research. We got 2.0 flash thinking, right? We got all the LLMs, but everything else that they announced, right? Project Mariner, Project Astra, VO. None of that's publicly available.

Starting point is 00:22:32 Right. So did Google have a great month? Absolutely. Are we going to see VO2? Anytime soon? I don't know. Right, right? At least in terms of it being generally available, I don't know.

Starting point is 00:22:45 Because they announced VO1 in May. And they started to release VO2 before they had to even release it. So I don't know if the entire AI world is going to get access to V-O-2 anytime soon, right? And again, what OpenAI even released was SORA Turbo, right? Which if you look at their naming and their marketing approach over the last two years, turbo is obviously a faster model, but it's generally less capable. So, I don't know, we'll have to wait and see, but the AI video wars aren't going anywhere anytime soon. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.

Starting point is 00:23:46 Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and share. the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching,

Starting point is 00:24:27 and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adop.com. All right, Michael, dropping a hint. He said, tip, the Google team is very active on their Discord channels. All right, so there might be some more info in there.

Starting point is 00:24:59 Thanks, Michael. A lot to go check that out. Here's a kind of alarming AI story. So new research from Anthropic reveals that AI models can deceive through alignment faking. So recent study from Anthropic highlights a concerning behavior in AI models where they can pretend to adopt to new principles while secretly adhering to their original preferences. So this study found that models like Anthropics Claude 3 Open. can engage in a behavior termed alignment faking, where they simulate alignment with new principles while actually maintaining their original behaviors. This sounds like a seven-year-old, right?

Starting point is 00:25:43 So in controlled experiments, Claude III Opus attempted to fake alignment 12% of the time when instructed to answer potentially harmful questions, even though its original training aimed to avoid such responses. So researchers noted that when, Claude III opus was re-trained on conflicting principles, it exhibited alignment faking as at a much higher rate of 78% in some tests, indicating a significant challenge in ensuring true alignment with safety measures. The implications of this research are huge, and it's suggesting that developers could be misled into believing a model

Starting point is 00:26:26 is aligned with their own safety training, when in reality it may change. just be faking it to avoid retraining. This is weird. So the anthropic study, interestingly enough, was co-led by former open AI safety researcher Jan Leakey, who left open AI a couple of months ago for Anthropic. So the study emphasized the need for further investigation in this behavior as AI systems grow more sophisticated. This is important to note, though, not all models that they tested exhibited this behavior. So others, including OpenAIs GPT40 and Metazalamah 3.1, showed little to no alignment faking,

Starting point is 00:27:15 suggesting variability among AI systems. But yeah, apparently Claude 3 Opus, which is Anthropics, in theory, their most powerful model, even though it hasn't been upgraded yet to the 3.5. So the best model out right now for Anthropic is obviously Sonnet 3.5 new or some people are calling it 3.51 or 3.6, right? So the Opus model hasn't yet been updated from the 3.0 series, which is interesting. And maybe it has something to do with this alignment faking. Regardless of whether that's the case or not, this is a super concerning study from Anthropic. First, hats off to Anthropic, right?

Starting point is 00:27:56 they're constantly, I think Open AI does a great job of this as well, but I think Open AI really focuses on its own models a little bit more when it does its own research looking into model behavior. I think Anthropic does this a little more frequently, at least according to the size of the team, et cetera. But Anthropic really takes a, I think a broader, more holistic view. And this is concerning AF, right? This is essentially saying that, hey,

Starting point is 00:28:26 even when we train models on a certain data set or if we retrain them for safety purposes, right, a lot of times it goes back to its original behavior. Let me read that one stat one more time. So researchers noted that when Claude III obis was retrained on conflicting principles, it exhibited alignment faking at a much higher rate of 78% in some tests. All right. Retraining is important, right? Humans can never get it right.

Starting point is 00:29:00 AI can never get it right. You know, model training, humans training AI models on safety, guardrails, new data sets, etc. It's extremely important, but it's an iterative process. That's something people don't understand. People think, oh, a model comes out and there's no more, you know, training or corrections until the next model. That's not the case. So when, in this case, they're saying, oh, if anthropic,

Starting point is 00:29:25 finds a problem with Claude 3 opus and they have to retrain it. 78% of the time, let's say they accidentally allowed, you know, something bad to get through to a model that the public is using. And they go to retrain it. 78% of the time, it's going to fake, right? Oh, hey, hey, Claude, we, like you accidentally told you A equals B. Turns out A equals B is kind of harmful. So now please keep in mind that A equals C.

Starting point is 00:29:52 All right. publish. Okay, 78% of the time, it's going to fake it. Yikes. Not good. Not good. Yeah, Denny says Alignment Faking equals lying. Sure. Kind of, right? I think it's slightly more complicated than that. But alignment, because it's not always things that are bits and bites, right? it's not always in instances where there's yes and nos. Sometimes it's just taking things down a path that you may not want the model to go down. It could just be against guardrails that you set for the model, things like that. So it's not necessarily that it's just lying to you.

Starting point is 00:30:39 It's not necessarily that it's a hallucination. It just might after retraining. That's all it is, right? So if something gets wrong in the first version of a model, it doesn't mean it's right, sorry, it doesn't mean it's right or wrong from a facts perspective. It might just be as an example, good versus bad, safe versus dangerous, right? So it's not necessarily lie versus truth, right? When we talk about alignment, it's more of desired behaviors.

Starting point is 00:31:11 It's more of adhering to guardrails versus, oh, it's hallucinating, it's lying. That can be the case, yes, but it's a little more complicated than that. All right. Next piece of AI news. Even more updates that a lot of people didn't see this one. Google search has introduced and has started to preview its new AI mode to compete with chat GPT search. So Google is set to enhance its Google search functionality by introducing an AI mode that closely resembles its Gemini AI chatbot. So reports indicated that the new AI mode in Google search will allow users to switch to a chatbot-style interface, providing a more interactive search experience.

Starting point is 00:31:56 So screenshots of early tests reveal a new shortcut button for the AI mode within the Google app, which could streamline access to AI-driven search capabilities. So the AI mode was discovered in a beta version of the Google app, suggesting that Google is actively working on integrating artificial intelligence into its search platform. The feature may include the ability to refine searches with follow-up questions, enhancing the overall search experience for users. Additionally, Google has been testing the option to attach files to searches, indicating a broader integration of AI functionality within its services. So this is very similar to if you were to now just go to chatgbt.com, even if you're not logged in, right?

Starting point is 00:32:50 So now if you go to chatgbt.com, even if you're not logged in, it does kind of work a little bit more like a search engine, right? Right away, you can just enter a query. It could be a, you know, what you'd call a short tail query, right, restaurants near me, or, or could be something, you know, why is my dryer still broken after three repairmen have come out, right? So we might be getting away from traditional search, which is interesting, right? Because, and this is a bigger topic to tackle, all of these AI searches, right? Whether you're talking about chat GPT search, whether you're talking about using Google Gemini in this way, Microsoft co-pilot, right? Even meta has a great integration to real-time results from the web.

Starting point is 00:33:41 Even deep research. I talked about this last week. Deep research from Google. What happens then? What happens with this new AI mode? If it just gobbles up information from all these websites and those website users don't get a click. I think this is going to be disastrous because clearly this is what users want. Right?

Starting point is 00:34:04 But who's going to feed the models? Who's going to feed the models? right? What happens when more and more people start blocking their websites that have high quality information that all of these, you know, as an example, Chad GPT search, Microsoft co-pilot, Matalamu, you know, this new AI mode in Google. What happens when so many high quality publishers start blocking access? Right? These big tech companies have to start sharing that money with publishers. Because, generally, the way publishers, the way the internet works, right?

Starting point is 00:34:40 You go to a website and that person, probably they might have an ad on their site so they get a little money, right? You might opt into their email newsletter. You might buy a product or service, but that's what makes the internet world go round. So what's going to happen now if we see a Google AI mode, right? And I think that's the big one because Google has historically, at least over the last decade, controlled more than 90% of search. So what happens then?

Starting point is 00:35:07 If you stop, if all these content publishers, all these big media publications, stop getting users to their website because we see things like Google AI mode. I don't know. It's worth talking about. All right. Chat GBTGPT updates. There's a lot, y'all. We're going to kind of combine them here.

Starting point is 00:35:30 So chat GPD search since we just talked about it. Big update there. So Open AI wrapped up there essentially 12 days of open AI or 12 days of shipment. on Friday. There's technically a little surprise, Day 13 by free SORA relaxed mode. You know, so getting unlimited generations for relaxed mode for a short period. All right.

Starting point is 00:35:54 So Open AI wrapped up there 12 days of Open AI. And a lot of the updates were to the either ChatGBTD desktop app, the iOS app, and ChatGPT search. So let me just kind of round up kind of the rest. So there was kind of a mini-Dab day. you know, all the goodies and treats for developers. 01 OpenAI's reasoning model is now available via the API. So a lot of API developer stuff, but for the everyday person, here's what you're going to see.

Starting point is 00:36:22 So chat GPD search is now available to all users globally, including those with free account. So that was a big update. So users can activate the search feature by clicking that little globe icon in the compose bar, allowing them to receive live updates from the web in their responses. A lot of new updates as well to the new advanced voice mode. So we actually started that out two weeks ago, but now advanced voice mode finally has access to chat GPT search. Here's why that's important.

Starting point is 00:36:58 Advanced voice mode, I think, was more of a party trick, right? And yeah, there's the video now, which is really cool. It's like, hey, look at this, right? But without access to the web, it was just that party trick. I would never use it for anything meaningful. And I would never suggest any business businesses use it for anything meaningful. Right. The same way I tell companies, you shouldn't be using Claude, you know, on the front end,

Starting point is 00:37:26 at least. It's different if it's on the back end because Claude's not connected to the internet. And just about anything that you're doing with business changes literally by the second. So if you're an enterprise company saying, oh, I'm going to give, you know, 30 people on my team access to Anthropic Clawed. We're going to use it on the front end. Bad idea. You're going to get bad old information.

Starting point is 00:37:45 The same thing was true with chat GPT's advanced voice mode. It's like, oh, yeah, that's cool. You know, use it on your desktop and, you know, have it work along with you. But don't, right? Because it was using old information. It didn't have the newest knowledge. Now it does. So that's one of the big updates is now the advanced voice mode

Starting point is 00:38:07 does have access to chat GPT search. They announced this middle of the week. It took about five days for me to get this. But a couple things that you need to know that I didn't see covered, and it definitely wasn't in Open AI's demo. So it does work with both video and normal voice mode. So that new kind of neural AI agent that you can talk, or not AI agent, but this new kind of an AI that you can talk to sounds super human,

Starting point is 00:38:33 super realistic, very low latency, right? You can cut them off. It's great. So now it has access to the web. However, it's not as open AI demoed. So go try it for yourself. Let me know. Maybe even some of our live stream audience,

Starting point is 00:38:47 if you're listening on your computer, go try it out for me real quick. So now when you do this and you, if you access, if you ask chat GPT advanced voice mode for recent information, you get the old search sounds. So what that means in the demo,

Starting point is 00:39:02 it was almost instantaneously. And it wasn't like there was this query, period. But now you get this little sounds from the standard voice mode that happened when it was essentially querying the web. So a little bit different than the demo. You get these little click sounds and you have to wait about two to four seconds when you are asking for real-time information from the web. All right. A couple other updates, a handful of other helpful updates from chat GPT, including those that I just mentioned, including some major improvements to its work with functionality. So yes, even in voice mode, you can now use the chat GBT app to work with

Starting point is 00:39:43 and see files that you're working on from a list of third-party app. So if you are using as an example, you know, the chat GPT app on Mac, right? And maybe you're working in the terminal or, you know, now it integrates with some of Apple's kind of files as well, is you can talk to chat GPT and chat GPT can see the contents of those files from certain third-party programs. So the combination of advanced voice mode, real-time, and also this work with apps, a lot of nice updates from OpenAI. But the biggest update, I think for the past week, is the new reasoning model from OpenA.3. Yes, they skipped 02, some issues with a, I believe it was a British telecom company.

Starting point is 00:40:40 So it looks like OpenAI is going from its 01 reasoning model to 03. So Open AI has announced its new 03 reasoning model, which have generated excitement and skepticism within the AI community regarding their potential to approach artificial general intelligence. That's what this is ultimately about here, y'all. Everyone's talking. Do we have AGI yet? I'll say no. All right. I'll just keep it easy. I'll say no. But let's talk about this model a little bit. So the O3 models were introduced as a major advancement in AI with CEO Sam Altman claiming they represent a new phase in the technology, right? Not in open AI's models, but in AI technology. So there is a benchmark. AGI, right? So the creator of this benchmark, and I'm going to tell you here in a minute, why this is important, the creator of the ARC AGI benchmark emphasized that while the O3 models are impressive, they still struggle with a number of simple tasks, indicating true AGI has not yet been achieved. Yeah. So essentially, there's a lot of talk right now.

Starting point is 00:41:56 You know, when Open AI announced this, they announced the benchmarks. They announced some of the results with this ARC AGI, which is essentially, there's a prize and a test for essentially achieving a certain score on this ARC AGI benchmark or this ARC AGI prize competition to simplify it. Right. So a lot of people are like, oh, we have AGI now. I don't think we do. But let's talk a little bit more.

Starting point is 00:42:26 So the ARC prize organization recognized the O3M. models as a significant step forward, highlighting their novel ability to adopt a task, which was previously unseen in the family of GPT models. So the ARC AGI benchmark assesses an AI's capacity to generalize and efficiently acquire new skills, making it a critical measure of progress toward AGI. So the O3 announcement follows a period of concern in the AI community regarding what a lot of people talked about, the slowing pace of education. advancements for large language models, particularly in light of scaling laws that predict

Starting point is 00:43:07 diminishing returns on AI performance as models grow. So despite these concerns, the O3 models demonstrate that there is still room for improvement in AI, potentially leading to more sophisticated chatbots and enhanced problem-solving capabilities. Yeah, also reportedly, this new O3 model has like a tested IQ of like 157 or something absolutely bonkers like that. Again, that's just, you know, some unofficial ramblings on the internet. That's not part of a research paper or anything yet. So you might be thinking, oh, can I go get the O3 model? No, you absolutely can. So this is, again, this is one of those waitless things. It's not even really a wait list. This isn't open to the general public yet.

Starting point is 00:43:53 So the O3 models, because there's kind of a high-powered one and a low-powered one, you know, the same thing, kind of like a big and a mini. So the O3 models are currently only accessible to approved safety researchers who are tasked with exploring their safety and security implications with no public release date confirmed. Open AI's previous reasoning model, 01 was just released or sorry, was just announced like three months ago. And the new 01 Pro model was just announced and released like, or sorry, yeah, like two weeks ago.

Starting point is 00:44:33 I've been playing with O1 Pro. I love it. You know, also, if you want me to run any O1 Pro prompts, you know, feel free to, you know, just leave them in the comments here if you're on the live stream, if you want to see what 01 Pro is capable of. But, you know, pretty impressive that now we're already seeing 03. Again, just an announcement, but it will, according to OpenAI, be rolled out to a select few safety researchers.

Starting point is 00:44:58 And I think we're going to see is this 03 model once it's out and once it has access to all of OpenAI's other tools, right? Tool use, I believe, is a big step in even considering if a model could be considered to achieve artificial general intelligence. So what that means is when a single model is essentially smarter and more capable to do everyday task across any domain than any smart humans, right? And you can make the argument, and a lot of people have been making the argument earlier in 24 that we've already achieved artificial general intelligence. I don't think we necessarily have. It just depends because the goalposts are constantly moving.

Starting point is 00:45:45 If you look at definitions of AGI from 15 or 20 years ago, we've definitely achieved it. But the definitions keep changing as the technology keeps changing. So I don't think as an example, this 03, at least what we saw on Friday from CEO, Sam Altman and team, I don't think this is necessarily AGI, even though it did achieve the highest score ever on the ARC AGI kind of benchmark there, which I think, at least for now, is kind of the bar that a lot of people have set. So even though Open AI has kind of passed this benchmark, the ARC AGI, I don't think that necessarily means that we've achieved AGI. I do think the first big step is tool use. All of these, right, whether we're talking about 01, 01, Pro, 03, right, whatever, I don't think you can even start having those conversations until it has access to all of the tools, right?

Starting point is 00:46:42 You need access to advanced data analysis. You need access to be able to, you know, take in input, images, in videos, right? Because a human, right? If I play a video and I put a human in an AI and say, hey, tell me what this video means. Right now, the O1 models can do that. Some of the Gemini models can, right? But you have to look across all spectrums. You have to look across different inputs and outputs.

Starting point is 00:47:12 I think until we actually have that conversation and we can say, yep, we've achieved AGI. I do think these 01 models, O3 models need access to advance that analysis. They need to be able to, you know, accept inputs of different types, you know, PDFs, Excel sheets, different types of codes. It needs to be able to render things in real time. So I think a lot of people are talking, oh, we've achieved AGI, I'll say probably not yet. although this 03 model could be the one that ultimately does it, right? It could be the one that, but we might not get access to it for six months, for 12 months. And it might be another couple of months or a couple of quarters or even a year or more until these most capable O models get access to all the other tools that I think they would actually need to say, yep, we've achieved ATI.

Starting point is 00:48:00 And right now, the price on the O3 high compute is bonkers. I mean, some of these tests, like single prompts were costing thousands of dollars. I mean, very difficult. But in the ARC AGI, it was costing hundreds of thousands of dollars to run it. So the costs are obviously going to go down drastically. But right now, a 03 model looks nice. But I think right now it's just a shiny toy. Who knows if we're even going to get access if the general public will even get access to it in 2025.

Starting point is 00:48:33 All right. That is the AI news for the week. Let me quickly recap it. So Google has unveiled Gemini 2.0 flash thinking. Meta updated its Rayban smart glasses for those in the early access program to have live AI video capabilities. Salesforce announced Agent Force 2.0, which will launch in February 2025. InVIDIA has launched the $249 Jetson or in Nano, which I think is really going to Change Edge AI. Google unveiled VO2 way better than SORA, but when will we get access?

Starting point is 00:49:09 Who knows? Anthropic release some new research. That said AI models can be pretty deceptive through alignment to faking. Google search is reportedly introducing an AI mode to compete with chat GPT search and other AI search. Open AI rolled out a ton of other updates. A lot of them, I think, in advanced voice mode and chat chat. Gpt search. And then Open AI unveiled the new O3 reasoning models, which I don't think most of us are going to get access to anytime soon. All right. That was a lot. Maybe you were on the treadmill and you missed a thing or two.

Starting point is 00:49:44 You can always go to our website at your EverydayAI.com. And sign up for the newsletter. We'll be recapping all of these stories as well as everything else you need to stay up to date. If you're listening on the podcast, thank you. Please subscribe to the show. Tell your friends, leave us a rating. we'd really appreciate it. If you're joining us live on LinkedIn, Twitter, YouTube, whatever, share this with your friends.

Starting point is 00:50:08 Please, people are always like, hey, Jordan, this has been so helpful. How can I help? Tell someone about it, right? I know this might be your big secret on how you're the smartest person in AI at your company, but we'd appreciate you sharing the love. That's why we do this every single day. We keep it free, accessible because AI is tough to keep up with. That's why we do this thing.

Starting point is 00:50:25 Also, we're going to take a little break. All right. So if you're a normal newsletter, newsletter reader, newsletters going out today, but Tuesday through Friday, we're putting a little pause. It's been an exhausting last couple of weeks. But we will be back next week with the newsletter. Thank you for tuning in. Hope to see back tomorrow.

Starting point is 00:50:47 Well, later next week. And every day after that for more everyday AI. Thanks, y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want. want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premier Express, and more in one

Starting point is 00:51:16 conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 428: AI News That Matters - December 23rd, 2024

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.