Everyday AI Podcast – An AI and ChatGPT Podcast - EP 379: AI News That Matters - October 14th, 2024

Episode Date: October 14, 2024

Is OpenAI gonna lose money for 5 more years? Will Tesla be able to use AI to solve transportation problems? Why is Microsoft going all in on AI in healthcare? We discuss this week's AI News That ...Matters. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Google's Role in Advertising and AI2. Tesla's Struggles with AI-powered Vehicles 3. Microsoft's Advances in Healthcare AI4. OpenAI's Financial Outlook and its Role in the AI Industry5. Impact of AI Models on Logic and ReasoningTimestamps:00:00 AI news continues: Tesla innovates, Microsoft focuses healthcare.06:02 Nobel Prize in Physics awarded for AI.09:52 Yahoo search is nearly irrelevant; Google's ad market shifting.12:00 Google Gemini struggles with up-to-date information integration.17:40 Interested in self-driving AI cars' business impact.20:32 Improve ChatGPT use with PPP course access.23:23 Doctors should use transcription for efficiency.26:04 OpenAI reports AI-generated election misinformation rise.28:50 AI spreading misinformation before upcoming US election.34:58 OpenAI achieved stage 2 with reasoning models.38:09 Apple invests heavily but relies on third-party models.42:03 Google, Tesla, Microsoft, OpenAI, Apple AI updates.Keywords:Google's US search ad market share, Google Gemini, Tesla, CyberCab, RoboVan, Waymo, Workplace productivity with self-driving cars, Microsoft AI tools in healthcare, Administrative burdens in healthcare, ChatGPT course, Apple AI research, GSM Symbolic, Meta's Llama, Microsoft's PHY, Google's JEMMA, OpenAI's GPT-4, Tesla stock decline, Tesla missing timelines, AI logical reasoning debate, AI vs human cognition, OpenAI projected loss, Google dominance in online search, DOJ actions against Google, Nobel Prize in Physics, AI misuse, Medical imaging models, Misinformation in US elections, Reasoning models, Fake content targeting US elections, CyberCab skepticism.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist. 

Transcript
Discussion (0)
Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. You may think we had a quiet week in AI because we didn't get a new large language model update.
Starting point is 00:00:54 But that is not the case because reportedly, Open AI may not be profitable until 2029. We got a lot of new AI powered innovation from Tesla. And Microsoft is going all in on 8. in health. So yeah, even though we didn't, I think for the first time in seemingly months, get huge updates from one of the big large language bottle makers. AI news did not stop. And neither do we. What's going on, y'all? My name is Jordan Wilson, and I'm the host of Everyday AI. Welcome every single Monday we bring you the AI news that matters. So you don't have to spend hours every single day trying to keep up with what's happening in the world of AI and how it might impact
Starting point is 00:01:41 your company or career. You can just do that with us every single Monday on the AI news that matters. So if you're new here, welcome to Everyday AI. This is a daily live stream podcast and free daily newsletter helping us all do just that. Keep up. Not just keep up, actually, but how we can use all of this information to get ahead. So if you are brand new here, thank you for joining us. Make sure if you haven't already, go to Your EverydayAI.com. That website should be your best friend, y'all because on our website it is literally the largest free unbiased source of everything generative AI on the web you can go back we have nearly now 400 episodes whatever you know if you care about marketing sales nonprofits whatever it is we have categories on our website you can go learn from
Starting point is 00:02:30 the leading experts in the world so if you haven't already please make sure to go check that out all right enough chit chat y'all good morning to our live stream audience joining us So everyone, Jay and Marie and Kurt, Michael, Kathleen, Brian, everyone else, sabbatical. Thank you for joining us. Let's just get straight into it, y'all. A lot going on in the world of AI. Let's get caught up. So Open AI is reportedly facing $44 billion in losses before profitability in 2020.
Starting point is 00:03:06 So that is according to some recent reports from the information. That says Open AI is projected to incur $44 billion in losses before reaching profitability in 2029. So according to reports, Open AI is attributing that to substantial expenditures on training AI models, employees salaries, and data acquisition. So the report estimates that Open AI could generate a rapid. $100 billion in revenue, but its current spending is approximately $7 billion on model training and $1.5 billion on staffing annually. Also, Microsoft, a key investor with a $13 billion stake in Open AI, is expected to take a 20% cut of the company's revenues, which may
Starting point is 00:04:01 obviously impact Open AI's profitability. So according to the report, Open AI is currently generating about $2 billion from chat GPT and $1 billion from access fees for large language models totaling in annual revenue somewhere between $3.5 to $4.5 billion. But market analysts suggest that OpenAI will need additional funding to sustain its operations following that recent $6.6 billion fundraising round we talked about two weeks ago that paying to Open AI at a valuation of $157 billion. So there's been a lot of concerns related to the long-term viability of AI startups as investor interest may be kind of peaking, potentially leading to takeovers or
Starting point is 00:04:55 increased scrutiny of financial practices. It seems like, if I'm being honest, a lot, like I don't think a lot of people, aside from those that work at, you know, private equity. venture capital firms have really worried about the long term sustainability of companies like Open AI and Anthropic because let's be honest, when any of those companies want to raise money, they can very easily raise billions of dollars on command. So, you know, people are like, oh, you know, this means Open AI is going to shut down. They're facing financial strain.
Starting point is 00:05:29 No, that's not what it means. So let's just call that exactly what it is. This is typical. This is typical going from a startup to profitability. So, you know, don't freak out over these recent reports. That says Open AI may not be profitable until, you know, 20, 29. That doesn't mean you shouldn't be using their technology. You absolutely should be using their technology or I profit Claude or Google Gemini or whatever is best for your business,
Starting point is 00:05:55 regardless of if these large language models are making money themselves. Because right now, we've talked about this all the long. the biggest race for these companies is for you. It is for us, right? That is what they ultimately care about. They care about attracting customers, retaining customers, and, you know, profit. They'll figure that part out. But right now, it's a great time to be a consumer, a business owner, you know, to take
Starting point is 00:06:22 advantage of all that these large language models have to offer. All right. Let's keep it going, y'all. Here's a big one. So the Nobel Prize in Physics has been awarded to two AI researchers for their groundbreaking work in artificial intelligence. So the recent awarding of the Nobel Prize in Physics has gone to two prominent research, marking a significant mild stone in the field of artificial intelligence. So John J. Hopfield, who's 91 in Jeffrey Hinton, who's 76, received the Nobel Prize. for their pioneering research that laid the groundwork for modern artificial intelligence.
Starting point is 00:07:07 So the majority of their work goes back to the 80s and before, where the two have been instrumental in developing machine learning, a process where computers learn from vast amounts of data to perform tasks, such as anything AI can do, diagnosing disease, personalizing entertainment recommendations, doing your homework, whatever AI does. So Hopfield is a professor at Princeton University, and invented the Hopfield Network in 1982, which was a neural network capable of mimicking certain brain functions and recalling information from partial data.
Starting point is 00:07:44 Hinton is often called the Godfather of AI based in Toronto, utilized Hotfield's inventions to create networks that classify and recognize patterns in large data sets, which can be applied in various fields, including image recognition. So most of you all have probably heard of Jeffrey Hinton. It was very famous when he departed from Google last May in May, 2023, and was driven by his desire to openly discuss the potential risks associated with AI technology, emphasizing concerns over its misuse by malicious actors. And that has definitely been going on.
Starting point is 00:08:24 And we have another news story today on that exact same topic. I don't know about y'all. Today, today's one of those Mondays. I need my coffee a little stronger. A little sleepy today. I don't know about you guys. Yeah, and FYI, we are an unedited, unscripted podcast. We're just bringing it to you real.
Starting point is 00:08:45 All right. Our next piece of AI news, could Google be in big trouble with the Department of Justice, maybe? So the U.S. Department of Justice has some recent proposals to address Google's dominance in online search, and that could significantly impact the tech giant's profitability and also its advancements in artificial intelligence. So the U.S. Department of Justice, the DOJ, is considering remedies that may force Google to divest key parts of its business, such as the Chrome browser in the Android operating system, which are perceived by feds to support its monopoly in online search. So other potential actions include restricting Google from collecting sensitive user
Starting point is 00:09:37 data, requiring transparency in search results, and allowing websites to opt out of their content being used for AI training. That is the big one, y'all. So following the Department of Justice announcement, Alphabet stock, Alphabet the parent company of Google, of course, dropped 1.5% indicating at least a little investor concern over implicating. of these remedies and on Google's long-term AI plans. So analysts are also warning about this, that these changes could obviously weaken Google's revenue and also provide more opportunities for competitors like Duck, DuckGo and Microsoft Bing.
Starting point is 00:10:20 And I guess, Yahoo! Does anyone use Yahoo search anymore? So especially as Google's U.S. search ad market share is projected to delete. dip below 50% by 2025. So yeah, that's not their search share, which is still in the 90s, but their search ad market share is expected to dip below 50%, which is kind of wild. So yeah, I think there's a lot of reasons why this is a, this is actually pretty big news. I would say the biggest is even though Google has been, I think, personally unable to capitalize on the amount of data,
Starting point is 00:11:00 that it has. And let's be honest, just the head start, right? I mean, the transformer infrastructure was developed or created by Google researchers, yet Google has been fairly far behind kind of in the large language model race, at least for front end products. I think Google and their Google Gemini has a great back end for developers. But if you're using the front end of, you know, Google Gemini, I think it's very far behind even Anthropic and, you know, the much smaller startups, Anthropic and Open AI, even though Google has better, more direct, more up-to-date access to all training data via its Googlebot, right? So if your company has a website, obviously it wants Google and has wanted Google to scrape
Starting point is 00:11:53 it and to access that information for many years. And in now doing so, you also. Also by default, you know, when you're saying, hey, Google, come to my website. You're also now by default saying, hey, Google, go ahead. You can also scrape all that content and use it for your Gemini models and all your other AI. So I think that part is particularly interesting. I'm not even so much focused on the DOJ trying to have or force Google to divest its Google Chrome browser and its Android operating system as much as I am on that AI training data piece, right? Because that is what I think has been Google's biggest advantage that
Starting point is 00:12:34 I think they haven't really taken advantage of because, for whatever reason, it seems like Google Gemini has this almost huge issue to bring up-to-date relevant information to its Gemini product via that kind of Google integration, which I know is kind of wild, right? Google knows everything up to the second, but Gemini, its front-end chat thought, kind of struggles bringing in recent and relevant information. Or I don't know. I could be alone in that, live stream audience. What do you think?
Starting point is 00:13:08 Is anyone out there using and loving Gemini on the front end? I haven't met too many people. Love the new notebook L.M. Don't get me wrong. But not a huge fan of what they have going on on the front end for Gemini chat. All right. Our next piece of AI news, speaking of things that I'm not. not personally the biggest fan of.
Starting point is 00:13:28 Tesla's cyber cab unveiling has failed to impress investors as their shares dropped immediately 9% and are now down more than 11% since the announcement. So Tesla's recent reveal of its AI powered cyber cab robot taxi concept has left investors very underwhelmed, resulting, like I said, in an almost immediate 9%. drop in shareholder value and 11% since the announcement. So if you didn't miss this, this was late last week. Tesla had an event branded as We Robot. Yeah, you don't want to get Will Smith mad.
Starting point is 00:14:11 You know, don't don't get too close to Will Smith's territory there. So the new cyber cab robo taxi concept. It's a concept. It showcased a two-seater self-driving vehicle that lacks. lack traditional controls such as a steering wheel and pedals. And instead, essentially, the AI self-driving does, you know, takes care of that for you. So CEO Elon Musk announced plans to introduce the cyber cab by 2027 with an anticipated price tag under $30,000, but provided no specifics on manufacturing locations or really specific.
Starting point is 00:14:55 specifics on anything. So before everyone out there collectively loses their marbles on that price tag and this great innovation, it's also worth noting Tesla's recent history with new vehicle releases. So as an example, Elon famously or maybe infamously announced the cyber truck in 2019 with a $39,000 price tag and a 2021 delivery date. We all know. that didn't happen, right? So the Tesla truck actually shipped out at the very end of 2023. And now, today, the price tag is 95,000. So right after it came out, it actually, oh, it's actually about six figures, not $39,000. So everyone looking at this, oh, cyber cab 2027, you know, AI cars with no steering wheels and pedals are going to be driving us around in three
Starting point is 00:15:50 years. I would not hold my breath. I would honestly say it's either going to cost about three to five times that or and or it's not going to be coming this decade. So yeah, I wouldn't exactly start saving your pennies for a cyber for a $30,000 cyber cab by 2027, FYI. So a little bit more details about it. The details were impressive. If it ever, if it ever comes to fruition is a completely other story. So Musk stated that Tesla aims to launch, which I mean, we've been hearing about this one for
Starting point is 00:16:23 years. It's unsupervised full self-driving FSD capabilities in Texas and California next year for existing Model 3 and Model Y vehicles, although the current FSD that's full full self-driving system still requires, yeah, human oversight. So kind of many different announcements. So following the event, like I talked about, Tesla's shares, marked a 12% decline year to date and 17% decline over the past year. So Tesla's not doing very good in terms of investor sentiment. Also, Elon Musk announced another new kind of AI driving vehicle with no steering wheel, no pedals called the RoboVon.
Starting point is 00:17:12 Yeah, that's Robovan, but the Robovan, I guess it's cooler if you pronounce it, Robovan. So the Robovan is designed for transporting larger groups of goods or people, but that also seemingly failed to generate any real excitement, at least when it comes to analysts. And Tesla has been getting crushed recently in the past couple of days since this announcement. Also, the event underscores the challenges Tesla is facing in bringing self-driving cars to market, especially as competitors like Waymo already had this technology.
Starting point is 00:17:48 and they've already launched successful robotaxy services in certain states. You know what? I was actually in Austin, Austin, Texas last week at a keynote speech I was doing. And I was super bummed that I couldn't catch a Waymo kind of taxi from the airport to my hotel. But yeah, maybe next time. Anyone out there, let me know. Have you guys taken these Waymo's or any, you know, self-driving AI cars? I'm pretty interested in this and let me know if you want to hear more about this on everyday AI.
Starting point is 00:18:23 We don't really go into self-driving cars and, you know, AI powered cars too much because I think it kind of starts to blur the line, right? And is that really, you know, impactful for your business and your career? Maybe not, aside from the fact that, okay, if we do get fully automated, fully self-driving cars, well, you can go do business in there and you don't have to drive. So maybe you normally have a, you know, 45-minute. commute, but if your car doesn't have a steering wheel or gas pedals and the law in your state allows it, okay, yeah, that turns into 45 more minutes, you know, of working, I guess. But, you know, we don't really cover this too much, but, you know, I think when Tesla comes out with a pretty big announcement like this, it's worth paying attention to.
Starting point is 00:19:07 So sabbatical here says, love to see the Waymo in Atlanta. Yeah. Yeah. And Cecilia saying there are already operational autonomous cabals. in the Bay Area. Yeah, absolutely. All of the innovation always strikes there first. All right. Our next piece of AI news, y'all, let's keep it going. Taking a shift here and going into health. So Microsoft has unveiled a new set of AI tools at reducing burnout in health care workers. I think this one's pretty important, y'all. So Microsoft has announced a suite of new
Starting point is 00:19:45 healthcare data and artificial intelligence tools designed to alleviate and administrative burden on clinicians, a significant contributor to burnout in the industry. So right now reports say that nurses spend up to 41% of their time on documentation. That's wild. So these new Microsoft AI tools could obviously be a game changer for the healthcare industry. All right, let's talk a little bit more about what these new AI tools are. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.
Starting point is 00:20:33 Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations.
Starting point is 00:21:16 Every step the assistant takes is visible, so you can refer. find, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adopi.com. So they include a collection of medical imaging models, a healthcare agent service, and an automated documentation solution specifically for nurses. So this comes in a set of multimodal AI models that can analyze various data types, such as medical images and clinical records, allowing healthcare organizations to develop more tailored applications for their needs. So right now, at least, Microsoft has partnered with Providence, Health, and Services to create
Starting point is 00:22:10 a whole slide model for pathology, enhancing cancer mutation, prediction, and subtyping. So this model is expected, like I said, to be a game changer for health systems looking to improve diagnostic capabilities. Now let's talk the healthcare agent. So the healthcare agent service enables organizations to build AI agents that assist with tasks like answering clinical questions and identifying relevant clinical trials, potentially saving doctors a significant amount of time. Microsoft's automated documentation tool for nurses aims to streamline their workflows,
Starting point is 00:22:47 which differs obviously from that of clinicians. So this tool right now is in. in development with input from organizations like Stanford Healthcare and my hometown plays here, Northwestern Medicine in Chicago. So Microsoft's previous tool, Dax co-pilot, has gained some traction among physicians, and the introduction of a similar tool for nurses highlights the company's commitment to supporting health care staff. So while many of these solutions are still in early development or preview stages, their
Starting point is 00:23:20 potential to transform health care workflows is significant, particularly in light of the growing demand for efficient technology-driven solutions in the sector. Yeah. So a shout out to Jay, had a great conversation with Jay a couple of weeks ago about this very thing, about AI and healthcare and burnout. So he's saying since the advent of EHRs or electronic health record, time spent documenting has increased substantially causing burnout and for physicians and others. a lot of time spent completing notes at home after hours. That is a huge, a huge factor here, y'all. And I think the, I think if I'm being honest, the AI, kind of the intersection of
Starting point is 00:24:02 AI and medical and health care is ripe for disruption, right? Yeah, especially that, just note taking, right? You know, so many times in the doctor, even over the last couple of years, I'm just sitting there waiting for the doctor to type notes or the nurse to type notes, right? Luckily, in their professions, right, they're okay at typing, but it's like, why are we still living? Why are we still living in that, in that world? Also, even within the same healthcare system, right? I remember my first appointment like three years ago where the doctor's like, oh, okay, or, you know, maybe it was like two and a half years ago. The doctor said,
Starting point is 00:24:40 hey, we'd like to record this and our system will automatically transcribe. And I'm like, yeah, like, why aren't we doing that at all times, right? I'd rather much, I'd much rather be talking to a face to face, then talking to the physician's shoulder or the nurse's shoulder as they, you know, pack on the keyboard. That's not their fault. Obviously, the healthcare system, they are really overburdened. You know, they're overwhelmed. You know, I think especially since the pandemic, it's just resulted in this longstanding burnout for our medical system here in the U.S. Our health care system is overwhelmed. So yeah, I'm personally looking forward to to this Microsoft partnership, you know, really rolling out to these other institutions,
Starting point is 00:25:25 as well as the big medical players starting to integrate this technology into their existing systems. So please, let's get this fixed. You know, the fact that, let me repeat this, nurses right now are spending 41% of their time on documentation. And it should be a fraction of that. That is what AI does best, right, to, accurately transcribe words to be able to put that in a computer. So, hey, healthcare system, medical innovators out there, let's get on board, whether it's with Microsoft, Google has some great options for AI in the healthcare industry. But it's about time that we unburden the healthcare industry by leveraging artificial intelligence.
Starting point is 00:26:14 Yeah, we have to figure out privacy concerns. Yes, I get that private health information. PII, all that. But y'all, let's tackle that because this is a no-brainer to help not just relieve the broken health care system for doctors, for nurses, all of these people who are just overworked, but also for us, right? Us humans that want to get into a doctor and we have to wait three months, six months, longer, because the health care system is so drowned out. All right, let's keep this thing going, y'all. A couple more AI stories here. I'd say Two big ones.
Starting point is 00:26:50 We always save the best for last here on the AI news that matters, y'all. All right. So Open AI has reported a rise in AI generated fake content targeting U.S. elections. So Open AI has revealed that its AI models have been increasingly exploited to create fake content aimed at influencing elections, raising concerns about the misuse of technology in the political landscape. So in 2023, Open AI announced that they neutralized over 20 sophisticated attempts where its AI tools were used to generate misleading articles and social media comments related to elections. Notably, a set of chat GPT accounts was identified in August that produced content on U.S. elections, highlighting the potential for AI to spread misinformation. The company also banned several accounts from RewaWT. Wanda in July that were involved in generating election-related comments for social media platforms.
Starting point is 00:27:55 And y'all, this isn't like one or two. This is in the thousands. This is just bulk election misinformation. OpenAI found that many entities and state actors were using their tools for reasons that the company doesn't want to. So despite these attempts, Open AI noted that none of the content generated achieved significant viral engagement or maintained a sustainable audience. The U.S. Department of Homeland Security has expressed growing concerns about foreign interference in the upcoming November 5th elections with countries like Russia, Iran, and China,
Starting point is 00:28:36 potentially using AI to spread divisive information. Y'all, we are in that home stretch, right? For election season, the election coming up here in the U.S. in about four weeks. weeks. And now is the time, especially if you are listening to this podcast, if you're watching this live stream, stay vigilant, right? You have to always double check anything that you read on social media. You have to double check what you read online because right now and especially kind of in that last hour, right? This is when all of the kind of AI powered information is going to hit its all-time high.
Starting point is 00:29:19 This, you know, because here's why, right? And there's, you know, both sides are using this, right? It's always kind of annoying to me when I, you know, read the AI news and I'm like, oh, this political party, you know, used AI to do this. And everyone's like, oh, you're biased. And I'm like, no, this is just the AI news, right? There's certain candidates and certain politicians right now that are using AI to spread a little bit more misinformation and disinformation. So you really need to stay vigilant about what you are reading.
Starting point is 00:29:51 Is it true or not? What you are sharing, right? All of these AI images now that look actually fairly real. And then these images are being used to create videos that are very real. So as the U.S. election comes up in a couple of weeks, a reminder to everyone, please always stay vigilant. get your news from trusted sources. You know, don't pay too much attention to what is spreading on Facebook and Twitter and YouTube because there is a good chance in the coming weeks. When it comes to politics, there's a good chance. A lot of it could be fake, right?
Starting point is 00:30:26 We've kind of known that this last month right before the election, it's going to be an onslaught because by the time that the general public is probably going to realize something is AI generated, it could be too late. So keep that in mind. All right. Our last piece of AI news here. An interesting one, Apple researchers are challenging the logic of leading AI models, including Open AI's latest 01.
Starting point is 00:30:55 So a new study from Apple researchers raises the significant questions about the logical capabilities of today's large language models, including Open AI's latest reasoning model 01. So the research team at Apple developed a new evaluation tool called GSM symbolic, which builds on the GSM 8K mathematical reasoning data set to test AI models more rigorously. So yeah, if you're a dork like me and you follow all of the benchmarks for large language model, you'll probably recognize that GSM, right? So the GSM 8K is a fairly popular kind of benchmark to, to see how smart these models are.
Starting point is 00:31:42 So that is essentially grade school math 8K with that 8K standing for 8,000 math word problems. So yeah, these are different essentially tests for large language models to see, okay, are they actually smart? Can they actually reason? So this new one from Apple researchers is similar to the very popular GSM 8K, but this one is called GSM symbolic. So the study from Apple researchers tested a range of models, both open source and proprietary,
Starting point is 00:32:16 including Google's, or sorry, Meta's Lama, Microsoft's FI, Google's Gemma, Mistral, and OpenAIs GPD40, and 01. So revealing that even top models struggle with real logic. So current accuracy scores on the GSM 8K data set are deemed unreliable with performance fluctuations observed across different models. For instance, the Lama 8B model scored between 70 and 80%, while Phi 3 varied from 75 to 90%. So yeah, huge gaps, right? 10 to 15% gaps in these tests are kind of saying they're unreliable. So researchers notice.
Starting point is 00:33:03 that as task difficulty increases, the variance in model performance also grows, suggests that handling this variation may require exponentially more data. And despite achieving high scores on benchmarks, OpenAI's 01 model. So that is, you know, the one that we originally, you know, was codenamed Q-Star, and then it was called Strawberry, and now it's rolled out as the O1 model. So despite achieving high scores on. those benchmarks, the O1 model still exhibits performance fluctuations and makes basic errors reinforcing the idea that they are more advanced pattern matchers than true reasoners.
Starting point is 00:33:46 All right. So pretty, this didn't grab a lot of headlines because it's a little dorky, right? Essentially, Apple researchers went. They did a bunch of very advanced benchmarking across all these models and said, hey, these models can't actually reason, right? And then they suggested a new benchmark going forward that could better test this reasoning ability. And in doing so, hopefully make that wide range, right?
Starting point is 00:34:16 So essentially what they're saying is, number one, these models can't actually reason, right? They can't actually use logic. They are just next token predictors. And one of the rationales behind that was, well, look at this wide variance in these scores, right? scoring anywhere from a 75 to a 90, right? If you are able to reason and repeat that reason over and over, you wouldn't have a 15 percentage kind of variation in these scores.
Starting point is 00:34:46 All right. So this study highlights a growing debate between leading AI research institutions as OpenAI maintains that 01 represents a significant step toward logical agents, while Apple researchers argue that the evidence points to limitations in reasoning. Let me just cut it to you real. I think there's a lot of different ways that this study can be interpreted, right? Number one, I think we all have to understand this kind of shift that's going on in the large language model industry, even though this is kind of open AIs language, but we are really taking a step, right? Kind of Open AI, you know, released slash leaked their five stages toward AGI a couple of months ago.
Starting point is 00:35:38 You know, stage one being chatbot, stage two being reasoners, stage three being agents. So OpenAI CEO Sam Altman said a couple of weeks ago that, hey, essentially that OpenAI has achieved stage two, which is reasoners, which are models that can reason. models that can, you know, in the 01 model, essentially something that is more than a next token prediction, right? Because right now, the best way to get the most out of large language models is a human has to continually guide it. A human has to, right? I like having visuals, right?
Starting point is 00:36:17 And you say, oh, you know, you say something to chat GPT and then chat GPT gives you a response. And then you go back and forth, back and forth, right? And you might be chatting and you might have a, you know, 15, 20, 30 interactions with chat GPT in the same kind of chat window, trying to guide it to better outcomes, right? And if you're doing this correctly, right, you might say, oh, that's, you know, some good prompt engineering. That's, you know, using, you know, chain of thought to prompting. You know, there's all these different prompting techniques. And essentially with these new models that are reasoning models, essentially the big companies are saying, hey, now the models themselves are
Starting point is 00:36:56 going to be doing that work that the human would normally be doing, right? So normally, even though you start with giving chat GPT or other large language models an end goal, it still takes a lot of finessing from humans to get them to that end goal. Some, you know, some subtle guidance and some, in some correction, right? So with these new reasoning models, the thought is, okay, these models are actually going to be kind of going, you know, doing the actions and all of those incremental steps. you know, kind of step by step, right, is a common phrase when prompting a large language model to go through this chain of thought reasoning or thinking more like a human.
Starting point is 00:37:38 So Apple researchers here, you have to think, right, don't get me wrong. All these big companies have some of the most brilliant minds. But you have to look past the research, right? Why was this study even greenlit? Well, you have to remember. Reports last year was that Apple was spending millions, millions with an S, millions of dollars a day on developing its own internal large language model. Yet, when they announced Apple intelligence, which has not had a very successful rollout so far, right, Apple was not able to kind of release its own flagship model. So they have a small edge AI, right?
Starting point is 00:38:26 So they have a kind of a small large language model that runs on device or on computer, right, on your computer, on your smartphone and doesn't require the cloud. But for the most part, for most complex tasks, it's actually tapping in to Open AIS GPT40. So I always like to read through the, you know, the marketing speak here. So you have to think if there's some other implications here of Apple researchers. And again, Apple spending millions of dollars a day trying to develop its own large language model seemingly did not accomplish its goal in time for Apple intelligence. Therefore, having to rely on a third party. And there's been other reports that Apple is still looking at working in the future with other large language model developers, whether that's Google or Claude, you know, Claude from Anthropic.
Starting point is 00:39:22 So you have to think if there's maybe some other things going on kind of behind the scenes or for the reason why this model or this study is coming out with Apple researchers saying, hey, these super smart models, they can't actually think. Don't worry about them. So pretty interesting study. You know, if you're a dork like me, you might, you might read, you might enjoy reading it. The other thing, and we're going to close with this before I recap everything, there is a bigger argument here. year, right, for even how large language models work versus how the human brain works, right?
Starting point is 00:40:01 I think there's been so much of a focus on saying, hey, yes, large language models, even though as they get more and more powerful, you know, keep in mind, they're just next token predictors, which on the surface, yes, very true. But also, you know, it's people are saying they make this argument as to why. why AI is not super innovative as to why AI is not going to be impactful. So the deflectors always say, oh, it's just next token prediction. It's only as smart as its data set, right? Isn't that humans?
Starting point is 00:40:37 Isn't that how our brains work similarly to a neural network, right? That's what we are using. We are using the data set that we've been trained on, right? Your own life experiences, what you learned in books. what you've read, what you've been taught, right? Reinforcement learning. That's what humans go, you know, go through. When I'm little and I break a lamp and my mom says, stop breaking the lamp. Eventually, my actions that I make from then on are probably going to reduce the likelihood that I'm going to break that lamp again, right? So, I mean, is there that much difference in the end with how
Starting point is 00:41:15 a large language model processes data? It's trained by humans, right? Reinforcement learning with with human feedback. And then the data is always updated. And yes, it's a next token prediction, but that's essentially what human brains do. Right. So you're going to continue to hear a lot of this,
Starting point is 00:41:33 y'all, through the rest of 2024 and into 2025, you know, as companies like Google, you know, reportedly we talked about this last week on the AI News That Matters. They're reportedly working on their reasoning model,
Starting point is 00:41:45 right? So we're going to be going, really shifting hard into talking about reasoning models, models that can actually quote unquote think like humans and agentic system. So that's really what we're going to be focusing on a lot more here in the last couple of months of this year and going into 2025. So I think it's important to read studies like this, but to also know why they're coming out at the time that they're coming out.
Starting point is 00:42:09 All right. Yeah. Someone said here, big bogey face. Just Apple intelligence face palm. Yeah. Not super impressed with Apple intelligence so far. All right. So that is it for.
Starting point is 00:42:20 for the AI news that matters this week. Let's do a very quick recap here. So first, we talked about Apple is facing up to $44 billion in losses, according to a report from the information before it may achieve profitability by 2029. The Nobel Prize in Physics was awarded to groundbreaking research in artificial intelligence from John J. Hopfield and Jeffrey Hinton. Next, Google is in hot water as the U.S. Department of Justice has proposed some remedies for Google that could reshape the online giants kind of share in the search and AI landscape. Tesla undealed a lot of AI powered driving capabilities in its new cyber cab reportedly to be released without steering wheels, without pedals by 2020.
Starting point is 00:43:18 and for under $30,000. I don't think anyone actually believes that, and Tesla's stock went into the tank. Next, Microsoft has unveiled some new AI tools aimed at reducing burnout in health care workers. Next, Open AI reported a rise in AI-generated fake content targeting U.S. elections. And then last but not least, Apple researchers
Starting point is 00:43:44 released a new paper challenging the logic of leading AI models and their inability to reason, including OpenAI's latest O1 model. All right, y'all, that was a lot. I hope this was helpful if it was. Please share this show with someone. Yeah, I know it might be your little secret at your company because now, if you listen all the time, if you read our newsletter, you're probably one of the smartest people in AI at your company. So everyone's looking at you for the answers, right?
Starting point is 00:44:15 Everyone's looking at Michael for the answers and looking at Marie for the answers and Dr. Scott, right? This might be your secret, but please, we'd appreciate it. If you'd share this with others, if you're listening on the podcast, Spotify, Apple, thank you so much for tuning in. Please make sure to follow the show. Leave us a rating and a review as well. And please go to your everyday AI.com. Sign up for the free daily newsletter and join us back tomorrow. And every day for more Everyday AI.
Starting point is 00:44:41 Thanks, y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words, and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time.
Starting point is 00:45:15 See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.