Everyday AI Podcast – An AI and ChatGPT Podcast - EP 163: Google Gemini - ChatGPT killer or a marketing stunt?

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the all-in-one creative AI studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. Google got so many things wrong when it came to the release and the marketing behind their new Gemini model.

Starting point is 00:00:56 It's been almost a week. Yes, I took that long to decide how I'm going to have or how I'm going to deliver today's episode. Because I think that there's a problem going on. right now with Google. And I really want to talk about whether that this new Google Gemini model, its latest large language model, is a chat GPT killer or just a marketing stunt gone horribly wrong. All right. So we're going to talk about that today on everyday AI.

Starting point is 00:01:33 This is your daily live stream podcast and free daily newsletter, helping everyday people like you and me, not just learn what's going on in the world of generative AI because they're There's a lot, but how we can all actually leverage it, how we can use everything that's going on day to day, these new tools, tips, techniques, and strategies, how we can use them, understand them to help grow our companies, grow our careers. My name's Jordan Wilson. I'm your host, and I want to know, I want to know your thoughts. What do you think so far of this Google Gemini?

Starting point is 00:02:06 Have you used it yet inside Google Bard? do you think Google handled it well? I want to know from you. Today's going to be one of those shows where I ask a lot of questions of our live audience. If you are joining us on the podcast, thank you as always. We appreciate your support. Please follow us, leave us a rating. But also check the show notes.

Starting point is 00:02:24 Every single day we leave show notes in there. So you can come back here. You can join the conversation going on on LinkedIn and other social media. So please, I want to hear from everyone out there because I think today's episode on Google Gemini is extremely important to talk about because it's actually a lot. because it's actually about the bigger picture. It's not just about Google Gemini. It's about the company, Google.

Starting point is 00:02:44 It's about the state of large language model. So I do want to hear from you. So before we get started, as we always do, let's talk about what's going on in the world of AI news. So it was only a matter of time, but AI can officially read our minds. Scientists have developed a new groundbreaking technology that can read human thoughts

Starting point is 00:03:02 and convert them into text through an EEG cap in an AI model. So this technology, was developed by the GrafenX-UTS human-centric artificial intelligence center at the University of Technology in Sydney in Australia, and it was recently featured at a prestigious AI conference. That was a mouthful, right? But this system uses an EEG cap to record and decode brain activity and an AI model named DeWave to translate EEG signals into coherent words and sentences.

Starting point is 00:03:36 Yes, this is not science fiction. This is real science. But yeah, apparently now this was just unveiled today. Yeah, AI can read our minds and actually put it into coherent words and sentences. I'm actually happy for this for those times when I'm thinking about something AI-related and can just get it down on paper, right? All right, next piece of news. How prominent are hallucinations in AI models? Well, it's so prominent the word hallucinate is now officially the dictionary.

Starting point is 00:04:06 Word of the Day or sorry, Word of the Year. Yeah, Word of the Year. So hallucinate, if you don't know, it means when an AI model gives you false or inaccurate information. So every single year, Dictionary.com, toward the end of the year, names their word of the day or word of the year. And this year, it's hallucinate, right? It's something we talk about on the everyday AI show all the time.

Starting point is 00:04:28 We've done multiple episodes specifically on how to get large language models to hallucinate less. So if you're like every other person out there and you're worried about hallucinations, make sure to go to your everyday AI.com. Search the word hallucination or hallucinate, and you'll find so many different episodes that will help you. And so a little more about hallucinate. So this term emerged from technical jargon in the 1970s,

Starting point is 00:04:51 which I didn't know. And it has seen a 46% increase in lookups this year. Oh, man. I used to side note, I used to just look up words all the time in the dictionary for fun. Weird, right? All right. Last but not least in AI news, political robocalling is changing due to AI. So a Democratic congressional candidate in Pennsylvania's 10th district is using an AI powered phone campaign to reach voters and fundraise, making it the first of its kind, officially known at least,

Starting point is 00:05:23 in U.S. politics. So the company is Sivox, and that's the company behind the AI campaign, hopes to start a public conversation about the role of AI in politics. So I'm not sure how. how y'all feel about this, but especially here in the U.S., political robocalls and robotechs are terribly annoying, but they're extremely prominent, especially when we get into that election season with an election, now about less than 11 months away here in the U.S. So expect if you haven't started getting calls from actual AIs that you can talk back and forth with, unfortunately, I think you're going to have to expect that soon. All right.

Starting point is 00:06:02 So there's more. There's always, it's more news, more what's happening in the world and how it affects you. Make sure you go to your everyday AI.com. Sign it for that free daily newsletter where every single day, we break down the conversation that we have on the podcast, on the live stream, but also we cover just about everything else. We have our AI news, which is, you know, some of the things that I previewed as well as

Starting point is 00:06:26 others, fresh finds, which are just different important things happening in the AI world, kind of across the internet. And we have a whole lot more as well as previewing other shows that we have coming up on everyday AI. All right, big wind up this morning. If you're joining us, let me know. I want to know from you. How hot should this hot take be?

Starting point is 00:06:51 Should this hot take Tuesday be when I'm talking about Google Gemini? Let me know if you're joining us live. Should I take it easy on them? Should I tell how it really is? or should I just burn all the bridges? Should I just go full hot-tick mode? And make Google never want to talk to the Everyday AI show. Let me know.

Starting point is 00:07:11 I want to know from you. And thank you, everyone, for joining us as always. Mike joining us. Good morning. Woozy and Tara joining us as well. Thank you all. Mike joining us. Liz.

Starting point is 00:07:21 So many people. Thank you. But let me know. What are your thoughts so far on Google Gemini? I'm going to get into it. I'm going to get into it, but I always want to hear from our audience. So what are your thoughts? What are your questions?

Starting point is 00:07:38 I love this from Douglas saying, quick, de wave. What am I thinking now? Yes, that AI brain model. If I could get it live as well, that that would be fun. All right, but let's talk a little bit. And so far, the votes are in. It's a lot of flame emojis. People want full, full heat.

Starting point is 00:07:58 Full heat. All right. Well, Cecilia says never burn your bridges, but everyone else is giving me like 50 flame emojis. So I'm going to have to go with the majority of people here. So all right. We're going full hot take. But let's talk high level overview. Let's talk high level overview on Google Gemini, what this is, what it means.

Starting point is 00:08:21 So yes, I've waited almost a full week. So they announced this almost, let's see, it's been six days. Today's day seven. So about a week ago, Google announced Gemini. So here's the overview. So Google Bard has been using the Palm to large language model. So Google is replacing that with a much more sophisticated and a much better model in Gemini. All right.

Starting point is 00:08:47 So Palm is out, Gemini is in. So there's essentially three flavors of Gemini. So you have Ultra, Pro, and Nano. All right? So Ultra is essentially the largest, you know. use cases. Pro is kind of like the day-to-day use cases. And Nano is a much smaller version of the model that you can actually fit on a physical device. So Google is going to be putting it in their hardware. So let's start off with the good things. That's awesome, right? I love seeing smaller,

Starting point is 00:09:15 more capable models that don't even necessarily need to be connected to the internet and can run locally on a device. That is one of the big pieces of the future of large language models. So let's start with the good, right? Before I just start, you know, busting out the flame, Let's start with the good. Also, it is the first true multimodal-based model. Okay? So what that means is, you know, when you think of GPT, the model from OpenAI, it is multimodal. So being able to input, not just text, but images and audio, and then on the output, being able to receive, you know, kind of multiple medias. So, you know, being able to input and output, text, photo, audio. But GPT was built as a text model first, and then kind of these multimodal functionalities were built on top of it, so to speak.

Starting point is 00:10:04 So Gemini is, at least according to all reporting and everything that I can see, the first true multimodal first model. So that's a big step in the correct direction on generative AI and large language models being the future of how we work. So another good thing is obviously built in integration with Google services. All right. And then like we talked about, the ability to run on physical devices. So what most, well, what most people do know is ultra is not out yet, right? So that's what everyone's talking about, this ultra mode, you know, inside Google Gemini. But if you go on to Bard right now, it is actually the pro mode. So more on that later, but keep that in mind. Google launch with kind of giving people access inside of Google Bar to this middle tier mode. That's extremely important. It's extremely important. All right. So now as I sip my good morning coffee, let me know.

Starting point is 00:11:00 What do you, I mean, are you all drinking coffee when we go over this? Are you walking the dog? Are you on the treadmill? Let me know. Getting caffeinated up because I saw a lot of people wanting the hot tics. All right.

Starting point is 00:11:12 So let's talk about this. Google pretty much lied about Gemini's capabilities. I said pretty much, all right? I'm not going to say they straight up lied. we'll say they hallucinated. All right. We'll say they hallucinated. They did not tell the full truth.

Starting point is 00:11:31 All right. So let's look at, let's look at why and what they said. All right. So much of this was based around a video release. All right. And everyone everywhere, if you care about generative AI,

Starting point is 00:11:44 if you're on Twitter, if you're on LinkedIn, if you follow AI news, you saw this video. Okay. So Google released a short video. And it was essentially showed a person interacting with Gemini in real time. Right.

Starting point is 00:12:00 And I have some screenshots of it that will show you here in a second. But essentially, it was someone on the left side interacting. And there was an overhead video with different objects asking questions and talking to the Gemini model. And Gemini was seeing what was going on on the screen and interacting in real time. All right. And so the only piece of, the only note that Google put up initially was on the video, they said this. They said, we've been testing the capabilities of Gemini, our new multimodal AI model. We've been capturing footage.

Starting point is 00:12:36 Okay. Test it on a wide range of challenges, showing it a series of images and asking it to reason what it sees. All right. So the thing that I want to point out, is interactions, right? That's what I have underlined there. So here's one example. Here's one example of what was shown. Okay. So again, this is a video. So this is a screenshot from a video. But essentially in the video, the actor or the main participant who is talking with Gemini does paper, rock scissors, right? And yes, I say paper rock scissors. I know that's weird. I know 99.9% of

Starting point is 00:13:18 world says rock, paper, scissors. I just can't. I grew up calling it paper rock scissors. So the person's playing paper rock scissors. And then Google Gemini says, I know what you're doing. You're playing rock paper scissors. And the collective internet lost its mind. Everyone was going nuts. But here's the problem. None of that was actually in real time, like Google. said it wasn't right uh the person was not actually talking to google gemini google gemini was not actually listening but that is the the exact message and it was a very well done marketing video right and that is what people really started running with immediately when google announced this video just about everyone who's anyone in the world of AI right i was going to call people out by names

Starting point is 00:14:16 Yeah, I was going to call them out all those, you know, quote unquote influencers on Twitter and LinkedIn, those 22 year old crypto bros that you're buying your chat GPT prompts from. They all said, chat GPT killer. Look at this model in real time. This is the future. This is the best thing ever. Google is taking over. But none of it was actually in real time. Yeah.

Starting point is 00:14:43 And here. Now the former journalist in me, has a lot of problems with this. All right? A lot of problems. Because, yeah, there were falsehoods, untruths, white lies all over the place. The only other thing that Google said, aside from that message on the screen, was this in the YouTube video description.

Starting point is 00:15:09 It said, for the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity. That is nonsense. That is hot garbage untruths. That is a hallucination. That is terribly inaccurate. Because let's break it down, shall we? Let's break down Google's own words and let them know that no.

Starting point is 00:15:39 Wrong. BS detector is blaring, blaring. Latency has been reduced. Yeah, latency has been reduced by completely reconstruct, completely reconstructing how this was done. It was not done in real time. I'm going to show you. And then it says,

Starting point is 00:15:59 and Gemini outputs have been shortened for brevity. No, they haven't. They've been completely reproduced in a different way that was not represented in this demo video that was seen by millions of people that was shared by tens of thousands of the biggest names in tech and age. All those people you're following but should stop following because everyone is just blindly reposting anyone's marketing messages without digging into it.

Starting point is 00:16:28 Y'all, I saw this. I was at the AI summit in New York City speaking. When I saw this Wednesday, I saw the demo video. Two thoughts immediately came through my mind. Number one, I said, that's impressive. Number two, I said, I got to dig into that because I don't think it's possible. Right? I wasn't going to share it with you.

Starting point is 00:16:48 all. I wasn't going to say, hey, this new Gemini model is the best thing ever. We talked about its release, but we didn't say anything about it's a chat GPT killer. It is correctly doing this in real time. It's not. Let's take a look at how they actually did it. And break down those words. Yeah. So latency has been reduced. No, it didn't. The video was was produced. Gemini was not in real time. And then when they said, Gemini outputs have been shortened for brevity. No. those multimodal input outputs were faked. They were text and photos. They were not real time.

Starting point is 00:17:26 So we're going to leave links in the newsletter. So make sure if you're listening on the podcast or on the live stream, if you are not already signed up for the newsletter, what the heck is going on? Go to your everyday AI.com. Sign up for the newsletter. We will leave a link to the paper because essentially on Google's developer blog, they eventually, I don't know the timing.

Starting point is 00:17:45 I believe it was after, after people started questioning the validity of this video, but I'm not sure on that because I don't think it's timestamped. However, on the Google developer blog, to their credit, Google did release how this video was actually made and how it was put together. So this paper, rock scissors. Yes, paper rock scissors. Where in the video, the person goes paper, rock scissors, and Google says, oh, I see what, or Google Gemini says, oh, I see what you're doing.

Starting point is 00:18:14 That's not how it worked. There was no real video. What they did is they recorded the video. Then they took mini screenshots and then they uploaded those screenshots with text into Google Gemini. So no for this demo that set the internet world on fire. There was no seeing Google Gemini was not seeing real time anything. It was not hearing anything real time. time. It was just the same as every other model. They took screenshots, many screen grabs,

Starting point is 00:18:56 go look at it on the developer's blog that we'll link to. And in this example, so we have the three photos, one of a paper, one of a rock, and one of scissors. It's someone's hand doing these different techniques. So not only did they not just upload the video, right? Look at how many levels of hallucinations from Google's marketing, right? They didn't upload it. They didn't even, number one, it wasn't real time. Number two, they didn't upload a video into Gemini. Number three, they didn't just do screenshots of the paper rock scissors.

Starting point is 00:19:32 They did photos, three photos, one of a hand making the paper, one of a hand making the rock, one of a hand making the scissors. And guess what? Even at that point, they couldn't just, upload it and Google Gemini could tell them. They had to say, what do you think I'm doing? Hint, it's a game.

Starting point is 00:19:54 Hmm. Hmm. What do y'all think? Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant. Now live in the Adobe Firefly app,

Starting point is 00:20:18 the all-in-one creative AI studio. Powered by Adobe's creative agent, Firefly AI assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouching,

Starting point is 00:20:55 and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adopi.com. Is Google's pants on fire? Did they lie? Did they hallucinate?

Starting point is 00:21:23 Is this marketing? What I would like to say is it's not truthful. It is so far from the truth. Jerry says don't hold back. More flame emojis. All right. All right, Jerry. Yeah.

Starting point is 00:21:41 Mike says the hot takes are scorching. All right. Well, we'll keep it going then. We'll keep it going. But let's keep investigating. And let me tell everyone this. Yeah, since people are wanting, the hot takes, stop listening to.

Starting point is 00:21:55 to other people on the internet, right? I know that sounds weird, like, oh, only listen to me. I'm a former journalist. When a company tells me something, I never just talk about it on the everyday AI show. I never put it blindly in the newsletter where thousands of people are reading and trusting what we're putting out there is true.

Starting point is 00:22:16 If your mother says she loves you, get it in writing. Y'all, I was an award-winning journalist. When I see this kind of stuff, I know it is hot marketing garbage. and I'm never going to go on and just trumpet what all these other crypto bros are doing to get you to sign up for their newsletter so they can sell you some crap AI products that are going to go out of business anyways. That's not me.

Starting point is 00:22:40 That's not everyday AI. If you're listening, if you're watching, if you're reading our newsletter, we are giving you the truth. All right? And this was not the truth. From Google, it was not the truth. So, yeah. In other words, the entire premise of the video was faked.

Starting point is 00:22:59 Okay. It's not like Google was announcing, you know, new features on a camera or Google was announcing, you know, some, some new Gmail feature. No. The whole premise of this video and the whole secret sauce or the whole USP, the unique selling proposition of this new Gemini model was the multi-modality. Right. was the fact that presumably Gemini could see and hear and talk in real time. And guess what? It was no better or no different than what we can already do with other models. And eventually, when people caught on days later, Google got scorched. Rightfully so.

Starting point is 00:23:50 You see these articles says Google's Gemini marketing trick. an article from CNBC Google Faces controversy over edited Gemini AI demo video. Yes. But the initial reaction was Google's stock shot up through the roof. And now since that happened and since people realize, yeah, Google, you faked this, this wasn't truthful. Their stock has been going down and I think it should continue to go down. But also I'm curious for our live stream audience.

Starting point is 00:24:30 Did you believe the video? 99% of people did. I'm a skeptic. I'm a little bit cynical at times. The old journalist in me just didn't believe a single thing. But I think most people, even very smart people that I, that I trust, were duped. And Google has been getting dragged through the mud, mud ever since. Now, you thought the takes were hot before.

Starting point is 00:24:57 Here we go. I'm settling in my seat because the takes are going to get even hotter. Here's the thing, Google. Why would you put out a video like this? Did you really think that no one was going to notice? Did you think that no one was going to dig in? Did you think that no one was going to explore how this seemingly revolutionary, that piece of technology, y'all?

Starting point is 00:25:24 If it worked like that live and in real time, that is not only bigger than chat GPT. That's one of the biggest technological innovations since the smartphone. But it wasn't true. Here's the thing. No one's talking about this. You can do the exact same thing in chat GPT. I literally took the exact same prompt in the same pictures. And obviously, I uploaded the same thing.

Starting point is 00:25:54 I said, what do you think? The same photo inside of chat GPT, inside of their multimodal GPT4, I uploaded the photo. I said, what do you think I'm doing? Hint, it's a game. The exact same thing that Google did. It wasn't a video. They uploaded a screen. They uploaded photos and texts.

Starting point is 00:26:13 My gosh, this is embarrassing for them. But guess what? GPT4 did it just fine, obviously. So here's my other bone to pick, right? Because right now, we talk about the three models, Ultra, Pro, and Nano. Right now, Google Bard has the Gemini Pro model, which is not very good, right? So we are comparing, even in these screenshots, we are comparing this ultra model that's not even out yet. It will reportedly be available sometime in 2024.

Starting point is 00:26:48 So we are comparing an unknown model that no one out there can test or use to Jesus. which is already out there. So guess what? We could have recreated, in theory, recreated this entire thing in GPT4 that's already out. And I'm showing you an example of it. Huge fail from Google marketing. Huge fail.

Starting point is 00:27:14 But that might not be the most concerning part, right? The fact that, hey, it actually wasn't live. Hey, we actually just took a bunch of screenshots and had to do a a lot of prompting, a lot of prompting inside, inside Gemini to get this to work correctly. And then we took those responses and narrated them back. Oh my gosh. So deceitful. But that might not be the most concerning part because like I talked about, the public

Starting point is 00:27:42 perception was already set by the time three days later that people realized, wait. This isn't how it works. It wasn't real time, right? Because every single person out there, every little influencer on Twitter trying to get you to buy something crappy from them, every single news outlet just blindly said, you know, Gemini, more powerful than GPT4. And, you know, everyone on Twitter essentially put the same thing out there. Here is the chat GPT killer. No. Gemini was not a chat GPT killer.

Starting point is 00:28:26 They were promoting a model that none of us can see, use, or benchmark. All right. So here's the problem. Here's the problem. Yeah. Like Douglas said, I thought it was an impressive video, but we need to see it, touch it, feel it before we can believe it. Absolutely. The truth is important, like Brian is saying.

Starting point is 00:28:51 Cecilia brings up a great point. She said, did Google lie or did we all this? assume and you know what that makes out of you and me. Yeah. I think it was both, Cecilia. I think the majority of people just assumed and took, you know, their spoon fed, uh, information from big Google and just reposted it to try to get people to sign up or whatever crappy product or service that they're offering ultimately in the end. But I think also, yes, Google faked it. Google represented. that this was happening live and in real time.

Starting point is 00:29:30 And not only did they misrepresented, but even their language, right, when they said, hey, these are interactions. Those weren't interactions. Those were not interactions, right? And when they said it's been edited for brevity, no, it wasn't. No, it wasn't.

Starting point is 00:29:50 It was just screenshots. It was completely reproduced and reconstructed from scratch. none of those live interactions actually happened. Like I said, that's why I said probably lied in parentheses, but they faked it. This was full-blown big tech hallucinations. Bad.

Starting point is 00:30:16 It was bad. So the problems. So Gemini Ultra, which was quote unquote better than chat GPT, isn't even out yet. Okay. Gemini Pro, which we're going to get to, in a second from a benchmarks perspective. So what we can all go use right now?

Starting point is 00:30:34 Yeah, you can go use Gemini inside Google Bard. Like I said, they swapped out the Palm 2 model for Gemini Pro. It is in line with GPT 3.5. Okay? The benchmarks were not favorable. So the combination of those things, number one, the best model you can't use. Number two, the model you can use is testing like GPT 3.5. And number three, the benchmarks were not favorable.

Starting point is 00:31:02 Let me ask Google. Why did you release this now? This was a huge, huge failure releasing this now. Not just the way that you released it, but the timing of it all. Because actually, about four days before this announcement, there was all this reporting that was saying that the Gemini model was getting delayed until 2024. Right? So I'm guessing it might have been a, you know, shareholder response.

Starting point is 00:31:32 Hey, we need to get something out there. You know, initially this was supposed to be a live event. And instead, it just got released through a press release and a YouTube video. So obviously there's a lot of things going on behind the scenes that you and I don't have access to. However, anyone can see now that this release was disastrous. not only the delivery, the delivery and what I will say is the deceit, but the timing. Why would Google release this when we can't even use or test this model that they said is so fantastic? And when the one that we can use is comparatively running at the same speed as models and the same power as models that are almost two years old.

Starting point is 00:32:23 Let's look at those benchmarks. Let's dig in. Let's dig in, shall we? So here we go. I got to make my screen bigger for this. So the big thing is there's all these different benchmarks. Okay. And again, we're going to have the link and you can go read this paper.

Starting point is 00:32:39 It's a 70, 60 page PDF. I read through most of it. Specifically, people are talking about the benchmarks in this table here. Okay. Where it's comparing Gemini Ultra, Gemini Pro, GBT4, GVT4, GVT, 3.5 and some other models across different benchmarks. And what Google said was Gemini Ultra was ahead of GPD 4 in 30 out of 32 benchmarks. Is that the truth?

Starting point is 00:33:15 No. No. It's not the truth. Not the full truth. Let's take a look. All right. So now, again, if you're listening on the podcast, I'm going to do my best to describe this. but it is some kind of technical benchmarking.

Starting point is 00:33:36 Okay, but right now, keep in mind, Google is comparing a model in Ultra. No one has access to Ultra. You only have access to Pro, okay? That no one can test, and they're also using different testing methodologies. So I have highlighted here Gemini Ultra and Gemini, or sorry, Gemini Ultra and GPT4.

Starting point is 00:34:00 And you'll see that for the most part, in every single one of these except Hela Swag, which is, which GPD4 wiped the floor with Gemini Ultra in Hela Swag, which is actually a very unique test. But in every other test, Gemini Ultra outperform GPD4. Well, number one, I would hope so. I would hope that an unreleased model would outperform the model that's already been in production

Starting point is 00:34:32 for almost a year. Gosh, I would hope. How embarrassing would that be? Okay. But look in the details, y'all. Look in the details. I know it's kind of hard to see. So let's look at this test, which is GSM 8K, which is grade school math.

Starting point is 00:34:51 All right. The difference is the Gemini Ultra went through a 32 shot. And the same thing with the MMLU, which we're going to talk about here in a second. It went through a chain of thought 32 shot. So without getting too technical and too into the weeds, chain of thought is a methodology of prompting that makes it much easier or much more increases the likelihood of a model getting it correctly. So a 32 attempt chain of thought.

Starting point is 00:35:19 That's what they are comparing to the GPT where they're also looking at the five shot. Okay. So they're kind of cherry picking. Right. So when you look at the GSM 8K, which is grade school math, it's a test. All of these are different tests and benchmarks that you can see how well and you can see definitively how models perform against each other. So you can say, oh, this one's best at A, B and C. This one's best at DENF, right?

Starting point is 00:35:48 But when you even look at the GSM 8K, Gemini Ultra did a 32 shot and GPT4 did a five shot. So you're comparing apples and orange slices here, Google, and no one else can run these tests to confirm them. Let's keep looking. Because here's here's the real story that no one is talking about. What we can all use right now, which is Gemini Pro. It gets straight up smoked in almost all of these benchmarks by GPT4, which a kid Google, what's the logic behind releasing a model?

Starting point is 00:36:35 You know, all this hype, all this marketing and say, oh, well, number one, we didn't actually do this. It wasn't actually real time. And that's our way better model and you can't use it yet. And obviously everyone rushes out to go try Gemini Pro inside Google Bart and it's straight. Garbage. Anyone that used it said, oh, yeah, it's way worse than GPT4 and the benchmarks even confirm it. So Google, why?

Starting point is 00:36:59 on earth, would you even, I would not have released Gemini Pro? Number one, I wouldn't have done this announcement at all. It was botched on every level. Number two, if you're going to do this, don't put Gemini Pro out there at all. Again, I know there's a lot of things that are above my pay grade and I don't understand. But if you put out and you release something and people go out and all they can use is pro, all they can use is pro. Number one, the average consumer is not going to know the difference.

Starting point is 00:37:29 I'm not the average consumer. Most of the people listening to the show aren't. But the average person, they saw this on social media and they go into their Google barred and they start using it and they see it's still hot garbage. It is outperformed by GPT on almost every single benchmark. And it is not even close. It is not even close. Not close.

Starting point is 00:37:54 All right. And here's something that no one else is talking about. So MMLU testing. That is a massive multitask language understanding. So this is a newer and what a lot of people who are smarter than me call the gold standard for evaluating large language models. So it's essentially think of it as like an SAT, right? It's not that simple. But this is kind of a standardized test to see how well different models can do in this massive multitask language understanding.

Starting point is 00:38:26 Okay. So even when we look at Gem, and I ultra. Okay, the model that's not out yet. And if you look at the same testing methodology, a five shot, right, which is much more realistic, a 32 shot chain of prompt, that's not how people use large language models, you know, a five shot test is much more realistic. That is much more indicative or truthful and how most people might be using one of these

Starting point is 00:38:56 models, right? So y'all, when you're comparing average use case across the quote unquote gold standard of large language model testing, Gemini Ultra scored an 83.7% on this MMLU, whereas GPT4 scored a much higher score at 86.4%. Again, Google, did you not think? Did you not think this through? because here's here's the other reality. This is not even out yet. This is not out yet. Okay.

Starting point is 00:39:36 So by the time it comes out, you know, whether it's first quarter of 2024, second quarter, how long until Open AI releases GPT5, right? Yes, it looks like Ultra is going to outperform GPT4 on some metrics and through some testing methodologies, but for the most part, I think your best use case is the average person sees it as pretty similar to GPT4. Why would you do that? Why did Google release Gemini? I think, y'all, it was a massive misstep.

Starting point is 00:40:17 It was a masterclass in what not to do in marketing. and I actually think it's set the Gen AI industry back, right? Because you had thousands of news organizations first reporting out, oh, hey, this is the new chat GPT killer, right? Because they're essentially just copying and pasting. Google's press release. Stop doing that. Tech journalists.

Starting point is 00:40:44 Investigate. Because it is not a chat GPT killer because we don't know. We can't use it. All we can use right now is Gemini Pro. it is not good. It is not good at all. Google shouldn't have released Gemini. They shouldn't.

Starting point is 00:41:04 And they dropped the ball with Bard. Again. Again. Yeah. Kevin, Kevin says GPD 4.5 will probably be released this month. Yeah. Kevin, if you know anything, let me know. I'm always following to see when things are going to be updated and released.

Starting point is 00:41:22 But regardless, Google dropped the ball with Bard. Again. So I don't know if y'all remember this. But when Google first announced, Google barred, right? They did it in a big way, but also a huge failure. Because in the demo video, if you all don't remember this, in the actual demo video that they showed that millions, millions of people ended up seeing, there was inaccurate information that took all of five seconds to disprove.

Starting point is 00:42:04 right? So there was some, you know, something flashed up on screen about, you know, being able to see something with a telescope. And it got it wrong. And Google didn't even realize until after the fact, didn't even fact checked. They just took whatever Bard produced in this demo as truth. just like they wanted us all to do with Google Gemini Ultra. They botched the Bard release. They botched the Ultra release. And I think they're going to pay for it.

Starting point is 00:42:47 And what I mean by that is their stock. All right. So if you look at what's called the magnificent seven tech stocks, which now make up a majority of the S&P 500 here in the U.S. So we're talking Microsoft, Apple, META, Nvidia, Amazon. on Google Tesla. So Tesla is kind of in an almost in a different category than the rest of them. Yes, Tesla's in AI with their auto drive,

Starting point is 00:43:08 but they're having issues that are completely different than what these other companies are going through. But if we look at the other six, Microsoft, Apple, Meta, Invidia, Amazon, and Google, those six companies are heavily involved in generative AI. They're heavily involved in large language models, whether it's their own or investing billions of dollars and other companies that are creating them. And Google, at least out of those six, is the only one whose stock is down in the last three months. Microsoft crushing it up 12%. Apple crushing it, up 10%.

Starting point is 00:43:48 Meta eating people for lunch up 8%. Invidia up 4%. Amazon up 3%. Tesla is obviously getting crushed down 10%. But they have some other issues with their auto driving. scandal, but Google is down 1%. Here's what happened. When the announcement first came out, their shock shot up.

Starting point is 00:44:13 And it stayed up for a day or two. And then when the reporting came out that said, oh, actually, this was faked. This video was faked. Their stock has continued to go down and I think it will continue to go down. As more and more people talk about this, as more and more people report about this, and as you look at the big picture that Google had, months, months to get this right. After completely fumbling the Google Bard release,

Starting point is 00:44:43 they did it again with Google Gemini. There's no transparency there. There's no trust. And that is paramount, right? If you want big companies to start using your product, if you want enterprise companies to use and promote and to feel good about your product, you have to establish trust.

Starting point is 00:45:03 You have to be transparent. And what Google did is the exact opposite of that. All right. So they not only hurt themselves, they hurt their stock, they hurt their credibility, but I think this was a shot to the generative AI industry. Right. Because they are now planting seeds of distrust that will impact all other companies. Right. Because now when you have big fortune 100, big Fortune 500 companies who were already on the fence about adapting.

Starting point is 00:45:31 Gen A.I. Now they're just going to be like, oh, no, we can't trust this. Look at the Google. Look at the Google stuff. Right? This gave large language models a black eye because Google fumbled it. Google fumbled it. Hey, thank you for the support, Michael.

Starting point is 00:45:57 He says, I'm a force to be reckoned with. You know what? I think it was my background. sets me up to do this, right? I've been a tech geek my whole life. I have a digital strategy company. We've been using different generative AI tools with our clients for four years. We've been using the GPT technology since it was publicly released.

Starting point is 00:46:21 You know, in what was that? Copy AI was the first third party service to offer it back in 2020. So we've been using the GPT technology now for almost three years. For ourselves, for our clients. So I'm not new to this, right? I'm not an expert, but I'm not new. But y'all, you have to investigate. When a company says, oh, look at this great new thing, don't just repost it.

Starting point is 00:46:46 Don't just copy and paste and say, oh, they're a chat GPT killer. No, take your time and investigate. All right. I hope this was helpful, y'all. If it was, please, as always, go to your everyday AI. dot com. Why? Well, we have a newsletter that's actually based in facts, right?

Starting point is 00:47:17 We're not like everyone else. We don't just take whatever big tech company says and give it to you. We investigate everything. We talk to the experts. We ask them the hard questions because if you are going to grow your company, grow your career with generative AI, you have to actually know the truth. you have to actually know how to use it. All right.

Starting point is 00:47:37 So please, if you haven't already, go to your everyday AI.com, sign up for that free daily newsletter. Hit us back, right? I answer all messages. It does take me a while, especially the LinkedIn DMs, but drop me an email. I try to respond to every single message. Whatever your question is, I try to respond. We try to point you in the right direction.

Starting point is 00:47:56 We are here to help you learn and leverage generative AI to grow your company and to grow your career. If this was helpful, please share this episode with a friend. Leave us a rating, but also share this episode with a friend. So if you're on social media, hit that repost, share whatever button is on social media. I don't know. But also, if you're listening on the podcast, if this was helpful, please share it with a friend. And I hope to see you back tomorrow and every day for more, everyday AI. Thanks, y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps,

Starting point is 00:48:41 including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us.

Starting point is 00:49:11 us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 163: Google Gemini - ChatGPT killer or a marketing stunt?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.