The AI Daily Brief: Artificial Intelligence News and Analysis - BREAKING: Google Launches Gemini

Starting point is 00:00:00 Today on the AI breakdown breaking news as Google has officially launched Gemini, and they claim it beats GPT4. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown Not Network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI breakdown. Wouldn't you know it, we had completed our entire episode today, a brief, a main, all of that good stuff, when it dropped at Google had released Gemini. Now, this was surprising and not surprising at the same time. Not surprising in the sense that they have been under incredible pressure to do so. They have been widely seen as far behind OpenAI and falling further so, but surprising

Starting point is 00:00:49 because we kept getting reports of delays around this product. Most recently, at the end of last week, the information and others were reporting that Google had actually canceled a set of in-person preview events for Gemini that were unannounced but meant to take place this week. Now, that reporting was updated to say that maybe there would be a virtual, preview, but still it was quite a surprise to see them drop the actual factual full-on announcement earlier today. So for a TLDR, let's look at AI entrepreneur Bindu Reddy who writes, yay, Google is finally in the arena. They just announced Gemini and it has some impressive benchmark scores.

Starting point is 00:01:22 Gemini beats GBT4 and several benchmarks except Heliswag. The 4-mMLU beat is significant. All in all, this is a GPT4 class model. Finally, we have a model that can dethrone the king. And indeed, a lot of the early tweets and comments were exactly. exactly some version of that. Santiago writes Google Gemini versus OpenAI GPT4. It's a bloodbent. Is this an end of an era? And what he was referring to was this big list of benchmarks, whereon basically all of them, Gemini Ultra beat GPT4. That includes a number of reasoning benchmarks. It includes math, code, image, video, audio. But as we'll see, the story may not be quite as clear as it seems. Still, let's start with the information from Google about what Gemini

Starting point is 00:02:04 actually is and get a sense of how they are positioning it relative to GPT4. I will say right up front that the two words or phrases that you need to keep in mind are one, multimodal and two, reasoning. The blog post they dropped this morning was called Introducing Gemini, our largest and most capable AI model. Alphabet CEO Sundar Pichai kicks it off and says, Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. AI has the potential to create opportunities from the everyday to the extraordinary for people

Starting point is 00:02:42 everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning creativity and productivity on a scale we haven't seen before. So big bombastic language, really setting this up as the next big thing. From there we get to the Gemini-specific introduction from Demis Hasabas, who is of course the CEO and co-founder of Google Deep Mind. A couple of the key lines. He writes, it was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image, and video. Indeed, a little later on, he describes how their approach to multimodality is different. Until now, he writes, the standard approach to creating multimodal models

Starting point is 00:03:22 involve training separate components for different modalities, and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning. We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models, and its capabilities are state-of-the-art

Starting point is 00:03:51 in nearly every domain. Now, one other interesting note about Gemini 1.0 that will become relevant in just a moment, there are actually three different sizes of the model. Gemini Nano, which is their most efficient model for on-device tasks, Gemini Pro, which they describe as our best model for scaling across a wide range of tasks, Gemini Ultra are largest and most capable model for highly complex tasks. Keep in mind that Gemini Pro is a GPT3.5 type of model, and it is Gemini Ultra specifically that is outperforming GPT4 on many of these benchmarks.

Starting point is 00:04:23 But let's get into the features and capabilities that they chose to highlight in this announcement post. In the main launch video, it is all about multimodality. Through and through and through multimodal from the ground up, natively multimodal, that is clearly what they are trying to differentiate around. Now, when they get into their capabilities, the other thing that they start to focus on is this idea of sophisticated reasoning. They write, Gemini 1.0's sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. This makes it uniquely skilled at uncovering knowledge that can be difficult to discern in vast amounts of data. Its remarkable ability to extract insights from hundreds of

Starting point is 00:04:57 thousands of documents through reading, filtering, and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance. They highlight this in a video where Google Gemini was used to update a database of papers that refer to a specific type of information, pointing out that over a lunch period it could analyze 200,000 papers, figure out which were actually relevant for the particular database, and then extract the key information from those relevant papers. Now, to make sure we get our dash of multimodal here, it was also able to take that information and extend charts and figures that were in the original paper based on the new information that it had added. In other words, we've got a little bit of reasoning with a dash of

Starting point is 00:05:33 multimodality. Next up, Google discusses how Gemini understands text, images, audio, and more. Again, this is that ground-up native multimodality that they've been talking about throughout. The example they use is someone who's helping their kid with physics homework. With simply a screenshot of the homework, Gemini is able to identify what went wrong in a problem that the student got wrong, and then be queried to help explain why a certain piece of information, in this case the height of a line was incorrect, and it can do this in a way that actually explains its reasoning. As Google writes, it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex

Starting point is 00:06:07 subjects like math and physics. Next up, they talk about one of their most high profile features, which is advanced coding. They write, our first version of Gemini can understand, explain, and generate high quality code in the world's most popular programming languages like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the leading foundation models for coding in the world. Now, as part of this section, they also have a video about how Gemini is not only good at coding, but that it gets better when assisted by humans, but also the introduction of Alpha Code 2, which is for competitive coding. Now, basically, they argue in this video that the reason that competitive coding matters

Starting point is 00:06:41 is not that it's some big sport that Google wants to be a part of, but that it suggests for different, more advanced reasoning properties than just normal coding would otherwise. They basically argue that other LLMs are good at the implementation stage of problems, but they're not good at requirement analysis and system design. In other words, that there is a whole level of additional reasoning that needs to go into problem solving that alpha code in particular is better at. Now, they kind of give an afterthought to the new updated tensor processing units, announcing the quote, most powerful, efficient and stable TPU system to date, cloud TPUV5P, designed for training cutting edge AI models. On another day, this would be big news, but surrounded

Starting point is 00:07:17 by the rest of Gemini, it is definitely buried in the announcement. Now, they didn't make any cute video about it, but they do talk in this announcement about responsibility and safety as well. They talk about conducting novel research into potential risk areas like cyber offense, persuasion, and autonomy. They say they're using benchmarks such as the real toxicity prompts, which is a set of 100,000 prompts with varying degrees of toxicity pulled from the web. And basically they say, this is something that we care about. Now, there were a couple things that weren't on this announcement page, but that were included in their videos that were also really cool. One of these was a bespoke UI creator, where one of the Googlers showed how Gemini was able to create a custom

Starting point is 00:07:52 UI to better deliver answers to a particular problem. In this case, the problem that he was trying to solve was around birthday party ideas for his daughter. Google said, sure, I can help you. Can you give me a little bit more about her interests? From there, he said, she likes animals, and we want the party to be outside. Google Gemini responded with not just a list of text ideas, but an actual interface that had a set of animal-themed birthday party ideas, including under the sea, farm animals, dinosaurs, and unicorns, that the user piloting this query could then click on and get more ideas. The demo also showed off the reasoning that was going on underneath the hood and how Gemini was deciding what type of interface was needed and what type of information would come back.

Starting point is 00:08:29 When the user honed in on farm animals, he was able to click on an idea for cupcakes and ask, again, in natural language, for step-by-step instructions for baking them. That produced again another bespoke user interface designed to maximize those step-by-step instructions. This, of course, is showing off not only reasoning, but also the multimodal capabilities where instead of just a list recipe, it was an actual scrollable list of steps with interesting images. Jason Calcanus wrote, based on this video, Google seems to have caught up to open AI, maybe even slightly improved on the slick chat GPT interface. Another example showed this native multimodality in a bunch of different ways.

Starting point is 00:09:02 An engineer from Google slowly drew an image of a duck, asking Gemini what it was seeing at each step, with Gemini catching on more and more and providing additional information and context as it went. There are a lot of different things in this experiment, but basically they were all about this sort of natively multimodal interface, which was impressive to people. I'm John Mossad, the CEO of Replit said, this fundamentally changes how humans work with computers. So where is this live? Well, Gemini Pro is already available now in a set of Google's products.

Starting point is 00:09:30 Gemini is now being used inside the search generative experience, and as of today, BART is also using a fine-tuned version of Gemini Pro. Gemini Nano, it looks like, is coming to the Pixel 8 Pro, and some people were noticing a difference with Bard. Ethan Mollock writes, for the first time I am seeing signs of cleverness in Bard, which has been upgraded to Gemini Pro, GPD 3.5 levels of reasoning. I gave it the Apple test, ending sentences with the word Apple. It still mostly fails, but when asked to visualize it, the results are interesting.

Starting point is 00:09:56 Others had less impressive experiences? When someone asked, what are the latest updates about the war in Israel and Gaza? The response from Bard was, the conflict in Israel and Gaza is complex and changing rapidly. If you'd like up-to-date information, try using Google Search. Now, what people quickly started to realize is that after you got past the excitement of those benchmarks, everything was perhaps not all that it seemed. The TLDR is that the Gemini that is actually available and launching today is not Gemini Ultra, the version of this model that theoretically beats GPT4.

Starting point is 00:10:26 Instead, it is Gemini Pro, which aims to compete with GPT3.5. So already we have a moment where everyone who's gotten all excited about this and finally seeing something at GPT4 level, realizes that they can't actually use it yet. Nick Dobos writes, so has anyone actually tried Gemini yet? Pro only in Bard, Bard's still not actually updated, only text capabilities, no multimodal, no API yet, so none of the demo video are actually possible to try. Google once again has a blog post for a waitlist for Gemini Ultra? Where is the actual thing? But don't worry, we have a chart that says 0.2% better when using an entirely more complex prompting scheme. Total vaporware, lull. I hope I'm proven

Starting point is 00:11:05 wrong, but wow, do I hate tech companies announcing nonsense. Google in particular has a huge habit of announcing and never following through. So lots to follow up on in this point, but let's go to that assessment. But don't worry, we have a chart that says 0.2% better when using an entirely more complex prompting scheme. What he's referring to is the fact that a big part of the announcement, perhaps the biggest emphasized benchmark from the entire thing, was that Gemini Ultra outperformed a human expert on the MMLU and also beat the state of the art from GPT4 with a 90% and versus GPT4 is 86.4%. The human expert was that 0.2% behind Gemini Ultra at 89.8%. Andrew Carr, however, writes, huge congrats to the team. It looks awesome. But I have to point out what's going on

Starting point is 00:11:48 with the Y axis here. Now, what he's referring to is that the methodology used for the MMLU test that GPT4 got 86.4% on is totally different than the one used by Gemini Ultra. As Philip Schmidt points out, the 86.4% was five examples using FewShot, while the 90% was 32% percent. samples using Fushot combined with chain of thought. In other words, a totally different type of process. His conclusion? Never trust marketing content. The AI safety memes account says, Hold up, Gemini only hits 90% on MMLU with 32 examples and chain of thought. GPD4 just used five examples, no chain of thought. This seems hell a sketch, but I could definitely be misunderstanding something. Can someone steal man? Now, in the paper that was released alongside this,

Starting point is 00:12:29 it actually showed that when Gemini Ultra used the same five-shot methodology that GPD4 had, it scored an 83.7, lower than that 86.4% from GPT4. Now, that paper also did say that with Chain of Thought and 32 shot, Gemini Ultra did still outperform GPT4, but that was also using an API version, so it's very questionable all in all. Or rather, maybe a better way to put it is that even if it doesn't undermine Google's claim that Gemini Ultra outperforms GPT4, it's still just a sketchy apples to Orange's way to present this information that doesn't really be fit Google and I think definitely says something about where they are and the pressure they are under. Indeed, the more time went on after this

Starting point is 00:13:08 announcement, the more people started to come around to this point of view. Haseeb Qureshi wrote, Google's much-hyped Gemini model just dropped but looks underwhelming. So to model is Ultra, which is not yet released, Q1. Pro model, which is available is GPT-3.5? They claim Gemini Ultra edges out GPD4, but benchmarks look very cherry-picked. Now, some are speculating that the reason that Gemini isn't out here crushing GPT4 is maybe not just default of Google, but is some fundamental barrier being hit. Gary Marcus writes, hot take on Google Gemini and GPD4. Google Gemini seems to have by many measures matched or slightly exceeded GPT4, but not to have blown it away. From a commercial standpoint, GPT4 is no longer unique. That's a huge problem for OpenAI, especially post-drama,

Starting point is 00:13:50 when many customers are now seeking a backup plan? From a technical standpoint, the key question is, are LLMs close to a plateau? Note that Gates and Altman have both been dropping hints and GPT5 isn't here after a year despite immense commercial desire. The fact that Google with all its resources did not blow away GPT4 could be telling. For a lot of others, the big question now is when OpenAI makes its next announcement. Nate Chan writes, calling it, OpenAI releasing GPT4.5 before Christmas. Then there was the Jimmy Apples account. Now, this is an account that has claimed to have inside information about OpenAI before, which has been validated on several occasions. And while the post has now been deleted, in the immediate aftermath of Gemini being released,

Starting point is 00:14:29 He posted something along the lines of, OK, Sam, so now are you finally going to release what you've been sitting on, rather than just this slow drip, implying heavily that OpenAI had something much more advanced than what is publicly available. Siki Chen writes, how many weeks you reckon OpenAI will allow Gemini to be the best model in the world. Now, we haven't gotten any confirmation or comment from OpenAI members that I've seen,

Starting point is 00:14:51 but we did have a now deleted tweet from Ilya Sutskever, who of course was the co-founder and chief scientist at OpenAI, who was integral in the removal of Sam, but then also switched sides and wanted him to come back. After the Google Gemini announcement, he posted the wildly cryptic tweet, I learned many lessons this past month. One such lesson is that the phrase, the beatings will continue until morale improves,

Starting point is 00:15:10 applies more often than it has any right to. This set off an absolute frenzy of people trying to figure out what he was talking about. Was it something having to do with Google Gemini, or was the timing just a coincidence? In any case, it's been deleted now, so we will just have to guess. AI YouTuber Matt Wolf sums up my feelings perfectly when he writes, I no longer know how to describe these weeks we're having in the world of AI. Craziest week ever, most insane week we've seen yet, et cetera.

Starting point is 00:15:33 I always think we're having a week that can't be outdone. This week has been another one that I don't know how to describe. And indeed, given that I had to scrap an entire podcast for this one, which don't worry, you guys will get at some point. I couldn't agree more. But here we are. Google Ultra isn't out, but Gemini is finally confirmed. We've got more information.

Starting point is 00:15:51 We're seeing what they're presenting. The next question, of course, will be whether it can actually perform in the real world. You better believe developers are going to be waiting hungrily for the opportunity to test it. But for now, that will wrap this breaking coverage of Gemini's announcement. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - BREAKING: Google Launches Gemini

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.