The AI Daily Brief: Artificial Intelligence News and Analysis - With Gemini 2.0, is Google So Back Baby?

Starting point is 00:00:00 Google drops a slew of new AI features showing just how far the company's AI strategy has come this year. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Quick note, friends, before we dive in today, this episode was caught up in the travel dragnet. And so once again, I am doing just a main episode. I think that probably on Friday we will do an extended news episode to try to catch up on all the headlines that we missed. A little bobbly to end the year, but we are making it happen, and at least you are not missing episodes. So what we are talking about today is an absolute slate of new announcements from Google.

Starting point is 00:00:41 It is very clear that they were not content letting OpenAI have all of its fun with its 12 days of OpenAI or Shipmiss or whatever they were calling it, and really wanted to come in and steal some of that thunder. We're going to talk first about what was actually announced, and then towards the end of the episode, I'm going to spend a little bit of time talking about what it all reflects in terms of where Google sits heading into 2025, vis-a-vis this AI race. As I said, there was a ton that was announced, so it's going to take a minute to get through it all. The big banner headline was that this was Gemini 2.0. Almost exactly one year after their original frontier model, a model which at the time was trying to capture energy and attention as the first natively multimodal model, it's very clear

Starting point is 00:01:23 where their heads are at when it comes to Gemini 2.0. It's right there in the subtitle of the blog post, our new AI model for the Agenic Era. So what's actually in Gemini 2.0? First of all, it has native image and multilingual audio generation. It also features what Google are calling native intelligent tool use, meaning it can directly interface with Google products like search and even execute code. It also is the first model to accept streaming video as an input. And so when you take it all together,

Starting point is 00:01:50 Google now has a model that can view something in real time, hold the conversations, and take actions in the background. This release centered around improvements to Gemini. Flash, which is the version of the model that's designed to be fast and cheap. The first generation of Flash was text only, but it is now fully multimodal and has all the features of the larger models. That means it can accept images, videos, and audio as inputs alongside text and produced audio responses.

Starting point is 00:02:15 Tulsi Doshi, the head of product for Gemini, said, we know Flash is extremely popular with developers for its balance of speed and performance. And with 2.0 Flash, it's just as fast as ever, but now it's even more powerful. Based on Google's benchmarking, Gemini 2.0 Flash is significantly improved in areas like coding and image analysis over the Gemini 1.5 Pro. Google is in fact so confident that Flash will be the best model for most jobs that it's replacing Pro as the flagship model in the lineup. Demis Asab as the CEO of Google DeepMind said, effectively it's as good as the current Pro model is, so you can think of it as one whole tier better for the same cost efficiency and performance efficiency and speed. We're really happy with that. The audio generation feature, which is new to Flash, was described as steerable and customizable.

Starting point is 00:02:55 It features eight different voices which are optimized across a range of languages and accents. Doshi said, you can ask it to talk slower, you can ask it to talk faster, or you can even ask it to say something like a pirate. The product version of 2.0 Flash will be released in January, but developers can access the full multimodal API already and start building. The response of this was pretty good. Dan Mack on Twitter writes, I kind of hate when AI influencers try to engagement bait by saying this is insane, but I must say this is in fact insane. Google beat open AI to the punch by allowing real-time video and audio. interaction on your desktop with Gemini 2.0 Flash. This is for sure a new era of the AI age. And while a massive update to the foundation model is a big deal, even they pointed out this is all

Starting point is 00:03:37 about the agenic era. And so perhaps unsurprisingly, Google showcased three prototype agents built on the new model. The first is Project Astra, an updated version of their universal AI assistant. The assistant is now fully speech-to-speech. Google demonstrated its ability to keep up with complex conversations, transition between different languages and access other Google tools. The assistant can now access real-time information through Google search, maps, and lens, which is a feature we haven't seen from an AI assistant to date. Astra now has 10 minutes of in-session memory and can recall conversations you've had in the past to enhance personalization. The second agent is a coding assistant called Jules,

Starting point is 00:04:15 and Jules demonstrates what happens when you combine reasoning models with agentic capabilities. Jules can create multi-step plans to address issues, modify multiple files, and prepare pull requests for Python and JavaScript coding tasks and GitHub workflows. And if this agent is what's behind the announcement last quarter that more than a quarter of all code created at Google is now generated by AI, then we could be in for something great. Google has designed Jules with a lot of human in the loop, frankly, likely more than they need in order to ensure safety.

Starting point is 00:04:45 Jules will present a suggested plan before taking action. Users can monitor progress. and permission is requested before merging any changes. Jayce Lynn Conselman, the director of product management at Google Lab, said, we're early in our understanding of the full capabilities of AI agents for computer use. Jules is only available to a select group of trusted testers at the moment, but will be rolled out more broadly early next year. A third agent is the web browsing assistant called Project Mariner, and this gets out one of the most important U.X shifts that we're seeing, where instead of trying to adapt ourselves to what AI and agents can do,

Starting point is 00:05:20 we're just trying to get agents to behave more like us. Anthropic made a bunch of news earlier this year when they showed their version of a very nascent agent that could actually point and click on your screen, and Mariner is of a similar ilk. The model can take control of the Chrome browser, clicking buttons, filling out forms, and using the web much like a person would.

Starting point is 00:05:38 Google leaders called this a fundamentally new UX paradigm shift that we're seeing right now. Quote, we need to figure out what is the right. way for all of this to change the way users interact with the web and the way publishers can create experiences for users as well as for agents in the future. The demonstration showed the agent building out an online shopping cart based on a grocery list. The process was painfully slow, with around five seconds of delay between cursor movements. The agent also got stuck and asked for assistance multiple times. For now, the agent can't use the checkout by itself, a safety limit

Starting point is 00:06:07 so it doesn't need to handle credit card details. And from a functional standpoint, the agent does work like Anthropics computer use mode, taking constant screenshots to determine its next move. Because of this, Mariner can only use the visible tab in Chrome, so you can't use the computer for other things while the agent is in control. Google feels very comfortable with this, though. DeepMind CTO, Corre, Kevuk-Soglou said, because the AI is now taking actions on a user's behalf, it's important to take this step by step. You as an individual can use websites, and now your agent can do everything that you do on a website as well. As an added bonus to preview what comes next, Google said they're testing agents that understand video games.

Starting point is 00:06:44 They said the agents can, quote, reason about the game based solely on the action on the screen and offer up suggestions for what to do next in real-time conversation. If you get stuck, the agents can also access Google Search to figure out what you should do next. Google is testing the agents on games like Clash of Clans and Hayday. Today's episode is brought to you by Plum. Want to use AI to automate your work but don't know where to start? Plum lets you create AI workflows by simply describing what you want. No coding or API keys required.

Starting point is 00:07:10 Imagine typing out AI, analyze my Zoom meetings and send me your insights in Notion and watching it come to life before your eyes. Whether you're an operations leader, marketer, or even a non-technical founder, Plum gives you the power of AI without the technical hassle. Get instant access to top models like GPT40, Claude Sonnet 3.5, assembly AI, and many more. Don't let technology hold you back. Check out Use Plum, that's Plum with a B, for early access to the future of workflow automation. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program demonstrating top-notch security practices and establishing trust is more important than ever. Vanta automates compliance for ISO-2701, SOC2, GDPR, and leading AI frameworks like ISO-42,001,

Starting point is 00:07:53 and NIST-AI-Ris-Mansement framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center, all powered by Vanta AI. Over 8,000 global companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash NLW. That's vanta.com slash NLW. Today's episode is brought to you, as always, by Superintelligent. Have you ever wanted an AI daily brief but totally focused on how AI relates to your company? Is your company struggling with AI adoption, either because you're getting stalled figuring out what use cases

Starting point is 00:08:35 will drive value or because the AI transformation that is happening is siloed individual teams, departments, and employees, and not able to change the company as a whole. Super Intelligent has developed a new custom internal podcast product that inspires your teams by sharing the best AI use cases from inside and outside your company. Think of it as an AI daily brief, but just for your company's AI use cases. If you'd like to learn more, go to be super.a.i slash partner and fill out the information request form. I am really excited about this product, so I will personally get right back to you.

Starting point is 00:09:07 Again, that's besuper.a.ai slash partner. Still, we are not done. Because alongside the agents, Google is also introducing a new reasoning mode for Gemini 1.5 Pro, which they're calling deep research. This seems to be closer to a long-form research tool than a competitor to OpenAI's 01 model. In deep research mode, Gemini responds to a prompt with a multi-step research plan. Once revised and approved, the model then spends a few minutes searching for and compiling

Starting point is 00:09:35 information. It then repeats the process several times, iterating on the information learned. Once complete, the model generates a report on the key findings along with full citations of academic sourcing. Google is calling it an agent as technically it completes this process using Google search. David Citron, product director for Gemini App said, we built a new agentic system that uses Google's expertise of finding relevant information on the web to direct Gemini's browsing and research. Deep research saves you hours of time. Oran Professor Ethan Malik, who has gone deep on advanced academic uses of AI, seems impressed. He wrote,

Starting point is 00:10:06 The new deep research feature from Google feels like one of the most appropriately Googly uses of AI to date, and it is quite impressive. I've had access for a bit, and it does very good initial reports on almost any topic. The paywalls around academic sources put some limits. He did also include, I wish they had stats on the hallucination rate. I suspect better than an undergraduate, and it is more likely to miss subtle things than to get stuff completely wrong. He continued, one warning to instructors is that the new Google deep research feature solves most of the issues with AI-created research assignments. Pretty solidly well-organized and written with accurate citations, it makes it very easy for students to skip or automate their research work.

Starting point is 00:10:44 Billow-Sidhu called it essentially perplexity on steroids. Last couple of announcements, Google is of course deploying these new model capabilities everywhere and one of the first uses is an upgrade to Google's AI overviews. The company says that the tool will now be able to handle, quote, more complex topics as well as multimodal and multi-step searches. They also said it can answer questions about math and programming. You'll remember that AI overviews were part of the narrative challenge for Google at the beginning of the year. Initially, they were widely mocked online due to things like suggesting glue as a pizza topping. Still, Google CEO Sandhapachai said, our AI overviews now reach 1 billion people, enabling them to ask entirely new types of questions,

Starting point is 00:11:24 quickly becoming one of our most popular search features ever. We'll continue to bring AI overviews to more countries and languages over the next year. Lastly, on the hardware side, Google has unveiled the sixth generation of their Trilium AI chip. The chip is used for training in inference competing with Nvidia GPUs alongside the new Traneum chip from Amazon. They claim the performance improvements could fundamentally alter the economics of AI training. They say that it delivers a 4x improvement in training performance compared to its predecessor, as well as a significant reduction in energy use. As a more tangible metric, Google is claiming a 2.5x improvement in training performance per dollar. Gemini 2.0 was trained exclusively on a Trilium cluster. And Google

Starting point is 00:12:02 disclose that they have built a 100,000 chip cluster, which they claim is one of the most powerful AI supercomputers. In their announcement, Google didn't provide any comparisons to rival chipmakers, so it's a little hard to know how the new silicon stacks up. However, the chips are now generally available to Google cloud users, so it probably won't take long for us to find out. Taking a step back, Google's brand story across the last couple years of AI has been a really fascinating one. I think if you had gone a few years back, Google was the default leader, both from a real and an imagined perspective when it came to generative AI. The launch of ChatGPT and the ascendance of OpenAI really upset the Apple Cart. And it wasn't just that. Not only was there now a consumer product out

Starting point is 00:12:43 ahead of Google, but in early 2023, the meta also carved out a totally different space because of their approach to open source. For most of 2023, Google felt distinctly behind when it came to generative AI. Indeed, even one year ago when Gemini 1.0 was launched, the broad perception, was that their hand had been forced, that the model really wasn't as far along and wasn't competitive yet with GPT4, and wouldn't be until they released the most performant version of it early in 2024. Basically, Google had to do something, and so they had to announce Gemini 1.0 earlier than they might otherwise have wanted to. Then in the beginning of this year, while we did get a GPT4 class model in Gemini, we also got what I was just mentioning,

Starting point is 00:13:27 AI overviews and search that told people to put glue on pizza, and of course, the whole controversy and dust up around the historically inaccurate image generation, which forced diversity into situations in history which were very undiverse. Think black Nazis. In other words, it was a pretty brutal beginning of the year for Google. Slowly but surely, though, that has changed. Undeniably, one of the big reasons for that is that Google got a breakout AI product hit in Notebook L.M. The addition of the podcast summarization feature, which opened up this totally new set of use cases, and ways of consuming information never before available, really got this ship pointed in the right direction

Starting point is 00:14:07 and a ton of narrative juice back in the Google House. That set the tone, I think, for this announcement, which was comprehensive, had a lot of great stuff in it, and was received incredibly positively. People are excited about these new features, they're excited about Astra. They're not dealing with this cynically. And importantly, from a brand perspective,

Starting point is 00:14:27 it's more of a return to form than anything else. In other words, people are saying, Oh, that Google that we know that we would have assumed would be a leader in this space, they are back. And that, I think, is exactly where Google wants its brand to be. The company has an incredible number of advantages when it comes to the AI wars. They've got a slate of products to integrate AI into and to capture data from that potentially make their AI products not only very useful, but already plugged into the systems

Starting point is 00:14:56 that people are using today. And so if they can continue this momentum, they could be poised for an even bigger 2025. That's not to say that there aren't challenges, because as we've been discussing when it comes to agents, it's sort of like all bets are off, and everything is up for grabs once again. Still, you've got to think that the folks over at Google are a lot happier heading into 2025 than they were heading into 2024, and I think that they should be. For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - With Gemini 2.0, is Google So Back Baby?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.