The AI Daily Brief: Artificial Intelligence News and Analysis - With Gemini 2.0, is Google So Back Baby?
Episode Date: December 13, 2024NLW explores the latest announcements from Google, including Gemini 2.0, a set of new agents, and why the company is heading into 2025 much stronger than they came into 2024. Brought to you by: Vanta... - Simplify compliance - https://vanta.com/nlw The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Google drops a slew of new AI features showing just how far the company's AI strategy has come this year.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Quick note, friends, before we dive in today, this episode was caught up in the travel dragnet.
And so once again, I am doing just a main episode.
I think that probably on Friday we will do an extended news episode to try to catch up on all the headlines that we missed.
A little bobbly to end the year, but we are making it happen, and at least you are not missing episodes.
So what we are talking about today is an absolute slate of new announcements from Google.
It is very clear that they were not content letting OpenAI have all of its fun with its 12 days of OpenAI or Shipmiss or whatever they were calling it,
and really wanted to come in and steal some of that thunder.
We're going to talk first about what was actually announced, and then towards the end of the episode,
I'm going to spend a little bit of time talking about what it all reflects in terms of where
Google sits heading into 2025, vis-a-vis this AI race. As I said, there was a ton that was
announced, so it's going to take a minute to get through it all. The big banner headline was that
this was Gemini 2.0. Almost exactly one year after their original frontier model, a model which at the time
was trying to capture energy and attention as the first natively multimodal model, it's very clear
where their heads are at when it comes to Gemini 2.0. It's right there in the subtitle of the blog post,
our new AI model for the Agenic Era.
So what's actually in Gemini 2.0?
First of all, it has native image and multilingual audio generation.
It also features what Google are calling native intelligent tool use,
meaning it can directly interface with Google products like search and even execute code.
It also is the first model to accept streaming video as an input.
And so when you take it all together,
Google now has a model that can view something in real time,
hold the conversations, and take actions in the background.
This release centered around improvements to Gemini.
Flash, which is the version of the model that's designed to be fast and cheap.
The first generation of Flash was text only, but it is now fully multimodal and has all the
features of the larger models.
That means it can accept images, videos, and audio as inputs alongside text and produced audio
responses.
Tulsi Doshi, the head of product for Gemini, said, we know Flash is extremely popular
with developers for its balance of speed and performance.
And with 2.0 Flash, it's just as fast as ever, but now it's even more powerful.
Based on Google's benchmarking, Gemini 2.0 Flash is significantly improved in areas like coding and image analysis over the Gemini 1.5 Pro.
Google is in fact so confident that Flash will be the best model for most jobs that it's replacing Pro as the flagship model in the lineup.
Demis Asab as the CEO of Google DeepMind said, effectively it's as good as the current Pro model is, so you can think of it as one whole tier better for the same cost efficiency and performance efficiency and speed.
We're really happy with that.
The audio generation feature, which is new to Flash, was described as steerable and customizable.
It features eight different voices which are optimized across a range of languages and accents.
Doshi said, you can ask it to talk slower, you can ask it to talk faster, or you can even ask it to say something like a pirate.
The product version of 2.0 Flash will be released in January, but developers can access the full multimodal API already and start building.
The response of this was pretty good. Dan Mack on Twitter writes,
I kind of hate when AI influencers try to engagement bait by saying this is insane, but I must say this is in fact insane.
Google beat open AI to the punch by allowing real-time video and audio.
interaction on your desktop with Gemini 2.0 Flash. This is for sure a new era of the AI age.
And while a massive update to the foundation model is a big deal, even they pointed out this is all
about the agenic era. And so perhaps unsurprisingly, Google showcased three prototype agents
built on the new model. The first is Project Astra, an updated version of their universal
AI assistant. The assistant is now fully speech-to-speech. Google demonstrated its ability to keep up
with complex conversations, transition between different languages and access other Google
tools. The assistant can now access real-time information through Google search,
maps, and lens, which is a feature we haven't seen from an AI assistant to date.
Astra now has 10 minutes of in-session memory and can recall conversations you've had in the past
to enhance personalization. The second agent is a coding assistant called Jules,
and Jules demonstrates what happens when you combine reasoning models with agentic capabilities.
Jules can create multi-step plans to address issues, modify multiple files,
and prepare pull requests for Python and JavaScript coding tasks and GitHub workflows.
And if this agent is what's behind the announcement last quarter
that more than a quarter of all code created at Google is now generated by AI,
then we could be in for something great.
Google has designed Jules with a lot of human in the loop,
frankly, likely more than they need in order to ensure safety.
Jules will present a suggested plan before taking action.
Users can monitor progress.
and permission is requested before merging any changes. Jayce Lynn Conselman, the director of product
management at Google Lab, said, we're early in our understanding of the full capabilities of AI
agents for computer use. Jules is only available to a select group of trusted testers at the moment,
but will be rolled out more broadly early next year. A third agent is the web browsing assistant
called Project Mariner, and this gets out one of the most important U.X shifts that we're seeing,
where instead of trying to adapt ourselves to what AI and agents can do,
we're just trying to get agents to behave more like us.
Anthropic made a bunch of news earlier this year
when they showed their version of a very nascent agent
that could actually point and click on your screen,
and Mariner is of a similar ilk.
The model can take control of the Chrome browser,
clicking buttons, filling out forms,
and using the web much like a person would.
Google leaders called this a fundamentally new UX paradigm shift
that we're seeing right now.
Quote, we need to figure out what is the right.
way for all of this to change the way users interact with the web and the way publishers can create
experiences for users as well as for agents in the future. The demonstration showed the agent
building out an online shopping cart based on a grocery list. The process was painfully slow,
with around five seconds of delay between cursor movements. The agent also got stuck and asked for
assistance multiple times. For now, the agent can't use the checkout by itself, a safety limit
so it doesn't need to handle credit card details. And from a functional standpoint, the agent does
work like Anthropics computer use mode, taking constant screenshots to determine its next move.
Because of this, Mariner can only use the visible tab in Chrome, so you can't use the computer
for other things while the agent is in control. Google feels very comfortable with this, though.
DeepMind CTO, Corre, Kevuk-Soglou said, because the AI is now taking actions on a user's behalf,
it's important to take this step by step. You as an individual can use websites, and now your
agent can do everything that you do on a website as well. As an added bonus to preview what comes next,
Google said they're testing agents that understand video games.
They said the agents can, quote, reason about the game based solely on the action on the screen
and offer up suggestions for what to do next in real-time conversation.
If you get stuck, the agents can also access Google Search to figure out what you should do next.
Google is testing the agents on games like Clash of Clans and Hayday.
Today's episode is brought to you by Plum.
Want to use AI to automate your work but don't know where to start?
Plum lets you create AI workflows by simply describing what you want.
No coding or API keys required.
Imagine typing out AI, analyze my Zoom meetings and send me your insights in Notion and watching
it come to life before your eyes. Whether you're an operations leader, marketer, or even a non-technical
founder, Plum gives you the power of AI without the technical hassle. Get instant access to top
models like GPT40, Claude Sonnet 3.5, assembly AI, and many more. Don't let technology hold you back.
Check out Use Plum, that's Plum with a B, for early access to the future of workflow automation.
Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's
security program demonstrating top-notch security practices and establishing trust is more important
than ever. Vanta automates compliance for ISO-2701, SOC2, GDPR, and leading AI frameworks like ISO-42,001,
and NIST-AI-Ris-Mansement framework, saving you time and money while helping you build customer
trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating
your security posture with a customer-facing trust center, all powered by Vanta AI. Over 8,000 global
companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate AI trust and prove security
in real time. Learn more at vanta.com slash NLW. That's vanta.com slash NLW.
Today's episode is brought to you, as always, by Superintelligent. Have you ever wanted an
AI daily brief but totally focused on how AI relates to your company? Is your company
struggling with AI adoption, either because you're getting stalled figuring out what use cases
will drive value or because the AI transformation that is happening is siloed individual teams,
departments, and employees, and not able to change the company as a whole.
Super Intelligent has developed a new custom internal podcast product that inspires your
teams by sharing the best AI use cases from inside and outside your company.
Think of it as an AI daily brief, but just for your company's AI use cases.
If you'd like to learn more, go to be super.a.i slash partner and fill out the information
request form.
I am really excited about this product, so I will personally get right back to you.
Again, that's besuper.a.ai slash partner.
Still, we are not done.
Because alongside the agents, Google is also introducing a new reasoning mode for Gemini 1.5
Pro, which they're calling deep research.
This seems to be closer to a long-form research tool than a competitor to OpenAI's
01 model.
In deep research mode, Gemini responds to a prompt with a multi-step research plan.
Once revised and approved, the model then spends a few minutes searching for and compiling
information. It then repeats the process several times, iterating on the information learned.
Once complete, the model generates a report on the key findings along with full citations of
academic sourcing. Google is calling it an agent as technically it completes this process using
Google search. David Citron, product director for Gemini App said, we built a new agentic system
that uses Google's expertise of finding relevant information on the web to direct Gemini's
browsing and research. Deep research saves you hours of time. Oran Professor Ethan Malik,
who has gone deep on advanced academic uses of AI,
seems impressed. He wrote,
The new deep research feature from Google feels like one of the most appropriately Googly uses
of AI to date, and it is quite impressive. I've had access for a bit, and it does very good
initial reports on almost any topic. The paywalls around academic sources put some limits.
He did also include, I wish they had stats on the hallucination rate. I suspect better than an
undergraduate, and it is more likely to miss subtle things than to get stuff completely wrong.
He continued, one warning to instructors is that the new Google deep research feature solves most of the
issues with AI-created research assignments. Pretty solidly well-organized and written with accurate
citations, it makes it very easy for students to skip or automate their research work.
Billow-Sidhu called it essentially perplexity on steroids. Last couple of announcements,
Google is of course deploying these new model capabilities everywhere and one of the first
uses is an upgrade to Google's AI overviews. The company says that the tool will now be able to
handle, quote, more complex topics as well as multimodal and multi-step searches. They also said it can
answer questions about math and programming. You'll remember that AI overviews were part of the
narrative challenge for Google at the beginning of the year. Initially, they were widely mocked online
due to things like suggesting glue as a pizza topping. Still, Google CEO Sandhapachai said,
our AI overviews now reach 1 billion people, enabling them to ask entirely new types of questions,
quickly becoming one of our most popular search features ever. We'll continue to bring AI
overviews to more countries and languages over the next year. Lastly, on the hardware side,
Google has unveiled the sixth generation of their Trilium AI chip. The chip is used for training
in inference competing with Nvidia GPUs alongside the new Traneum chip from Amazon. They claim
the performance improvements could fundamentally alter the economics of AI training. They say that it
delivers a 4x improvement in training performance compared to its predecessor, as well as a significant
reduction in energy use. As a more tangible metric, Google is claiming a 2.5x improvement
in training performance per dollar. Gemini 2.0 was trained exclusively on a Trilium cluster. And Google
disclose that they have built a 100,000 chip cluster, which they claim is one of the most powerful
AI supercomputers. In their announcement, Google didn't provide any comparisons to rival chipmakers,
so it's a little hard to know how the new silicon stacks up. However, the chips are now generally
available to Google cloud users, so it probably won't take long for us to find out. Taking a step back,
Google's brand story across the last couple years of AI has been a really fascinating one. I think
if you had gone a few years back, Google was the default leader, both from a real and an imagined
perspective when it came to generative AI. The launch of ChatGPT and the ascendance of OpenAI
really upset the Apple Cart. And it wasn't just that. Not only was there now a consumer product out
ahead of Google, but in early 2023, the meta also carved out a totally different space because of their
approach to open source. For most of 2023, Google felt distinctly behind when it came to generative
AI. Indeed, even one year ago when Gemini 1.0 was launched, the broad perception,
was that their hand had been forced, that the model really wasn't as far along and wasn't
competitive yet with GPT4, and wouldn't be until they released the most performant version of it
early in 2024. Basically, Google had to do something, and so they had to announce Gemini
1.0 earlier than they might otherwise have wanted to. Then in the beginning of this year,
while we did get a GPT4 class model in Gemini, we also got what I was just mentioning,
AI overviews and search that told people to put glue on pizza, and of course, the whole
controversy and dust up around the historically inaccurate image generation, which forced diversity
into situations in history which were very undiverse. Think black Nazis. In other words, it was a
pretty brutal beginning of the year for Google. Slowly but surely, though, that has changed.
Undeniably, one of the big reasons for that is that Google got a breakout AI product hit in Notebook
L.M. The addition of the podcast summarization feature, which opened up this totally new set of use cases,
and ways of consuming information never before available,
really got this ship pointed in the right direction
and a ton of narrative juice back in the Google House.
That set the tone, I think, for this announcement,
which was comprehensive, had a lot of great stuff in it,
and was received incredibly positively.
People are excited about these new features,
they're excited about Astra.
They're not dealing with this cynically.
And importantly, from a brand perspective,
it's more of a return to form than anything else.
In other words, people are saying,
Oh, that Google that we know that we would have assumed would be a leader in this space,
they are back.
And that, I think, is exactly where Google wants its brand to be.
The company has an incredible number of advantages when it comes to the AI wars.
They've got a slate of products to integrate AI into and to capture data from
that potentially make their AI products not only very useful, but already plugged into the systems
that people are using today.
And so if they can continue this momentum, they could be poised for an even bigger
2025. That's not to say that there aren't challenges, because as we've been discussing when it
comes to agents, it's sort of like all bets are off, and everything is up for grabs once again.
Still, you've got to think that the folks over at Google are a lot happier heading into 2025
than they were heading into 2024, and I think that they should be. For now, though, that is going
to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time,
peace.
