The AI Daily Brief: Artificial Intelligence News and Analysis - Could Google's Gemini Be A ChatGPT Killer?
Episode Date: August 20, 2023AI Twitter is buzzing with cryptic posts about how the LLM world is set to change in the next few months. On today's episode, NLW looks at everything we know about Google Gemini. ABOUT THE AI BREAKDO...WN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're talking about why everyone is getting so hyped for new developments in AI seemingly coming this fall.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown. Not Network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI breakdown.
For this weekend episode, I wanted to do something a little bit different and dive into the realm of speculation.
Now, usually, especially on weekdays, we have.
so much new news coming in every single day, that it really basically takes all the space that
we have just to keep up with the developments. Today, we are taking a rare detour into the realm
of innuendo rumor, prognostication, and potential. But I think it makes sense to do this now,
given that this week, one of the big themes that we've come back to is whether the AI hype has died
down. Now, if you listen to that episode where I discussed exactly that, one of the things that I
spent some time on is what might bring AI hype back. Not that we necessarily care all that
much about it coming back. The thing that was right at the top of the list is some big new
advancement in technology, something that once again wows our senses of wonder and imagination
and feels like it has fundamentally changed everything all over again. The most common thing I think
that people believe could do that is whatever GPT5 will be, but given that OpenAI continues to say,
or at least up until recently had continued to say that they're not even training GPT5 yet,
it seems unlikely that that's going to be the first catalyst for some new excitement.
And yet, for those paying attention on AI Twitter,
there is an absolute bubbling of barely contained excitement,
vague cryptic messages about things that are coming.
AI entrepreneur Matt Schumer on August 15th wrote,
The LLM landscape is going to look very different in a few weeks.
Robert Scoble retweeted that saying,
that is what I am hearing too, multimodal LLMs.
This fall will be busy.
A couple days later, Brian Romley added his voice to the chorus.
By November 2023, he writes,
just about everything we thought was LLMs and GBT will change.
There is a very new technology on the horizon.
Now, Maximum Chroma's response, I think, to Brian, is a little reasonable.
They said, whenever I see people talk like this, I keep a close eye on my wallet.
But for the sake of today's weekend episode, let's not view.
this with derision and skepticism, but openness and try to parse out a best guess at what they
might be referring to. One has to think, given what we just discussed as relates to GPT5,
that the thing that people are most anticipating right now is Google's Gemini.
The rumor mill around Gemini really started to kick up at the end of June. On June 26, Wired published
a piece called Google DeepMind's CEO says its next algorithm will eclipse chat GPT. Demis Hizhabis says
the company is working on a system called Gemini that will tap
techniques that helped AlphaGo defeat a Go champion in 2016.
Now, this was one of the first times that Hasabas had talked about Gemini in any sort of detail.
He said,
At a high level, you can think of Gemini as combining some of the strengths of AlphaGo-type systems
with the amazing language capabilities of large models.
We also have some new innovations that are going to be pretty interesting.
Now, by way of background for people who aren't us familiar with AlphaGo, Wired writes,
AlphaGo was based on a technique Deep Mind has pioneered called Reinforcement Learning,
in which software learns to take on tough problems that require choosing what actions to take,
like in Go or video games, by making repeated attempts and receiving feedback on its performance.
It also uses a method called Tree Search to explore and remember possible moves on the board.
Now from there it goes widely into the realm of speculation.
In a section titled New Thinking, Wired Rights,
training a large language model like OpenAIs GPD4 involves feeding vast amounts of curated text from books,
web pages, and other sources into machine learning software known as a transformer.
It uses the patterns in that training data to become proficient at predicting the letters and words that should follow a piece of text,
a simple mechanism that proves strikingly powerful at answering questions and generating text or code.
An important additional step in making ChatGPT in similarly capable language models is using reinforcement learning based on feedback from humans on an AI model's answer to finesse its performance.
Deep Mind's deep experience with reinforcement learning could allow its researchers to give Gemini novel capabilities.
Hesabas and his team might also try to enhance LLM technology with ideas from other areas of AI.
So not a ton of information, just this prognostication that Gemini should be even better than
ChatGBT, which is certainly enough to be going with when it comes to the industry getting excited.
Also worth noting is that another thing that happened around this time or just a little bit earlier
was that Google combined its two major AI labs into one, forcing a perhaps uncomfortable cultural challenge
in order to have all of their resources focused firmly in the same direction.
But that brings us to the news from this week.
the information wrote a piece called how Google is planning to beat OpenAI.
And effectively, it's a story of how these two teams within Google
have been forced together to try to produce something that helps Google not only
catch up with its competitor in OpenAI, but greatly exceed them.
John Victor writes,
In April, Alphabet CEO Sundar Pichai took an unusual step,
merging two large artificial intelligence teams with distinct cultures in code
to catch up and to surpass Open AI and other rivals.
Now the test of that effort is coming, with hundreds of people scrambly,
to release a group of large machine learning models, one of the highest stakes products the company
has ever built. The models, collectively known as Gemini, are expected to give Google the ability
to build products its competitors can't, according to a person involved with Gemini's development.
Now, the TLDR on what these sources are saying that Gemini will do that GPT4 can't is, in short,
multimodality. Gemini is supposed to combine the text capabilities of LLMs like GPT4 with AI image
generators such as Mid Journey and Stable Diffusion. As the information points out, this was the first time
that those image capabilities had been reported. They also wrote, quote, Google employees have also
discussed using Gemini to offer features like analyzing charts or creating graphics with text descriptions
and controlling software using text or voice commands. Part of that sounds a little bit like the code
interpreter features, which some folks have considered such a huge advance for GPD4, that it actually
represents quietly GPT4.5. Now, of course, once released, Gemini will power everything in the Google Suite
of applications. It'll be the AI not only in the barred chatbot, but the AI in Google Docs and
slides. Now, one of the things that potentially gives Google an advantage is the unique data that it
has access to. For example, the information had previously reported that Google had been training
Gemini on a huge corpus of YouTube video transcripts, but given these new multimodal capacities,
there's no reason theoretically that they couldn't have actually trained it on the video and audio
itself. This would, as the information put it, give them multimodal capabilities many researchers
believe are the next frontier in AI. Models trained on YouTube videos could, for example,
help a mechanic diagnose a problem with a car repair based on a video. They also might generate
software code based on someone's sketch of a website or app they want to create. Acapability
OpenAI has previewed but hasn't launched. Other byproducts of training on YouTube video
could be features akin to those that people have been excited about in startups like Runway
ML. For example, text to video software that could generate videos just based on descriptions.
Now, there aren't necessarily a ton more details about what Gemini can do, but there are more
details that the information had about how it's being built.
For example, they had effectively an org chart for who was in charge of the project and how
these two teams had combined.
They even had information about which software from the different units that were combined
were used in different parts of the training process.
The information also confirmed that Google co-founder, Sergey Brand has been intimately
involved in the project.
As they put it, he has been, quote, running his own evaluations of the models and helps
with aspects of training them. For example, quote, after the team discovered Gemini had been
trained on potentially offensive content, which researchers had meant to exclude, Brin weighed in
on technical decisions to retrain the models. The last part of the story is what other Silicon
Valley Notables think, particularly venture capitalists, and by and large, it seems like the
sentiment is, finally Google has some fireback. Finally, they are going to actually compete.
Now, the only other notable thing is that the information also reports that some of the compromises
these teams have had to make in order to figure out how to work together are not.
not necessarily optimal for developing great software. But of course, ultimately, how much of a cost
or a challenge those issues actually were will really be borne out or not in the software they release.
I think in many ways the real question is just how much it feels to consumers, like multimodality,
which is so obviously the next frontier when it comes to these LLMs and these generative AI tools
in general, feels like a natural evolution or something that reignites a new hype cycle
because of that old idea that sufficiently advanced technology is indistinguishable from magic.
I don't think it's bad either way. To the extent that we get another hype cycle pop,
it'll create a context for a lot more people to get their eyes on these tools and start experimenting,
which I believe is probably worth the cost that it comes with of things like over-hypy influencers
and the other natural negative externalities when it comes to a buzzy area.
If on the other hand it doesn't get that hypey pop,
my guess is that it's not going to be because the technology isn't incredibly impressive or useful.
My guess instead is that that will simply represent that we have really well and truly moved on to a new phase,
a new phase in which the prioritization is not on how good things sound, but what they can actually do.
I also don't think it would be the worst thing in the world if the release of a truly multimodal LLM in the form of Google Gemini
just helps people who are already bought into this space even more deeply integrated into the work that they're already doing.
Either way, I'm excited that people are excited for the coming fall.
It has been ultimately a fairly quiet summer, but in a field this dynamic, I don't think quiet lasts for long.
Anyways, friends, that is going to do it for today's AI breakdown.
I hope you are having a wonderful weekend.
If you enjoyed this, do me such a huge favor.
Click the like button, click the subscribe button.
And if you're listening as a podcast, go consider leaving a review or a five-star rating.
Until next time, friends, peace.
