The AI Daily Brief: Artificial Intelligence News and Analysis - Could Google's Gemini Be A ChatGPT Killer?

Episode Date: August 20, 2023

AI Twitter is buzzing with cryptic posts about how the LLM world is set to change in the next few months. On today's episode, NLW looks at everything we know about Google Gemini. ABOUT THE AI BREAKDO...WN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're talking about why everyone is getting so hyped for new developments in AI seemingly coming this fall. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown. Not Network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI breakdown. For this weekend episode, I wanted to do something a little bit different and dive into the realm of speculation. Now, usually, especially on weekdays, we have. so much new news coming in every single day, that it really basically takes all the space that we have just to keep up with the developments. Today, we are taking a rare detour into the realm
Starting point is 00:00:47 of innuendo rumor, prognostication, and potential. But I think it makes sense to do this now, given that this week, one of the big themes that we've come back to is whether the AI hype has died down. Now, if you listen to that episode where I discussed exactly that, one of the things that I spent some time on is what might bring AI hype back. Not that we necessarily care all that much about it coming back. The thing that was right at the top of the list is some big new advancement in technology, something that once again wows our senses of wonder and imagination and feels like it has fundamentally changed everything all over again. The most common thing I think that people believe could do that is whatever GPT5 will be, but given that OpenAI continues to say,
Starting point is 00:01:29 or at least up until recently had continued to say that they're not even training GPT5 yet, it seems unlikely that that's going to be the first catalyst for some new excitement. And yet, for those paying attention on AI Twitter, there is an absolute bubbling of barely contained excitement, vague cryptic messages about things that are coming. AI entrepreneur Matt Schumer on August 15th wrote, The LLM landscape is going to look very different in a few weeks. Robert Scoble retweeted that saying,
Starting point is 00:01:59 that is what I am hearing too, multimodal LLMs. This fall will be busy. A couple days later, Brian Romley added his voice to the chorus. By November 2023, he writes, just about everything we thought was LLMs and GBT will change. There is a very new technology on the horizon. Now, Maximum Chroma's response, I think, to Brian, is a little reasonable. They said, whenever I see people talk like this, I keep a close eye on my wallet.
Starting point is 00:02:25 But for the sake of today's weekend episode, let's not view. this with derision and skepticism, but openness and try to parse out a best guess at what they might be referring to. One has to think, given what we just discussed as relates to GPT5, that the thing that people are most anticipating right now is Google's Gemini. The rumor mill around Gemini really started to kick up at the end of June. On June 26, Wired published a piece called Google DeepMind's CEO says its next algorithm will eclipse chat GPT. Demis Hizhabis says the company is working on a system called Gemini that will tap techniques that helped AlphaGo defeat a Go champion in 2016.
Starting point is 00:03:02 Now, this was one of the first times that Hasabas had talked about Gemini in any sort of detail. He said, At a high level, you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of large models. We also have some new innovations that are going to be pretty interesting. Now, by way of background for people who aren't us familiar with AlphaGo, Wired writes, AlphaGo was based on a technique Deep Mind has pioneered called Reinforcement Learning, in which software learns to take on tough problems that require choosing what actions to take,
Starting point is 00:03:32 like in Go or video games, by making repeated attempts and receiving feedback on its performance. It also uses a method called Tree Search to explore and remember possible moves on the board. Now from there it goes widely into the realm of speculation. In a section titled New Thinking, Wired Rights, training a large language model like OpenAIs GPD4 involves feeding vast amounts of curated text from books, web pages, and other sources into machine learning software known as a transformer. It uses the patterns in that training data to become proficient at predicting the letters and words that should follow a piece of text, a simple mechanism that proves strikingly powerful at answering questions and generating text or code.
Starting point is 00:04:06 An important additional step in making ChatGPT in similarly capable language models is using reinforcement learning based on feedback from humans on an AI model's answer to finesse its performance. Deep Mind's deep experience with reinforcement learning could allow its researchers to give Gemini novel capabilities. Hesabas and his team might also try to enhance LLM technology with ideas from other areas of AI. So not a ton of information, just this prognostication that Gemini should be even better than ChatGBT, which is certainly enough to be going with when it comes to the industry getting excited. Also worth noting is that another thing that happened around this time or just a little bit earlier was that Google combined its two major AI labs into one, forcing a perhaps uncomfortable cultural challenge in order to have all of their resources focused firmly in the same direction.
Starting point is 00:04:50 But that brings us to the news from this week. the information wrote a piece called how Google is planning to beat OpenAI. And effectively, it's a story of how these two teams within Google have been forced together to try to produce something that helps Google not only catch up with its competitor in OpenAI, but greatly exceed them. John Victor writes, In April, Alphabet CEO Sundar Pichai took an unusual step, merging two large artificial intelligence teams with distinct cultures in code
Starting point is 00:05:16 to catch up and to surpass Open AI and other rivals. Now the test of that effort is coming, with hundreds of people scrambly, to release a group of large machine learning models, one of the highest stakes products the company has ever built. The models, collectively known as Gemini, are expected to give Google the ability to build products its competitors can't, according to a person involved with Gemini's development. Now, the TLDR on what these sources are saying that Gemini will do that GPT4 can't is, in short, multimodality. Gemini is supposed to combine the text capabilities of LLMs like GPT4 with AI image generators such as Mid Journey and Stable Diffusion. As the information points out, this was the first time
Starting point is 00:05:53 that those image capabilities had been reported. They also wrote, quote, Google employees have also discussed using Gemini to offer features like analyzing charts or creating graphics with text descriptions and controlling software using text or voice commands. Part of that sounds a little bit like the code interpreter features, which some folks have considered such a huge advance for GPD4, that it actually represents quietly GPT4.5. Now, of course, once released, Gemini will power everything in the Google Suite of applications. It'll be the AI not only in the barred chatbot, but the AI in Google Docs and slides. Now, one of the things that potentially gives Google an advantage is the unique data that it has access to. For example, the information had previously reported that Google had been training
Starting point is 00:06:34 Gemini on a huge corpus of YouTube video transcripts, but given these new multimodal capacities, there's no reason theoretically that they couldn't have actually trained it on the video and audio itself. This would, as the information put it, give them multimodal capabilities many researchers believe are the next frontier in AI. Models trained on YouTube videos could, for example, help a mechanic diagnose a problem with a car repair based on a video. They also might generate software code based on someone's sketch of a website or app they want to create. Acapability OpenAI has previewed but hasn't launched. Other byproducts of training on YouTube video could be features akin to those that people have been excited about in startups like Runway
Starting point is 00:07:09 ML. For example, text to video software that could generate videos just based on descriptions. Now, there aren't necessarily a ton more details about what Gemini can do, but there are more details that the information had about how it's being built. For example, they had effectively an org chart for who was in charge of the project and how these two teams had combined. They even had information about which software from the different units that were combined were used in different parts of the training process. The information also confirmed that Google co-founder, Sergey Brand has been intimately
Starting point is 00:07:37 involved in the project. As they put it, he has been, quote, running his own evaluations of the models and helps with aspects of training them. For example, quote, after the team discovered Gemini had been trained on potentially offensive content, which researchers had meant to exclude, Brin weighed in on technical decisions to retrain the models. The last part of the story is what other Silicon Valley Notables think, particularly venture capitalists, and by and large, it seems like the sentiment is, finally Google has some fireback. Finally, they are going to actually compete. Now, the only other notable thing is that the information also reports that some of the compromises
Starting point is 00:08:09 these teams have had to make in order to figure out how to work together are not. not necessarily optimal for developing great software. But of course, ultimately, how much of a cost or a challenge those issues actually were will really be borne out or not in the software they release. I think in many ways the real question is just how much it feels to consumers, like multimodality, which is so obviously the next frontier when it comes to these LLMs and these generative AI tools in general, feels like a natural evolution or something that reignites a new hype cycle because of that old idea that sufficiently advanced technology is indistinguishable from magic. I don't think it's bad either way. To the extent that we get another hype cycle pop,
Starting point is 00:08:47 it'll create a context for a lot more people to get their eyes on these tools and start experimenting, which I believe is probably worth the cost that it comes with of things like over-hypy influencers and the other natural negative externalities when it comes to a buzzy area. If on the other hand it doesn't get that hypey pop, my guess is that it's not going to be because the technology isn't incredibly impressive or useful. My guess instead is that that will simply represent that we have really well and truly moved on to a new phase, a new phase in which the prioritization is not on how good things sound, but what they can actually do. I also don't think it would be the worst thing in the world if the release of a truly multimodal LLM in the form of Google Gemini
Starting point is 00:09:25 just helps people who are already bought into this space even more deeply integrated into the work that they're already doing. Either way, I'm excited that people are excited for the coming fall. It has been ultimately a fairly quiet summer, but in a field this dynamic, I don't think quiet lasts for long. Anyways, friends, that is going to do it for today's AI breakdown. I hope you are having a wonderful weekend. If you enjoyed this, do me such a huge favor. Click the like button, click the subscribe button. And if you're listening as a podcast, go consider leaving a review or a five-star rating.
Starting point is 00:09:55 Until next time, friends, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.