The AI Daily Brief: Artificial Intelligence News and Analysis - Gemini Delayed: Is Google In the Innovator's Dilemma?
Episode Date: November 17, 2023The Information reports that much-anticipated Gemini has been officially delayed. NLW explores interpretations for why Google has struggled to keep pace with OpenAI and other AI competitors. Today's ...Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown Notion - Notion AI. Knowledge, answers, ideas. One click away. - https://notion.com/aibreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, Google Gemini is officially delayed and Google appears to be stuck in the innovator's dilemma.
Before that of the brief, Discord has shut its chatbot down, but does that matter?
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our Discord, our YouTube channel, and our newsletter.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
Well, today we have, as we pretty much have every day, a bunch of stories.
of new AI features and tools being released and blowing people's minds and changing the way that we
work. But we also have this story, which we're kicking off with, of Discord, shutting down its
experimental AI chatbot. Now, I can see this making a lot of news as a potential tide turner
or shift representing a move away from AI hype, but let's talk about it a little bit and find out
how we should actually interpret it. So, first of all, Clyde is an experimental AI chatbot. It has
been around for a few months. It has not been available for everyone. And it was built
on top of OpenAI's models. It basically allowed Discord users to use some version of chat
GPT directly from inside the app. And that Discord had planned to make it a fundamental part of the app,
and as the Verge writes, it's not clear why Clyde is suddenly shutting down. Quote, it's possible
the chatbot may return as a paid Nitro-only feature in the future, or perhaps Discord
has learned enough from its testing period and decided an AI chatbot doesn't need to be
baked into its service. And here's what I think. I do not think that this represents some tide-turning,
move away from generative AI or past the peak hype cycle or anything like that, I think that the
specific format and user experience of a chatbot just being dropped into every other experience
has always been an obvious thing to try, but inevitably an experience where in most cases
it's not going to be a core part of the experience. There are absolutely some types of interactions
and existing apps and platforms where a chatbot is a really valuable addition. What's more,
there's an argument that over time, natural language interactions that come in the form of chatbots
may be the default way that we interact with computers. However, we are not there yet. Our default way of
interacting with applications and computers is the way that we've been trained to do so forever,
pointing, clicking, typing, etc. Just slapping a chatbot in things just because it might work,
doesn't mean it's going to work. And I think that what we have here is Discord being a smart enough
company to say, it didn't work. In other words, I wouldn't look into it any farther than that.
But what about new tools that have come out?
Well, moving over to Meta, they are teasing a new AI-powered editing suite that is coming to Facebook
and Instagram.
And this, I think, could be really cool.
Speaking of shifting the default interface for how we interact with computers to natural language,
this is kind of an example of that.
The new tool uses Meta's emu model, and in one of the picture examples they give,
the user prompts the tool to turn the dog in the picture into a panda, which it dutifully does.
So some of this isn't totally new.
Adobe, Google, and Canva have all already put out tools by which people
can use natural language to edit photos. So, for example, removing or replacing objects,
taking people out of photos, basically doing all the things that you might have previously
been able to do with Photoshop, or at least a combination of Photoshop and in-painting, but doing
so with natural language. This is not an insignificant shift. When this sort of tool first was
announced by Adobe earlier in the year, people absolutely freaked out because they saw how
instantly useful it actually was. What this represents is that this is now absolute table stakes,
and that all platforms that interact with images in any meaningful way are going to have to have
something like this built in because people are just going to start to expect it.
And of course, that gets back to the idea that these natural language interfaces will become more
default because if every service that you use to interact with images always allows you to
use natural language to edit those images, well, guess what?
You're going to get really used to editing images with natural language.
So that was Emu Edit, that was one of two tools that were announced by Mark Zuckerberg yesterday.
Now, to the extent that they are pitching something different from that existing suite of tools,
It's that they believe that the emu model is sophisticated enough that it can identify what you're asking it to change or modify
without having to actually manually select it.
So, for example, when we were talking before about turning a dog into a panda,
you wouldn't have to select the dog in the photo for it to get the gist.
Instead, you can just say, turn the dog into a panda,
and emu should be sophisticated enough to actually identify what the dog in the photo is.
Which is, yes, a shift in scale, not a shift in kind,
but something that brings the natural language interface for editing much closer to what the average user would expect.
Now, the second tool that was announced was called Imu Video.
This is a video generator from text prompts and reference images, i.e. something along the lines
of Pika Labs or runway, and so far the results are a little bit farther behind those tools.
As the Verge writes, the results seem far from realistic.
However, as they point out, they also look like a step up from the rough animations
that Metas make-a-video system was producing last year.
Overall, I think this is a great example of what we've talked about a lot on this show,
which is the fact that we are moving into an integration phase,
where instead of just releasing an endless stream of new tools,
the capacities that AI has are being integrated into existing workflows where users actually spend their time.
In other words, maybe there's nothing that different about what meta's emu edit can do,
as compared to, for example, Adobe.
But the fact that it's going to be natively integrated in Facebook and Instagram where people are already sharing their photos
means that it's likely to get a heck of a lot more usage than the scenario where people would have had to take their photos out,
put them into Photoshop or another Adobe or Google tool, and then bring them back to Instagram where they could then post them.
Next up is a tool that is not from one of the big companies, but is a prototype
that has been absolutely lighting up Twitter slash X.
It's a collaborative whiteboard app called TL Draw
that's already a very cool experience.
When you go to TL Draw, you're literally presented
with the whiteboard that you can then draw on
Hi, AI, breakers.
And then you can do stuff with it.
You can put in shapes, you can put in arrows.
Basically, you can design actual mockups
and basic user experiences.
But what has people really excited
is the new Make It Real feature.
Make It Real is a little button
where when you press Make It Real,
The system will automatically take what you have mocked up using its very simple tools and turn it into real, quote-unquote, looking UI.
Now, this is being powered by the OpenAI API, so to get access to make it real, you have to input your own API key.
And of course, this takes advantage of the new GPT4 vision, which is a version of GPT4 that actually can interpret visual images and use those pictures as prompts.
Simon Willison explained on Twitter how this works.
He says, found the system prompt that drives this thing here.
It works by generating a base 64 encoded PNG of the draw components, then power.
passing that to GPT4 Vision with that system prompt and instructions to turn this into a single
HTML file using Tailwind. However, one of the things that makes this even more interesting
is that you can actually use annotations in the wireframes that you're creating to let
GPT4V understand what it's supposed to do. So it's more sophisticated even than just copying an image
and turning it into HTML, which would have been cool enough on its own right. Overall, for as cool
as this is, it's not so much that it's groundbreaking technology, in the sense that we already
knew about GPT4 Vision. It's just a great example of how these models are going to be increasing
implemented into specific use cases and workflows around those use cases that make them dead
simple and totally transformational to use. As ours, technical writes, it feels like we've been
given a preview of a possible future mode of software development or interface design at the very
least. We're creating a working prototype as simple as making a visual mock-up and having an
AI model do the rest. Now, a couple other examples of how AI is being integrated into existing
professions. Axios writes, AI helps defense attorneys sift through police body cam videos. So basically,
this is about a new company called Justice Text, which was founded by two University of Chicago
undergrads, and what it does is it takes advantage of the fact that police departments around the
country are now in many cases requiring officers to wear body cams that capture their interactions
throughout the day. That can be an incredible trove of information for defense attorneys and
public defenders and basically the people who have to defend suspects and who could use that
information to defend suspects or the police officers themselves, and can use that information
to figure out what actually went down rather than just having things deteriorate to what he said
she said. However, sifting through thousands and thousands of hours of video is not an easy task.
So, Justice Text is basically an AI model that is optimized for helping attorneys review transcripts
of body cam footage as well as interrogation videos and phone calls that happen from jail.
By converting audio and video into text, lawyers can look for keywords such as hands up or get
down and find important moments that happen during police encounters. To me, it's a great
example of a very simple, but very cool use case of how AI is going to change and improve how
things happen, even in a boring administrative context. Lastly, a nice little story to close out a week
where there has been a lot of discussion of the geopolitics of AI. Of course, with China's President
Xi visiting President Biden in San Francisco, Google CEO Sondar Pichai has compared AI to
climate change at the APEC CEO summit on Thursday. The comparison point is basically that it affects
everyone, and that, quote, AI advances will get out to all the countries and so it's naturally
the kind of technology that I don't think there's any unilateral safety to be had. In other words,
if it goes wrong in one country, it's going to impact other countries. Thus, he said, in some ways,
it's like climate change in the planet. We all share a planet, and I think that's true for AI. You have to
start building the frameworks globally. And so perhaps if you agree with that take, maybe it was the
right idea for Rishi Sunak in the UK AI Safety Summit to invite the Chinese after all. In any case,
Google is in fact the subject of our main episode. So for that, stick around because it's coming up next.
And now a word from today's sponsor. Are you interested in how two-tested in how two-trial
Top of mind trends, AI and crypto can work together?
If so, I have the perfect podcast recommendation for you.
Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz.
Web3 with A16Z Crypto is your definitive resource for the future of the internet.
Whether you're already building in these spaces or simply curious about what's next.
If you need a place to start, they recently released an excellent episode with Stanford
Cryptography Professor Dan Bonay and former Google Xer Aliya in conversation with host
Sonal Choxi about the intersection of AI and Crypto.
From fighting deepfakes and proving humanity to large language models like ChatGBT, BT, they cover
it all.
I highly recommend checking it out, especially if you'd like to learn more about how
AI and crypto will impact our everyday lives.
Beyond Crypto and AI, this show is for creators seeking more ways to truly own their work,
for business leaders trying to prepare for the future today, and for innovators exploring
trending tech topics.
So go ahead, listen to Web3 with A16Z Crypto wherever you get your podcasts.
And now a quick word from today's sponsor. I am a huge Notion user. We're talking multiple accounts
for multiple projects. I use it for everything from applicant tracking to note taking to project
management, to sharing public documents, to frantically capturing ideas I have while out hiking or
just driving around. Given that and given the topic of the AI breakdown, I was excited to learn that
they've launched a new AI tool called Q&A. It's like a personal assistant that responds in seconds with
exactly what you need. Notion AI can give you instant answers to your questions using information
from across your wiki, projects, docs, and meeting notes. For someone like me who makes dozens
of notes per day around a huge array of topics, having a built-in AI tool to help recall that is
incredibly useful. Now, beyond that use case, think about this. Have an urgent question you normally
turn to a coworker to answer? Just ask Q&A instead. It'll search through thousands of documents in
seconds and answer your question in clear language no matter how large or complex your workspace is.
Plus, you can trust your data is secure because Notion AI is designed to protect your information.
No AI models are trained with your information, the data is encrypted, and answers will never
use information from pages you don't have access to. With Notion AI, it's even easier to do your most
meaningful work. Try Notion AI for free when you go to notion.com slash AI breakdown.
That's all lowercase letters, notion.com slash AI breakdown to try the powerful, easy to use
notion AI today. And when you use our link, you're supporting the show.
One more time, that's Notion.com slash AI breakdown.
On Monday, November 6th, OpenAI held what was unarguably one of the most exciting days in
AI development this fall, which was their OpenAI Dev Day.
On that day, we got new GPT4 Turbo with a 128K context window.
We got the assistance API.
We got custom GPs.
We had all of these things showing just how quickly OpenAI is moving.
Then just earlier this week, we heard full.
from Microsoft, and they had made their latest AI announcements. Meta is constantly in the news
with some feature or another, and Amazon is rumored to be working on something as well,
and the question that this has left, summed up by playground founders who hail,
tweeting on that day, the day of the OpenAI Dev Day, where is Google? Well, where Google is,
is officially delayed with the release of their Gemini model. So what we were doing today is we are
talking all about what's going on, why the delays are happening, and what we know right now.
First of all, let's talk about what this model is. Gemini is not Bard. BARD is, of course,
Google's currently available model, and to its credit, Bard continues to get updates.
We talked earlier this week about the fact that BARD now has improved mathematical capabilities,
and has been officially released to more age groups. Specifically, it has been made available
for younger people after a bunch of testing and working with experts to make sure it is more safe.
Bard is also starting to get code interpreter or advanced data analysis type of data
visualization capacities, which should make it even more useful as well.
But what people have been really waiting for is not barred, but a more advanced model,
which many have assumed would be the first to actually challenge the supremacy of GPT4 from a performance standpoint.
And indeed, the last reporting that we had suggested that Gemini was right around the corner.
On September 14th, the information reported that Google was nearing the release of its Gemini AI model
based on the fact that it had started to give access to companies outside of Google versions of the model.
According to three people who had direct knowledge of the matter,
giving outside developers access to Gemini suggested that Google was getting close to officially launching the thing.
Now, there are a bunch of things that people have been looking forward to when it comes to Gemini.
First of all, it was supposed to be natively multimodal.
And second, as the information writes,
Gemini has an advantage over GPD4 in at least one respect said a person who has tested it.
The model leverages reams of Google's proprietary data from its consumer products
in addition to public information from the web.
As a result, the model should be especially accurate when it comes to understanding users' intentions with particular queries,
and it appears to generate fewer incorrect answers known as hallucinations.
The information had also previously reported that Gemini had greatly improved
co-generating capabilities, which is, of course, a major initial use case.
And on top of all of that, there's just a speculation around how much power had actually
gone into this model.
Semi-analysts wrote a post that very clearly got under Sam Altman's skin because he seemed
to comment on it on Twitter later.
On August 27th, they published a piece called Google Gemini Eats the World.
Gemini smashes GpT4 by 5X, the GPU pours.
Now, semi-analysts is a blog that follows the semiconductor and chip industry, and what they wrote in this piece was that the sleeping giant Google has woken up, and they are iterating on a pace that will smash GPT4 total pre-training flops by 5x before the end of the year. The path is clear to 20x by the end of next year given their current infrastructure buildout. Now, they noted whether Google has the stomach to put out these models without publicly neutering their creativity or their existing business model is a different discussion. Now, this blog post was not claiming that just having more computing power necessarily meant making a better model.
They were just identifying that Google was, as they put it, the most compute-rich firm in the world.
So, the point that I'm trying to convey here is that there has been a lot of hype around Gemini.
However, every time things get pushed back, the bar that it needs to clear gets a little bit higher.
As Jim Fan from Nvidia wrote, expectations for Google Gemini is now ridiculously high.
Gemini has to check off at least one of the following, 120% IQ of textual GPT4, or 100% of GPT4 but at half the cost,
2x speed of turbo or 100% of visual GPT4 or natively supports long videos and ship the API in
Q1 of 2024.
Now, Jim suggests that they had to ship the API in Q1 of 2024 to meet expectations, but I think
we are already well past the point where everything feels delayed.
And indeed, that is sort of the thrust of the information article which was published
just yesterday, Google delays release of Gemini AI that aims to compete with OpenAI.
The piece begins, Google's company-defining effort to catch up to check up to check.
AppGPT creator OpenAI is turning out to be harder than expected.
And apparently, it wasn't just reporting from the information, but Google representatives had
actually told their business partners and cloud customers that they would have access to
the new Gemini model around November. Well, here we are in November, and the company has recently
told them to not expect it until the first quarter of next year. There are a bunch of reasons
this is problematic. One of them, which isn't just related to how cool people think they are in
the AI space, is their actual bottom line. In recent earnings reports, while Microsoft's Azure
cloud business, outperformed analyst estimates, Google had their slowest quarter of growth since
something like 2019. In other words, having access to things like OpenAI's models is making a
meaningful difference in how Google is competing on that core cloud business. Now, what's more,
the information notes that the delays around Google having a credible competitor to chat GPT,
and more specifically the GPT4 API, means that not only are they losing out on that consumer
brand opportunity, but they're also losing the affinity of developers. The more that people start building
on top of GPT4 and GPT3.5 and OpenAI's APIs in general, the harder it's going to get for Google
to win developers over to build this next generation set of applications on their tools instead.
There is a real platform lock-in that can happen, which can be extremely challenging to try
to dislodge. There's also the fact that success when it comes to use of artificial intelligence
tools is somewhat self-reinforcing, in that the more that OpenAI's chat GPT gets used,
the better data open AI has to improve chat GPT in the future. Now, spokespeople for
the company have sort of tried to delay and deflect. Spokesperson Catherine Watson says we're not
commenting on rumors or speculation, and Google CEO Sundar Pichai said at a public event that they were,
quote, focused on getting Gemini one point out as soon as possible, make sure it's competitive
state of the art and will build from there on. As recently as an investor call on October 24th,
Pichai said that they are, quote, laying the foundation of what I think of as a next generation of
models will be launching all throughout 2024. So what are the possibilities here? Well, one of them is, of course,
something similar to what caused Amazon's shifted plans last year after ChatGPT came out.
Remember, Amazon was planning on releasing a model called Bedrock at their annual event last November,
but at the last minute decided to delay.
Lucky they did because in the middle of that event, ChatchipetT was announced,
and it just blew what was then Amazon Bedrock out of the water.
Of course, that caused a switch in strategy, and Bedrock instead became the Enterprise Sandbox,
through which Amazon helps its enterprise and business customers customize their own models
and select from a variety of different models,
designed for their own specific purposes,
but perhaps Google Gemini is going through something similar.
In other words, as painful as it is to delay,
it is still infinitely better to delay
rather than put something out that isn't actually as good,
or frankly, isn't meaningfully better than Open AI's models.
Now, the other thing is that there are a lot of different pieces of software
that Gemini has to actually power.
The information writes,
Google has high hopes for Gemini that go beyond boosting enterprise software sales.
The company wants it to power new tools for creators on YouTube,
such as the ability to generate custom backgrounds for videos,
and improve the capabilities of BARD as well as Google Assistant,
Google Siri-like voice assistant for phones and other devices.
Apparently, according to people with knowledge of the project,
Google has developed several different versions of Gemini,
including some smaller versions of the model that have been tested by outside developers,
but that primary main biggest version of Gemini remains incomplete.
Kind of reiterating what I just said,
the information writes,
A key challenge for the Gemini team is making sure the primary model
is as good as or better than OpenAI's most advanced LLL.
GPD4. It's not clear whether Google has met that standard, said a person who's been involved in the
effort. Now, other speculations outside of just raw performance relative to GPT4 is that some
specific use cases for Gemini involve different types of capacities that just might not be ready
yet. For example, they talk about advertising, which is, of course, the vast, vast majority
of Google's revenue. Here's how Gemini describes the challenge. Gemini also has a longer memory
of a user's interactions compared to earlier Google LLM, such as Palm 2, which currently powers barred
and generative AI results in Google search.
The longer memory could allow advertisers
to compare the performance of campaigns over time.
For instance, an advertiser could use the model
to create new variants of the best performing ad copy
over the last month. Now, of course, that's not a trivial
thing to do. Increasing memory
and contextual understanding is a big
challenge. Still, it seems like part of
the biggest reason for the delays are more
on the organizational side than anything else.
Basically, two teams that were separate.
Google Brain and Google DeepMind
had to come together to work on this model.
That merging is a real challenge.
even if everyone is totally aligned and totally in sync, which is very hard to achieve on its own
right, bringing two totally separate teams together to actually work productively,
especially when they're both bringing their fits and starts and things that they've
experimented with and things that have worked and haven't and biases and ideas and basically their
past track record. It's no mean feat. Now also apparently Google co-founder, Sergey Brin has been
involved in the conversations, not as a formal decision maker, but as offering criticism and
feedback on Gemini. And as much as some people are reporting that, like this shows how serious
Google is about this effort and how big a deal it is, which is true. I also don't know that it's the
best strategy to have a founder with a controlling stake, but no actual decision-making authority
other than implied decision-making authority involved in running a team that needs to be highly
efficient and directed. There's nothing to indicate that this is happening, but I can see a
million pitfalls around leadership and who's actually in charge and who has authority, and it just seems
like a big challenge to me, and not necessarily a sign of strength, let's put it that way. To many,
what this looks like is a classic situation of Google being trapped in the innovator's dilemma.
A blog post recently came out what I learned getting acquired by Google, and excerpts from that
seemed to give a picture of just how hard things are to get done inside that company.
Shrean's Benzali, who wrote the piece, writes,
Google is an ever-shifting web of goals and efforts.
Amazing things are possible at Google if the right people care about them, a VP that gets it,
a research team with a related charter or compatibility with an org's goals.
Navigating this mess of interest is half of a PM's job.
And then you need the blessing of approvers like privacy, trust and safety, and infracapacity.
It takes dozens of conversations to know if an idea is viable and hundreds more to make it a reality.
And that's the happy path. Amazing things are possible at Google if the right people keep caring about them.
Team goals can change in any given quarter and entire teams can disappear in the face of a reorg.
A phenomenon so common that Googlers can look past the tragedy and see the comedy in it.
Suppose you dodge all those bullets, you might still wake up to find that while you were working on your project,
two distant teams were also working on the same idea, and the time is
come to fight it out because now only one can proceed. The internal users of the losing projects
will find that the API they depended on is now deprecated, but the replacement, well, it's not quite
ready yet. Googleers wanted to ship great work, but often couldn't. While there were undoubtedly
people who came in for the food, worked three hours a day and enjoyed their early retirements,
all the people I met were earnest hardworking and wanted to do great work. Would beat them
down were the gauntlet of reviews, the frequent reorgs, the institutional scar tissue from past
failures, and the complexity of doing even simple things on the world stage. Startups can afford to
ignore many concerns, Googlers rarely can. Jim Fan again extends this thinking into the world of
AI and writes, this sheds light on why OpenAI spearheaded the LLM revolution, even though most
of the foundation tech originated at Google. Google is right in the thick of the innovator's
dilemma. I'm sure they know very well the power of foundation models. After all, they invented
transformer, AlphaGo, and Flamingo, the key ingredients of GPs. Yet it is so difficult to justify
diverting resources away from the ongoing profitable products or potentially even cannibalized search to promote
LLMs. Siting the article, Google is an ever-shifting web of goals and efforts. You need all the relevant
higher-ups to agree on the agenda, actively fight for enormous resources, and push back against
all the other parties that want the minimal disruption. Too many stars need to align. That is a very
tall order. That being said, I don't think it's fair to blame anyone. The institutional bureaucracy
is a natural emergent property of companies at this scale. 10 years from now, open AI and
Anthropic may suffer from the same if they grow to that order of magnitude. I think Chibb is right,
But I also think it doesn't matter.
Ultimately, this battle is now, not 10 years from now.
And right now, Google is losing.
However, counting them out too early seems a little premature to me.
And so for now, we're just going to have to wait and see what actually comes out with Gemini.
That's going to do it for today's AI breakdown.
I appreciate you listening or watching as always.
Until next time, peace.
