The AI Daily Brief: Artificial Intelligence News and Analysis - Gemini Delayed: Is Google In the Innovator's Dilemma?

Starting point is 00:00:00 Today on the AI breakdown, Google Gemini is officially delayed and Google appears to be stuck in the innovator's dilemma. Before that of the brief, Discord has shut its chatbot down, but does that matter? The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, our YouTube channel, and our newsletter. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. Well, today we have, as we pretty much have every day, a bunch of stories. of new AI features and tools being released and blowing people's minds and changing the way that we work. But we also have this story, which we're kicking off with, of Discord, shutting down its

Starting point is 00:00:44 experimental AI chatbot. Now, I can see this making a lot of news as a potential tide turner or shift representing a move away from AI hype, but let's talk about it a little bit and find out how we should actually interpret it. So, first of all, Clyde is an experimental AI chatbot. It has been around for a few months. It has not been available for everyone. And it was built on top of OpenAI's models. It basically allowed Discord users to use some version of chat GPT directly from inside the app. And that Discord had planned to make it a fundamental part of the app, and as the Verge writes, it's not clear why Clyde is suddenly shutting down. Quote, it's possible the chatbot may return as a paid Nitro-only feature in the future, or perhaps Discord

Starting point is 00:01:22 has learned enough from its testing period and decided an AI chatbot doesn't need to be baked into its service. And here's what I think. I do not think that this represents some tide-turning, move away from generative AI or past the peak hype cycle or anything like that, I think that the specific format and user experience of a chatbot just being dropped into every other experience has always been an obvious thing to try, but inevitably an experience where in most cases it's not going to be a core part of the experience. There are absolutely some types of interactions and existing apps and platforms where a chatbot is a really valuable addition. What's more, there's an argument that over time, natural language interactions that come in the form of chatbots

Starting point is 00:02:04 may be the default way that we interact with computers. However, we are not there yet. Our default way of interacting with applications and computers is the way that we've been trained to do so forever, pointing, clicking, typing, etc. Just slapping a chatbot in things just because it might work, doesn't mean it's going to work. And I think that what we have here is Discord being a smart enough company to say, it didn't work. In other words, I wouldn't look into it any farther than that. But what about new tools that have come out? Well, moving over to Meta, they are teasing a new AI-powered editing suite that is coming to Facebook and Instagram.

Starting point is 00:02:35 And this, I think, could be really cool. Speaking of shifting the default interface for how we interact with computers to natural language, this is kind of an example of that. The new tool uses Meta's emu model, and in one of the picture examples they give, the user prompts the tool to turn the dog in the picture into a panda, which it dutifully does. So some of this isn't totally new. Adobe, Google, and Canva have all already put out tools by which people can use natural language to edit photos. So, for example, removing or replacing objects,

Starting point is 00:03:02 taking people out of photos, basically doing all the things that you might have previously been able to do with Photoshop, or at least a combination of Photoshop and in-painting, but doing so with natural language. This is not an insignificant shift. When this sort of tool first was announced by Adobe earlier in the year, people absolutely freaked out because they saw how instantly useful it actually was. What this represents is that this is now absolute table stakes, and that all platforms that interact with images in any meaningful way are going to have to have something like this built in because people are just going to start to expect it. And of course, that gets back to the idea that these natural language interfaces will become more

Starting point is 00:03:35 default because if every service that you use to interact with images always allows you to use natural language to edit those images, well, guess what? You're going to get really used to editing images with natural language. So that was Emu Edit, that was one of two tools that were announced by Mark Zuckerberg yesterday. Now, to the extent that they are pitching something different from that existing suite of tools, It's that they believe that the emu model is sophisticated enough that it can identify what you're asking it to change or modify without having to actually manually select it. So, for example, when we were talking before about turning a dog into a panda,

Starting point is 00:04:04 you wouldn't have to select the dog in the photo for it to get the gist. Instead, you can just say, turn the dog into a panda, and emu should be sophisticated enough to actually identify what the dog in the photo is. Which is, yes, a shift in scale, not a shift in kind, but something that brings the natural language interface for editing much closer to what the average user would expect. Now, the second tool that was announced was called Imu Video. This is a video generator from text prompts and reference images, i.e. something along the lines of Pika Labs or runway, and so far the results are a little bit farther behind those tools.

Starting point is 00:04:34 As the Verge writes, the results seem far from realistic. However, as they point out, they also look like a step up from the rough animations that Metas make-a-video system was producing last year. Overall, I think this is a great example of what we've talked about a lot on this show, which is the fact that we are moving into an integration phase, where instead of just releasing an endless stream of new tools, the capacities that AI has are being integrated into existing workflows where users actually spend their time. In other words, maybe there's nothing that different about what meta's emu edit can do,

Starting point is 00:05:01 as compared to, for example, Adobe. But the fact that it's going to be natively integrated in Facebook and Instagram where people are already sharing their photos means that it's likely to get a heck of a lot more usage than the scenario where people would have had to take their photos out, put them into Photoshop or another Adobe or Google tool, and then bring them back to Instagram where they could then post them. Next up is a tool that is not from one of the big companies, but is a prototype that has been absolutely lighting up Twitter slash X. It's a collaborative whiteboard app called TL Draw that's already a very cool experience.

Starting point is 00:05:29 When you go to TL Draw, you're literally presented with the whiteboard that you can then draw on Hi, AI, breakers. And then you can do stuff with it. You can put in shapes, you can put in arrows. Basically, you can design actual mockups and basic user experiences. But what has people really excited

Starting point is 00:05:44 is the new Make It Real feature. Make It Real is a little button where when you press Make It Real, The system will automatically take what you have mocked up using its very simple tools and turn it into real, quote-unquote, looking UI. Now, this is being powered by the OpenAI API, so to get access to make it real, you have to input your own API key. And of course, this takes advantage of the new GPT4 vision, which is a version of GPT4 that actually can interpret visual images and use those pictures as prompts. Simon Willison explained on Twitter how this works. He says, found the system prompt that drives this thing here.

Starting point is 00:06:16 It works by generating a base 64 encoded PNG of the draw components, then power. passing that to GPT4 Vision with that system prompt and instructions to turn this into a single HTML file using Tailwind. However, one of the things that makes this even more interesting is that you can actually use annotations in the wireframes that you're creating to let GPT4V understand what it's supposed to do. So it's more sophisticated even than just copying an image and turning it into HTML, which would have been cool enough on its own right. Overall, for as cool as this is, it's not so much that it's groundbreaking technology, in the sense that we already knew about GPT4 Vision. It's just a great example of how these models are going to be increasing

Starting point is 00:06:50 implemented into specific use cases and workflows around those use cases that make them dead simple and totally transformational to use. As ours, technical writes, it feels like we've been given a preview of a possible future mode of software development or interface design at the very least. We're creating a working prototype as simple as making a visual mock-up and having an AI model do the rest. Now, a couple other examples of how AI is being integrated into existing professions. Axios writes, AI helps defense attorneys sift through police body cam videos. So basically, this is about a new company called Justice Text, which was founded by two University of Chicago undergrads, and what it does is it takes advantage of the fact that police departments around the

Starting point is 00:07:26 country are now in many cases requiring officers to wear body cams that capture their interactions throughout the day. That can be an incredible trove of information for defense attorneys and public defenders and basically the people who have to defend suspects and who could use that information to defend suspects or the police officers themselves, and can use that information to figure out what actually went down rather than just having things deteriorate to what he said she said. However, sifting through thousands and thousands of hours of video is not an easy task. So, Justice Text is basically an AI model that is optimized for helping attorneys review transcripts of body cam footage as well as interrogation videos and phone calls that happen from jail.

Starting point is 00:08:01 By converting audio and video into text, lawyers can look for keywords such as hands up or get down and find important moments that happen during police encounters. To me, it's a great example of a very simple, but very cool use case of how AI is going to change and improve how things happen, even in a boring administrative context. Lastly, a nice little story to close out a week where there has been a lot of discussion of the geopolitics of AI. Of course, with China's President Xi visiting President Biden in San Francisco, Google CEO Sondar Pichai has compared AI to climate change at the APEC CEO summit on Thursday. The comparison point is basically that it affects everyone, and that, quote, AI advances will get out to all the countries and so it's naturally

Starting point is 00:08:37 the kind of technology that I don't think there's any unilateral safety to be had. In other words, if it goes wrong in one country, it's going to impact other countries. Thus, he said, in some ways, it's like climate change in the planet. We all share a planet, and I think that's true for AI. You have to start building the frameworks globally. And so perhaps if you agree with that take, maybe it was the right idea for Rishi Sunak in the UK AI Safety Summit to invite the Chinese after all. In any case, Google is in fact the subject of our main episode. So for that, stick around because it's coming up next. And now a word from today's sponsor. Are you interested in how two-tested in how two-trial Top of mind trends, AI and crypto can work together?

Starting point is 00:09:14 If so, I have the perfect podcast recommendation for you. Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz. Web3 with A16Z Crypto is your definitive resource for the future of the internet. Whether you're already building in these spaces or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi about the intersection of AI and Crypto. From fighting deepfakes and proving humanity to large language models like ChatGBT, BT, they cover

Starting point is 00:09:49 it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond Crypto and AI, this show is for creators seeking more ways to truly own their work, for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z Crypto wherever you get your podcasts. And now a quick word from today's sponsor. I am a huge Notion user. We're talking multiple accounts

Starting point is 00:10:19 for multiple projects. I use it for everything from applicant tracking to note taking to project management, to sharing public documents, to frantically capturing ideas I have while out hiking or just driving around. Given that and given the topic of the AI breakdown, I was excited to learn that they've launched a new AI tool called Q&A. It's like a personal assistant that responds in seconds with exactly what you need. Notion AI can give you instant answers to your questions using information from across your wiki, projects, docs, and meeting notes. For someone like me who makes dozens of notes per day around a huge array of topics, having a built-in AI tool to help recall that is incredibly useful. Now, beyond that use case, think about this. Have an urgent question you normally

Starting point is 00:10:57 turn to a coworker to answer? Just ask Q&A instead. It'll search through thousands of documents in seconds and answer your question in clear language no matter how large or complex your workspace is. Plus, you can trust your data is secure because Notion AI is designed to protect your information. No AI models are trained with your information, the data is encrypted, and answers will never use information from pages you don't have access to. With Notion AI, it's even easier to do your most meaningful work. Try Notion AI for free when you go to notion.com slash AI breakdown. That's all lowercase letters, notion.com slash AI breakdown to try the powerful, easy to use notion AI today. And when you use our link, you're supporting the show.

Starting point is 00:11:36 One more time, that's Notion.com slash AI breakdown. On Monday, November 6th, OpenAI held what was unarguably one of the most exciting days in AI development this fall, which was their OpenAI Dev Day. On that day, we got new GPT4 Turbo with a 128K context window. We got the assistance API. We got custom GPs. We had all of these things showing just how quickly OpenAI is moving. Then just earlier this week, we heard full.

Starting point is 00:12:06 from Microsoft, and they had made their latest AI announcements. Meta is constantly in the news with some feature or another, and Amazon is rumored to be working on something as well, and the question that this has left, summed up by playground founders who hail, tweeting on that day, the day of the OpenAI Dev Day, where is Google? Well, where Google is, is officially delayed with the release of their Gemini model. So what we were doing today is we are talking all about what's going on, why the delays are happening, and what we know right now. First of all, let's talk about what this model is. Gemini is not Bard. BARD is, of course, Google's currently available model, and to its credit, Bard continues to get updates.

Starting point is 00:12:45 We talked earlier this week about the fact that BARD now has improved mathematical capabilities, and has been officially released to more age groups. Specifically, it has been made available for younger people after a bunch of testing and working with experts to make sure it is more safe. Bard is also starting to get code interpreter or advanced data analysis type of data visualization capacities, which should make it even more useful as well. But what people have been really waiting for is not barred, but a more advanced model, which many have assumed would be the first to actually challenge the supremacy of GPT4 from a performance standpoint. And indeed, the last reporting that we had suggested that Gemini was right around the corner.

Starting point is 00:13:22 On September 14th, the information reported that Google was nearing the release of its Gemini AI model based on the fact that it had started to give access to companies outside of Google versions of the model. According to three people who had direct knowledge of the matter, giving outside developers access to Gemini suggested that Google was getting close to officially launching the thing. Now, there are a bunch of things that people have been looking forward to when it comes to Gemini. First of all, it was supposed to be natively multimodal. And second, as the information writes, Gemini has an advantage over GPD4 in at least one respect said a person who has tested it.

Starting point is 00:13:53 The model leverages reams of Google's proprietary data from its consumer products in addition to public information from the web. As a result, the model should be especially accurate when it comes to understanding users' intentions with particular queries, and it appears to generate fewer incorrect answers known as hallucinations. The information had also previously reported that Gemini had greatly improved co-generating capabilities, which is, of course, a major initial use case. And on top of all of that, there's just a speculation around how much power had actually gone into this model.

Starting point is 00:14:20 Semi-analysts wrote a post that very clearly got under Sam Altman's skin because he seemed to comment on it on Twitter later. On August 27th, they published a piece called Google Gemini Eats the World. Gemini smashes GpT4 by 5X, the GPU pours. Now, semi-analysts is a blog that follows the semiconductor and chip industry, and what they wrote in this piece was that the sleeping giant Google has woken up, and they are iterating on a pace that will smash GPT4 total pre-training flops by 5x before the end of the year. The path is clear to 20x by the end of next year given their current infrastructure buildout. Now, they noted whether Google has the stomach to put out these models without publicly neutering their creativity or their existing business model is a different discussion. Now, this blog post was not claiming that just having more computing power necessarily meant making a better model. They were just identifying that Google was, as they put it, the most compute-rich firm in the world. So, the point that I'm trying to convey here is that there has been a lot of hype around Gemini. However, every time things get pushed back, the bar that it needs to clear gets a little bit higher.

Starting point is 00:15:20 As Jim Fan from Nvidia wrote, expectations for Google Gemini is now ridiculously high. Gemini has to check off at least one of the following, 120% IQ of textual GPT4, or 100% of GPT4 but at half the cost, 2x speed of turbo or 100% of visual GPT4 or natively supports long videos and ship the API in Q1 of 2024. Now, Jim suggests that they had to ship the API in Q1 of 2024 to meet expectations, but I think we are already well past the point where everything feels delayed. And indeed, that is sort of the thrust of the information article which was published just yesterday, Google delays release of Gemini AI that aims to compete with OpenAI.

Starting point is 00:16:00 The piece begins, Google's company-defining effort to catch up to check up to check. AppGPT creator OpenAI is turning out to be harder than expected. And apparently, it wasn't just reporting from the information, but Google representatives had actually told their business partners and cloud customers that they would have access to the new Gemini model around November. Well, here we are in November, and the company has recently told them to not expect it until the first quarter of next year. There are a bunch of reasons this is problematic. One of them, which isn't just related to how cool people think they are in the AI space, is their actual bottom line. In recent earnings reports, while Microsoft's Azure

Starting point is 00:16:33 cloud business, outperformed analyst estimates, Google had their slowest quarter of growth since something like 2019. In other words, having access to things like OpenAI's models is making a meaningful difference in how Google is competing on that core cloud business. Now, what's more, the information notes that the delays around Google having a credible competitor to chat GPT, and more specifically the GPT4 API, means that not only are they losing out on that consumer brand opportunity, but they're also losing the affinity of developers. The more that people start building on top of GPT4 and GPT3.5 and OpenAI's APIs in general, the harder it's going to get for Google to win developers over to build this next generation set of applications on their tools instead.

Starting point is 00:17:13 There is a real platform lock-in that can happen, which can be extremely challenging to try to dislodge. There's also the fact that success when it comes to use of artificial intelligence tools is somewhat self-reinforcing, in that the more that OpenAI's chat GPT gets used, the better data open AI has to improve chat GPT in the future. Now, spokespeople for the company have sort of tried to delay and deflect. Spokesperson Catherine Watson says we're not commenting on rumors or speculation, and Google CEO Sundar Pichai said at a public event that they were, quote, focused on getting Gemini one point out as soon as possible, make sure it's competitive state of the art and will build from there on. As recently as an investor call on October 24th,

Starting point is 00:17:50 Pichai said that they are, quote, laying the foundation of what I think of as a next generation of models will be launching all throughout 2024. So what are the possibilities here? Well, one of them is, of course, something similar to what caused Amazon's shifted plans last year after ChatGPT came out. Remember, Amazon was planning on releasing a model called Bedrock at their annual event last November, but at the last minute decided to delay. Lucky they did because in the middle of that event, ChatchipetT was announced, and it just blew what was then Amazon Bedrock out of the water. Of course, that caused a switch in strategy, and Bedrock instead became the Enterprise Sandbox,

Starting point is 00:18:24 through which Amazon helps its enterprise and business customers customize their own models and select from a variety of different models, designed for their own specific purposes, but perhaps Google Gemini is going through something similar. In other words, as painful as it is to delay, it is still infinitely better to delay rather than put something out that isn't actually as good, or frankly, isn't meaningfully better than Open AI's models.

Starting point is 00:18:46 Now, the other thing is that there are a lot of different pieces of software that Gemini has to actually power. The information writes, Google has high hopes for Gemini that go beyond boosting enterprise software sales. The company wants it to power new tools for creators on YouTube, such as the ability to generate custom backgrounds for videos, and improve the capabilities of BARD as well as Google Assistant, Google Siri-like voice assistant for phones and other devices.

Starting point is 00:19:07 Apparently, according to people with knowledge of the project, Google has developed several different versions of Gemini, including some smaller versions of the model that have been tested by outside developers, but that primary main biggest version of Gemini remains incomplete. Kind of reiterating what I just said, the information writes, A key challenge for the Gemini team is making sure the primary model is as good as or better than OpenAI's most advanced LLL.

Starting point is 00:19:28 GPD4. It's not clear whether Google has met that standard, said a person who's been involved in the effort. Now, other speculations outside of just raw performance relative to GPT4 is that some specific use cases for Gemini involve different types of capacities that just might not be ready yet. For example, they talk about advertising, which is, of course, the vast, vast majority of Google's revenue. Here's how Gemini describes the challenge. Gemini also has a longer memory of a user's interactions compared to earlier Google LLM, such as Palm 2, which currently powers barred and generative AI results in Google search. The longer memory could allow advertisers

Starting point is 00:20:01 to compare the performance of campaigns over time. For instance, an advertiser could use the model to create new variants of the best performing ad copy over the last month. Now, of course, that's not a trivial thing to do. Increasing memory and contextual understanding is a big challenge. Still, it seems like part of the biggest reason for the delays are more

Starting point is 00:20:17 on the organizational side than anything else. Basically, two teams that were separate. Google Brain and Google DeepMind had to come together to work on this model. That merging is a real challenge. even if everyone is totally aligned and totally in sync, which is very hard to achieve on its own right, bringing two totally separate teams together to actually work productively, especially when they're both bringing their fits and starts and things that they've

Starting point is 00:20:38 experimented with and things that have worked and haven't and biases and ideas and basically their past track record. It's no mean feat. Now also apparently Google co-founder, Sergey Brin has been involved in the conversations, not as a formal decision maker, but as offering criticism and feedback on Gemini. And as much as some people are reporting that, like this shows how serious Google is about this effort and how big a deal it is, which is true. I also don't know that it's the best strategy to have a founder with a controlling stake, but no actual decision-making authority other than implied decision-making authority involved in running a team that needs to be highly efficient and directed. There's nothing to indicate that this is happening, but I can see a

Starting point is 00:21:15 million pitfalls around leadership and who's actually in charge and who has authority, and it just seems like a big challenge to me, and not necessarily a sign of strength, let's put it that way. To many, what this looks like is a classic situation of Google being trapped in the innovator's dilemma. A blog post recently came out what I learned getting acquired by Google, and excerpts from that seemed to give a picture of just how hard things are to get done inside that company. Shrean's Benzali, who wrote the piece, writes, Google is an ever-shifting web of goals and efforts. Amazing things are possible at Google if the right people care about them, a VP that gets it,

Starting point is 00:21:49 a research team with a related charter or compatibility with an org's goals. Navigating this mess of interest is half of a PM's job. And then you need the blessing of approvers like privacy, trust and safety, and infracapacity. It takes dozens of conversations to know if an idea is viable and hundreds more to make it a reality. And that's the happy path. Amazing things are possible at Google if the right people keep caring about them. Team goals can change in any given quarter and entire teams can disappear in the face of a reorg. A phenomenon so common that Googlers can look past the tragedy and see the comedy in it. Suppose you dodge all those bullets, you might still wake up to find that while you were working on your project,

Starting point is 00:22:23 two distant teams were also working on the same idea, and the time is come to fight it out because now only one can proceed. The internal users of the losing projects will find that the API they depended on is now deprecated, but the replacement, well, it's not quite ready yet. Googleers wanted to ship great work, but often couldn't. While there were undoubtedly people who came in for the food, worked three hours a day and enjoyed their early retirements, all the people I met were earnest hardworking and wanted to do great work. Would beat them down were the gauntlet of reviews, the frequent reorgs, the institutional scar tissue from past failures, and the complexity of doing even simple things on the world stage. Startups can afford to

Starting point is 00:22:53 ignore many concerns, Googlers rarely can. Jim Fan again extends this thinking into the world of AI and writes, this sheds light on why OpenAI spearheaded the LLM revolution, even though most of the foundation tech originated at Google. Google is right in the thick of the innovator's dilemma. I'm sure they know very well the power of foundation models. After all, they invented transformer, AlphaGo, and Flamingo, the key ingredients of GPs. Yet it is so difficult to justify diverting resources away from the ongoing profitable products or potentially even cannibalized search to promote LLMs. Siting the article, Google is an ever-shifting web of goals and efforts. You need all the relevant higher-ups to agree on the agenda, actively fight for enormous resources, and push back against

Starting point is 00:23:32 all the other parties that want the minimal disruption. Too many stars need to align. That is a very tall order. That being said, I don't think it's fair to blame anyone. The institutional bureaucracy is a natural emergent property of companies at this scale. 10 years from now, open AI and Anthropic may suffer from the same if they grow to that order of magnitude. I think Chibb is right, But I also think it doesn't matter. Ultimately, this battle is now, not 10 years from now. And right now, Google is losing. However, counting them out too early seems a little premature to me.

Starting point is 00:24:01 And so for now, we're just going to have to wait and see what actually comes out with Gemini. That's going to do it for today's AI breakdown. I appreciate you listening or watching as always. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Gemini Delayed: Is Google In the Innovator's Dilemma?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.