The AI Daily Brief: Artificial Intelligence News and Analysis - OpenAI Building AI Agents as Google Launches Gemini Advanced

Starting point is 00:00:00 Today on the AI breakdown, Google has officially changed BARD to Gemini and released the most advanced version of their Gemini model. Before that on the brief, OpenAI is working towards some seriously advanced agents. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI Breakdown Brief. All the AI headline news you need in around five minutes. As you might have guessed, our main story today is going to be about Google changing the Bard brand to Gemini and giving access to their most advanced ultra model, which just kicked off today.

Starting point is 00:00:41 However, the information has some really interesting reporting from behind the scenes in OpenAI in a piece that they titled, OpenAI shifts AI battleground to software that operates devices and automates tasks. Now, if you were an AI breakdown listener last year, you heard me talk a lot about AI agents. One of the huge themes among developers was trying to move to an era where instead of just answering people's questions, AI agents actually had the capacity to solve their problems. In other words, you could give them a problem and the agent could figure out what tasks were needed to solve that problem or accomplish that goal, including potentially leveraging other agents to do so. Now, this is a reality that is not here yet. There are lots and lots of companies trying and

Starting point is 00:01:24 experimenting and building towards that, some of which are getting increasingly high capacity in some specific functions, but there is not yet a generalist AI agent winner or even really leader. OpenAI appears to be determined to change that. Writes the information, OpenAI is developing a form of agent software to automate complex tasks by effectively taking over a customer's device. The customer could then ask the chat GPT agent to transfer data from a document to a spreadsheet for analysis, for instance, or to automatically fill out expense reports and enter them into accounting software. These kinds of requests would trigger the agent to perform the clicks, cursor movements, text typing, and other actions humans take as they

Starting point is 00:02:00 work with different apps. Now, apparently they are working on actually two different types of AI agents, the one that we just described, which would take over a person's specific device, and another which would be specifically for web-based tasks. Now, this makes sense, given that Sam Altman has discussed his vision for chat GPT as ultimately a, quote, super smart personal assistant for work. However, as the information also points out, it could bring Open AI increasingly in competition with Microsoft, who are, of course, positioning co-pilot as exactly this sort of thing. Although how far Microsoft is in any sort of plans or developments around AI agent sort of behavior isn't clear at all.

Starting point is 00:02:35 There are also real questions about whether users will be comfortable with this. Right now, the only types of software that take over people's computers are malware and viruses, and so getting over that impression could be really difficult. Now, these are not new efforts, apparently. It appears that they've actually been in development for more than a year. However, there are some indications that employees within OpenAI think that these tools are going to be a really, really big deal. One of the people who was the information sources, for example, pointed out a tweet from

Starting point is 00:03:02 Ben Newhouse, who is an OpenAI employee who this source said had worked on computer using agents, and Ben on Twitter posted, building what I think could be an industry defining zero-to-one product that leverages the latest and greatest from our upcoming models. Adding even more hype to that cryptic announcement, Pete Wellander, AI's vice president of product added, this product that Ben was describing will, quote, change everything. Now, of course, there are other indicators. These are certainly the lines that people are thinking about. In many ways, if you go back and look at how OpenAI framed custom GPTs, it was as a very first step towards an agent-like future. They, of course, also launched at their

Starting point is 00:03:38 Dev Day event, the Assistance API, which is explicitly about helping developers build light agent-type experiences in their applications. So right now, this is just a behind-the-scenes report. there's no indication that anything is coming soon, but it's consistent with other things what we've heard, and it certainly seems to suggest where the AI arms race could be headed next. It is certainly something that if nothing else, I will be watching closely and letting you know if I hear anything more, or frankly, probably even less definitive. Now, another company that we've been guessing at their AI strategy, and who are finally starting to tease it themselves, as of Tim Cook talking to investors recently, is, of course,

Starting point is 00:04:15 Apple. One of the indicators that Apple is getting deeper and deeper into its own AI strategy is the fact that they have been increasing their open source releases in the space. The latest is something called MLLM Guided Image Editing or MGIE. It's a model that lets users use plain language to edit a photo without any photo editing software. So think of the sort of in painting that got people so excited in Mid Journey and Adobe Image applications last year. Want to be wearing a different color shirt in a photo? Just say I want my shirt to be a different color.

Starting point is 00:04:44 Now, according to the Verge, the model blends two different uses of multimodal language prompts. First, it learns how to interpret user prompts, then it imagines what the edit would look like. Now, part of what makes this different is there is reasoning involved. So, for example, if you had a picture of a pepperoni pizza and you typed in the prompt make it more healthy, MGI would add vegetable toppings. This is very different than having to use a prompt add vegetable toppings. Now, if you are interested in trying this out, you can download it from GitHub or you can do a web demo over on Hugging Face.

Starting point is 00:05:15 Lastly today, another interesting survey about U.S. worker attitudes towards artificial intelligence. This one comes from Rutger University's Heldrick Center for Workforce Development, which has multi-decade experience of surveying Americans around the impact of new technology in the workplace. Now, one thing that's perhaps not surprising is that there is meaningful concern among people about their job being eliminated by AI. Three in ten have that worry. However, a far more dominant worry, at least right now, has to do with the quote, hidden hand of AI being involved in human resource decision making, i.e. hiring and firing. When it comes to those issues for AI, seven in 10 U.S. workers say they're very or somewhat concerned. As the

Starting point is 00:05:57 center's director, Carl Van Horn puts it, a concern about the hidden handout there, that I'm not going to get a chance to really discuss my virtues with the hiring officer or with my boss. Instead, there'll be some algorithm that tells me whether I stay or go. So some pretty interesting nuance we're getting into when it comes to people's perceptions and fears around AI. Always interesting to see these new stats, although, of course, take them as one tiny piece of evidence in a much larger world. For now, though, that is going to do it for today's AI breakdown brief. I'll be back soon with the main AI breakdown. Welcome back to the AI breakdown.

Starting point is 00:06:31 Earlier this week, an Android developer found a change log message that suggested that we would be getting the most advanced model of Gemini this week, and that along with it, Google would be making a big marketing change, moving the Bard brand to Gemini. Now, this is something that we've seen as sort of a trend among these big companies. They start with one brand and ultimately start to settle on a cross-cutting brand that refers to everything that they're touching with AI. I told the whole story of Bing shifting to co-pilot yesterday, for example, which is, of course, encapsulated by their Super Bowl ad, which was just released. Well, like I said at the top, the rumors are true.

Starting point is 00:07:06 Bard is now Gemini and Ultra. Their most advanced model is actually available. Jack Krosick from the Google Bard team says, today Bard becomes. Gemini, available on web and mobile new app in the Play Store, starting to roll out today, and introducing Gemini Advanced, access to our most capable model Ultra 1.0. Jack continues, Bard was built to be the direct way to access Google's AI models. Last week, Gemini Pro went worldwide and completed the transition into the Gemini era. Gemini is more than state-of-the-art models. It's an ecosystem you will see through our products and APIs. Hence, Bard is now Gemini.

Starting point is 00:07:41 Gemini Advanced provides access to our most capable model Ultra 1.0. We worked with 100-plus AI expert trusted testers across multiple disciplines. They've told us they prefer Gemini Advance for its longer context conversations, 32K, and the ability to role play complex scenarios. It also doesn't interrupt your flow with low rate limits. Now, from there, he goes on to a lot of other details, including discussing the new app experience on Android and iOS in the Google app, as well as another new feature called Double Check,

Starting point is 00:08:08 which allows users to double-check the information that's coming down. back from Gemini, and finally discusses the pricing structure for this most advanced ultra 1.0 model. Users can try it for two months for free, and then it is $20 a month after. Well, 1999 technically. Now, there are some things that are not there yet that make it not as feature complete as chat GPT. These include multimodal upgrades, interactive coding, deeper data analysis, file uploads, multilingual, and more. Now, there is a lot that makes this interesting. As the verge points out, quote, Google is famous for having a million similar products with confusingly different names and seemingly nothing in common. That pattern, however, has been broken

Starting point is 00:08:46 with the release of Gemini. The Verge also points out the stakes. They write, it's not a surprise that Google is so all in on Gemini, but it does raise the stakes for the company's ability to compete with OpenAI, Anthropic perplexity, and the growing set of other powerful AI competitors on the market. In our test just after the Gemini launched last year, the Gemini powered Bard was very good, nearly on par with GPT4, but it was significantly slower. Now Google needs to prove it can keep up with the industry as it looks to both build a compelling consumer product and try to convince developers to build on Gemini and not with OpenAI. Only a few times in Google's history has it seemed like the entire company was betting on a single thing. Once that turned into Google Plus, and we know how

Starting point is 00:09:24 that went. But this time, it appears Google is fully committed to being an AI company, and that means Gemini might be just as big as Google. Now, the only thing that I disagree with from this verge analysis is the idea that going all in on Gemini raises the stakes for the company's ability to compete. Those stakes were raised by the very existence of leaders in the AI space that weren't Google. For a company that has been at the very forefront of innovation in this space, it was shocking last year to see how far it was behind throughout the entire year. It was, in fact, to many people stunning. Indeed, it put them in a position where they really had to announce Gemini in December even though they couldn't make Ultra available at that time. The exciting thing is, of course,

Starting point is 00:10:05 is that because we didn't have access to the ultra version of the model back in December, we simply had to take their assertion that it matched or exceeded GPT4 in numerous areas at face value or choose not to believe it. Now we get to test it for ourselves, but there are some people who have had a little bit longer of a chance to already dig into it. Popular creator Marquez Brownlee says, OK, so I've been testing out Google's Gemini on a few phones for a few weeks now. Some things I've noticed that stood out.

Starting point is 00:10:32 Upsides, tons of useful new generative features, can write letters, craft trip plan, create images, et cetera, all the good stuff. Notably better semantic understanding of random fact-based questions. Downside, it's missing some classic Google Assistant features like home control and adding to shopping lists. The new pop-up UI is a little more complex, but you can get used to it. Renaming it is going to confuse a lot of people for no reason. Bindu Ready from Abacus writes, My initial thoughts still continues to be somewhat nerfed and refuses to answer questions. Refuse to generate a simple illustration of George Clooney, chat GPT is better. Missing PDF upload. Answers do seem better than the previous version. Seems to have a

Starting point is 00:11:05 reasoning vibe. However, it does not answer some hard questions that GPT does. For example, it didn't get, in a room I have only three sisters. Anna is reading a book, Alice is playing a match of chess, what's the third sister Amanda doing? The answer is the third sister is playing chess. GPT4 nails it. Overall, we plan to do a lot more analysis, but first impressions are good, but not great. TLDR, I don't think it will make a material difference to how Bard was doing before, especially if their plan is to charge for this. However, it's always good to have more players in the market. Now, someone who has a much more positive take is Zvi Mauchewis who writes, As someone who had early access, I can say that Gemini Ultra is damn impressive. When it is good,

Starting point is 00:11:44 it is excellent, and this includes the most common queries, especially learning and looking up facts. I've switched to it as my default LLM. Now, Zvi is a prolific blogger and super compelling thinker, so I am inclined to take his take on this with a little bit more confidence than some of the others. And then, of course, there's Professor Ethan Mollick from Wharton. Ethan has had access to Gemini for the last six weeks and wrote an extensive post on his one useful thing blog, giving his perspective. The post is called Google's Gemini Advanced, tasting notes and implications. Subtitle, and then there were two.

Starting point is 00:12:16 Now, one thing that Ethan makes clear is that he is not trying to test Gemini on the basis of benchmarks. Instead, he wanted to give a subjective mix of opinions based on his usage. He writes, let me start with the headline. Gemini Advanced is clearly a GPT4 class model. The statistics show this. but so does a month of our informal testing. And this is a big deal, because OpenAI's GPT4,

Starting point is 00:12:38 the paid version of ChatGBTGPT and Microsoft Copilot, has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we had only one advanced AI model to look at, and it is hard-drawing conclusions with a dataset of one. Now there are two, and we can learn a few things. Now, I just want to stop here and put a fine point on this. It really is remarkable that for an entire year

Starting point is 00:13:01 in this insanely fast-moving space, there was nothing that could really come close to GPT4. I've actually argued in the past that having another GPT4 class model on the scene and actually available represents a major transitional moment for the industry. From the period that was entirely defined by ChatGPT from November 2020 when it launched to basically today, to this new era that's coming, whatever it happens to look like. Now that said, Ethan continues that Gemini Advanced does not obviously blow. away GBT4. He writes, it is really good, but I would concur with the test that suggests it is roughly equivalent. When it comes to the various strengths and weaknesses of these platforms,

Starting point is 00:13:40 he writes, GPT4 is much more sophisticated about using code and accomplishes a number of hard verbal tasks better. Gemini is better at explanations and does a great job integrating images and search. Both are weird and inconsistent and hallucinate more than you would like. And that gets to a really interesting section of the piece called It's Full of Ghosts. Ethan writes, no one has a great definition of sentience, which is okay because LLMs are in no way sentient. They are software systems designed to create human-like language. But there is a weirdness to GPT4 that isn't sentience, but also isn't like talking to a program. A weirdness that only comes out after you spend enough hours playing with the AI and getting

Starting point is 00:14:14 unnerved or delighted or both by its unexpected abilities in seeming intelligence. There was a famous controversial paper put out by Microsoft Research soon after the release of GPT4 called sparks of artificial general intelligence that tried to put this argument into scientific terms, but ended up just calling it sparks of artificial general intelligence. It is the illusion of a person on the other end of the line, even though there is nobody there. GPT4 is full of ghosts. Gemini is also full of ghosts. Seriously, if you use the system for a while, I can almost guarantee at least one moment

Starting point is 00:14:44 when you stand up from your desk, walk around the room, and wonder what is going on. Now, his takeaway is that the sparks that we saw in GPT4 are not based on GPT4, but are byproducts of GPT4 class models. From a tone and personality perspective, he suggests that GPT4 is more bland, and, and where Gemini is more friendly, agreeable, and has a, quote, tendency towards wordplay. But ultimately, he says these models are really, really similar. Now, the other thing that he argues about Gemini is that it, quote, illuminates a vision of AI as a powerful integrated personal assistant.

Starting point is 00:15:15 He basically argues that the barred integration with the Google ecosystem of Gmail, Google Docs, travel tools, etc. was interesting but too dumb to actually use. Whereas now, he says, with a smarter brain in the form of Gemini advanced, you can start to do some really interesting things that, at their best, seem magical. Go through my emails, tell me which are important, and draft replies for each. Look up my next conference and plan a trip I would like. He says it isn't there yet, but it is very much closer to being an actual assistant rather than the limited series at Alexis we have seen in the past.

Starting point is 00:15:42 That is in part why I suspect that Gemini advanced is the start, not the end of a wave of AI development. We can start to see a world where AI agents act on our behalf. A GPT4 class model is not quite strong enough to power these agents, but we are getting close. So really interesting stuff, and ultimately what we come back to, is that we are now living in a 2GPT4 level world, which I think is going to have some fairly significant implications that we will discover as it happens.

Starting point is 00:16:09 For now, though, an exciting day in the AI world. I appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - OpenAI Building AI Agents as Google Launches Gemini Advanced

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.