The AI Daily Brief: Artificial Intelligence News and Analysis - LEAKED: First Look at Gemini Multimodal?

Starting point is 00:00:00 Today on the AI Breakdown, we're looking at Google Gemini leaks as well as an unannounced AI developer tool. Before that on the brief, a new tool that artists can use to poison their images before they get to AI training models. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube channel, our Discord, and our newsletter. Welcome back to the AI breakdown brief. all the AI headline news you need in around five minutes. And boy, do we have an interesting one to start with today. In fact, one of the big themes across this entire brief is the realignment of the internet and internet contributors in the context of massive AI companies crawling and in many

Starting point is 00:00:46 people's minds at least stealing their data. Now, for some, the remediation here is policy, right? They are waiting for people in Washington, politicians and policymakers to tell the AI companies what they can and can't do. To be fair to those AI companies, they're also increasingly allowing people to opt out. Apparently something like 535 big publishers are now blocking open AI from scraping their data to train their future models. But the MIT Technology Review wrote about an even more dramatic step that is based in technology itself that many folks who have been up in arms about AI are quite excited about. The tool in question is called Nightshade, and it's being referred to as a data poisoning tool. Now, if nothing else, Nightshade is an excellent

Starting point is 00:01:26 an evocative name. But basically what it does is it allows people who are uploading images, particularly artists who are uploading images of their art, to change pixels in such a way that it confuses the data that an AI model would get. These changes are invisible to the naked eye, but they can, as the MIT Technology Review puts it, cause the resulting model to break in chaotic and unpredictable ways. Effectively, these hidden pixels tell the AI models that are being trained on them that things in the images are not what they actually are. Again, as this report describes it, dogs become cats, cars become cows, and so forth. Now, this project is being led by researchers from the University of Chicago led by Ben Zhao. The team involved say they view it as a

Starting point is 00:02:07 power balancing tool. And effectively, they're trying to create an incentive for AI companies to come back to the table and actually find some way to compensate people for the data their models are trained on. Now, this is not the first project from this team at the University of Chicago that plays in a similar space. They also developed a tool called Glaze, which works in a similar way. It allows artists who mask their personal style so that the AI model thinks it looks different than it actually does. For those of you listening, I suggest you hop on over to the YouTube, which you can find at YouTube.com slash at the AI breakdown. There's an image they show of how these tools work in practice. For Nightshade, they show a variety of items like dog, car, handbag, hat, fantasy art, cubism, cartoon, concept art,

Starting point is 00:02:49 that decay in serious ways once these poison samples are introduced. By the time 300 poison samples are introduced into the model, a dog becomes a cat, a car becomes a cow, a handbag becomes a toaster, a hat becomes a cake, the fantasy art style becomes pointillism, cubism becomes anime, cartoon becomes impressionism, and concept art becomes abstract. They also give an example when glaze is used. Basically, the pixels tell the AI model that the art being imitated is actually of a very different style than it is. So for example, it might encode the idea that it's an oil painting by Vincent Van Gogh, when actually it's just a sepia photo realistic photo. Now, when it comes to the question of whether this

Starting point is 00:03:26 is the right way to fight against these models or to try to rejust balance? I'm not really sure. Obviously, having an AI-focused show, you can guess that to some extent I'm balanced on the way that AI opens up new creative pathways. At the same time, I do think that this is an existential struggle for a lot of artists, and that kind of makes any and all tactics totally reasonable. The big AI labs certainly have a bigger balance of resources, but regardless, it is going to be an interesting fight to watch. I do think it's reflective of an assumption that perhaps policy isn't really going to be the right path. And indeed, there's still every chance that courts rule that training AI models is a version of fair use and does not trigger copyright rules,

Starting point is 00:04:04 in which case artists will definitely need something like this if they're trying to prevent their works from being trained upon. Certainly there is much enthusiasm on Twitter as this article went hyper-viral. Now, speaking of companies who are taking a strong stand against AI training, Reddit is apparently in deep discussions with the big AI labs about being competent. for training on Reddit's vast trove of conversations and information. Now, of course, earlier this year, they changed AP access pricing, which was a hugely polarizing move. Many third-party developers who didn't have anything to do with AI were caught up in the changes, which of course led many in the community to be extremely frustrated. At the same time, Reddit basically said that they had to do

Starting point is 00:04:44 this to prevent big generative AI companies from using their data to train LLMs. Citing an anonymous source, the Washington Post said that if these conversations with these top generative AI companies don't go well, that Reddit is potentially ready to take some dramatic actions. From the Washington Post, if a deal can't be reached, Reddit is considering blocking search crawlers from Google and Bing, which would prevent the forum site from being discovered in searches and reduce the number of visitors to the site. But the company believes the trade-off would be worth it, saying, Reddit can survive without search. Now, part of why Reddit may be emboldened, because earlier this year, thousands of subreddits went dark, which apparently led to discontent among Google search users,

Starting point is 00:05:21 A leaked audio recording of an internal company meeting saw a Google SVP saying that Google users were unhappy about not having access to Reddit through the search site. Now, on the flip side, similar web suggests that around 49% of Reddit's traffic comes from search engines, which means that if they did take this move, it would be extremely dramatic and have significant impact on their usage and ultimately probably their bottom line as well. If anything, this just dramatizes how high companies see the stakes around this question of AI training. Now, moving over to the big tech side of the AI world, Microsoft has made a major announcement that they're making their biggest investment in Australia in 40 years. The company is going to invest

Starting point is 00:06:00 around $5 billion over the next couple years to boost AI in the country. The biggest part of that will be a 45% increase in Microsoft-owned data centers in the country growing from 20 to 29. But then on top of that, they will also be establishing a Microsoft Data Center Academy and also collaborating on a cybersecurity initiative. The announcement was made as part of the Australian Prime Minister's visit to the U.S. this week. Now, moving over to Apple again, obviously if you've been following along here, Apple's AI strategy has been much discussed in the news lately, not that they've made any announcements, but there have been a lot of reports and analyses from people who are watching, for example, their supply chains that suggest an approach to AI coming into view.

Starting point is 00:06:38 John Gruber, who writes the Daring Fireball blog, which is one of the best known Apple trackers in the world, wrote about that report from Bloomberg that we talked about yesterday on Apple's AI strategy. I think Mark German's summary does get to an essential truth. If I asked you which companies are at the forefront of AI powered products, I doubt you'd put Apple on the list. And AI is proving so useful and yet it is a nascent field that Apple needs to soon be on the list, lest their products begin to fall behind competitively. Which companies are best at integrating AI into products is going to be like which companies are best at creating hardware at scale, and which companies are best at human interface design. Now, he also commented on German's report

Starting point is 00:07:14 that a person of knowledge inside the company said, there's a lot of anxiety about this and it's considered a pretty big miss internally. Gruber writes, what I have heard from Little Birdies and Cupertino is not that there was a miss on this already. Apple is almost never at the forefront of stuff like this. They're a deliberate company. Their goal, as with any new technology, is to integrate it into products in meaningful ways best, not first. That's why there is no internal anxiety that they've already missed anything related to AI. The anxiety inside Apple is that many people inside do not believe Apple's own AIML team can deliver. And but that the company, if only for privacy reasons is only going to use what comes from their own AIML team. So basically saying this

Starting point is 00:07:51 a little bit differently, the report that Gruber is getting is that there is concern and anxiety around AI strategy, but not because Apple is already behind, but because they don't have faith in who's working on the problem internally to actually deliver something great. Interesting little wrinkle and twist. Meanwhile, as a sign of how serious this is, Apple analyst Ming Chai Quo has written a widely circulated report that they expect Apple to spend up to 4.75 billion on AI servers in 24. Now, in terms of where these numbers are coming from, it's a little bit hard to ascertain. I kind of feel like it's that meme of what's the source I made it up, but it's a respected analyst and lots of people talking about it. And I think that even if the exact numbers are wrong,

Starting point is 00:08:30 the fact that these reports keep making news suggests just how much people are paying attention to what the Cooper Tino Giant is going to do next. Anyways, friends, that is going to do it for today's AI breakdown brief. Next up, the main AI breakdown. Are you interested in how two top-of-mind trends AI and crypto can work together? If so, I have the perfect podcast recommendation for you. Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz. Web 3 with A16Z Crypto is your definitive resource for the future of the internet.

Starting point is 00:09:04 Whether you're already building in these spaces or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi about the intersection of AI and crypto. From fighting deepfakes and proving humanity to large language models like Chatchipit, they cover it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond Crypto and AI, this show is for creators seeking more ways to truly own their work,

Starting point is 00:09:36 for business leaders trying to prepare for the future today and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z Crypto wherever you get your podcasts. Welcome back to the AI breakdown. Today we are wandering into the realm of speculation, but I think it's worthwhile. An internet denizen and medium user going by Bedros Pambukian, about a week and a half ago posted an article called Gemini is coming to MakerSuite and so are. Stubbs. Now, I came across this today in the excellent Ben Spites newsletter, and it seems like this may be what's getting this article attention now. I will caveat this piece that it is at least a little bit, what's the source I made it up? But one, I do think it seems at least a little bit

Starting point is 00:10:22 credible, and two, I'm going to situate it in the larger conversation about the battle for developers that I'm quite confident is important, even if the details contained in this article aren't fully accurate. So first off, let's talk about what is actually in this piece. The post is called Gemini is coming to MakerSuite and so are Stubs. So there are two big pieces of this and we'll take them in turn. Gemini is, of course, Google's more advanced forthcoming model. Many are expecting it to be the first model to meet or exceed GPT4, and frankly, if it doesn't, Google is in some serious trouble. Now, to understand the context for this specific post, we have to look at another Google tool called MakerSuite. Google introduced it on September 26th on a blog post on the Google for developer's site.

Starting point is 00:11:06 The company writes, we're always on the lookout for tools and technologies that bring innovative solutions to our developer community. MakerSuite is a fast, easy way to start building generative AI apps. It provides an efficient UI for prompting some of Google's latest models and easily translates prompts into production-ready code you can integrate into your applications. Today, we've removed the wait list so anyone in 179 countries and territories can use MakerSuite. Okay, so basically this is exactly what it sounds like. It is a platform for building generative AI apps using Google's AI tools. It's only been widely available for about a month now.

Starting point is 00:11:37 Now, one thing that is notable is that when it was announced it was clearly just text-based. This was not multimodal. The three times of prompts that they talk about in this blog are text prompts, which they say provide a flexible and free-form experience that allows you to express yourself creatively through prompts. Data prompts, which they say are the go-to choice when you have examples to help you specify precisely what you want from the model. These being good for applications that require consistent input and output,

Starting point is 00:12:00 such as data generation and translation, and finally, chat prompts for building conversational experiences. So one part of the thrust of Beidros' piece is that multimodal Gemini is coming to this MakerSuite platform. Some of the league screenshots seem to show how the new Gemini Powered MakerSuite model can handle things like text recognition, object recognition, captioning, understanding image inputs, and more. They show a screenshot of run settings that allows the user to select between a text or multimodal model, as well as a checkbox to be able to include images as an output type. One of the contentions that the author makes is that Gemini is not just in addition to Bard, but is its own entirely separate model.

Starting point is 00:12:38 Another screenshot appears to show an integration with Google Drive, from which users can grab images, another screenshot shows their user testing their prompt with images, and what's more, remember we talked about how there were text prompts, data prompts, and chat prompts in the first version of MakerSuite. Another screenshot appears to show that data prompts also support multimodality. A reminder menu in the screenshot says, images to your prompt. Try tasks like captioning and image understanding. And finally, the author

Starting point is 00:13:02 shows a snippet of the code, which clearly has the identification Gemini in it. Now, while the author assumed that the Gemini leak here would overshadow the other part, let's talk a little bit about this idea of Stubbs. Stubs are basically a feature by which users can create generative AI apps that live directly in a site. The author describes it as akin to AI generated Figma prototypes. Note, apparently these stubs do not generate full code. It really is just the Figma-style prototype. However, one interesting feature is that you can see a public view of other people's stubs that have been made public and can even save and remix them. This is through a community gallery feature. Now, the excitement here is, of course, that this could unlock a lot of creativity.

Starting point is 00:13:42 It could radically increase the speed with which people are able to prototype new ideas, and just generally contributes to the momentum of how AI is unlocking, not just developer creativity, but the ability of a wider array of people to build new ideas and applications. Now, the reason that I think that this matters is that it harkens to one of the most important battles in the AI arms race, which is the battle for the affinity and affiliation of AI developers. This has been a key focus all year. Probably my most referenced article of the entire history of this podcast is that internal Google memo that was leaked in May called We Have No Mote and Neither Does Open AI. The thrust of that was, of course,

Starting point is 00:14:20 that the rise of open source AI developers, particularly in the Meta-Lama ecosystem, had totally changed the nature of competition. It wasn't just a Google versus OpenAI battle anymore, but a Google versus OpenAI versus a legion of indie hackers who are actually making some surprising and serious progress. The author of that memo said, The uncomfortable truth is we aren't positioned to win this arms race and neither is OpenAI. While we've been squabbling a third faction has been quietly eating our lunch. I'm talking, of course, about open source. Plainly put, they are lapping us. Things we consider major open problems are solved and in people's hands today.

Starting point is 00:14:54 Now, of course, meta's momentum in that space started in some ways when their full Lama model was leaked, but was really extended when they announced Lama 2, which came officially with a commercial option. Since then, you've seen the big AI lab spend even more time on develop recording. In addition to the Google tools that we just talked about, the ones that have been publicly announced in terms of MakerSuite, as well as the ones that seem like they're coming. With Stubbs in this Gemini integration, OpenAI is also trying to clearly win

Starting point is 00:15:20 and or retain the affiliation of developers with their OpenAI Dev Day, which is coming up on November 6th. Now, when they announced Dev Day, Sam Altman went to pains to make sure that people didn't think that we were going to get GPT4.5 or GPT5, but the people would still be really excited. Subsequent to the announcement, speculation has fallen in a few key areas. One has to do with lower cost options. One of the biggest barriers for people building in the OpenAI ecosystem is the incredibly high cost of API access, especially relative to some of the open source options. that are out there. A second area of speculation is around a fundamentally new tool set, specifically for AI agents, which of course has had a ton of developer energy and attention

Starting point is 00:15:58 throughout the year. We are just a couple weeks away from that event, so I expect to see a lot more leaks of information about what might be coming in not too long. Now, zooming farther out, there's also been a lot of chatter recently about Apple's plans around generative AI, and while we don't have anything conclusively, it does sound like part of the effort is around developer recruitment as well. From a Bloomberg article, Apple's software engineering teams are also looking at integrating generative AI into development tools like Xcode, a move that could help app developers write new applications more quickly. That would bring it in line with services like Microsoft's GitHub copilot, which offers auto-complete suggestions to developers while they write code. Now, of course,

Starting point is 00:16:32 it's not just the big guys that are going after developers, but also other more independent players who are trying to reimagine the developer experience for the AI age. Replit recently announced the expansion of its open source AI developer tools to all of its users. Over the course at the last year, they've been rolling out generative AI tools, including the Ghostwriter AI code completion tool, but up until two weeks ago, that had been limited to a testing group of users. As of October 9th, however, Replit integrated Ghostwriter into their core platform and made that tool available to all of its users, the effort they called AI for all. At the same announcement, they also shared a new version of its custom-built LLM focused on coding. Now, interestingly,

Starting point is 00:17:09 that Replit announcement happened at the AI Engineer Summit in San Francisco a couple of weeks ago. This was an event brought to you by the folks over at the Latenspace podcast and was just a huge success with a ton of energy and excitement around it. So much so that they actually announced another event, the AI Engineer World's Fair, next year. The point of all this is that one of the big vectors of competition in any new technology space, but especially in artificial intelligence, is going to be what ecosystems developers build in and what tools and models they build on top of. This is hugely important to the long-term trajectories that the companies that are competing in this, space, and so it's not surprising to see so much effort being put into cultivating relationships with developers across all of these companies. These Google MakerSuite leaks certainly suggest that it's a priority for Google to keep innovating

Starting point is 00:17:55 in this area, and so of course it is something that we will keep an eye on. Like I said, take it with a grain of salt that all of this is leaked information, which A could be wrong and B could be subject to change, but still an interesting little insight into where the space is. That, however, is going to do it for today's AI breakdown. I appreciate you listening or watching as always. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - LEAKED: First Look at Gemini Multimodal?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.