The AI Daily Brief: Artificial Intelligence News and Analysis - LEAKED: First Look at Gemini Multimodal?
Episode Date: October 24, 2023A Medium user has posted a set of screenshots that purport to show Google's forthcoming Gemini model in their recently opened Makersuite AI builder tool. NLW explores the leaks along with the broader ...competition for AI developers. Before that on the Brief, a new tool that allows artists to "poison" their digital images to mess with AI training models. Today's Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Breakdown, we're looking at Google Gemini leaks as well as an unannounced
AI developer tool. Before that on the brief, a new tool that artists can use to poison their
images before they get to AI training models. The AI breakdown is a daily podcast and video about
the most important news and discussions in AI. Go to Breakdown.network for more information about
our YouTube channel, our Discord, and our newsletter. Welcome back to the AI breakdown brief.
all the AI headline news you need in around five minutes. And boy, do we have an interesting one
to start with today. In fact, one of the big themes across this entire brief is the realignment
of the internet and internet contributors in the context of massive AI companies crawling and in many
people's minds at least stealing their data. Now, for some, the remediation here is policy, right?
They are waiting for people in Washington, politicians and policymakers to tell the AI companies
what they can and can't do. To be fair to those AI companies, they're also increasingly allowing
people to opt out. Apparently something like 535 big publishers are now blocking open AI from
scraping their data to train their future models. But the MIT Technology Review wrote about an
even more dramatic step that is based in technology itself that many folks who have been up in
arms about AI are quite excited about. The tool in question is called Nightshade, and it's being
referred to as a data poisoning tool. Now, if nothing else, Nightshade is an excellent
an evocative name. But basically what it does is it allows people who are uploading images,
particularly artists who are uploading images of their art, to change pixels in such a way that it
confuses the data that an AI model would get. These changes are invisible to the naked eye,
but they can, as the MIT Technology Review puts it, cause the resulting model to break in chaotic
and unpredictable ways. Effectively, these hidden pixels tell the AI models that are being
trained on them that things in the images are not what they actually are. Again, as this report
describes it, dogs become cats, cars become cows, and so forth. Now, this project is being led
by researchers from the University of Chicago led by Ben Zhao. The team involved say they view it as a
power balancing tool. And effectively, they're trying to create an incentive for AI companies
to come back to the table and actually find some way to compensate people for the data their
models are trained on. Now, this is not the first project from this team at the University of Chicago
that plays in a similar space. They also developed a tool called Glaze, which works in a similar way.
It allows artists who mask their personal style so that the AI model thinks it looks different than it actually does.
For those of you listening, I suggest you hop on over to the YouTube, which you can find at YouTube.com slash at the AI breakdown.
There's an image they show of how these tools work in practice.
For Nightshade, they show a variety of items like dog, car, handbag, hat, fantasy art, cubism, cartoon, concept art,
that decay in serious ways once these poison samples are introduced.
By the time 300 poison samples are introduced into the model,
a dog becomes a cat, a car becomes a cow, a handbag becomes a toaster, a hat becomes a cake,
the fantasy art style becomes pointillism, cubism becomes anime, cartoon becomes impressionism,
and concept art becomes abstract. They also give an example when glaze is used. Basically,
the pixels tell the AI model that the art being imitated is actually of a very different style
than it is. So for example, it might encode the idea that it's an oil painting by Vincent Van Gogh,
when actually it's just a sepia photo realistic photo. Now, when it comes to the question of whether this
is the right way to fight against these models or to try to rejust balance? I'm not really sure.
Obviously, having an AI-focused show, you can guess that to some extent I'm balanced on the way
that AI opens up new creative pathways. At the same time, I do think that this is an existential
struggle for a lot of artists, and that kind of makes any and all tactics totally reasonable.
The big AI labs certainly have a bigger balance of resources, but regardless, it is going to be
an interesting fight to watch. I do think it's reflective of an assumption that perhaps policy
isn't really going to be the right path. And indeed, there's still every chance that courts rule
that training AI models is a version of fair use and does not trigger copyright rules,
in which case artists will definitely need something like this if they're trying to prevent
their works from being trained upon. Certainly there is much enthusiasm on Twitter as this
article went hyper-viral. Now, speaking of companies who are taking a strong stand against AI training,
Reddit is apparently in deep discussions with the big AI labs about being competent.
for training on Reddit's vast trove of conversations and information. Now, of course, earlier this
year, they changed AP access pricing, which was a hugely polarizing move. Many third-party developers
who didn't have anything to do with AI were caught up in the changes, which of course led many
in the community to be extremely frustrated. At the same time, Reddit basically said that they had to do
this to prevent big generative AI companies from using their data to train LLMs. Citing an anonymous
source, the Washington Post said that if these conversations with these top generative AI companies don't go
well, that Reddit is potentially ready to take some dramatic actions. From the Washington Post,
if a deal can't be reached, Reddit is considering blocking search crawlers from Google and Bing,
which would prevent the forum site from being discovered in searches and reduce the number of
visitors to the site. But the company believes the trade-off would be worth it, saying,
Reddit can survive without search. Now, part of why Reddit may be emboldened, because earlier this
year, thousands of subreddits went dark, which apparently led to discontent among Google search users,
A leaked audio recording of an internal company meeting saw a Google SVP saying that Google users were
unhappy about not having access to Reddit through the search site.
Now, on the flip side, similar web suggests that around 49% of Reddit's traffic comes from
search engines, which means that if they did take this move, it would be extremely dramatic
and have significant impact on their usage and ultimately probably their bottom line as well.
If anything, this just dramatizes how high companies see the stakes around this question of AI training.
Now, moving over to the big tech side of the AI world, Microsoft has made a major announcement
that they're making their biggest investment in Australia in 40 years. The company is going to invest
around $5 billion over the next couple years to boost AI in the country. The biggest part of that
will be a 45% increase in Microsoft-owned data centers in the country growing from 20 to 29. But then
on top of that, they will also be establishing a Microsoft Data Center Academy and also collaborating
on a cybersecurity initiative. The announcement was made as part of the Australian
Prime Minister's visit to the U.S. this week. Now, moving over to Apple again, obviously if you've
been following along here, Apple's AI strategy has been much discussed in the news lately, not that they've
made any announcements, but there have been a lot of reports and analyses from people who are
watching, for example, their supply chains that suggest an approach to AI coming into view.
John Gruber, who writes the Daring Fireball blog, which is one of the best known Apple trackers in the
world, wrote about that report from Bloomberg that we talked about yesterday on Apple's AI
strategy. I think Mark German's summary does get to an essential truth. If I asked you which
companies are at the forefront of AI powered products, I doubt you'd put Apple on the list. And
AI is proving so useful and yet it is a nascent field that Apple needs to soon be on the list,
lest their products begin to fall behind competitively. Which companies are best at integrating
AI into products is going to be like which companies are best at creating hardware at scale,
and which companies are best at human interface design. Now, he also commented on German's report
that a person of knowledge inside the company said, there's a lot of anxiety about
this and it's considered a pretty big miss internally. Gruber writes, what I have heard from
Little Birdies and Cupertino is not that there was a miss on this already. Apple is almost never at the
forefront of stuff like this. They're a deliberate company. Their goal, as with any new technology,
is to integrate it into products in meaningful ways best, not first. That's why there is no internal
anxiety that they've already missed anything related to AI. The anxiety inside Apple is that many people
inside do not believe Apple's own AIML team can deliver. And but that the company, if only for
privacy reasons is only going to use what comes from their own AIML team. So basically saying this
a little bit differently, the report that Gruber is getting is that there is concern and anxiety
around AI strategy, but not because Apple is already behind, but because they don't have faith in
who's working on the problem internally to actually deliver something great. Interesting little
wrinkle and twist. Meanwhile, as a sign of how serious this is, Apple analyst Ming Chai Quo has written
a widely circulated report that they expect Apple to spend up to 4.75 billion on AI servers in
24. Now, in terms of where these numbers are coming from, it's a little bit hard to ascertain.
I kind of feel like it's that meme of what's the source I made it up, but it's a respected
analyst and lots of people talking about it. And I think that even if the exact numbers are wrong,
the fact that these reports keep making news suggests just how much people are paying attention
to what the Cooper Tino Giant is going to do next.
Anyways, friends, that is going to do it for today's AI breakdown brief.
Next up, the main AI breakdown.
Are you interested in how two top-of-mind trends AI and crypto can work together?
If so, I have the perfect podcast recommendation for you.
Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz.
Web 3 with A16Z Crypto is your definitive resource for the future of the internet.
Whether you're already building in these spaces or simply curious about what's next.
If you need a place to start, they recently released an excellent episode with Stanford
Cryptography Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi
about the intersection of AI and crypto.
From fighting deepfakes and proving humanity to large language models like Chatchipit, they cover it all.
I highly recommend checking it out, especially if you'd like to learn more about how
AI and crypto will impact our everyday lives.
Beyond Crypto and AI, this show is for creators seeking more ways to truly own their work,
for business leaders trying to prepare for the future today and for innovators exploring trending
tech topics. So go ahead, listen to Web3 with A16Z Crypto wherever you get your podcasts.
Welcome back to the AI breakdown. Today we are wandering into the realm of speculation, but I think it's
worthwhile. An internet denizen and medium user going by Bedros Pambukian, about a week and a half ago
posted an article called Gemini is coming to MakerSuite and so are.
Stubbs. Now, I came across this today in the excellent Ben Spites newsletter, and it seems like
this may be what's getting this article attention now. I will caveat this piece that it is at least
a little bit, what's the source I made it up? But one, I do think it seems at least a little bit
credible, and two, I'm going to situate it in the larger conversation about the battle for
developers that I'm quite confident is important, even if the details contained in this article
aren't fully accurate. So first off, let's talk about what is actually in this piece. The post is called
Gemini is coming to MakerSuite and so are Stubs. So there are two big pieces of this and we'll take them in turn.
Gemini is, of course, Google's more advanced forthcoming model. Many are expecting it to be the
first model to meet or exceed GPT4, and frankly, if it doesn't, Google is in some serious trouble.
Now, to understand the context for this specific post, we have to look at another Google tool called
MakerSuite. Google introduced it on September 26th on a blog post on the Google for developer's site.
The company writes, we're always on the lookout for tools and technologies that bring innovative
solutions to our developer community. MakerSuite is a fast, easy way to start building
generative AI apps. It provides an efficient UI for prompting some of Google's latest models
and easily translates prompts into production-ready code you can integrate into your applications.
Today, we've removed the wait list so anyone in 179 countries and territories can use MakerSuite.
Okay, so basically this is exactly what it sounds like.
It is a platform for building generative AI apps using Google's AI tools.
It's only been widely available for about a month now.
Now, one thing that is notable is that when it was announced it was clearly just text-based.
This was not multimodal.
The three times of prompts that they talk about in this blog are text prompts,
which they say provide a flexible and free-form experience that allows you to express yourself creatively
through prompts.
Data prompts, which they say are the go-to choice when you have examples to help you specify
precisely what you want from the model.
These being good for applications that require consistent input and output,
such as data generation and translation, and finally, chat prompts for building conversational experiences.
So one part of the thrust of Beidros' piece is that multimodal Gemini is coming to this MakerSuite
platform. Some of the league screenshots seem to show how the new Gemini Powered MakerSuite model
can handle things like text recognition, object recognition, captioning, understanding image inputs, and more.
They show a screenshot of run settings that allows the user to select between a text or multimodal model,
as well as a checkbox to be able to include images as an output type.
One of the contentions that the author makes is that Gemini is not just in addition to Bard,
but is its own entirely separate model.
Another screenshot appears to show an integration with Google Drive,
from which users can grab images,
another screenshot shows their user testing their prompt with images,
and what's more, remember we talked about how there were text prompts,
data prompts, and chat prompts in the first version of MakerSuite.
Another screenshot appears to show that data prompts also support multimodality.
A reminder menu in the screenshot says,
images to your prompt. Try tasks like captioning and image understanding. And finally, the author
shows a snippet of the code, which clearly has the identification Gemini in it. Now, while the author
assumed that the Gemini leak here would overshadow the other part, let's talk a little bit about
this idea of Stubbs. Stubs are basically a feature by which users can create generative AI apps
that live directly in a site. The author describes it as akin to AI generated Figma prototypes.
Note, apparently these stubs do not generate full code. It really is just the Figma-style prototype.
However, one interesting feature is that you can see a public view of other people's stubs
that have been made public and can even save and remix them. This is through a community gallery
feature. Now, the excitement here is, of course, that this could unlock a lot of creativity.
It could radically increase the speed with which people are able to prototype new ideas,
and just generally contributes to the momentum of how AI is unlocking, not just developer
creativity, but the ability of a wider array of people to build new ideas and applications.
Now, the reason that I think that this matters is that it harkens to one of the most important
battles in the AI arms race, which is the battle for the affinity and affiliation of
AI developers. This has been a key focus all year. Probably my most referenced article of the
entire history of this podcast is that internal Google memo that was leaked in May called
We Have No Mote and Neither Does Open AI. The thrust of that was, of course,
that the rise of open source AI developers, particularly in the Meta-Lama ecosystem,
had totally changed the nature of competition. It wasn't just a Google versus OpenAI battle anymore,
but a Google versus OpenAI versus a legion of indie hackers who are actually making some
surprising and serious progress. The author of that memo said,
The uncomfortable truth is we aren't positioned to win this arms race and neither is OpenAI.
While we've been squabbling a third faction has been quietly eating our lunch.
I'm talking, of course, about open source. Plainly put, they are lapping us.
Things we consider major open problems are solved and in people's hands today.
Now, of course, meta's momentum in that space started in some ways when their full Lama model
was leaked, but was really extended when they announced Lama 2, which came officially with a
commercial option.
Since then, you've seen the big AI lab spend even more time on develop recording.
In addition to the Google tools that we just talked about, the ones that have been
publicly announced in terms of MakerSuite, as well as the ones that seem like they're
coming.
With Stubbs in this Gemini integration, OpenAI is also trying to clearly win
and or retain the affiliation of developers with their OpenAI Dev Day, which is coming up on
November 6th. Now, when they announced Dev Day, Sam Altman went to pains to make sure that people
didn't think that we were going to get GPT4.5 or GPT5, but the people would still be really excited.
Subsequent to the announcement, speculation has fallen in a few key areas. One has to do with
lower cost options. One of the biggest barriers for people building in the OpenAI ecosystem is
the incredibly high cost of API access, especially relative to some of the open source options.
that are out there. A second area of speculation is around a fundamentally new tool set,
specifically for AI agents, which of course has had a ton of developer energy and attention
throughout the year. We are just a couple weeks away from that event, so I expect to see a lot
more leaks of information about what might be coming in not too long. Now, zooming farther out,
there's also been a lot of chatter recently about Apple's plans around generative AI, and while we
don't have anything conclusively, it does sound like part of the effort is around developer
recruitment as well. From a Bloomberg article, Apple's software engineering teams are also looking
at integrating generative AI into development tools like Xcode, a move that could help app developers
write new applications more quickly. That would bring it in line with services like Microsoft's GitHub
copilot, which offers auto-complete suggestions to developers while they write code. Now, of course,
it's not just the big guys that are going after developers, but also other more independent
players who are trying to reimagine the developer experience for the AI age. Replit recently
announced the expansion of its open source AI developer tools to all of its users. Over the course
at the last year, they've been rolling out generative AI tools, including the Ghostwriter
AI code completion tool, but up until two weeks ago, that had been limited to a testing
group of users. As of October 9th, however, Replit integrated Ghostwriter into their core platform
and made that tool available to all of its users, the effort they called AI for all. At the same
announcement, they also shared a new version of its custom-built LLM focused on coding. Now, interestingly,
that Replit announcement happened at the AI Engineer Summit in San Francisco a couple of weeks ago.
This was an event brought to you by the folks over at the Latenspace podcast and was just a huge success with a ton of energy and excitement around it.
So much so that they actually announced another event, the AI Engineer World's Fair, next year.
The point of all this is that one of the big vectors of competition in any new technology space, but especially in artificial intelligence, is going to be what ecosystems developers build in and what tools and models they build on top of.
This is hugely important to the long-term trajectories that the companies that are competing in this,
space, and so it's not surprising to see so much effort being put into cultivating relationships
with developers across all of these companies.
These Google MakerSuite leaks certainly suggest that it's a priority for Google to keep innovating
in this area, and so of course it is something that we will keep an eye on.
Like I said, take it with a grain of salt that all of this is leaked information, which
A could be wrong and B could be subject to change, but still an interesting little insight into
where the space is.
That, however, is going to do it for today's AI breakdown.
I appreciate you listening or watching as always.
Until next time, peace.
