The AI Daily Brief: Artificial Intelligence News and Analysis - Is Open Source AI Dangerous?
Episode Date: July 23, 2023Last week, Meta announced its Llama 2 model, one of the most powerful open source LLMs yet. Today on The AI Breakdown, NLW explores arguments that releasing powerful open source AI is dangerous, along... with counterpoints including a reading of a recent Op-Ed from Meta head of global affairs Nick Clegg. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're asking whether open source AI is dangerous.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our Discord, newsletter, and YouTube.
Hello, friends, welcome back to a long reads episode of the AI breakdown.
This week has been kind of interesting when it comes to open source AI and AI safety.
And there are a couple reasons for that.
When it comes to open source specifically, the company that has put the most emphasis on open sourcing its AI developments, which is of course meta, released their Lama 2 model this week.
It is in many ways the most important and biggest open source AI release so far.
And if you go back and listen to my episode about that Lama 2 release, you'll hear the degree to which the last few months of development in the AI space have been shaped by the original release of Lama 1 and the subsequent leak of the full model.
Given that, we're going to read an essay from a meta team member about why openness is the right option.
At the same time, the other thing that happened this week is that a set of companies were slated to make a very public pronouncement around voluntary principles regarding AI safety.
It feels then a worthwhile time to ask how open source development relates to safety.
Does open source development create a bulwark against concentration in the hands of a few hyper-powerful corporations?
or do open source models make it more easy for powerful AI to get into the hands of the bad guys,
however we choose to define them?
Can both be true simultaneously, and if they are, what do we do about it?
Now, Meta's flag in the ground around this issue has really shaped the discourse around it for the last several months.
On May 18th, the New York Times published an article called,
In Battle Over AI, Meta decides to give away its crown jewels.
The tech giant has publicly released its latest AI technology so people can build their own chatbots.
rivals like Google say that approach can be dangerous.
From that piece, quote,
Google, OpenAI, and others have been critical of meta,
saying an unfettered open source approach is dangerous.
AI's rapid rise in recent months has raised alarm bells about the technology's risks,
including how it could upend the job market if not properly deployed.
And within days of Lama's release, the system leaked onto 4chan,
the online message board known for spreading false and misleading information.
Now, while this article did portray Google and OpenAI's perspective on this,
It also did point out that open source AI might be a competitive threat as well.
They pointed to the same leaked internal Google memo that we've read so many times on this show.
They argued that the real threat to Google was not OpenAI,
but was in fact open source software being built in large part in the meta ecosystem.
Now, one of the things that some have pointed out is that meta's emphasis on open source
as compared to OpenAI's relative closeness make the very name OpenAI a little bit ironic.
In an interview around the launch of GPT4, Sam Altman said,
A thing that I do worry about is we're not going to be the only creator of this technology.
There will be other people who don't put some of the safety limits that we put on it.
Society, I think, has a limited amount of time to figure out how to react to that, how to regulate that, how to handle it.
Another piece this week after the launch of Lama 2 appeared in Fortune and was titled,
Mark Zuckerberg just made Metas AI models open source.
OpenAI used to do that until backtracking because it was, quote, just not wise.
That piece referenced an interview that OpenAI chief scientist and co-founder Ilya
Sutskever gave with the Verge just after the release of GPD4 as well.
In that interview, Ilya said,
These models are very potent, and they're becoming more and more potent.
At some point, it will be quite easy if one wanted to cause a great deal of harm with those models.
And as the capabilities get higher, it makes sense that you don't want to disclose them.
The Verge article wrote, when asked why OpenAI changed its approach to sharing its research,
Sutzkever replied simply,
We were wrong. Flat out, we were wrong.
If you believe as we do, that at some point AI, AGII, is going to be extremely unbelievably potent,
then it just does not make sense to open source. It is a bad idea. I fully expect that in a few years,
it's going to be completely obvious to everyone that open sourcing AI is just not wise.
Now, in his announcement of Lama 2, Mark Zuckerberg touched on the safety aspect of open sourcing the software only briefly.
He wrote, open source drives innovation because it enables many more developers to build with new technology.
It also improves safety and security because when software is open, more people can scrutinize it to identify and fix potential issues.
I believe it would unlock more progress if the ecosystem were more open, which is why we're
open sourcing Lama 2.
However, an even more full-throated articulation of their opinion came from an op-ed by Nick Clegg
in the Financial Times.
Clegg is the president of global affairs at Meta and was a former high-ranking UK cabinet
official.
Nick's piece was titled, Openness on AI is the Way Forward for Tech.
The case for transparency is growing as the best way to combat fears on the developing technology.
The piece reads,
underlying much of the excitement and trepidation about advances in generative artificial intelligence
lurks a fundamental question. Who will control these technologies? The big tech companies that have the
vast computing power and data to build new AI models for society at large. This goes to the heart
of a policy debate about whether companies should keep their AI models in-house or make them available
more openly. As the debate rumbles on, the case for openness has grown. This is in part because of
practicality. It's not sustainable to keep foundational technology in the hands of just a few
large corporations. And in part, because of the record of open sourcing. It's important to distinguish
between today's AI models and potential future models. The most dystopian warnings about AI are really
about a technological leap, or several leaps. There's a world of difference between the chatbot-style
application of today's large language models and the supersized frontier models theoretically capable
of sci-fi-style superintelligence. We're still in the foothills debating the perils we might find
at the mountaintop. If and when these advances become more plausible, they may necessitate a different
response. But there's time for both the technology and the guardrails to develop. Like all
foundational technologies, from radio transmitters to internet operating systems, there will be a
multitude of uses for AI models, some predictable and some not. And like every technology,
AI will be used for both good and bad ends by both good and bad people. The response to that
uncertainty cannot simply rest on the hope that AI models will be kept secret. That horse has
already bolted. Many large language models have already been open source like Falcon 40B,
MBT 30B and dozens before them. And open innovation isn't something to be feared. The infrastructure
of the internet runs on open source code, as do web browsers and many of the apps we use every day.
While we can't eliminate the risks around AI, we can mitigate them. Here are four steps I believe
tech companies should take. First, they should be transparent about how their systems work.
At meta, we have recently released 22 system cards for Facebook and Instagram, which give people
insight into the AI behind how content is ranked and recommended in a way that does not require deep
technical knowledge. Second, this openness should be accompanied by collaboration across industry,
government, academia, and civil society. Meta is a founding member of partnership on AI,
alongside Amazon, Google, DeepM, Microsoft, and IBM. We are participating in its framework for
collective action on synthetic media, an important step in ensuring guardrails are established
around AI-generated content. Third, AI systems should be stress-tested. Ahead of releasing the next
generation of Lama or large language model, meta is undertaking red-teaming. This process common in cybersecurity
involves teams taking on the role of adversaries to hunt for flaws in unintended consequences.
Meta will be submitting our latest Lama models to the DefCon Conference in Las Vegas next month,
where experts can further analyze and stress test their capabilities.
A mistaken assumption is that releasing source code or model weights make systems more vulnerable.
On the contrary, external developers and researchers can identify problems that would take teams
hold up inside company silos much longer.
Researchers testing Meta's large language model BlenderBot 2 found it could be tricked into remembering misinformation.
As a result, BlenderBot 3 was more resistant to it. Finally, companies should share details of their work as it develops, be it through academic papers and public announcements, open discussions of the benefits and risks, or, if appropriate, making the technology itself available for research and product development.
Openness is an altruism. Meta believes it's in its interest. It leads to better products, faster innovation, and a flourishing market, which benefits us as it does many others. And it doesn't mean every model can or should be open sourced. There's a role for both proprietary and open AI.
models. But ultimately, openness is the best antidote to the fears surrounding AI. It allows for
collaboration, scrutiny, and iteration. And it gives businesses, startups, and researchers access
to tools they could never build themselves. Facts by computing power, they can't otherwise access,
opening up a world of social and economic opportunities. So a couple things. One, Clegg's piece is
more about what's good about being open than about mitigating the risks of being open, right?
Effectively, he punts the risks of being open as relegated to something for,
bigger future models. Now, this is something that others have talked about as well. In his congressional
testimony, Sam Altman made sure to differentiate between smaller, lower-powered open-source models
and more higher-powered foundational models above a certain capabilities threshold, which were the ones
in question when he said that Open AI would support a licensing regime. Where those lines get drawn
feels extremely important, however. And the other thing that this piece doesn't totally address
is the extent to which bad actors can use tools even of current capabilities. Yet, right now,
is growing conversation about something called Worm GPD. The AI Not Kill Everyoneism
Memes account on Twitter, which is at AI safety memes, aggregated a number of different quotes
from sources including the independent and slash next about what was going on with this tool.
They write, days after meta open sources Lama 2, we have Worm GPD, an AI tool taking off across
cybercrime forums on the dark web. A quote from one of the articles,
we instructed Worm GBT to generate an email to pressure an unsuspecting account manager into paying a
fraudulent invoice. The results were unsettling. Worm GBT produced an email that was not only
remarkably persuasive but also strategically cunning, showcasing its potential for sophisticated
fishing and BEC or business email compromise attacks. Another quote comes from security researcher
Daniel Kelly. Daniel said, this tool presents itself as a black hat alternative to GPT models,
designed specifically for malicious activities. What does that mean for the rest of us? Essentially,
it boils down to the speed and number of scams a language model can generate at once, which is
obviously worrying when you consider how fast language models can generate text. This makes cyber
attacks such as phishing emails particularly easy to replicate when put in the hands of even a novice
cybercriminal. The use of generative AI democratizes the execution of sophisticated BEC attacks.
Even attackers with limited skills can use this technology, making it an accessible tool for a broader
spectrum of cyber criminals. Now, one of the things that I'm struck by personally is that this
is a conversation that is extraordinarily difficult in the abstract and really does demand
specificity. For example, I think that there are probably many people who frankly would deal with
Worm GPT for the upside of chat GPT. However, the question is where that line changes. What if it's not
a fishing attack trying to get money, but a biological attack that's actually meant to hurt people?
This is another example that's a favorite of some people to use. And then, of course,
there's the question around national security and national competitiveness. Specifically,
how compatible is the goal to, quote-unquote, stay ahead of China with the mechanism of releasing
advanced sophisticated open source models. I find myself extremely worried about concentrations of power,
and in general, very natively on the side of thinking that open source is an important
counterweight and bulwark to that concentration of power. I think that's particularly true in the
context of a technology that is so data-hungry that it has a natural tendency to reward those
who already have resources. And so because of that natural disposition, what I find myself
trying to do, rather than just leaving this a dialectic between open source or not,
is to try to understand where the real lines are for me personally and where I think they should
be for society. And on top of that, I guess, trying to assess how much we have the ability to
actually control getting up to that line without going over it. In other words, if we stopped right
now, how much farther would the bads, however we define them, be able to take the things they
already have access to? I don't know the answers to these questions, but I think it's a conversation
we should be having, and I'm glad it's starting to get a little bit louder.
For now, guys, you know, I want to know what you think.
This is a perfect use for the Discord.
I'm going to create a special thread for exactly this question.
Come check it out.
It's at bit.ly slash AI breakdown.
And let's together see if we can figure it out.
For now, thanks as always for listening or watching.
And until next time, peace.
