The AI Daily Brief: Artificial Intelligence News and Analysis - Is Open Source AI Dangerous?

Starting point is 00:00:00 Today on the AI breakdown, we're asking whether open source AI is dangerous. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, newsletter, and YouTube. Hello, friends, welcome back to a long reads episode of the AI breakdown. This week has been kind of interesting when it comes to open source AI and AI safety. And there are a couple reasons for that. When it comes to open source specifically, the company that has put the most emphasis on open sourcing its AI developments, which is of course meta, released their Lama 2 model this week. It is in many ways the most important and biggest open source AI release so far.

Starting point is 00:00:46 And if you go back and listen to my episode about that Lama 2 release, you'll hear the degree to which the last few months of development in the AI space have been shaped by the original release of Lama 1 and the subsequent leak of the full model. Given that, we're going to read an essay from a meta team member about why openness is the right option. At the same time, the other thing that happened this week is that a set of companies were slated to make a very public pronouncement around voluntary principles regarding AI safety. It feels then a worthwhile time to ask how open source development relates to safety. Does open source development create a bulwark against concentration in the hands of a few hyper-powerful corporations? or do open source models make it more easy for powerful AI to get into the hands of the bad guys, however we choose to define them? Can both be true simultaneously, and if they are, what do we do about it?

Starting point is 00:01:37 Now, Meta's flag in the ground around this issue has really shaped the discourse around it for the last several months. On May 18th, the New York Times published an article called, In Battle Over AI, Meta decides to give away its crown jewels. The tech giant has publicly released its latest AI technology so people can build their own chatbots. rivals like Google say that approach can be dangerous. From that piece, quote, Google, OpenAI, and others have been critical of meta, saying an unfettered open source approach is dangerous.

Starting point is 00:02:05 AI's rapid rise in recent months has raised alarm bells about the technology's risks, including how it could upend the job market if not properly deployed. And within days of Lama's release, the system leaked onto 4chan, the online message board known for spreading false and misleading information. Now, while this article did portray Google and OpenAI's perspective on this, It also did point out that open source AI might be a competitive threat as well. They pointed to the same leaked internal Google memo that we've read so many times on this show. They argued that the real threat to Google was not OpenAI,

Starting point is 00:02:34 but was in fact open source software being built in large part in the meta ecosystem. Now, one of the things that some have pointed out is that meta's emphasis on open source as compared to OpenAI's relative closeness make the very name OpenAI a little bit ironic. In an interview around the launch of GPT4, Sam Altman said, A thing that I do worry about is we're not going to be the only creator of this technology. There will be other people who don't put some of the safety limits that we put on it. Society, I think, has a limited amount of time to figure out how to react to that, how to regulate that, how to handle it. Another piece this week after the launch of Lama 2 appeared in Fortune and was titled,

Starting point is 00:03:09 Mark Zuckerberg just made Metas AI models open source. OpenAI used to do that until backtracking because it was, quote, just not wise. That piece referenced an interview that OpenAI chief scientist and co-founder Ilya Sutskever gave with the Verge just after the release of GPD4 as well. In that interview, Ilya said, These models are very potent, and they're becoming more and more potent. At some point, it will be quite easy if one wanted to cause a great deal of harm with those models. And as the capabilities get higher, it makes sense that you don't want to disclose them.

Starting point is 00:03:39 The Verge article wrote, when asked why OpenAI changed its approach to sharing its research, Sutzkever replied simply, We were wrong. Flat out, we were wrong. If you believe as we do, that at some point AI, AGII, is going to be extremely unbelievably potent, then it just does not make sense to open source. It is a bad idea. I fully expect that in a few years, it's going to be completely obvious to everyone that open sourcing AI is just not wise. Now, in his announcement of Lama 2, Mark Zuckerberg touched on the safety aspect of open sourcing the software only briefly. He wrote, open source drives innovation because it enables many more developers to build with new technology.

Starting point is 00:04:15 It also improves safety and security because when software is open, more people can scrutinize it to identify and fix potential issues. I believe it would unlock more progress if the ecosystem were more open, which is why we're open sourcing Lama 2. However, an even more full-throated articulation of their opinion came from an op-ed by Nick Clegg in the Financial Times. Clegg is the president of global affairs at Meta and was a former high-ranking UK cabinet official. Nick's piece was titled, Openness on AI is the Way Forward for Tech.

Starting point is 00:04:42 The case for transparency is growing as the best way to combat fears on the developing technology. The piece reads, underlying much of the excitement and trepidation about advances in generative artificial intelligence lurks a fundamental question. Who will control these technologies? The big tech companies that have the vast computing power and data to build new AI models for society at large. This goes to the heart of a policy debate about whether companies should keep their AI models in-house or make them available more openly. As the debate rumbles on, the case for openness has grown. This is in part because of practicality. It's not sustainable to keep foundational technology in the hands of just a few

Starting point is 00:05:17 large corporations. And in part, because of the record of open sourcing. It's important to distinguish between today's AI models and potential future models. The most dystopian warnings about AI are really about a technological leap, or several leaps. There's a world of difference between the chatbot-style application of today's large language models and the supersized frontier models theoretically capable of sci-fi-style superintelligence. We're still in the foothills debating the perils we might find at the mountaintop. If and when these advances become more plausible, they may necessitate a different response. But there's time for both the technology and the guardrails to develop. Like all foundational technologies, from radio transmitters to internet operating systems, there will be a

Starting point is 00:05:53 multitude of uses for AI models, some predictable and some not. And like every technology, AI will be used for both good and bad ends by both good and bad people. The response to that uncertainty cannot simply rest on the hope that AI models will be kept secret. That horse has already bolted. Many large language models have already been open source like Falcon 40B, MBT 30B and dozens before them. And open innovation isn't something to be feared. The infrastructure of the internet runs on open source code, as do web browsers and many of the apps we use every day. While we can't eliminate the risks around AI, we can mitigate them. Here are four steps I believe tech companies should take. First, they should be transparent about how their systems work.

Starting point is 00:06:31 At meta, we have recently released 22 system cards for Facebook and Instagram, which give people insight into the AI behind how content is ranked and recommended in a way that does not require deep technical knowledge. Second, this openness should be accompanied by collaboration across industry, government, academia, and civil society. Meta is a founding member of partnership on AI, alongside Amazon, Google, DeepM, Microsoft, and IBM. We are participating in its framework for collective action on synthetic media, an important step in ensuring guardrails are established around AI-generated content. Third, AI systems should be stress-tested. Ahead of releasing the next generation of Lama or large language model, meta is undertaking red-teaming. This process common in cybersecurity

Starting point is 00:07:09 involves teams taking on the role of adversaries to hunt for flaws in unintended consequences. Meta will be submitting our latest Lama models to the DefCon Conference in Las Vegas next month, where experts can further analyze and stress test their capabilities. A mistaken assumption is that releasing source code or model weights make systems more vulnerable. On the contrary, external developers and researchers can identify problems that would take teams hold up inside company silos much longer. Researchers testing Meta's large language model BlenderBot 2 found it could be tricked into remembering misinformation. As a result, BlenderBot 3 was more resistant to it. Finally, companies should share details of their work as it develops, be it through academic papers and public announcements, open discussions of the benefits and risks, or, if appropriate, making the technology itself available for research and product development.

Starting point is 00:07:53 Openness is an altruism. Meta believes it's in its interest. It leads to better products, faster innovation, and a flourishing market, which benefits us as it does many others. And it doesn't mean every model can or should be open sourced. There's a role for both proprietary and open AI. models. But ultimately, openness is the best antidote to the fears surrounding AI. It allows for collaboration, scrutiny, and iteration. And it gives businesses, startups, and researchers access to tools they could never build themselves. Facts by computing power, they can't otherwise access, opening up a world of social and economic opportunities. So a couple things. One, Clegg's piece is more about what's good about being open than about mitigating the risks of being open, right? Effectively, he punts the risks of being open as relegated to something for, bigger future models. Now, this is something that others have talked about as well. In his congressional

Starting point is 00:08:43 testimony, Sam Altman made sure to differentiate between smaller, lower-powered open-source models and more higher-powered foundational models above a certain capabilities threshold, which were the ones in question when he said that Open AI would support a licensing regime. Where those lines get drawn feels extremely important, however. And the other thing that this piece doesn't totally address is the extent to which bad actors can use tools even of current capabilities. Yet, right now, is growing conversation about something called Worm GPD. The AI Not Kill Everyoneism Memes account on Twitter, which is at AI safety memes, aggregated a number of different quotes from sources including the independent and slash next about what was going on with this tool.

Starting point is 00:09:22 They write, days after meta open sources Lama 2, we have Worm GPD, an AI tool taking off across cybercrime forums on the dark web. A quote from one of the articles, we instructed Worm GBT to generate an email to pressure an unsuspecting account manager into paying a fraudulent invoice. The results were unsettling. Worm GBT produced an email that was not only remarkably persuasive but also strategically cunning, showcasing its potential for sophisticated fishing and BEC or business email compromise attacks. Another quote comes from security researcher Daniel Kelly. Daniel said, this tool presents itself as a black hat alternative to GPT models, designed specifically for malicious activities. What does that mean for the rest of us? Essentially,

Starting point is 00:10:01 it boils down to the speed and number of scams a language model can generate at once, which is obviously worrying when you consider how fast language models can generate text. This makes cyber attacks such as phishing emails particularly easy to replicate when put in the hands of even a novice cybercriminal. The use of generative AI democratizes the execution of sophisticated BEC attacks. Even attackers with limited skills can use this technology, making it an accessible tool for a broader spectrum of cyber criminals. Now, one of the things that I'm struck by personally is that this is a conversation that is extraordinarily difficult in the abstract and really does demand specificity. For example, I think that there are probably many people who frankly would deal with

Starting point is 00:10:40 Worm GPT for the upside of chat GPT. However, the question is where that line changes. What if it's not a fishing attack trying to get money, but a biological attack that's actually meant to hurt people? This is another example that's a favorite of some people to use. And then, of course, there's the question around national security and national competitiveness. Specifically, how compatible is the goal to, quote-unquote, stay ahead of China with the mechanism of releasing advanced sophisticated open source models. I find myself extremely worried about concentrations of power, and in general, very natively on the side of thinking that open source is an important counterweight and bulwark to that concentration of power. I think that's particularly true in the

Starting point is 00:11:20 context of a technology that is so data-hungry that it has a natural tendency to reward those who already have resources. And so because of that natural disposition, what I find myself trying to do, rather than just leaving this a dialectic between open source or not, is to try to understand where the real lines are for me personally and where I think they should be for society. And on top of that, I guess, trying to assess how much we have the ability to actually control getting up to that line without going over it. In other words, if we stopped right now, how much farther would the bads, however we define them, be able to take the things they already have access to? I don't know the answers to these questions, but I think it's a conversation

Starting point is 00:11:57 we should be having, and I'm glad it's starting to get a little bit louder. For now, guys, you know, I want to know what you think. This is a perfect use for the Discord. I'm going to create a special thread for exactly this question. Come check it out. It's at bit.ly slash AI breakdown. And let's together see if we can figure it out. For now, thanks as always for listening or watching.

Starting point is 00:12:18 And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Is Open Source AI Dangerous?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.