CyberWire Daily - A dark side to LLMs. [Research Saturday]

Episode Date: April 8, 2023

Sahar Abdelnabi from CISPA Helmholtz Center for Information Security sits down with Dave to discuss their work on "A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated ...Large Language Models." There is currently a large advance in the capabilities of Large Language Models or LLMs, as well as being integrated into many systems, including integrated development environments (IDEs) and search engines. The research states, "The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable." This could lead them to be susceptible to targeted adversarial prompting, as well as making them adaptable to even unseen tasks. Researchers demonstrated these said attacks to see if the LLMs needed new techniques for more defense. The research can be found here: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly
Starting point is 00:01:45 evolving cyberspace. Thanks for joining us. We all have seen the news after ChatGPT and we all have seen how people are starting to get really interested and hyped by the new technology. And that motivated our colleagues to actually inspire them to think that there might be an issue with this integration. That's Sahar Abdonabi from CISPA Hemholtz Center for Information Security. The research we're discussing today is titled A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models.
Starting point is 00:02:45 are really not noticing when we put these large language models in other applications and rely on their output and also rely on the input that they might digest in real time from other untrusted or unverified sources. Well, let's go through the research together here. Can you take us through exactly how you got your start? And let's go through the findings. Yeah, sure. your start and let's go through the findings. Yeah, sure.
Starting point is 00:03:05 So currently the main way people have been interacting with ChatGPT before the plugins and before being chat and so on is that you go to ChatGPT, you enter a question or anything that you would like to ask
Starting point is 00:03:22 for and then ChatGPT answers. That was the main way of communication. There is a clear input, there is a clear output. And with that, there were still some risks because there were some people that could circumvent the filtering and maybe generate some harmful output or malicious output. And there were also some risks that people rely on the information from chat GPT as trusted or factual, while in some cases it's not. However, it was a clear scenario. There is a clear input and clear output. Now, when we integrate LLMs or large-division models
Starting point is 00:04:01 with other applications, the line between the instructions that are directly given by the users and the other instructions that might be maliciously injected somewhere else can get really blurry. So I might ask LinkChat, for example, a question. And to answer my question, PinkChat can go and search online for some information or some sources or websites or whatever. However, someone out there
Starting point is 00:04:32 might plan some hidden instructions for PinkChat. And these instructions will be digested by the model and will affect how the model can communicate with me later on. So there is some hidden layer of communication of instructions that me as a user might have not been aware of.
Starting point is 00:04:56 And therefore, there is a clear violation of security boundary that could happen and could open up a lot of new attack vectors. Well, explain to me how you all went about testing this. Yeah, so when we actually tested these attacks, at that time, Big Chat was not yet available, at least for us here in Germany. I'm not sure if it was released earlier, maybe in other countries, like the case with BART nowadays. was released earlier, maybe in other countries,
Starting point is 00:05:24 like the case with Word nowadays. But when we actually wrote the paper, we didn't have chat GPT APIs. We have Bing chat, and we really had limited, sort of not state-of-the-art models. So what we did, actually, ironically, that was only less than two months ago. It's nearly one month or like five or six weeks. We had access to the latest GPT-3 model, the DaVinci model.
Starting point is 00:05:53 And we simulated the tools like the plugins, actually, that we all are seeing now. So we simulated plugins or tools like a personal assistant that can read your emails and maybe draft or send emails, which again, we now see LLMs are integrated into applications like emails. We simulated also a tool that when you ask the question, it go to Wikipedia and maybe find some relevant Wikipedia articles and read them and answer a question and so on. Again, because we just have really access to the current tools that are actually available nowadays. So we did this, like we had some instructions in the input to the model, like in the Wikipedia article, for example, that the model would be reading during the search or in the email that the model would be receiving as a personal assistant agent or so on.
Starting point is 00:06:52 And the instructions are hidden or embedded in this input to the model to simulate the case when the MLM is integrated in other applications. and the MLM is integrated in other applications. And then the user is asking the chat or the simulated chatbot that we have built also using DaVinci for... We experimented for some reason with Albert Einstein. So the user might ask the chatbot about information about Albert Einstein,
Starting point is 00:07:22 and then the chatbot will go and read the Wikipedia page, which we prepared for it. We simulated that it's the Wikipedia page, but we hid some instructions in there. And then you, unexpectedly, you might find the model speaking in a pirate accent because we told it to do so. Or you might find the model asking you for personal information because, again, we told it to do so. Or you might find the model asking you for personal information, because again, we told it to do so in the Wikipedia page that we have prepared. Later, now we have access to ChatGPT APIs, and we also have access to Bing Chat,
Starting point is 00:07:59 and we replicated a lot of these attacks with Bing Chat as well. So in that case, we created a local HTML file, for example, that contains these hidden instructions. And you might have seen that Bing chat or the Edge browser has this sidebar feature. So there is a sidebar feature. If you are browsing a certain website, you might open the sidebar feature, if you are browsing a certain website, you might open the sidebar and then open WingChat in the sidebar and start to speak to it.
Starting point is 00:08:32 Like tell it, for example, summarize the current website for me. And in that case, it reads the context of the current page or the current website that you are actually using or reading. And any instructions hidden in this page that might be hidden by any attacker would actually affect the model. So now, after the paper has released for like six weeks, we can say that we can also replicate
Starting point is 00:09:01 most of these attacks very, very effectively and even much more successful than we have imagined using the initial DaVinci model using the recent GPT-4 that's integrated into the chat. And now, a message from our sponsor, Zscaler, the leader in cloud security.
Starting point is 00:09:34 Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context,
Starting point is 00:10:19 simplifying security management with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security. This reminds me of, I suppose in some ways, it reminds me of people using things like search engine optimization to try to rise to the top of Google results. But then also we hear people doing kind of SEO poisoning,
Starting point is 00:10:59 where they want malicious things to rise to the top. I mean, this strikes me as being along the same lines as that sort of thing. Is that an accurate perception on my part? That's actually a very, very accurate observation because it's also one of the things that we imagine how these attacks might be disseminated, right? So some people might use strategies like exactly SEO poisoning in order to get their website
Starting point is 00:11:29 retrieved by search engines. And if they are retrieved, then the LLM running the search engine would also be poisoned or ingested by these prompts that are hidden in their websites. So what do you recommend then? You've demonstrated this capability.
Starting point is 00:11:48 Do you have any suggestions for how we might go forward? For whom exactly? So for users or for... Well, let's do them one at a time. Why don't we start with the users? I think at the moment moment at least my recommendation would be to really be sure
Starting point is 00:12:10 to not use the models if you need 100% reliable and factual output. Yeah, you can ask Bing Chat, tell me some recipes for today. Which is fine because there is really no huge consequences that would
Starting point is 00:12:29 come out of that question. But if you really want to look for very reliable answers, I wouldn't recommend to use MLMs for this. And I would definitely recommend to verify not only if the output is factual, because this is a huge part of the whole thing, but also to verify the links that maybe Wingchat might suggest to you. So, for example, as part of the answer, Wingchat can tell you, find more information here or whatever.
Starting point is 00:13:02 But these links might be malicious because the prompts might actually tell the model and instruct the model to suggest, for example, harmful URLs. What about for developers, for folks who are out there and are eager to use these APIs? Are there warnings for them as well?
Starting point is 00:13:26 I would say yes. At the moment, it's really not so clear what the consequences of these models are. And I think there is a lot of harm that could be done by the current race of really the whole community to integrate LLMs in everything. And I think we really need to stop and ask ourselves if we are ready for the
Starting point is 00:13:52 whole safety considerations at the moment or not. Are there any things that you and your colleagues are going to work on next? Has this work led you to more interesting or additional interesting avenues to explore? Of course. I mean, as I said, this has been done only less than six weeks ago. And we actually came up with this whole paper in only one week. So we wrote it in one week, and we did all the experiments in just one week. It was crazy. It was the fastest thing I ever seen come together actually. And since then we really were not able to catch a break, honestly, because there are every day, there are new models released out there. There are new opportunities
Starting point is 00:14:40 for attacks. And honestly, things that we, when we wrote the paper, we thought they are a bit futuristic, like models that can read your emails, send automatic emails, and these emails are somehow poisoned and all these. It kind of seemed like very futuristic things and somehow a bit of sci-fi.
Starting point is 00:15:00 But now we have all of these things. I thought that this would be a little bit longer along the way when we have all of these things. I thought that this would be a little bit longer along the way when we have all these models but they are actually ready at the moment. And yeah, since then we have been working on actually testing the whole ideas on the models that are more recent such as Bing and GPT-H, GPT-4 and so on.
Starting point is 00:15:24 And actually surprisingly, the attacks work so much better when we have better models. Our thanks to Sahar Abdonabi from CISPA Hemholtz Center for Information Security. The research is titled A Comprehensive Analysis of Novel Prompt Injection Threats to Application Integrated Large Language Models. We'll have a link in the show notes. And now a message from Black Cloak. Did you know the easiest way for cybercriminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award-winning digital executive protection platform
Starting point is 00:16:19 secures their personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io. The Cyber Wire Research Saturday podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies.
Starting point is 00:17:05 This episode was produced by Liz Ervin and senior producer Jennifer Iben. Our mixer is Elliot Peltzman. Our executive editor is Peter Kilby, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.