CyberWire Daily - The next hot AI scam. [Research Saturday]

Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows, helping you gain insights, receive alerts, and act with ease through guided apps tailored to your role. Data is hard. Domo is easy. Learn more at ai.domo.com. That's ai.domo.com. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts

Starting point is 00:01:07 tracking down the threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. About a month before ChatGPT came out, so that would have probably been sometime in October, I got access to GPT-3. And it occurred to me at that moment

Starting point is 00:01:36 that people were probably soon going to be getting very cheap or even free access to large language models. And so now would be like an appropriate moment to look at how they might be used maliciously. That's Andy Patel. He's a researcher at WithSecure. The research we're discussing today is titled Creatively Malicious Prompt Engineering. so i i started to just play around with ideas you know creating phishing emails and things like that and then i started to find more interesting things to do and the research sort of morphed in that direction of of a prompt engineering direction where I wanted to discover prompts that did interesting things. And in particular, did things that could be used in a malicious manner, such as fake news, disinformation, trolling, online harassment, and those sorts of things.

Starting point is 00:02:44 So that's sort of how the research started. Well, for folks who may not be familiar with ChatGPT, and I suspect most of our audience is, it's certainly kind of taken the world by storm or captured people's imagination. If you're unfamiliar with it, how would you describe it? Well, it's a natural language generation model. So it's essentially, how could you describe it? As an algorithm where you give it a string of words and then it outputs a string of words.

Starting point is 00:03:20 What it can do is continue a sentence that you give it. It can answer a question. It can generate lists. It can do simple mathematical problems. It can explain things. Why do you suppose that this particular iteration of this kind of thing has attracted the attention that it has? I think it's because it's now good enough to do a majority of the things that you ask it to do. And it surprised people in many ways. It's been able to do things that people didn't expect it could do. And so to the outside observer, it looks like our definition of artificial intelligence, right? It's able to come across as a human, almost. It's able to answer a great deal of questions.

Starting point is 00:04:14 It's able to solve problems. In some cases, people can also almost see that it's able to reason to a certain extent. It's the beginnings of what people hope will be artificial general intelligence, an actual thinking machine. Yeah, I've seen people say that even when it's wrong, it states the incorrect information with absolute confidence. Yes, it does, yes.

Starting point is 00:04:47 And that's a bit of a problem, that when people start to use it to gather facts, and it states things that look like facts but aren't, you have to be careful. Well, let's go through your research here together. What are some of the areas that you explored here? We explored several different applications of this model in areas that we thought it might be useful from a creative point of view. So there are people who have used this model to generate code,

Starting point is 00:05:28 and they've also found that it can be used to generate attack code, for instance, and that's like a cybersecurity application. But in our case, we wanted it to create written content. We wanted to use it creatively. And the obvious first thing that we tried was to make phishing emails. After that, we looked at social media messages designed to troll and harass individuals and to cause brand damage. We then went and looked at social validation, which is this idea that if there's a lot of engagement around a topic, people buy into it. So the example we used there was the Tide Pod Eating Challenge, where we asked the model to generate some tweets asking people to take the Tide Pod Challenge. And then we generated replies from people who had taken the Tide Pod challenge

Starting point is 00:06:25 and then we generated replies to those from the original poster thanking them and asking their friends to take the challenge and stuff like that so that was another thing then we uh we looked at style transfer so a way of getting what uh the model to output something that conforms to a certain written style. And we tried some sort of extreme versions of this, like Irvin Welsh's written style. But then we also tried a sort of informal, internal company chat style that people might use when they're sending emails to each other inside a company.

Starting point is 00:07:03 And we found that it was able to transfer that style as well. And then we went on to look at opinion transfer. So we asked the model to state some facts, which it did in a very sort of Wikipedia-like fashion. We prepended an opinion and asked it to state the same facts, and it stated them with that opinion in mind now we did the same thing from from the point of view of politics we did we tried the same thing from both a left and a right wing perspective then we looked at um could we ask the model to

Starting point is 00:07:39 generate prompts themselves so we prompts are the name of the input that you give to the model to instruct it on what to do. So we played around with the idea of giving it a piece of content already and asking it, can you write a prompt that would generate that piece of content? Sort of reverse engineering in a way.

Starting point is 00:08:02 And the last thing we did was we looked at generating fake news, but fake news that the model couldn't possibly know about. So the model that we were using was trained in June of 2021. And we went about trying to generate a fake news article, And we went about trying to generate a fake news article claiming that the U.S. were the ones who attacked the Nord Stream pipeline back in autumn of 2022. So we provided it with some background information and then we asked it to write the news post, which it did quite successfully. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase

Starting point is 00:09:06 in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security.

Starting point is 00:09:24 Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request

Starting point is 00:09:41 based on identity and context, simplifying security management with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security. Well, speaking of success,

Starting point is 00:10:18 what were the areas where it excelled and were there any areas where it came up short? I think for social media content, it did a very good job. It wrote social media posts that looked like tweets. It automatically included hashtags. They looked very much like the sort of thing you would see on Twitter. For the things that it failed at, one obvious thing was if you ask it to generate nonfiction content, like an article, after like five or six paragraphs,

Starting point is 00:10:49 it will start to repeat itself. So it can't continually write and add new facts. It sort of has this limited scope that it can write on. And in fact, we saw it sometimes sort of repeat itself across the same line almost. So for the purposes of automating longer content, like news articles, you wouldn't want to automate that just in case it glitched in that way. Do you have a sense for where we stand in terms of automating some kind of, I'm thinking of like an adversarial Twitter account or a Twitter account where you have bad things in mind, you're trying to sway opinion or something like that.

Starting point is 00:11:35 Are we at the point where you could set something like that up and do it in a way that you wouldn't have to have a human automating it? Could you be confident that it would achieve what you set it out to do? Absolutely, I think you could do that. When we were testing online harassment, we actually made up a fake company, and we asked it to also make up the bio of the CEO of this fake company who is also a fake person. And then we asked it to harass those

Starting point is 00:12:10 and to do brand reputational damage and stuff. But I think in terms of real-world tweets, you could very easily write a script that search for certain keywords or hashtags using the API, read in the tweets, use a predefined prompt that basically just instructs it to write a reply opposing this tweet, make it as toxic as possible, and then have it post those tweets. What we found you could also do is have it rate those tweets that it wrote. So you could ask it to write 10 tweets opposing this,

Starting point is 00:12:45 make them as toxic as possible. It would generate 10 tweets and then you could ask it, okay, rate the above 10 tweets on toxicity and it would give you scores. And so then you could pick the most toxic one and have the script post that. So yeah, absolutely. You could be doing that already. It's fascinating to me. One of the things that we talk about over on the Hacking Humans podcast that I co-host with Joe Kerrigan is that very often, I guess, historically with online scams, the quality of the, let's just say English, when people are trying to go after English language speakers, is often a tell. There's bad English. Things are improperly translated. This strikes me as something that you could use to run your text through or indeed have it generated from whole cloth, and it really takes out that limitation of having good English.

Starting point is 00:13:43 Does your research support that notion? Absolutely, yeah. I mean, not only that, but if you think about the task of trying to imitate someone's writing style, that's quite difficult, even for a person to do. And you can take the need to have a skilled writer away from many of these campaigns. In fact, we may get to the point where a perfectly written email asking you to click on a link becomes the suspicious thing. Because right now we're looking for typos, grammar mistakes, badly written emails. But if everyone starts going towards using a model, then it's the perfectly written English that's suspicious

Starting point is 00:14:23 because humans still make the occasional error, don't they? Right. This also reminds me of, like, I was looking at some forums where kids were using the software to cheat on their homework, and then they found a piece of software that could detect whether something was written by AI or not, and then they were discussing how do you beat this detection thing? And they found that by adding typos, it would actually then not be rated as having been written by an AI anymore.

Starting point is 00:14:56 I wonder, could you tell this engine to generate something but include some typos? Actually, I saw something today where someone asked the AI to generate text and then they asked it to regenerate it such that it won't be detected by something that detects content written by GPT-3. And it rewrote it and it wasn't detected. So you could even ask it to do that itself. So based on the information that you've gathered here, where do you think this puts us?

Starting point is 00:15:30 And where are we headed with this technology? I mean, what we're going to see is this technology being integrated everywhere. I mean, people are already talking about it being integrated into search engines, it'll be integrated into Microsoft Word, Google Docs, things like that, so that you can ask it to help you out with your writing. And so it'll be used for a lot of legitimate, benign purposes,

Starting point is 00:15:58 as well as malicious purposes. And so purely detecting that something is written by an AI isn't going to be enough to determine that it's malicious. You're going to have to still understand what it is that's written there in order to determine, is it online harassment? Is it trolling? Is it disinformation? Is it phishing? And those are very difficult tasks. It's interesting to me, you know, as a parent of a teenager who has to submit and write content in high school,

Starting point is 00:16:34 I think back to my own experience, you know, before we were doing everything on computers and just what a different experience it is for him that these days kids are handing in papers that everything's gone through spellcheck, everything's gone through grammar check. And we accept that as being the modern standard. Teachers don't push back on that anymore because that's the standard. It's where we are. And I wonder where this leads us to. If every email, if every interaction gets run through something like this to be cleaned up, to be polished, will that become the standard and just become the acceptable way of interacting with people? I mean, I suppose so. I kind of of schools going back to asking assignments to be handwritten.

Starting point is 00:17:31 And of course, like, you know, you can have a model generate some text and then you just copy it onto a piece of paper. Sure. But, I mean, I think that the way they should be approached is that these are tools that we're going to have, that everyone's going to have. Eventually, this thing will run on your phone, right? And it'll be able to help you out.

Starting point is 00:17:53 So as a creative tool, it's very useful. It saves you time. It gets rid of writer's block. It comes up with suggestions for things. I mean, it should be embraced as a way that we work on things. And if we already have things like spelling check and grammar check and autocomplete the next word when you're typing, this is just the next logical progression from that.

Starting point is 00:18:19 And so if you're going to test someone with homework, then you should do it in such a way that you appreciate the fact that these things exist already. Yeah. Again, forgive me, but I remember growing up and taking math class and teachers saying to us, we couldn't use a calculator because we wouldn't always have a calculator with us. And now I look at myself today and everyone around me, and not only do we have calculators with us all the time, we have little tiny supercomputers that have access to all the world's knowledge. Exactly, yeah. Yeah.

Starting point is 00:19:05 It's sort of a similar thing that I see people talking about these whiteboard programming exercises that they have to do when they're interviewing at companies. And in real life, if you're programming something, you're spending half your time on Stack Overflow. I mean, it's just natural, right? You shouldn't expect someone to know all of that stuff without looking things up every now and then. It's just not the natural way of doing it, is it?

Starting point is 00:19:28 No. So I'm curious, what's the cautionary tale here from your research? Is there something that, particularly for folks who are in cybersecurity, is there a message they should take away from this? I mean, I've had a lot of questions about what do we do differently now that people will start to have these capabilities. And from the point of view of, for instance, phishing or disinformation, we already have human processes, things like phishing awareness,

Starting point is 00:20:06 media literacy. Those are going to become more important. If you get a DHL phishing email, it's not going to have been created by GPT-3 because they are going to copy the exact same email that DHL sends with the exact same style that DHL sends with the exact same style and logos and everything. And it's only the link in there that's going to be malicious. So that's not something that's even going to change, but the way that we approach it is that we mouse over the link

Starting point is 00:20:38 and check that it is legitimate, or it looks legitimate. We look at the sender field. Those things are still going to be very valid if if not more valid right and when it comes to um to social media you know i think we might see an uptick in uh like automated harassment things like that maybe spamming certain topics there are i mean you hear about the fact that nation states employ you know maybe even tens of thousands of individual or of actual people um to write trolling messages to write social media messages right that's probably not going to go away or

Starting point is 00:21:21 if it is it's going to slowly go away and sort of become automated. But it's quite difficult to really predict when these things will happen. I mean, when you try and predict when criminals will take this into use, it's going to be financially motivated, whether it's enough of a return on investment. And as this stuff gets cheaper and easier, it's more likely to be taken up. Another thing that I think is interesting is the fact that these models are already good enough at what they do.

Starting point is 00:21:57 Eventually, they're going to get smaller to the point where you're going to be able to download the weights and run it on your PC. And when you do that, you're not going to have the safety filters that are in place right now that exist because you have to access it via an API. So when you're going to be able to run these things on your own computer, you're going to be able to do even more stuff with them that you can't do right now because the safety filter just comes back

Starting point is 00:22:24 and says no, back and says no. Computer says no, you know? Our thanks to Andrew Patel from WithSecure for joining us. The research is titled Creatively Malicious Prompt Engineering. We'll have a link in the show notes. Cyber threats are evolving every second, and staying ahead is more than just a challenge. It's a necessity. That's why we're thrilled to partner with ThreatLocker, a cybersecurity solution trusted by businesses worldwide. ThreatLocker is a full suite of solutions designed to give you total control,

Starting point is 00:23:15 stopping unauthorized applications, securing sensitive data, and ensuring your organization runs smoothly and securely. Visit ThreatLocker.com today to see how a default-deny approach can keep your company safe and compliant. The CyberWire Research Saturday podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies.

Starting point is 00:23:57 This episode was produced by Liz Ervin and senior producer Jennifer Iben. Our mixer is Elliot Peltzman. Our executive editor is Peter Kilby, and I'm Dave Bittner. Thanks for listening.

CyberWire Daily - The next hot AI scam. [Research Saturday]

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.