CyberWire Daily - The next hot AI scam. [Research Saturday]
Episode Date: February 25, 2023Andy Patel from WithSecure Labs joins with Dave to discuss their study that demonstrates how GPT-3 can be misused through malicious and creative prompt engineering. The research looks at how this tech...nology, GPT-3 and GPT-3.5, can be used to trick users into scams. GPT-3 is a user-friendly tool that employs autoregressive language to generate versatile natural language text using a small amount of input that could inevitably interest cybercriminals. The research is looking for possible malpractice from this tool, such as phishing content, social opposition, social validation, style transfer, opinion transfer, prompt creation, and fake news. The research can be found here: Creatively malicious prompt engineering Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that
deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows,
helping you gain insights, receive alerts, and act with ease through guided apps tailored to
your role. Data is hard. Domo is easy. Learn more at ai.domo.com.
That's ai.domo.com.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation
with researchers and analysts
tracking down the threats and vulnerabilities,
solving some of the hard problems of protecting ourselves
in a rapidly evolving cyberspace.
Thanks for joining us.
About a month before ChatGPT came out,
so that would have probably been sometime in October,
I got access to GPT-3.
And it occurred to me at that moment
that people were probably soon going to be getting very cheap
or even free access to large language models.
And so now would be like an appropriate moment to look at how they might be used maliciously.
That's Andy Patel. He's a researcher at WithSecure.
The research we're discussing today is titled Creatively Malicious Prompt Engineering.
so i i started to just play around with ideas you know creating phishing emails and things like that and then i started to find more interesting things to do and the research sort of morphed
in that direction of of a prompt engineering direction where I wanted to discover prompts that did interesting things.
And in particular, did things that could be used in a malicious manner, such as fake news, disinformation, trolling, online harassment, and those sorts of things.
So that's sort of how the research started.
Well, for folks who may not be familiar with ChatGPT, and I suspect most of our audience is,
it's certainly kind of taken the world by storm or captured people's imagination.
If you're unfamiliar with it, how would you describe it?
Well, it's a natural language generation model.
So it's essentially, how could you describe it?
As an algorithm where you give it a string of words
and then it outputs a string of words.
What it can do is continue a sentence that you give it.
It can answer a question.
It can generate lists. It can do simple mathematical problems. It can explain things.
Why do you suppose that this particular iteration of this kind of thing
has attracted the attention that it has? I think it's because it's now good enough to do a majority of the things that you ask it to
do. And it surprised people in many ways. It's been able to do things that people didn't expect
it could do. And so to the outside observer, it looks like our definition of artificial intelligence, right? It's able to come across as a human, almost.
It's able to answer a great deal of questions.
It's able to solve problems.
In some cases, people can also almost see
that it's able to reason to a certain extent.
It's the beginnings of what people hope will be artificial general intelligence,
an actual thinking machine.
Yeah, I've seen people say that even when it's wrong,
it states the incorrect information with absolute confidence.
Yes, it does, yes.
And that's a bit of a problem, that when people start to use it to gather facts,
and it states things that look like facts but aren't, you have to be careful.
Well, let's go through your research here together.
What are some of the areas that you explored here?
We explored several different applications of this model
in areas that we thought it might be useful
from a creative point of view.
So there are people who have used this model to generate code,
and they've also found that it can be used to generate attack code,
for instance, and that's like a cybersecurity application.
But in our case, we wanted it to create written content.
We wanted to use it creatively.
And the obvious first thing that we tried was to make phishing emails.
After that, we looked at social media messages designed to troll and harass individuals and to cause brand damage.
We then went and looked at social validation, which is this idea that if there's a lot of engagement around a topic, people buy into it. So the example we used there was the Tide Pod Eating Challenge, where we asked the model to generate some tweets asking people to take the Tide Pod Challenge.
And then we generated replies from people who had taken the Tide Pod challenge
and then we generated replies to those from the original poster thanking them and asking their
friends to take the challenge and stuff like that so that was another thing then we uh we looked at
style transfer so a way of getting what uh the model to output something that conforms to a certain written style.
And we tried some sort of extreme versions of this,
like Irvin Welsh's written style.
But then we also tried a sort of informal,
internal company chat style that people might use
when they're sending emails to each other inside a company.
And we found that it was able to transfer that style as well.
And then we went on to look at opinion transfer.
So we asked the model to state some facts,
which it did in a very sort of Wikipedia-like fashion.
We prepended an opinion and asked it to state the same facts,
and it stated them with that opinion in mind
now we did the same thing from from the point of view of politics we did we tried the same
thing from both a left and a right wing perspective then we looked at um could we ask the model to
generate prompts themselves so we prompts are the name of the input
that you give to the model
to instruct it on what to do.
So we played around with the idea
of giving it a piece of content already
and asking it, can you write a prompt
that would generate that piece of content?
Sort of reverse engineering in a way.
And the last thing we did was
we looked at generating fake news,
but fake news that the model couldn't possibly know about. So the model that we were using was
trained in June of 2021. And we went about trying to generate a fake news article,
And we went about trying to generate a fake news article claiming that the U.S. were the ones who attacked the Nord Stream pipeline back in autumn of 2022.
So we provided it with some background information and then we asked it to write the news post, which it did quite successfully. And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue
to rise by an 18% year-over-year increase
in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools
expand your attack surface
with public-facing IPs
that are exploited by bad actors
more easily than ever with AI tools.
It's time to rethink your security.
Zscaler Zero Trust plus AI
stops attackers
by hiding your attack surface,
making apps and IPs invisible,
eliminating lateral movement,
connecting users only to specific apps,
not the entire network,
continuously verifying every request
based on identity and context,
simplifying security management with AI-powered automation,
and detecting threats using AI
to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
Well, speaking of success,
what were the areas where it excelled and were there any areas where it came up short?
I think for social media content, it did a very good job.
It wrote social media posts that looked like tweets.
It automatically included hashtags.
They looked very much like the sort of thing you would see on Twitter.
For the things that it failed at,
one obvious thing was if you ask it to generate nonfiction content,
like an article, after like five or six paragraphs,
it will start to repeat itself.
So it can't continually write and add new facts.
It sort of has this limited scope that it can write on.
And in fact, we saw it sometimes sort of repeat itself
across the same
line almost. So for the purposes of automating longer content, like news articles, you wouldn't
want to automate that just in case it glitched in that way. Do you have a sense for where we stand
in terms of automating some kind of, I'm thinking of like an adversarial Twitter account or a Twitter account where you have bad things in mind, you're trying to sway opinion or something like that.
Are we at the point where you could set something like that up and do it in a way that you wouldn't have to have a human automating it?
Could you be confident that it would achieve what you set it out to do?
Absolutely, I think you could do that.
When we were testing online harassment,
we actually made up a fake company,
and we asked it to also make up the bio of the CEO of this fake company
who is also a fake person.
And then we asked it to harass those
and to do brand reputational damage and stuff.
But I think in terms of real-world tweets,
you could very easily write a script
that search for certain keywords or hashtags using the API,
read in the tweets, use a predefined prompt that
basically just instructs it to write a reply opposing this tweet, make it as toxic as possible,
and then have it post those tweets. What we found you could also do is have it
rate those tweets that it wrote. So you could ask it to write 10 tweets opposing this,
make them as toxic as possible. It would generate 10 tweets and then you could ask it,
okay, rate the above 10 tweets on toxicity and it would give you scores. And so then you could
pick the most toxic one and have the script post that. So yeah, absolutely. You could be doing that already. It's fascinating to me. One of the things that we talk about over on the Hacking Humans podcast
that I co-host with Joe Kerrigan is that very often, I guess, historically with online scams,
the quality of the, let's just say English, when people are trying to go after English
language speakers, is often a tell. There's bad English. Things are improperly translated.
This strikes me as something that you could use to run your text through or indeed have it
generated from whole cloth, and it really takes out that limitation of having good English.
Does your research support that notion?
Absolutely, yeah. I mean, not only that, but if you think about the task of trying to
imitate someone's writing style, that's quite difficult, even for a person to do.
And you can take the need to have a skilled writer away from many of these campaigns. In fact, we may get to the point where a perfectly written
email asking you to click on a link becomes the suspicious
thing. Because right now we're looking for typos,
grammar mistakes, badly written emails. But if everyone starts going towards
using a model, then it's the perfectly written English that's suspicious
because humans still make the occasional error, don't they?
Right.
This also reminds me of, like, I was looking at some forums
where kids were using the software to cheat on their homework,
and then they found a piece of software that could detect whether
something was written by AI or not, and then they were discussing how do you beat this detection thing?
And they found that by adding typos,
it would actually then not be rated as having been written by an AI anymore.
I wonder, could you tell this engine to generate something but include some typos?
Actually, I saw something today where someone asked the AI to generate text
and then they asked it to regenerate it such that it won't be detected
by something that detects content written by GPT-3.
And it rewrote it and it wasn't detected.
So you could even ask it to do that itself.
So based on the information that you've gathered here,
where do you think this puts us?
And where are we headed with this technology?
I mean, what we're going to see is
this technology being integrated everywhere.
I mean, people are already talking about
it being integrated into search engines,
it'll be integrated into Microsoft Word, Google Docs, things like that,
so that you can ask it to help you out with your writing.
And so it'll be used for a lot of legitimate, benign purposes,
as well as malicious purposes.
And so purely detecting that something is written by an AI
isn't going to be enough to determine that it's malicious.
You're going to have to still understand what it is that's written there
in order to determine, is it online harassment?
Is it trolling? Is it disinformation? Is it phishing?
And those are very difficult tasks.
It's interesting to me, you know, as a parent of a teenager who has to submit and write content in high school,
I think back to my own experience, you know, before we were doing everything on computers
and just what a different experience it is for him that these days kids are handing in papers that
everything's gone through spellcheck, everything's gone through grammar check. And we accept that as
being the modern standard. Teachers don't push back on that anymore because that's the standard.
It's where we are. And I wonder where this leads us to. If every email, if every interaction
gets run through something like this to be cleaned up, to be
polished, will that become the standard and just become the acceptable way of interacting with
people? I mean, I suppose so. I kind of of schools going back to asking assignments to be handwritten.
And of course, like, you know, you can have a model generate some text
and then you just copy it onto a piece of paper.
Sure.
But, I mean, I think that the way they should be approached
is that these are tools that we're going to have,
that everyone's going to have.
Eventually, this thing will run on your phone, right?
And it'll be able to help you out.
So as a creative tool, it's very useful.
It saves you time.
It gets rid of writer's block.
It comes up with suggestions for things.
I mean, it should be embraced as a way that we work on things.
And if we already have things like spelling check and grammar check
and autocomplete the next word when you're typing,
this is just the next logical progression from that.
And so if you're going to test someone with homework,
then you should do it in such a way that you appreciate the fact that these things exist already.
Yeah.
Again, forgive me, but I remember growing up and taking math class and teachers saying to us,
we couldn't use a calculator because we wouldn't always have a calculator with us.
And now I look at myself today and everyone around me, and not only do we have calculators with us all the time, we have little tiny supercomputers that have access to all the world's knowledge.
Exactly, yeah.
Yeah.
It's sort of a similar thing that I see people talking about these whiteboard programming exercises that they have to do
when they're interviewing at companies.
And in real life, if you're programming something,
you're spending half your time on Stack Overflow.
I mean, it's just natural, right?
You shouldn't expect someone to know all of that stuff
without looking things up every now and then.
It's just not the natural way of doing it, is it?
No.
So I'm curious, what's the cautionary tale here from your research?
Is there something that, particularly for folks who are in cybersecurity,
is there a message they should take away from this?
I mean, I've had a lot of questions about what do we do differently
now that people will start to have these capabilities.
And from the point of view of, for instance, phishing or disinformation,
we already have human processes, things like phishing awareness,
media literacy.
Those are going to become more important.
If you get a DHL phishing email, it's not going to have been created by GPT-3 because they are going to copy the exact same email that DHL sends
with the exact same style that DHL sends with the exact same style
and logos and everything.
And it's only the link in there that's going to be malicious.
So that's not something that's even going to change,
but the way that we approach it is that we mouse over the link
and check that it is legitimate, or it looks legitimate.
We look at the sender field.
Those things are
still going to be very valid if if not more valid right and when it comes to um to social media
you know i think we might see an uptick in uh like automated harassment things like that maybe
spamming certain topics there are i mean you hear about the fact that nation states
employ you know maybe even tens of thousands of individual or of actual people um to write
trolling messages to write social media messages right that's probably not going to go away or
if it is it's going to slowly go away and sort of become automated.
But it's quite difficult to really predict when these things will happen.
I mean, when you try and predict when criminals will take this into use,
it's going to be financially motivated,
whether it's enough of a return on investment.
And as this stuff gets cheaper and easier, it's more likely to be taken up.
Another thing that I think is interesting is the fact that these models are already
good enough at what they do.
Eventually, they're going to get smaller to the point where you're going to be able
to download the weights and run it on your PC.
And when you do that, you're not going to have the safety filters
that are in place right now that exist
because you have to access it via an API.
So when you're going to be able to run these things on your own computer,
you're going to be able to do even more stuff with them
that you can't do right now because the safety filter just comes back
and says no, back and says no. Computer says no, you know?
Our thanks to Andrew Patel from WithSecure for joining us.
The research is titled Creatively Malicious Prompt Engineering.
We'll have a link in the show notes.
Cyber threats are evolving every second, and staying ahead is more than just a challenge.
It's a necessity. That's why we're
thrilled to partner with ThreatLocker, a cybersecurity solution trusted by businesses
worldwide. ThreatLocker is a full suite of solutions designed to give you total control,
stopping unauthorized applications, securing sensitive data, and ensuring your organization
runs smoothly and securely. Visit ThreatLocker.com today to see how a default-deny approach can keep your company
safe and compliant.
The CyberWire Research Saturday podcast
is a production of N2K Networks,
proudly produced in Maryland out of the startup studios of DataTribe,
where they're co-building the next generation
of cybersecurity teams and technologies.
This episode was produced by Liz Ervin
and senior producer Jennifer Iben.
Our mixer is Elliot Peltzman.
Our executive editor is Peter Kilby, and I'm Dave Bittner. Thanks for listening.