CyberWire Daily - Detecting dating profile fraud. [Research Saturday]

Episode Date: August 17, 2019

Researchers from King’s College London, University of Bristol, Boston University, and University of Melbourne recently collaborated to publish a report titled, "Automatically Dismantling Online Dati...ng Fraud." The research outlines techniques to analyze and identify fraudulent online dating profiles with a high degree of accuracy. Professor Awais Rashid is one of the report's authors, and he joins us to share their findings. The original research can be found here: https://arxiv.org/pdf/1905.12593.pdf Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. JoinDeleteMe.com slash N2K and use promo code N2K at checkout. The only way to get 20% off is to go to JoinDeleteMe.com slash N2K and enter code N2K at checkout. That's JoinDeleteMe.com slash N2K, code N2K. Hello, everyone, and welcome to the CyberWire's Research Saturday.
Starting point is 00:01:36 I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface
Starting point is 00:02:19 with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context, simplifying security Thank you. your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security. I have a longstanding interest in protecting citizens online. That's Professor Awais Rashid from the University of Bristol. The research we're
Starting point is 00:03:25 discussing today is titled Automatically Dismantling Online Dating Fraud. We have worked for many years, well over 12 to 15 years, on protecting people online such as children, as well as developing technologies to help law enforcement tackling those issues. This particular project came about because there is a colleague in psychology who was then at the University of Warwick, now based in Melbourne, and she was studying romance scams from a psychological perspective. And given our expertise in looking at more computational approaches to automatically detect attempts to victimize online,
Starting point is 00:04:04 there was a common ground. So we started to look at this as a problem and then also collaborated with researchers who were then at University College London. The general premise here was that romance scam is one of the perhaps the more underreported problems in some cases. So if you look at some of the recent data from the FBI, it notes that there is a total loss of $85 million through online romance scams in the US. The numbers could be higher because there is often quite a lot of stigma with people wanting to admit that they have been taken in by this kind of scam. And, you know, and of course, you know, they might be very traumatized to even be willing to share it. So how do online sites up to this point, how do they combat this sort of thing? So online dating platforms, in fairness, do quite a bit of work to protect the users that are on there.
Starting point is 00:04:57 But largely the techniques are manual. manual. So, you know, there are banks of human moderators whose job it is to identify where a scam profile might be created and attempted on the dating site. Of course, if people report particular activity to the dating site, then they act on it. But the key thing here is that the first step in an online romance scammers task is to create a fake dating profile and get it onto a dating site. So in order to protect people, the best thing to do a fake dating profile and get it onto a dating site. So in order to protect people, the best thing to do is to actually try and stop it at that point in time. And that's what dating sites do. But because there is a huge volume of profiles that are created, scammers are getting increasingly sophisticated. And while some of the indicators like using
Starting point is 00:05:40 stolen credit cards and so on are very easy to spot. In some cases, scam profiles are harder to detect for also humans. From our perspective, it is meant as an aid to the work that human moderators do, perhaps moving their attention to other activities so that they can really look at the more sophisticated profiles that might require a lot more human intervention. And some of the sort of stuff that can easily be caught through automation can be caught fairly quickly. What was the goal here of your research? What were you hoping to find here? So our goal was to see if we can develop AI and machine learning based techniques that can
Starting point is 00:06:21 automatically detect romance scammer profiles with a very high degree of accuracy. And the idea being that as people are creating profiles, the system works so that it can automatically flag potentially to a human moderator that something is suspect. Or, you know, depending on how the system is tuned, automatically even reject a profile. Of course, there is a balance here because while the tools are highly accurate, they are not 100% accurate as the nature of machine learning and AI techniques at the moment. So there has to be some kind of human intervention because, of course, you know, we don't want to deny regular users who want to use online dating sites from participating in these platforms. So let's walk through together. There are a number of, I guess, common attributes that are a part of a dating profile. Let's go through those together and talk about how you broke those out and how you analyze them. So we look at a number of indicators within the profile. So we look at what are the demographics information that a dating profile would have. We also look at
Starting point is 00:07:25 the description that the profiles provide about themselves, because there's always a bit of a textual field to say something about oneself on the profile. And then there is, of course, imagery, where people will share images of themselves to show the kind of person that they are. And it's quite interesting that if we look at some of these various indicators, then we can see the system can detect quite a few patterns, which may not immediately be obvious if we look at it just with the naked eye, so to speak. So if we, for example, compare the frequency with which, so we had a dataset from a dating site that has shared manually verified scam profiles online. So we have that data set and we can compare it against real dating profiles. And it's quite interesting to see that in real profiles, we don't see the same frequency with which some of what scammers
Starting point is 00:08:19 claim to be their profession. So scammers, for example, very often will claim to be in the military profession, engineers, business, for instance. And all these, if you then now think about it, you know, logically, all these are used to create a story. So military, you know, the good old sort of, you know, American GI abroad scam, you know, either the luggage is, you know, stuck somewhere and they need money to then move it across customs, or they have been injured or things like that. Again, engineer, business, it sort of shows a successful profile, but also there is always a reason why the scammer can't meet or why they're always traveling and so on. And that's for male profiles. But if you then look at female profiles, it's a very different setup in the sense that often you will see professions such as
Starting point is 00:09:06 student, carer, military also does make an appearance, but sales, but all of these are there to then create a story follow on. So as a student, you know, you potentially run out of money, you know, as a carer, you may need money to help care for someone you're caring about. And that is not to say that we don't see these in real profiles. These professions also exist in real profiles. And of course, you know, people have all sorts of lives. But the frequency with which they appear in scam profiles is much higher than they appear in the real profiles, male or female. And there is also other interesting patterns that we see that, you know, actually scammers
Starting point is 00:09:42 overshare in terms of images. see that, you know, actually scammers overshare in terms of images. So if you are not a scammer, you will share fewer images in the data set on average compared to scammers where they will share almost twice as many images. And it is perhaps a way for them to sort of, you know, try and portray that they have a sort of a very interesting life and lots and lots of interests and to attract potential victims. Now, you looked into the images themselves and found some pretty interesting patterns there as well. Yes, so we can. So our classifiers can automatically classify the images based on, you know, what is the content of the images. But also, if you look at some of the sort of fake versus real images in the scientific paper, there are some interesting examples. So you can see that they are often scammers would take
Starting point is 00:10:31 real people's images from online sites, and then change that, they will alter them. So they will superimpose someone else's head on. And there are two that are really interesting. And in one is, you know, in the original, there is a man on a hospital bed and has an injured leg. But in the fake image, which obviously at some point a scammer has shared with a victim to say, look, I'm injured and I'm in hospital and I need money. It's just the same image, but it has a woman's head imposed on it. And if you look at it cursorily, you know, on perhaps a smaller screen, it's quite easy to miss. But if you sort of, you know, blow it up, or if you can automatically analyze it, then you can see that actually it has been doctored. And talking about doctors,
Starting point is 00:11:13 the other one, which is quite interesting, is that one of the scam images shows a scammer, you know, with his friends as a doctor with sort of nurses and doctors in a hospital. But actually the image is taken from a TV show. So they basically just replaced the head of the lead actor with someone else's head. May or may not be a real, real, the scammers own imagery, unlikely to be their own imagery. But, you know, and again, you know, if you don't know,
Starting point is 00:11:42 and it's a show based in the UK, I don't know how popular it is in North America. And if you've not seen it, you know, it might look perfectly legitimate to a user. But if you have seen the show, then it is quite obvious. So there are these kinds of patterns, which we can see, but, you know, they look sort of, you know, legitimate to, you know, sort of at a quick glance. And when people are sort of glancing through these things, it is quite, quite easy to miss some of these things. Now, you did work of using automation to categorize these images. Did you also do work of reverse image searches? You're looking for, for example, like stock images that people pretended to be profile images? No, so we didn't particularly look at that, but there is other work which looks at doing this kind of reverse image search, and we can easily utilize these kind of techniques. But our focus was specifically to look at can we actually classify as something comes
Starting point is 00:12:39 in fairly quickly as to whether it is a real or a fake profile. And what did that reveal, the classifications that you assigned to things? What was the result of that work? We analyzed basically the demographic data. We analyzed the images based on the kind of content and features they're conveying. And we analyzed the profile descriptions using textual analysis and language analysis techniques. All of these use different types of machine learning technologies. So we use a combination of structured, unstructured, as well as deep learning techniques. And then we combine the outputs of the different classifiers and we tested kind of various functions to find what was the optimal one. And we can,
Starting point is 00:13:21 with a 97% accuracy, detect that an incoming profile is a scammer profile. But of course, as I said earlier on, you know, it's not 100% accurate. So there are, you know, of course, false positives in that regard. For the real life human beings who are assigned to sort through these things, if they're only tasked with then having to really take a close look at 3% of them, that really lightens the load for them and allows them to be more accurate there. Absolutely. And I think the key here is to help the human moderators by reducing the workload on them. Again, the scammers will keep trying. So if a moderator would reject the profile, it doesn't really take them very long to create another one and another one and another one and so on. Because ultimately, some of these things are automated in terms of trying to sign
Starting point is 00:14:18 up and things like that. So there's lots of different techniques that scammers also use to get themselves onto the dating site. So the more we can help the human moderators, the better it is. There is also potentially other applications in the sense that one can envisage this being a browser plugin that users use at their own, and they can use it to sort of see if the profiles that they're viewing are potentially scammer profiles. But there are, of course, sort of other issues there because one needs to be very careful because, you know, romance is a very, very personal interaction.
Starting point is 00:14:53 And one needs to be careful that, you know, people might start to sort of completely believe an AI technique. And what we do not want to do is to accuse, you know, perfectly legitimate users of being scammers. Right. Yeah. What if I actually am a military handsome man overseas who is looking for a relationship? Absolutely. And unfortunately, you know, with the high profile of that particular type of scam, it's a hard job at the moment in that regard to convince people that you're for real. But there are often telltale signs, you know, and these kind of tools are one thing in our set of tools to try and detect and prevent online romance scams. But, you know, of course, with the tools not being 100% accurate, you know, some of the profiles will get through.
Starting point is 00:15:44 Human moderators are ultimately, you know, human and may miss profiles. But there are telltale signs that users can use to protect themselves. You know, there are typical tactics. You know, scammers will try to take people off the dating site quickly. Now, again, dating sites are very unique in that sense that they are designed for strangers to talk to strangers. Okay. So you are effectively for strangers to talk to strangers. Okay.
Starting point is 00:16:08 So you are effectively by signing up to a dating site, you are saying you are actually happy to receive unsolicited messages because that's how people get in touch with you. And then once, if you've found someone you are interested in, then people do not really want to communicate through the dating site. You know, initially they might do a little bit of communication, but, you know, ultimately, you know, they would want to sort of talk to each other directly. And scammers actually utilize that as a tactic to get people
Starting point is 00:16:30 quickly off the dating site, because now there is no trail of what's going on in any shape or form. But there are other telltale signs if they are always unable to meet, if they are starting to ask for money. And some of the profiles that we looked at, there are very specific kind of language. So they often appeal to people's idealized sense of romance. So the work that our colleague in the project, Monica Witte, she did from a psychological perspective shows that people who fall victim to these kind of scams have often an idealized notion of what a romance is, that there is one person for everyone. And if you look at some of the fake
Starting point is 00:17:09 profiles, they play on that. So, you know, one example that we have is, you know, someone saying, you know, I'm actually a widower. Many a times, you know, scammers would pretend to be widowers as well, you know, and they say, you know, ever since my wife passed away, I've been celibate, you know, there was one person for me. And now I have found this kind of new person, and I want to grow old with you, and so on, and so forth, and things like that. So they appeal to that kind of idealized notion of romance. And there is those kinds of signs as well. There is also, you know, sort of often sort of they're very kind of forceful, I don't want to say forceful, I guess sort of very exaggerating in the way they want to sort of engage with people, you know, so the female profile, but talking about having
Starting point is 00:17:51 a great friendship, you know, and they will sort of encourage people to get in touch with personal email or personal messaging apps so that they can take them off the website. Now, one of the things you looked into that I found fascinating was you use natural language processing to actually analyze the number of words and the types of words that these scammers were using. And you found some real differences between the types of things that real people say and the things that the scammers say. Yes.
Starting point is 00:18:23 And that's actually a great question, because similar to people oversharing in terms of images, scammers also write more. So the average number of words that scammers would use in their profiles is well over 100, while users would normally write more brief and pithy descriptions of themselves, you know, around about 50 words less than scammers in general. Also, scammers often considerably more refer to emotions, both positive and negative. So they try to evolve this emotive response from the users, they will appeal to often sense of family, friendship, and the sort of provides sort of a certain sense of certainty. And if we contrast that with real users, then real users tend to often, for example, focus
Starting point is 00:19:11 more on their motives and drivers. They would talk about work, leisure, and those kinds of things. And also, the other interesting thing is that scammers use more formal language forms, while general users will display more informal language, such as, you know, Netspeak or short messaging speak. Scammers tend to often be more formalized, and it may often be that they're perhaps based outside the Western countries. Most of the scammers would target North America, but also Europe, so on. So perhaps they're not used to the vernacular of the particular country that they're targeting.
Starting point is 00:19:46 Can you give us an overview of what was the tech that you were using under the hood to combine this data that you gathered and then come up with your conclusions and end up with this high degree of accuracy? If you think at a very high level, how the process works is that we take in raw description of our profile. This comes from a dating site and their associated published scammer profiles that they have done. From there, we look at three different elements. We look at demographics, which is effectively occupation, gender, age, any other sort of data that is in the form fields. We look at images themselves, and then we look at the
Starting point is 00:20:26 descriptions in terms of the profiles, which is raw text. Each of these is then cleaned up. So we would normalize the data. We take, for example, if someone has said their occupation is a housewife, then we would normalize it to home and wife. If in images, we can classify things like, you know, there is a woman in a dress man playing rugby and so on and so forth. And we were earlier talking about sort of, you know, profile descriptions. And there we would analyze all the textual features by using natural language processing techniques. And then we extract the various features from these. So there are numerical features, there are particular categories that we would utilize. And for natural language processing, we found that n-grams, so portions of words based on three or four letters, were good indicators. We then feed them into a range
Starting point is 00:21:18 of different classifiers and then combine the outputs of the classifiers using a function to which we then test different weightings as to what works. How much adjustment and tweaking did you do along the way to come to the degree of accuracy that you have here? So that's the challenge with machine learning and AI techniques, that you have to sort of see which techniques work well. So we tried, of course, a range of different classifiers to see which one would perform better on particular types of data. We had some ideas, but we, of course, have to try and see which ones perform better on different types of data and then evaluate them on different types of test sets.
Starting point is 00:22:03 And once we had done that for each of the three different subcategories, so the profile description, the image, and the demographics, then once we knew which ones were the classifiers we wanted to utilize there, then we train an ensemble classifier. And then we test that as to how accurate that ensemble is going to be. So there is quite a lot of testing that goes on to reach a degree of confidence. Now, I have to hear a caveat that while the profiles we use are fairly representative of the profiles that are used in platforms in industry, but the profiles we use are from one particular platform that is open. And the reason we use them is that they have also made available a big data set, a reasonably sized data set, I should say, of scam profiles. So we have a direct comparison. We know which profiles are fake, and we also know which profiles are real on the assumption that human moderators miss nothing. So generalizing the results beyond the data set at this point in time would not be scientifically valid. Naturally, larger trials are needed on a number of other dating platforms and the kind of profiles that they see
Starting point is 00:23:17 to test if the classifiers we have trained work as effectively, what are the limitations of the classifiers? Would more fine-tuning be required before we can have confidence that this would work on a very, very large scale? This is effectively a fundamental early-stage research that needs much more further validation. So where would you like to see it go next? Would you like other folks to build off of the research you've done here? Absolutely. So our tools are publicly available. So the source code is publicly available.
Starting point is 00:23:51 We haven't shared the data for the simple ethical reason that because we use also, while publicly available, real users' profiles as well, we don't store any of that data on the basis that if users withdrew their profile from the platform, then it shouldn't live on in our data set. So the only thing that we share is as to how to collect the data, the scripts that we use, so that can be validated. And yes, of course, the idea here is really that others are welcome to take these on and build more.
Starting point is 00:24:23 We are ourselves, of course, interested in working with other platforms to do larger scale trials. As I mentioned earlier, we are also discussing as to the feasibility of perhaps a browser plugin that users can use themselves. But the key is how do we communicate that outcomes to the users so that they don't attach. is how do we communicate that, the outcomes to the users so that they don't attach, they are aware that the tools are not 100% accurate and hence don't attach a complete confidence to them, but equally do not completely ignore them. And then there is this fundamental question,
Starting point is 00:24:55 should we take just a better safe than sorry approach and just say, even if there is a small false positive range, we should just accept that it's better to be safer than sorry. And those are the more sort of fundamental research questions that really need to be investigated before we can sort of, you know, have more insights into how effective these techniques are on a large scale.
Starting point is 00:25:22 Our thanks to Professor Awais Rashid from University of Bristol. The research is titled Automatically Dismantling Online Dating Fraud. We'll have a link in the show notes. And now a message from Black Cloak. And now, a message from Black Cloak. Did you know the easiest way for cybercriminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award-winning digital executive protection platform
Starting point is 00:25:56 secures their personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io. The Cyber Wire Research Saturday
Starting point is 00:26:23 is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing Cyber Wire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.