LPRC - CrimeScience – The Weekly Review – Episode 231 Ft. Sam Yeung, PhD

Starting point is 00:00:00 Hi everyone and welcome to crime science. In this podcast, we explore the science of crime and the practical application of this science for loss prevention and asset protection practitioners, as well as other professionals. Welcome, welcome, welcome to the LPRC Crime Science podcast. I am your host today. My name is Tiffany Friesen and I am joined by Sam Young. He is one of our PhD research scientist here at LPLRC. Sam, I don't know. I will note that Sam is the facilitator of the retail fraud working group, so you can catch them on there.

Starting point is 00:00:44 But today, we're going to be talking about Sam's most recent R2P, decoding the scam, how narrative DNA protects the modern retailer. And before I get going with questions. and everything, Sam. Do you want to do a little hello, shout out to everyone? Hi, hello everyone. My official name is Camlung Young or you can call me Sam. I am a cognitive psychologist by training. My PhD is in cognitive psychology. I'm also a data scientist. I'm the facilitator for both data and analytic working group, as well as a week tailfought working group.

Starting point is 00:01:30 I like to scrape data, collect intelligence, and analyze data, and particularly tech. So that's me. Awesome. Thank you, Sam. Yeah, so today we're going to be talking about your most recent R2P decoding the scam. So to get started off, do you want to just describe the report and the project in general, just give us an overview. Yeah, sure.

Starting point is 00:02:01 This really is a request from one of our retail partners who recently experienced substantial volume of different types of scams. And they provide the data set that include a fairly detailed description, text description of each incident. Of them, I think they have about 160 incidents. And then the length of each incident is a bit of. about 180 words. So it's fairly detailed.

Starting point is 00:02:33 So they want, if a human is to process all the text, 180 times 160, it's still not too bad if you read it as a novel. But if you want to pick out the pattern across all the incidents, then you really have to keep track of everything that you have came across, and then do the tally and do the count. That will be a fairly difficult task. on an even larger scale, it will be impossible because humans just do not have the attention for that. So this is the use of computational data skills to try to extract the patterns across all these incidents to see what are the common themes.

Starting point is 00:03:21 Awesome. Thank you for that, Sam. And so can we just kind of go over how this research was conducted, just general? Yeah, sure. It really stopped as the exploratory data analysis for tech. So I tried to get the common variable that the tax. has mentioned such as gender of the potential offender, the tone that the sound of their voice, with, and then the pretended role, the bad guys are playing, as well as the cash deposit or gift card deposit method. So there are different types of variable as I try to explore

Starting point is 00:04:15 the data, maybe at 100 or more than 100. And then I try to extract stream down to the total relevant variables from hundreds to maybe 20s or something, so that it will be most relevant to the retail laws for investigation. Okay. And I know there are was the note about the offender's pretend role. Could you like just expound on that just a little bit? Yeah, sure. The scam have to come up with a storyline so they cannot just make a call and then

Starting point is 00:05:05 tell the associate, just give me the money. They have to come with a storyline and then the pretended role. And so from the analysis, one common theme, one of the few common themes that emerged from data is the FBI investigator. So, yeah, of government authority, so that the business is under its management or, or, I cannot find the right word now, but the government of. is a significant authoritative figure that the business should obey. So that's the power differences that they are using.

Starting point is 00:05:48 The other theme is that I find it interesting is that they pretend to be lost prevention investigator themselves. So that the people in the field need to comply with their internal company staff. So in order to make it look legit. I see. Very good. And then can we also talk about the type of scams that we were looking at here? Was it phone scams or turn scams or was there any specific types? So this request from a particular retail partner is largely on phone scams. So most of the data of the 160 phone scam space. And it mostly involved maybe having the associate get a gift card and then read out the numbers to them or withdraw cash from the cashier to a Bitcoin machine at a certain stop.

Starting point is 00:06:54 One is actually one of our retail partner as well, the name brand, such as a well-known pharmacy chain. So even the pace for making deposit also has a common theme as well across 160 instances. Okay, gotcha. Thank you for that. And then just to keep moving on here, can we go over some of the key findings of this? I know you touched on a little bit, the theme. of the impersonation and everything. But was there anything else to note on the findings?

Starting point is 00:07:40 The findings is that patterns can be detect using text analysis that is otherwise very difficult or impossible for human to keep tally off. So as mentioned, FBI LP investigator, internal LP investigator, common themes that are used. The other noticeable variable that I find interesting is the accent of the potential offender, that there are common themes there, as well as the age description of the voice,

Starting point is 00:08:16 and the deposit location. As mentioned, one is a well-known pharmacy train, the other one is a well-known, a well-known retail super-sstore. So by, by, so given the size of the data set, we cannot confidently say that it's from one group or two groups, etc. But if we have the national scale of data of this type across multiple retailers, then we will have a lot more data to draw a lot more pattern across different variables so that we can confidently say that together. with geospatial temporal information, we can have higher confidence to say that certain group is actually offending with similar tactics, similar physical description, including the voice,

Starting point is 00:09:14 including the spatial, the geolocation, the time, and then as well as even the deposit location. because criminal are usually specialists rather than generalists if their script keep working, it is very likely that they would just keep using it rather than changing it. Right. Just stick to what is working for, the scammers, yes. Awesome. And then I did want to talk about the AI chatbots

Starting point is 00:09:49 because you do note that in the report as well. Can you talk about why those would not necessarily be the best way to work with this data? In a lot of other cases, AI-chapbot actually might be a suitable tool. But in this case of text analysis, it might not be the best one because a newer model, such as chat GPD 5.2 or something or Gemini thinking just came out within the past, six months. So with the newer capability, even with the same prompt or the instruction you tell the chatbot what you do can be interpreted differently, hence the outcome will be different. So you're really introducing a lot of different variables there. But if you are in a highly

Starting point is 00:10:42 specialized field such as a lot prevention, in a highly specialized type of products such as drug chain pharmacy or other type of cosmetics, then you really know the variables that you're looking for. So you can really standardize the things that you're looking for and then write it in a programmatic way so that there will be no ambiguity open for discussion from the chat board or understanding from the chatbot. That's why it is important to not use

Starting point is 00:11:20 platform in this case. Gotcha. And then just because we're running close on time here, I want to get to the takeaways from this study and this report

Starting point is 00:11:36 and potentially where this could go in the future. So do you have anything on that for this project? I think the one, the biggest takeaway is that we tell us have to keep detailed description of each incident and train their employees to keep standardized record

Starting point is 00:11:56 such as the object, the person, the pay, time, the description of the potential offender, the method that they use, etc. So even some, even a 50-word description is better than a single sentence such as scammer came and we lost $50 on this incident. because the information is essentially lost. But if we keep detailed text description, then now the computer or the large language model is advanced enough to pick up the nuances across incidents and can gather intelligent information from the text data.

Starting point is 00:12:40 Awesome. Yeah, I think that will be very helpful for our members to take away today. One last question I did want to get to. I thought you said that was the last one. Sorry. Did I lie? I'm so sorry.

Starting point is 00:12:56 No, no, no. I'm kidding here. That was second to last. I'll make a caveat there. So my last question, and this is the last one I promise. So if someone were to get started creating analytical pipeline, where would you suggest that they start? That's a good question, but also a high learning curve.

Starting point is 00:13:23 I think knowing some programming language will be essential here. Python is actually one of the easiest one to learn. But now with the advance of large language models such as chat DVD and German 9, we do not really need to know the details of the programming language, but the logic of it. So knowing the logic of any programming language, it would actually help a lot. Okay.

Starting point is 00:13:52 So that's where to start, folks. All right. Before we wrap up, is there anything else you want to say about this report, Sam, for our members? Sure. If you have interesting tax data that you feel like I can derive

Starting point is 00:14:06 some information for you or intelligence, feel free to contact me. Yeah, you should know my email on the LPRC website. Thank you very much. And just in case, if they haven't been able to check the website, it is just Sam at LPresearch.org, correct? Yep. Awesome.

Starting point is 00:14:29 All right. Well, thank you so much, Sam, for today. This has been very enlightening. And thank you to all of our listeners for tuning in today. Before we go, I just want to plug a few things. So you can find Sam's report on our Knowledge Center. So if you end, and we have several other reports obviously on there, but you can find Sam's report on there. And if you need any assistance with that, please reach out to myself or any of the LPRC team members and we can get you plugged in.

Starting point is 00:15:07 And thank you for listening to the Crime Science podcast. We are available on, I think, almost all streaming platforms. So thank you for listening in. and we will see you all next time. Thanks for listening to the Crime Science Podcast, presented by the Loss Prevention Research Council. If you enjoyed today's episode, you can find more crime science episodes

Starting point is 00:15:28 and valuable information at LPRsearch.org. The content provided in the Crime Science Podcast is for informational purposes only and is not a substitute for legal, financial, or other advice. Views expressed by guests of the Crime Science podcast are those of the authors and do not reflect the opinions or positions of the Loss Prevention Research Council.

LPRC - CrimeScience – The Weekly Review – Episode 231 Ft. Sam Yeung, PhD

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.