LPRC - CrimeScience – The Weekly Review – Episode 231 Ft. Sam Yeung, PhD
Episode Date: February 20, 2026In this episode of the LPRC CrimeScience Podcast, host Tiffany Frison sits down with Dr. Sam Yeung to unpack his latest R2P, Decoding the Scam: How Narrative DNA Protects the Modern Retailer. Discover... how text analytics and cognitive science are helping retailers uncover hidden scam patterns, from impersonation tactics to gift card and bitcoin schemes. Tune in to learn how better data, smarter analysis, and standardized reporting can strengthen your fraud prevention strategy.
Transcript
Discussion (0)
Hi everyone and welcome to crime science. In this podcast, we explore the science of crime and the
practical application of this science for loss prevention and asset protection practitioners,
as well as other professionals. Welcome, welcome, welcome to the LPRC Crime Science podcast.
I am your host today. My name is Tiffany Friesen and I am joined by Sam Young. He is one of our
PhD research scientist here at LPLRC.
Sam, I don't know.
I will note that Sam is the facilitator of the retail fraud working group, so you can catch
them on there.
But today, we're going to be talking about Sam's most recent R2P, decoding the scam,
how narrative DNA protects the modern retailer.
And before I get going with questions.
and everything, Sam. Do you want to do a little hello, shout out to everyone?
Hi, hello everyone. My official name is Camlung Young or you can call me Sam.
I am a cognitive psychologist by training. My PhD is in cognitive psychology.
I'm also a data scientist. I'm the facilitator for both data and analytic working group, as well as a week
tailfought working group.
I like to scrape data, collect intelligence, and analyze data, and particularly tech.
So that's me.
Awesome.
Thank you, Sam.
Yeah, so today we're going to be talking about your most recent R2P decoding the scam.
So to get started off, do you want to just describe the report and the project in
general, just give us an overview.
Yeah, sure.
This really is a request from one of our retail partners who recently experienced substantial
volume of different types of scams.
And they provide the data set that include a fairly detailed description, text description
of each incident.
Of them, I think they have about 160 incidents.
And then the length of each incident is a bit of.
about 180 words.
So it's fairly detailed.
So they want, if a human is to process all the text,
180 times 160, it's still not too bad if you read it as a novel.
But if you want to pick out the pattern across all the incidents,
then you really have to keep track of everything
that you have came across, and then do the tally and do the count.
That will be a fairly difficult task.
on an even larger scale, it will be impossible because humans just do not have the attention for that.
So this is the use of computational data skills to try to extract the patterns across all these incidents to see what are the common themes.
Awesome. Thank you for that, Sam.
And so can we just kind of go over how this research was conducted, just general?
Yeah, sure.
It really stopped as the exploratory data analysis for tech.
So I tried to get the common variable that the tax.
has mentioned such as gender of the potential offender, the tone that the sound of their voice,
with, and then the pretended role, the bad guys are playing, as well as the cash deposit
or gift card deposit method. So there are different types of variable as I try to explore
the data, maybe at 100 or more than 100.
And then I try to extract stream down to the total relevant
variables from hundreds to maybe 20s or something,
so that it will be most relevant to the retail laws for investigation.
Okay.
And I know there are
was the note about the offender's pretend role. Could you like just expound on that just a little bit?
Yeah, sure. The scam have to come up with a storyline so they cannot just make a call and then
tell the associate, just give me the money. They have to come with a storyline and then the
pretended role.
And so from the analysis, one common theme, one of the few common themes that emerged from
data is the FBI investigator.
So, yeah, of government authority, so that the business is under its management or, or, I cannot
find the right word now, but the government of.
is a significant authoritative figure that the business should obey.
So that's the power differences that they are using.
The other theme is that I find it interesting is that they pretend to be lost prevention
investigator themselves.
So that the people in the field need to comply with their internal company staff.
So in order to make it look legit.
I see.
Very good. And then can we also talk about the type of scams that we were looking at here? Was it phone scams or turn scams or was there any specific types?
So this request from a particular retail partner is largely on phone scams. So most of the data of the 160 phone scam space.
And it mostly involved maybe having the associate get a gift card and then read out the numbers to them or withdraw cash from the cashier to a Bitcoin machine at a certain stop.
One is actually one of our retail partner as well, the name brand, such as a well-known pharmacy chain.
So even the pace for making deposit also has a common theme as well across 160 instances.
Okay, gotcha.
Thank you for that.
And then just to keep moving on here, can we go over some of the key findings of this?
I know you touched on a little bit, the theme.
of the impersonation and everything.
But was there anything else to note on the findings?
The findings is that patterns can be detect using text analysis
that is otherwise very difficult or impossible for human to keep tally off.
So as mentioned, FBI LP investigator, internal LP investigator,
common themes that are used.
The other noticeable variable that I find interesting
is the accent of the potential offender,
that there are common themes there,
as well as the age description of the voice,
and the deposit location.
As mentioned, one is a well-known pharmacy train,
the other one is a well-known,
a well-known retail super-sstore.
So by, by, so given the size of the data set, we cannot confidently say that it's from one group or two groups, etc.
But if we have the national scale of data of this type across multiple retailers, then we will have a lot more data to draw a lot more pattern across different variables so that we can confidently say that together.
with geospatial temporal information, we can have higher confidence to say that certain
group is actually offending with similar tactics, similar physical description, including the voice,
including the spatial, the geolocation, the time, and then as well as even the deposit location.
because criminal are usually specialists rather than generalists
if their script keep working,
it is very likely that they would just keep using it rather than changing it.
Right.
Just stick to what is working for, the scammers, yes.
Awesome.
And then I did want to talk about the AI chatbots
because you do note that in the report as well.
Can you talk about why those would not necessarily be the best way to work with this data?
In a lot of other cases, AI-chapbot actually might be a suitable tool.
But in this case of text analysis, it might not be the best one because a newer model,
such as chat GPD 5.2 or something or Gemini thinking just came out within the past,
six months. So with the newer capability, even with the same prompt or the instruction you tell
the chatbot what you do can be interpreted differently, hence the outcome will be different.
So you're really introducing a lot of different variables there. But if you are in a highly
specialized field such as a lot prevention, in a highly specialized type of products such as
drug chain pharmacy or other type of cosmetics,
then you really know the variables that you're looking for.
So you can really standardize the things that you're looking for
and then write it in a programmatic way
so that there will be no ambiguity open for discussion
from the chat board or understanding from the chatbot.
That's why it is important to not use
platform in this case.
Gotcha.
And then
just because we're
running close on time here,
I want to get to
the takeaways from this
study and this report
and potentially where this could
go in the future.
So do you have anything on that
for this project?
I think the one,
the biggest takeaway is that
we tell us have to keep
detailed description of each incident and train their employees to keep standardized record
such as the object, the person, the pay, time, the description of the potential offender,
the method that they use, etc. So even some, even a 50-word description is better than a single
sentence such as scammer came and we lost $50 on this incident.
because the information is essentially lost.
But if we keep detailed text description,
then now the computer or the large language model
is advanced enough to pick up the nuances across incidents
and can gather intelligent information from the text data.
Awesome.
Yeah, I think that will be very helpful for our members
to take away today.
One last question I did want to get to.
I thought you said that was the last one.
Sorry.
Did I lie?
I'm so sorry.
No, no, no.
I'm kidding here.
That was second to last.
I'll make a caveat there.
So my last question, and this is the last one I promise.
So if someone were to get started creating analytical pipeline,
where would you suggest that they start?
That's a good question, but also a high learning curve.
I think knowing some programming language will be essential here.
Python is actually one of the easiest one to learn.
But now with the advance of large language models such as chat DVD and German 9,
we do not really need to know the details of the programming language,
but the logic of it.
So knowing the logic of any programming language,
it would actually help a lot.
Okay.
So that's where to start, folks.
All right.
Before we wrap up,
is there anything else you want to say
about this report, Sam, for our members?
Sure.
If you have interesting tax data
that you feel like I can derive
some information for you or intelligence,
feel free to contact me.
Yeah, you should know my email on the LPRC website.
Thank you very much.
And just in case, if they haven't been able to check the website, it is just Sam at
LPresearch.org, correct?
Yep.
Awesome.
All right.
Well, thank you so much, Sam, for today.
This has been very enlightening.
And thank you to all of our listeners for tuning in today.
Before we go, I just want to plug a few things.
So you can find Sam's report on our Knowledge Center.
So if you end, and we have several other reports obviously on there, but you can find Sam's report on there.
And if you need any assistance with that, please reach out to myself or any of the LPRC team members and we can get you plugged in.
And thank you for listening to the Crime Science podcast.
We are available on, I think, almost all streaming platforms.
So thank you for listening in.
and we will see you all next time.
Thanks for listening to the Crime Science Podcast,
presented by the Loss Prevention Research Council.
If you enjoyed today's episode,
you can find more crime science episodes
and valuable information at LPRsearch.org.
The content provided in the Crime Science Podcast
is for informational purposes only
and is not a substitute for legal, financial, or other advice.
Views expressed by guests of the Crime Science podcast
are those of the authors
and do not reflect the opinions or positions of the Loss Prevention Research Council.
