Heroes in Business - Experian Identity Report with Kevin Chen, Senior Vice President and Chief Data Scientist for Experian DataLabs in North America
Episode Date: November 22, 2022Experian Identity Report with Kevin Chen, senior vice president and chief data scientist for Experian DataLabs in North America interviewed by David Cogan famous host of the Heroes Show and founder of... Eliances entrepreneur community. The Experian DataLabs are at the forefront of the company’s efforts to scan the horizon for opportunities to disrupt and transform the business with data. And one of those areas is about how Privacy-preserving technologies are being used to protect users and business.
Transcript
Discussion (0)
Up in the sky, look, it's captivating, it's energizing, it's Eliance's Heroes.
Eliance's is the destination for entrepreneurs, investors, CEOs, inventors, leaders, celebrities,
and startups, where our heroes in business align.
Now, here's your host flying in, David Kogan, founder of Eliance's.
Yes, that's right. And again, I'm so excited today. Why?
Because we have the Experian Identity Report and we're going to be speaking with Kevin Chen.
Now, who's Kevin? He is the Senior Vice President and Chief Data Scientist for Experian Data Labs in North America.
You can reach him at Experian.com, E-X-P-E-R-I-A-N.com.
Now, the Experian Data Labs are at the forefront of the company's efforts to scan the horizon for opportunities to really disrupt and transform the business with data. And one of those
areas that we're going to speak with Kevin about today is how privacy-preserving technologies are
being used to protect users and businesses. And this is everything, by the way, from streaming
services to grocery stores to online retailers. Consumers' usage data is captured and analyzed. Think of how much data that is.
And then that's used to provide customized recommendations that encourage continued
patronage. And again, most of us can appreciate and see the value of these personal recommendations.
I know I do. I love when I go to sites and it recommends other things, the next thing I know, I'm buying a ton more. However,
on the other hand, data must be kept private, especially in the increasing digitized economy where data is shared and fraudsters are on the crawl. So with that, Kevin, let's go right into
it. Why is there such demand for privacy-preserving technologies?
Yeah, sure. Thank you, David. Well, as you pointed out, right, so everybody knows that data is the oil of the 21st century.
So over the past 20 years, we have seen an explosion of the use of the data, consumer's data, to provide customized personalized services, as you pointed out.
to provide customized personalized services, as you pointed out.
And we know that consumer loves it because by doing so,
company is actually able to enable the consumer to get the products that they need to reduce the frustration and also increase the loyalty to the company.
So, you know, from the business point of view,
business always is on the lookout for new data, additional data about a consumer.
And as a matter of fact, one trend that we observe is that many companies, many businesses have exhausted their knowledge about a consumer from their own first party data.
So they are actually starting to move into use third party data, data from the other companies about a consumer so that they can get a better
insight about a consumer. So from that perspective, as you know, when a company wanted to use the data
from the other companies, there's always this trust issue that's happening, right? You may
wonder how if I were to provide my data to the other companies with the, you know,
obviously the consent of the consumers, how the other company would be use the data?
Would they keep the data secure?
Would they keep the data private?
And also whether they would use it only for the sole purpose or they will use it for competitive
purposes, right?
So there's this trust issue between the companies.
And then on the other hand, given the tremendous value of the data,
there's a significant increase of data breaches.
We have all heard about those data breaches that come in from telecom industry,
financial industry, e-commerce, and so on.
And the volume is actually very large,
right? So each time when a data compromise, data breach that happened, it must cause the consumer
to wonder what's happening to my data, whether my data is kept private or not, and who owns my data and so on. So again, there's this trust between the consumer and the business.
So therefore, we need to find a way to balance the data security, data privacy, and also
the convenience that the business is able to provide to the consumers by leveraging those
consumer data.
And that's where this, you know, you mentioned about the privacy preserving technology.
That's why the privacy preserving technology is on the rise, trying to kind of restore this trust between the consumer and the business that store and use the consumer's data.
And there's that trend toward data anonymization, right?
Really, where personally identify information is encrypted, then removed, reducing the chance,
right, of identity and thought fraud.
Is that correct?
That's totally right.
Personally identify information or PRI, as we call it, that's referring to, say, name,
address, phone number, social security number as matter.
But, and so the assumption here is that if we,
if the consumer's data does not really have PI information
associated with it, then even if the fraudster
get a hold of the consumer's data,
there's really not much the fraudster can do about it. For example, if the fraudster gets a hold of the consumer's data, there's really not much the fraudster can do about it.
For example, if the fraudster got my income information
and other sensitive information,
but they don't know my personal identifier information,
they will not be able to leverage that information to come in front.
So to alleviate all those risks,
at minimum, the business should try to always encrypt the data,
encrypt the PI data. And so that's the minimum. But also the business should also try to
separate the PI data from the consumer's data by creating a linking key to reduce the risk. But I just want to mention,
though, even with the PI remove or what you call a normalization, it's not always guaranteed that
the data is safe and private. I want to give you an example. Actually, there's a famous research back in 2006 by two researchers at UT Austin.
At the time, Netflix released a movie rating data
from the users, right?
It's about 10 million of records by 500,000 consumers.
Obviously, Netflix removed all the PIs
and then they released it to the public
so that they hope the public can help them
to improve their recommendation of the movies.
Now, these two researchers,
they were actually able to find out
by using the consumer's information
from the so-called international movie databases
and correlate the ratings as well as the timestamp,
they were able to so-called re-identify a
majority of the consumers from the data set.
So anonymization is not necessarily safe.
Okay.
And another example I can give you, right, another way to preserve the privacy, the common
thinking is that as long as I aggregate the data together to a larger population, then I should
be safe, right? And that's the approach, for example, the Census Bureau has taken. They will
aggregate data to, say, block level or larger level, say, zip level, so that the data is
supposed to be secure and also private.
But that's not always the case.
It's safe for the Census Bureau. However, in a scenario where if you are able to query
that aggregation service many, many times
by providing different criteria, then it's possible
I can triangulate whether a consumer's data is in the data set or not
by asking different questions, right?
So that's why we still need to try to look into using different privacy preserving technology,
such as, for example, data cleanroom or synthetic data and many others to preserve your privacy.
Yeah, and I definitely want to ask you about that.
So thank you for bringing that up.
And again, you're watching, listening to me, David Kogan, host of the Alliances Hero Show.
Make sure that you go to alliances.com.
That's E-L-I-A-N-C-E-S.com, the only place where entrepreneurs align.
You can also check out past Experian episodes by going to eliancer.com.
That's E-L-I-A-N-C-E-R.com.
We have with us again, Kevin Chen, Senior VP and Chief Data Scientist for Experian Data Labs.
And of course, you can reach him by going to Experian.com.
So Kevin, let's go into that because you mentioned about
data clean rooms. Explain to us though how that new technology like the data clean rooms that
you mentioned, synthetic data also, are being used to protect an individual's personal data.
And also at the same time, right, delivering that personalized consumer experience that really we all want.
Right.
So what Data Cleanroom is trying to do is to provide a safe environment where the company and the business can come together
and explore the value and the benefit of the data from each other.
Data Cleanroom is just like a physical cleanroom,
but instead of trying to make sure
the environment is not contaminated,
data cleanroom focus on keeping the user's data
private and separate.
In a data cleanroom scenario,
all the participating party can actually set the parameters
in terms of what information from the user's data
can be seen by the others.
And in the data cleanroom,
the consumer's data is oftentimes anonymized,
as you mentioned earlier,
and aggregated and organized into, say, groups and cohorts
in a very controlled manner to preserve the privacy.
It may leverage special hardware,
such as something like trusted execution environment,
or by leveraging data permissioning, partitioning,
as well as some specialized design,
especially designed SQL languages to enable those privacy preserving.
And it's oftentimes orchestrated by the data claimant operators to ensure the privacy.
So with that guarantee, then the companies can export the data from the other parties
and then start to perform analysis,
building machine learning models, trying to figure out how to segment their customers
into different populations so that they can provide much more personalized services.
And then with that learning, they can take that segmentation outside of the data cleanroom to provide services
to the customers without compromising the consumer's privacy. So that's how the data
cleanroom works. This is great. Great information. Very valuable. And again, make sure you go to
experian.com. Excellent. Excellent. Well, boy, this is such valuable information. We've got to make
sure that we have you come back because, again, you're watching, listening to David Kogan, me,
that's right, host of the Alliance's Hero Show. And thank you again to Kevin Chen,
Senior Vice President and Chief Data Scientist for Experian Data Labs in North America.
Make sure that you go to Experian.com and make sure you stay
tuned each week for the Experian Identity Report.
Thank you so much again for being here today.
Thank you, David.