Heroes in Business - Experian Identity Report with Kevin Chen, Senior Vice President and Chief Data Scientist for Experian DataLabs in North America

Episode Date: November 22, 2022

Experian Identity Report with Kevin Chen, senior vice president and chief data scientist for Experian DataLabs in North America interviewed by David Cogan famous host of the Heroes Show and founder of... Eliances entrepreneur community. The Experian DataLabs are at the forefront of the company’s efforts to scan the horizon for opportunities to disrupt and transform the business with data. And one of those areas is about how Privacy-preserving technologies are being used to protect users and business.  

Transcript
Discussion (0)
Starting point is 00:00:00 Up in the sky, look, it's captivating, it's energizing, it's Eliance's Heroes. Eliance's is the destination for entrepreneurs, investors, CEOs, inventors, leaders, celebrities, and startups, where our heroes in business align. Now, here's your host flying in, David Kogan, founder of Eliance's. Yes, that's right. And again, I'm so excited today. Why? Because we have the Experian Identity Report and we're going to be speaking with Kevin Chen. Now, who's Kevin? He is the Senior Vice President and Chief Data Scientist for Experian Data Labs in North America. You can reach him at Experian.com, E-X-P-E-R-I-A-N.com.
Starting point is 00:00:54 Now, the Experian Data Labs are at the forefront of the company's efforts to scan the horizon for opportunities to really disrupt and transform the business with data. And one of those areas that we're going to speak with Kevin about today is how privacy-preserving technologies are being used to protect users and businesses. And this is everything, by the way, from streaming services to grocery stores to online retailers. Consumers' usage data is captured and analyzed. Think of how much data that is. And then that's used to provide customized recommendations that encourage continued patronage. And again, most of us can appreciate and see the value of these personal recommendations. I know I do. I love when I go to sites and it recommends other things, the next thing I know, I'm buying a ton more. However, on the other hand, data must be kept private, especially in the increasing digitized economy where data is shared and fraudsters are on the crawl. So with that, Kevin, let's go right into
Starting point is 00:02:01 it. Why is there such demand for privacy-preserving technologies? Yeah, sure. Thank you, David. Well, as you pointed out, right, so everybody knows that data is the oil of the 21st century. So over the past 20 years, we have seen an explosion of the use of the data, consumer's data, to provide customized personalized services, as you pointed out. to provide customized personalized services, as you pointed out. And we know that consumer loves it because by doing so, company is actually able to enable the consumer to get the products that they need to reduce the frustration and also increase the loyalty to the company. So, you know, from the business point of view, business always is on the lookout for new data, additional data about a consumer.
Starting point is 00:02:46 And as a matter of fact, one trend that we observe is that many companies, many businesses have exhausted their knowledge about a consumer from their own first party data. So they are actually starting to move into use third party data, data from the other companies about a consumer so that they can get a better insight about a consumer. So from that perspective, as you know, when a company wanted to use the data from the other companies, there's always this trust issue that's happening, right? You may wonder how if I were to provide my data to the other companies with the, you know, obviously the consent of the consumers, how the other company would be use the data? Would they keep the data secure? Would they keep the data private?
Starting point is 00:03:35 And also whether they would use it only for the sole purpose or they will use it for competitive purposes, right? So there's this trust issue between the companies. And then on the other hand, given the tremendous value of the data, there's a significant increase of data breaches. We have all heard about those data breaches that come in from telecom industry, financial industry, e-commerce, and so on. And the volume is actually very large,
Starting point is 00:04:06 right? So each time when a data compromise, data breach that happened, it must cause the consumer to wonder what's happening to my data, whether my data is kept private or not, and who owns my data and so on. So again, there's this trust between the consumer and the business. So therefore, we need to find a way to balance the data security, data privacy, and also the convenience that the business is able to provide to the consumers by leveraging those consumer data. And that's where this, you know, you mentioned about the privacy preserving technology. That's why the privacy preserving technology is on the rise, trying to kind of restore this trust between the consumer and the business that store and use the consumer's data. And there's that trend toward data anonymization, right?
Starting point is 00:05:05 Really, where personally identify information is encrypted, then removed, reducing the chance, right, of identity and thought fraud. Is that correct? That's totally right. Personally identify information or PRI, as we call it, that's referring to, say, name, address, phone number, social security number as matter. But, and so the assumption here is that if we, if the consumer's data does not really have PI information
Starting point is 00:05:38 associated with it, then even if the fraudster get a hold of the consumer's data, there's really not much the fraudster can do about it. For example, if the fraudster gets a hold of the consumer's data, there's really not much the fraudster can do about it. For example, if the fraudster got my income information and other sensitive information, but they don't know my personal identifier information, they will not be able to leverage that information to come in front. So to alleviate all those risks,
Starting point is 00:06:02 at minimum, the business should try to always encrypt the data, encrypt the PI data. And so that's the minimum. But also the business should also try to separate the PI data from the consumer's data by creating a linking key to reduce the risk. But I just want to mention, though, even with the PI remove or what you call a normalization, it's not always guaranteed that the data is safe and private. I want to give you an example. Actually, there's a famous research back in 2006 by two researchers at UT Austin. At the time, Netflix released a movie rating data from the users, right? It's about 10 million of records by 500,000 consumers.
Starting point is 00:06:57 Obviously, Netflix removed all the PIs and then they released it to the public so that they hope the public can help them to improve their recommendation of the movies. Now, these two researchers, they were actually able to find out by using the consumer's information from the so-called international movie databases
Starting point is 00:07:18 and correlate the ratings as well as the timestamp, they were able to so-called re-identify a majority of the consumers from the data set. So anonymization is not necessarily safe. Okay. And another example I can give you, right, another way to preserve the privacy, the common thinking is that as long as I aggregate the data together to a larger population, then I should be safe, right? And that's the approach, for example, the Census Bureau has taken. They will
Starting point is 00:07:54 aggregate data to, say, block level or larger level, say, zip level, so that the data is supposed to be secure and also private. But that's not always the case. It's safe for the Census Bureau. However, in a scenario where if you are able to query that aggregation service many, many times by providing different criteria, then it's possible I can triangulate whether a consumer's data is in the data set or not by asking different questions, right?
Starting point is 00:08:28 So that's why we still need to try to look into using different privacy preserving technology, such as, for example, data cleanroom or synthetic data and many others to preserve your privacy. Yeah, and I definitely want to ask you about that. So thank you for bringing that up. And again, you're watching, listening to me, David Kogan, host of the Alliances Hero Show. Make sure that you go to alliances.com. That's E-L-I-A-N-C-E-S.com, the only place where entrepreneurs align. You can also check out past Experian episodes by going to eliancer.com.
Starting point is 00:09:08 That's E-L-I-A-N-C-E-R.com. We have with us again, Kevin Chen, Senior VP and Chief Data Scientist for Experian Data Labs. And of course, you can reach him by going to Experian.com. So Kevin, let's go into that because you mentioned about data clean rooms. Explain to us though how that new technology like the data clean rooms that you mentioned, synthetic data also, are being used to protect an individual's personal data. And also at the same time, right, delivering that personalized consumer experience that really we all want. Right.
Starting point is 00:09:54 So what Data Cleanroom is trying to do is to provide a safe environment where the company and the business can come together and explore the value and the benefit of the data from each other. Data Cleanroom is just like a physical cleanroom, but instead of trying to make sure the environment is not contaminated, data cleanroom focus on keeping the user's data private and separate. In a data cleanroom scenario,
Starting point is 00:10:20 all the participating party can actually set the parameters in terms of what information from the user's data can be seen by the others. And in the data cleanroom, the consumer's data is oftentimes anonymized, as you mentioned earlier, and aggregated and organized into, say, groups and cohorts in a very controlled manner to preserve the privacy.
Starting point is 00:10:46 It may leverage special hardware, such as something like trusted execution environment, or by leveraging data permissioning, partitioning, as well as some specialized design, especially designed SQL languages to enable those privacy preserving. And it's oftentimes orchestrated by the data claimant operators to ensure the privacy. So with that guarantee, then the companies can export the data from the other parties and then start to perform analysis,
Starting point is 00:11:27 building machine learning models, trying to figure out how to segment their customers into different populations so that they can provide much more personalized services. And then with that learning, they can take that segmentation outside of the data cleanroom to provide services to the customers without compromising the consumer's privacy. So that's how the data cleanroom works. This is great. Great information. Very valuable. And again, make sure you go to experian.com. Excellent. Excellent. Well, boy, this is such valuable information. We've got to make sure that we have you come back because, again, you're watching, listening to David Kogan, me, that's right, host of the Alliance's Hero Show. And thank you again to Kevin Chen,
Starting point is 00:12:16 Senior Vice President and Chief Data Scientist for Experian Data Labs in North America. Make sure that you go to Experian.com and make sure you stay tuned each week for the Experian Identity Report. Thank you so much again for being here today. Thank you, David.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.