The a16z Show - The SSN Breach: What Now?

Episode Date: August 18, 2024

In this episode, we cover the recent data breach of nearly 3B records, including a significant number of social security numbers. Joining us to discuss are security experts Joel de la Garza and Naftal...i Harris. Incredibly enough, Naftali and his team were able to get their hands on the breached dataset and were able to validate the nature of the claims. Listen in as we explore the who, what, when, where, why… but also how a breach of this magnitude happens and what we can do about it.Resources:Read 16 Steps to Securing Your Data (and Life)Find Naftali on Twitter: https://x.com/naftaliharrisCheck out Sentilink: https://www.sentilink.com/Stay Updated: Let us know what you think: https://ratethispodcast.com/a16zFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:01 Hello, everyone. Welcome back to the A16Z podcast. Today, we've got a special episode covering a timely piece of news that, quite frankly, I wish we were not reporting on. In case you missed it, this week, there was a reported breach of nearly 3 billion records, but not just any records. Headlines included, quote, billions of social security numbers exposed, or even, quote, did hackers steal every social security number? Naturally, we wanted to bring in the experts to break down. what really happened here and its expected impact. So joining us today are Joel DeLogarza and Naftali Harris. Joel is an operating partner at A16Z who was previously the chief security officer at Box and previous to that, the global head of threat management and cyber intelligence per city group. Naftali, on the other hand, is co-founder and CEO of Centrelink, a company that helps block identity theft and fraud for hundreds of financial institutions at a scale that might make you wince. We verify over a million people every day.
Starting point is 00:01:02 Incredibly enough, Neftali's team was actually able to get their hands on the breached data set. And we're actually in the room as we were recording. So you'll hear Neftali reference them as they were poking and prodding to validate the claims. Listen in as we explore the who, the what, the when, the where, the why, but also how a breach like this happens and what we can do about it. We watch these markets, like, this has been going on forever. You can see the fraudsters talking about this on the. forms. Social security numbers are the kind of things that don't change, right? You get one when you're born and you're stuck with it for a while. Yep, you're in this breach. You're in the breach.
Starting point is 00:01:39 And probably all three of us are, frankly. As a reminder, the content here is for informational purposes only. Should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16C fund. Please note that A16C and its affiliates may also maintain investment. in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com slash disposures. So Joel Nafali, this was a pretty crazy week.
Starting point is 00:02:16 I got a slack from one of our coworkers, Joel, that was like, have you seen this social security hack? And I had not at that point. And that is a pretty, you know, frightening message. So why don't we just take a second to recap what actually happened here? What was this breach and what data was potentially at risk? Yeah, so just when you thought there wasn't any more information to leak out into the world, and then there's always a surprise that there's still more data to come out.
Starting point is 00:02:41 And so this week we saw there was a third-party company that collects all this information and uses it for things like validating your identity. And so they have your name, your social security number, your address. They also have nicknames. And it seems like they had it for all U.S. citizens as well as all Canadian citizens. So this is larger than just the U.S. So I'm not safe as a Canadian. I can't save you this time, unfortunately.
Starting point is 00:03:05 And so these hackers somehow came about getting this information, and then they tried to sell it on the dark web, and there weren't any takers. Nobody wanted to buy it because, like I said, I thought all this stuff was already public. And so when they couldn't sell it, they just released it for free. And so now there is this hundreds of gigabyte file out there on the Internet that encapsulates data about all Americans and most Canadians.
Starting point is 00:03:28 So there you go. That's what happened. When I read the articles yesterday, it seemed like this was alleged reporting. People weren't sure confidently per se that this was billions of data points, including Social Security numbers. How sure are we that is the data that was hacked? Well, Steph, I can answer that quite confident because we actually have it. So we founded ourselves on the dark web.
Starting point is 00:03:51 So to fill in a little bit of the timeline here, so National Public Data, which is the company that had the breach, they reported that the hack itself happened in December of 2020. And then it got released onto the dark web on a place called breached forums by some hacker named Fennis or Fennice. If I mispronounced your name, please don't come after me. But that person released it on August 6th. And we got to copy yourselves. And so we look through it. And yeah, it's as reported.
Starting point is 00:04:18 So there's names, dates of birth, addresses. The data would say is like relatively messy relative to some other data breaches that you sometimes see. But no, we're confident. And it's true because we literally have it. Geez. And so when you say it's messy, so like if there's a name and there's a social security number, are those linked? And are those linked to email or any other fields that might be in there?
Starting point is 00:04:39 Yeah, I'll give you an example of the way in which it's messy. So, for example, the first six records all correspond to the same individual, a woman from Alaska. But they have different invariants on her name, including like nicknames and stuff like that. I believe it's across two different addresses that she had. That's one level of messiness. Another way that the data is messy is about 10% of the SSNs are obviously fake. Like they begin with three zeros or four zeros. So the data is not as clean as it could be, which is obviously a good thing.
Starting point is 00:05:11 But there's no question. There's a lot of bad stuff in there. You've obviously accessed the dataset. How long did it take you to actually get access to it? My second, sorry, how long did it take us guys? Literally on August 6th. My God. You guys fucking believe this team?
Starting point is 00:05:25 Unbelievable. incredible proud of my team here. We already got it like the day it was released. And maybe for the listeners, give us a little insight. When you get access to a data set like this, what are you looking at? Right. Because I mean, obviously this is not your first rodeo. Yeah, when we first get a data set like this, the first thing we're trying to do is just understand like what's in it. So we'll take a look at the first couple thousand rows and just understand what fields are present and where does it look like the dead set actually came from. How common are the different fields? So for example, for this particular breach phone number,
Starting point is 00:05:56 is mostly missing. It's mostly like aim, address, social data birth is sometimes in there, sometimes not. For example, we looked at the evolved data breach from about a month or two ago, and that one had information on ACH transactions and balances across different fintechs and stuff like that. And so, you know, that led us down a different sort of path of inquiry. And just for folks listening who aren't spending time on the dark web, how easy is it really to access the data set? It's relatively straightforward if you know where to to look. And like, we're also by far not the only people doing this. I mean, I think as of this morning, we'd seen 26,000 views on breach forums for the thread. So like the fraudster community is looking at
Starting point is 00:06:36 this and we've seen this there. We've seen it on telegram. We've seen it on leak base. It's just, it's all over the place. And so if you know where to look, it's not that hard. Obviously, folks like the three of us don't do this every day, but for fraudsters or for infosec professionals, you can find it. That's not reassuring, but I mean, it's the answer I expected. I would say that this is probably one of the big wins for sensible regulation around breach disclosures. Like I think, having worked in this space since before there were breach disclosure requirements, these things were always happening and no one talked about them. And consumers were just oblivious. And I think that knowledge is power and making consumers aware of what's happened with their data is super important. And this is one of those cases
Starting point is 00:07:15 where I think forcing disclosure around breaches makes the world a safer place and makes people respond to them and handle them in a correct way. We're going to get to how. this happens and obviously it's impact, but maybe we could just get a sense for scale. I mean, when I heard this, it felt bigger, but I'm a layman. I hear about breaches all the time. And so how would you actually characterize maybe like the magnitude or importance of this particular breach? In terms of magnitude, so it's 277 gigabytes of data and compressed, which is a lot. That's across two different files, which totals 2.7 billion rows. Now, some of the reports, you've seen in the media is like, oh, this is on, you know, three billion people have had their
Starting point is 00:07:59 identities stolen, which is fortunately not the case. As I mentioned, there's a lot of duplicates there. But there are 2.7 billion records. It's literally a CSV file. And so each row is some different piece of information about an individual. Now, we haven't gone through the full file, but based on sampling, we think about approximately a third of the records are unique. And so If you run the math on that, it's high hundreds of millions of people. But again, we're not completely sure because you haven't seen the whole thing. So I'd say hundreds of millions of individuals confidently and 2.7 billion records. Joel, you've been working in security for so long. How would you characterize maybe not only the sheer number of records, but maybe the quality of the
Starting point is 00:08:40 information, the particular kind of information? I'm unfortunately probably a little desensitized. I'm only partially being snarky. Like, I do think a lot of this information's already leaked out there. Like we've had multiple breaches of credit reporting agencies. And you have to remember that social security numbers are the kind of things that don't change, right? You get one when you're born and you're stuck with it for a while. And so not through any central repository, but just the breaches over the last 20 years, a lot of this information's already leaked. And so I don't know how unique it is.
Starting point is 00:09:11 What might be interesting is that it gives you sort of maybe a central repository where you can QA the information you already have or maybe there's some information in there that hasn't already leaked. And so that's probably going to make a little bit of a difference for folks. I agree. The bureaus have all had leaks at different points. And I think the Equifax breach from, what, five or seven years ago had something like 80% of Americans in it or something like this.
Starting point is 00:09:35 But one of the things that I'm sort of thinking about here, and actually you can see the fraudsters talking about this on the forums, is they're sort of using this as a backbone to other breaches. And the other thing, too, is, frankly, fraudsters today, folks who commit identity theft, are not limited by PII. Like, PII is already out there. It's relatively easy to get an identity that you can use as a base to steal.
Starting point is 00:09:58 But the place where breaches really get bad is when you connect the sort of core PI information, so named data birth, S&N, address, when you connect that to other things. So if you connect that to a driver's license or a bank account or a VIN or email addresses, like that's when you can actually start to do something interesting
Starting point is 00:10:16 from a fraudster's perspective with the information. And this data that has gotten breached here, we think could be used as sort of a backbone to connect to all other sorts of information's been breached. As I mentioned, like in breached forums, the forums or the fraudsters are talking about those. Yeah, and it's funny when you actually read through some of these chatter with the attackers, right? Because they have a lot of the same problems that like legitimate businesses have, specifically like marketing companies, right? Which is like, I would make sure that we have the right Joel and how do we know that we've got his right car and do we have his right identification? because a lot of times these guys are trying to defeat things that are using personal information about you for authentication, right? They asked me what school you went to when you were five and stuff like that.
Starting point is 00:10:58 And so the more of this demographic information these folks can build up and the more accurate they can make it, the easier it is to subvert a lot of the security controls in place for them to commit fraud. Right. And as more of these breaches happen and more data is released, I mean, how much risk is there for me? like let's just say as the average American, should I be really concerned with this new breach or like how would you measure that? I mean, I think the risk is always there. It's ever present.
Starting point is 00:11:27 I think that you should probably have a lock or a freeze on your credit, right? That's sort of step one. I think if you do that, you mitigate some of the problems from these sorts of things. I think the bigger issue is going to be, at least as you look forward
Starting point is 00:11:39 and you think about how thieves and scammers are going to use this stuff, you know, you can start to use this demographic information. pretty convincingly, if you could clone someone's voice using Gen A.I. Or you could take this in a new direction in which you get a lot more attributes about a person that let you build a much more believable profile, that then let you replicate the presence, their kind of identity, and a lot more difficult to verify world.
Starting point is 00:12:03 And what we've heard from folks is that this kind of fraud, this sort of next level social engineering is the thing that's been happening more and more. I can give the advice I typically give to my family at Thanksgiving. I would ask me the same question, which is, look, at the end of the day, there's not too much that people can do to prevent fraudsters from stealing their identities. If you're in this breach, you're in the breach, and probably all three of us are, frankly. But the things that you can do are pretty basic and strong personal security things. Like, for instance, turn on two-factor authentication for all the important services that you have, probably the ones that are not important as well. Use a password manager, so they don't have a bunch of repeated passwords everywhere.
Starting point is 00:12:44 And maybe use your best judgment. If something seems like it's too good to be true, it probably actually is. Joel is a good point of freezing your credit. That's a great idea. It's also a good idea to just check your accounts on a regular basis to see if there's anything that you don't expect. Yeah, and we actually have a helpful blog post that we wrote years ago called 16 Things to Protect Yourself Online
Starting point is 00:13:04 that still is applicable today even after this data breach. Yeah, Joel, I think you've probably gotten way more use out of that than any of us would hope, huh? I wish I could say that things had changed radically, but it's still the same problems. How does something like this actually happen, right? We know all of these companies have various versions of our data, some more than others, some more important than others. Is it a lack of good infrastructure or is this just the kind of thing that's bound to happen
Starting point is 00:13:28 when you put data all in one central place? If I was a gambling man, I'd bet that they had some kind of configuration issue on a data store, that they had a cloud database that probably had a guessable password or wasn't using two-factor authentication and someone stole the credentials, right? If you look at the snowflake breach, which impacted, I think, 137 different companies, that was all because there wasn't two-factor authentication enabled and people were able to guess or steal those passwords and usernames. And so, to be quite honest, these breaches are usually lowest common denominator, right? They don't have to pick the lock if you leave the window open and you'd be surprised how people leave windows
Starting point is 00:14:05 open. And that tends to be how these things happen. I mean, on that note, I'm a little bit surprised by maybe like how unsurprised you are by this breach. And so where are we in that arc? Is this just really something that we expect to just continue to happen? And if you frame things the way you have as like the hackers basically become more effective as more of these happen and they can piece together different blocks, where does that put us? How does the industry need to shift if at all? Or should we just expect like a rolling cadence of this? If you go back decades, people could be secure by this data actually not being out there as much.
Starting point is 00:14:46 SSNs were secret and your possession of one meant that it was probably you. I like to joke that some social security numbers are both your username and your password, and at this point they're also public. So it's kind of the worst possible thing you could have. But so many different data breaches have completely broken that paradigm. And, you know, as I mentioned, frankly, there's so much data out there. that PI being secret is no longer control at all, frankly, to prevent identity theft or other kinds of fraud. No, frankly, the reason why there's not more identity theft or other fraud
Starting point is 00:15:19 out there is because institutions that guard against identity theft, so banks or governments or anyone that needs to verify the identities of consumers, like those institutions have controls for them. And, you know, Centrelink is one of those controls. And so actually the reason there's not more fraud out there is because of the controls that institutions take. not because there's not data breaches. Yeah, and I think, like I said, not to be overly cynical, but we've had data and databases for a really long time, and it's relatively recently that there's been a requirement
Starting point is 00:15:50 to disclose data breaches, right? California passed the CPCPA. Actually, the breach disclosure law in California passed, I think, in 2005, but it wasn't nationally implemented for quite some time. And even then, there's still a patchwork of regulations, and it's the SEC that's actually driving a lot of the breach disclosure requirements. Currently, they require you, I believe, leave to disclose within 48 hours after a material security breach, which is only a year old, right?
Starting point is 00:16:14 So these breaches have been happening for years and years and people just never talked about them. And so when you work in the security industry, especially if you work on the cyber intelligence or the financial fraud side and you watch these markets, like, this has been going on forever. And it's only now that companies are being forced to disclose it and that consumers are becoming aware. And so I think that's really the thing that's changed. And like all of these different kinds of situations, this is very much a cat and mouse game, right? It's the attackers and the defenders and you go back and forth.
Starting point is 00:16:45 And to be quite honest, the defenders have gotten really good. We have some really excellent technology out there, Centa Link's a great example of that, where a lot of this stuff can be nipped in the bud, even if the information is out there, you can limit the harm that it causes. Joel, you know what we verify over a million people every day? There's literally a million people a day that we help to prove who they are. That's amazing. Yeah, we're really proud of it.
Starting point is 00:17:04 The bottom line of a lot of this stuff is that, like I said, it's easy to be cynical, it's easy to get worked up about this stuff or whatever the case may be. In reality, things have actually gotten a lot better. And if you freeze your credit, if you follow the security best practices, if you use things like a Ubiki, you know, a hardware security key, you can exist online relatively safely, right? Probably more safe than you are walking through a city street at risk of being robbed, right? We've come a long way.
Starting point is 00:17:32 We just, you get these headlines in the media hype. this stuff up and people think it's the end of the world. But in reality, like, things are a lot better. They're a lot better than people would report them to be. The other really cool thing about the way the world has evolved is that with the startup ecosystem and the ability for, you know, expert founders to build technology to address these things, like we've actually shifted a lot of the economics on some of these things where you can build a successful company fighting this stuff and end up financially way better than if you were doing this stuff, right? And I think if you look at all these different kinds of situations and you look at any kind of crime to be quite honest,
Starting point is 00:18:07 it's just about where the incentives lie. And if you shift the incentives in a meaningful way, you can actually really start to crack down on a lot of this stuff. That's a great point. And Natali, that's what your company does, right? How many cases of identity fraud are you blocking per day? We stop over 20,000 a day. And who is paying for that? Is it the end customer who's paying you to monitor or how does that work? No, it's the institution. So we serve over 300 and banks, lenders, financial institutions, telcos, governments throughout the United States to help them figure out if their customers or users are who they say they are. So, for example, before someone opens a credit card, that financial institution will ask us, hey, is this a real person? Are they
Starting point is 00:18:48 so that identity stolen? And we'll be able to answer that for them in real time. On the note of some of the new technologies coming online, they do open up a new vector both for, to your point, Joel, attacking and defending. Curious, if you're see any gaps in terms of places that builders should be addressing on this new frontier, as, again, like the attack vector has also opened up. Everyone's talking about generative AI and sort of the ability to do deepfakes and that sort of thing. And there's a lot of activity there. We actually have an investment in a company called Pindrop, which is really good at spotting audio deepfakes. And they sell a lot of products, as you can imagine, to financial service companies, because that's
Starting point is 00:19:26 typically where you see the threat. But it all rolls downstream. And so it's not just J.P. Morgan Chase and Citibank. that are getting hit by these generative AI fakes, it's actually becoming grandmas and grandpas and parents, right? They're getting the fake phone calls from grandchildren and children, that they're being held and you need to wire the money and stuff to that effect, the virtual kidnappings, right? These are things that trickle down.
Starting point is 00:19:49 And so enterprises are doing a good job of protecting themselves from some of this. And what we need is for some of that technology to start to filter down into protecting consumers at large. Obviously, we've been using the same PII for ages, right? Like you guys mentioned Social Security. I mean, it's also crazy to me that they send you that on a piece of paper. But in any case, is there some world where we have similar to password managers, like forcing you to update your password every so often or other forms of like biological identification? Should we be rethinking the idea that we use name, email, address, phone, etc?
Starting point is 00:20:27 Or am I thinking about this incorrectly? and even those have just like the same kind of vectors. I would say like yes for sure we should be thinking about this differently. Is that ever going to happen? Unfortunately, no. But, you know, frankly, public cryptography solves like quite a bit of this. And I'm not talking about like crypto blockchains or anything like that. I mean, simply every citizen having a public private key pair and having the government or some trusted entity go and cryptographically sign those would solve a bunch of
Starting point is 00:20:54 identity verification issues. Is that going to happen in the United States? Absolutely not. But, you know, would that be an elegant solution that? would solve a lot of problems. It would. There has been a dream for a really long time among the number of diehard old cryptography people that one day the U.S. government would get into proving identity. And there has been a NIST working group, the National Institute of Standards Technology, has been trying to set standards for proofing for decades. There was a hope that maybe one day
Starting point is 00:21:20 the post office would become the place where you go prove your digital identity and get a token or some kind of key. I think we're still as far away from it today as we were 10 years ago, but I hold on maybe further. One day, I mean, California's rolling out digital driver's licenses, right? I got a digital license plate for my car. Like, we might get there. It might happen in my lifetime. I'm hoping.
Starting point is 00:21:43 I think to Nafali's point, like the technology exists, we know how to stop this. We just need someone with a political will and desire to make this a thing and maybe go after the real problems that everyday American consumers face. So one day we'll get there. I'm optimistic. Hopefully in my life. Just a few more breaches along. the way. All right, if you've made it this far, thank you so much for listening. And if you like us
Starting point is 00:22:06 covering these timely topics, be sure to let us know at rate thispodcast.com slash A16Z. Or you can email us at podpitches at A16Z.com. We'll see you on the flip side.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.