a16z Podcast - The SSN Breach: What Now?

Episode Date: August 18, 2024

In this episode, we cover the recent data breach of nearly 3B records, including a significant number of social security numbers. Joining us to discuss are security experts Joel de la Garza and Naftal...i Harris. Incredibly enough, Naftali and his team were able to get their hands on the breached dataset and were able to validate the nature of the claims. Listen in as we explore the who, what, when, where, why… but also how a breach of this magnitude happens and what we can do about it.Resources:Read 16 Steps to Securing Your Data (and Life)Find Naftali on Twitter: https://x.com/naftaliharrisCheck out Sentilink: https://www.sentilink.com/Stay Updated: Let us know what you think: https://ratethispodcast.com/a16zFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everyone. Welcome back to the A16Z podcast. Today, we've got a special episode covering a timely piece of news that, quite frankly, I wish we were not reporting on. In case you missed it, this week, there was a reported breach of nearly three billion records, but not just any records. Headlines included, quote, billions of social security numbers exposed, or even, quote, did hackers steal every social security number? Naturally, we wanted to bring in the experts to break down what really happened here and its expected impact. So, joining us today are Joel DeLogarza and Naftali Harris. Joel is an operating partner at A16Z, who was previously the chief security officer at Box, and previous to that, the global head of threat management and cyber intelligence per city group. Naftali, on the other hand, is co-founder and CEO of Centrelink, a company that helps block identity theft and fraud for hundreds of financial institutions. at a scale that might make you wins.
Starting point is 00:00:59 We verify over a million people every day. Incredibly enough, Naftali's team was actually able to get their hands on the breach data set, and we're actually in the room as we were recording, so you'll hear Neftali reference them as they were poking and prodding to validate the claims. Listen in as we explore the who, the what,
Starting point is 00:01:15 the when, the where, the why, but also how a breach like this happens and what we can do about it. We watch these markets, like this has been going on forever. You can see the fraudsters talking about this on the forums. Social security numbers are the kind of things that don't change, right? You get one when you're born and you're stuck with it for a while. Yep, you're in this breach.
Starting point is 00:01:38 You're in the breach. And probably all three of us are, frankly. As a reminder, the content here is for informational purposes only. Should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16C fund. please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com
Starting point is 00:02:06 slash disposures. So Joel Naftali, this was a pretty crazy week. I got a slack from one of our co-workers, Joel, that was like, have you seen this social security hack? And I had not at that point. And that is a pretty, you know, frightening message. So why don't we just take a second to recap what actually happened here? What was this breach and what data was potentially at risk? Yeah, so just when you thought there wasn't any more information to leak out into the world, and then there's always a surprise that there's still more data to come out. And so this week we saw there was a third-party company that collects all this information
Starting point is 00:02:46 and uses it for things like validating your identity. And so they have your name, your social security number, your address. They also have nicknames. And it seems like they had it for all U.S. citizens as well as all Canadian citizens. So this is larger than just the U.S. So I'm not safe as a Canadian. Well, I can't save you this time, unfortunately. And so these hackers somehow came about getting this information. And then they tried to sell it on the dark web. And there weren't
Starting point is 00:03:12 any takers. Nobody wanted to buy it because, like I said, I thought all this stuff was already public. And so when they couldn't sell it, they just released it for free. And so now there is this hundreds of gigabyte file out there on the internet that encapsulates the data about all Americans and most Canadians. So there you go. That's what happened. Oof. When I read the articles yesterday, it seemed like this was alleged reporting people weren't sure confidently, per se, that this was billions of data points, including social security numbers. How sure are we that is the data that was hacked? Well, Steph, I can answer that quite confident because we actually have it. Oh. So we founded ourselves on the dark web. And so to fill
Starting point is 00:03:52 in a little bit of the timeline here, so national public data, which is the company that had the reach. They reported that the hack itself happened in December of 2023. And then it got released onto the dark web on a place called breached forums by some hacker named Phenis. If I mispronounced your name, please don't come after me. But that person released it on August 6th. And we got to copy yourselves. And so we look through it. And yeah, it's as reported. So there's names, dates of birth, addresses. The data would say is like relatively messy relative to some other data breaches that you sometimes see. But no, we're confident and it's true because we literally have it. Jeez. And so when you say it's messy, so like if there's a name and there's a social security number,
Starting point is 00:04:35 are those linked? And are those linked to email or any other fields that might be in there? Yeah, I'll give you an example of the way in which it's messy. So for example, the first six records all correspond to the same individual, a woman from Alaska. But they have different variants on her name, including like nicknames and stuff like that. I believe it's across two different addresses that she had. That's one level of messiness. Another way that the data is messy is about 10% of the SSNs are obviously fake. Like they begin with three zeros or four zeros. So the data is not as clean as it could be, which is obviously a good thing, but there's no question there's a lot of bad stuff in there. You've obviously accessed the data set. How long did it
Starting point is 00:05:15 take you to actually get access to it? My six, sorry, how long did it take us, guys? Literally on August My God. You guys fucking believe this team? Unbelievable. Incredible proud of my team here. We already got it like the day it was released. And maybe for the listeners, give us a little insight. When you get access to a data set like this, what are you looking at? Right? Because I mean, obviously this is not your first rodeo. Yeah. When we first get a data set like this, the first thing we're trying to do is just understand like what's in it. So we'll take a look at the first couple thousand rows and just understand what fields are present and where does it look like the dead set actually came from. How common are the different. fields. So, for example, for this particular breach, phone number is mostly missing. It's mostly like aim, address, social data birth is sometimes in there, sometimes not. For example, we looked at the evolved data breach from about a month or two ago, and that one had information on ACH transactions and balances across different fintechs and stuff like that. And so, you know,
Starting point is 00:06:15 that led us down a different sort of path of inquiry. And just for folks listening who aren't spending time on the dark web. How easy is it really to access the data set? It's relatively straightforward if you know where to look. And like, we're also by far not the only people doing this. I mean, I think as of this morning, we'd seen 26,000 views on breach forums for the thread. So like the fraudster community is looking at this and we've seen this there. We've seen it on telegram. We've seen it on leak base. It's just, it's all over the place. And so if you know where to look, it's not that hard. Obviously, folks like the three of us don't do this every day. But for fraudsters or for Infosec profession,
Starting point is 00:06:49 as you can find it. That's not reassuring, but, I mean, it's the answer I expected. I would say that this is probably one of the big wins for sensible regulation around breach disclosures. Like, I think, having worked in this space since before there were breach disclosure requirements, these things were always happening and no one talked about them, and consumers were just oblivious. And I think that knowledge is power in making consumers aware of what's happened with their data is super important. And this is one of those cases where I think forcing disclosure around breaches makes the world a safer place. and makes people respond to them and handle them in a correct way.
Starting point is 00:07:24 We're going to get to how this happens, and obviously it's impact, but maybe we could just get a sense for scale. I mean, when I heard this, it felt bigger, but I'm a layman. I hear about breaches all the time. And so how would you actually characterize maybe like the magnitude or importance of this particular breach? In terms of magnitude, so it's 277 gigabytes of data uncompressed, which is a lot. That's across two different files, which totals 2.7 billion rows. Now, some of the
Starting point is 00:07:55 reporting you've seen in the media is like, oh, this is on, you know, three billion people have had their identities stolen, which is fortunately not the case. As I mentioned, there's a lot of duplicates there. But there are 2.7 billion records. It's literally a CSV file. And so each row is some different piece of information about an individual. Now, we haven't gone through the full file, but based on sampling, we think about approximately a third of the records are unique. And so if you run the math on that, it's high hundreds of millions of people. But again, we're not completely sure because you haven't seen the whole thing. So I'd say hundreds of millions of individuals confidently and 2.7 billion records.
Starting point is 00:08:33 Joel, you've been working in security for so long. How would you characterize maybe not only the sheer number of records, but maybe the quality of the information, the particular kind of information? I'm unfortunately probably a little desensitized. I'm only partially being snarky. Like, I do think a lot of this information's already leaked out there. Like, we've had multiple breaches of credit reporting agencies. And you have to remember that social security numbers are the kind of things that don't change, right?
Starting point is 00:08:59 You get one when you're born and you're stuck with it for a while. And so not through any central repository, but just the breaches over the last 20 years, a lot of this information's already leaked. And so I don't know how unique it is. What might be interesting is that it gives you sort of maybe a central repository where you can QA the information you already have or maybe there's some information in there
Starting point is 00:09:19 that hasn't already leaked. And so that's probably going to make a little bit of a difference for folks. I agree the bureaus have all had leaks at different points. And I think the Equifax breach from what, five or seven years ago had something like 80% of Americans in it or something like this.
Starting point is 00:09:35 But one of the things that I'm sort of thinking about here and actually you can see the fraudsters talking about this on the forums is they're sort of using this as a backbone to other breaches. And the other thing, too, is frankly, fraudsters today, folks who commit identity theft are not limited by PII. Like, PI is already out there. It's relatively easy to get an identity that you can use as a base to steal.
Starting point is 00:09:57 But the place where breaches really get bad is when you connect the sort of core PI information, so named, data birth, S&N, an address, when you connect that to other things. So if you connect that to a driver's license or a bank account or, a VIN or email addresses. Like that's when you can actually start to do something interesting from a fraudster's perspective with the information. And this data that has gotten breached here, we think could be used as sort of a backbone to connect to all other sorts of information's been breached.
Starting point is 00:10:27 As I mentioned, like in breached forums, the forums or the fraudsters are talking about those. Yeah, and it's funny when you actually read through some of these chatter with the attackers, right, because they have a lot of the same problems that like legitimate businesses have, specifically like marketing companies, right, which is like how do we make sure that we have the right Joel and how do we know that we've got his right car and do we have his right identification? Because a lot of times these guys are trying to defeat things that are using personal information about you for authentication, right? They ask me what school you went to when you
Starting point is 00:10:56 were five and stuff like that. And so the more of this demographic information these folks can build up and the more accurate they can make it, the easier it is to subvert a lot of the security controls in place for them to commit fraud. Right. And as more of these breaches happen and more data is released, I mean, how much risk is there for me? Like, let's just say as the average American, should I be really concerned with this new breach? Or like, how would you measure that? I mean, I think the risk is always there. It's ever present. I think that you should probably have a lock or a freeze on your credit, right? That's sort of step one. I think if you do that, you mitigate some of the problems from these sorts of things. I think the bigger issue is going to be, at least as you look forward and you think about how thieves and scammers are going to use this stuff, you know, you can start to use this demographic information pretty convincingly if you could clone someone's voice using GenAI or you could take this in a new direction in which you get a lot more attributes. it's about a person that let you build a much more believable profile, that then let you replicate the presence, their kind of identity, and a lot more difficult to verify world.
Starting point is 00:12:03 And what we've heard from folks is that this kind of fraud, this sort of next level social engineering, is the thing that's been happening more and more. I can give the advice I typically give to my family at Thanksgiving. I would ask me the same question, which is, look, at the end of the day, there's not too much that people can do to prevent fraudsters from stealing their identity. If you're in this breach, you're in the breach, and probably all three of us are, frankly. But the things that you can do are pretty basic and strong personal security things.
Starting point is 00:12:33 Like, for instance, turn on two-factor authentication for all the important services that you have, probably the ones that are not important as well. Use a password manager, so they don't have a bunch of repeated passwords everywhere. And maybe use your best judgment. If something seems like it's too good to be true, it probably actually is. Joel is a good point of freezing your credit. That's a great idea. It's also a good idea to just check your accounts on a regular basis to see if there's anything that you don't expect.
Starting point is 00:12:58 Yeah, and we actually have a helpful blog post that we wrote years ago called 16 things to protect yourself online that still is applicable today even after this data breach. Yeah, Joel, I think you've probably gotten way more use out of that than any of us would hope, huh? I wish I could say that things had changed radically, but it's still the same problems. How does something like this actually happen, right? We know all of these companies have various versions of our data, some more than others, some more. more important than others. Is it a lack of good infrastructure, or is this just the kind of thing that's bound to happen when you put data all in one central place? If I was a gambling man, I'd bet that they had some kind of configuration issue on a data store, that they had a cloud database that
Starting point is 00:13:38 probably had a guessable password or wasn't using two-factor authentication and someone stole the credentials, right? If you look at the snowflake breach, which impacted, I think, 137 different companies. That was all because there wasn't two-factor authentication enabled and people were able to guess or steal those passwords and usernames. And so to be quite honest, these breaches are usually lowest common denominator, right? They don't have to pick the lock if you leave the window open and you'd be surprised how people leave windows open. And that tends to be how these things happen. I mean, on that note, I'm a little bit surprised by maybe like how unsurprised you are by this breach. And so where are we in that arc? Is this just really something that we expect to just
Starting point is 00:14:21 continue to happen? And if you frame things the way you have as like the hackers basically become more effective as more of these happen and they can piece together different blocks, where does that put us? How does the industry need to shift, if at all? Or should we just expect like a rolling cadence of this? If you go back decades, people could be secure by this data, actually not being out there as much. SSNs were secret and your possession of one meant that it was probably you. I like to joke that some social security numbers are both your username and your password. And at this point, they're also public.
Starting point is 00:14:59 So it's like kind of the worst possible thing you could have. But so many different data breaches have completely broken that paradigm. And, you know, as I mentioned, frankly, there's so much data out there that PI being secret is no longer control at all, frankly, to prevent identity theft or other kinds of fraud. No, frankly, the reason why there's not more identity theft or other fraud out there is because institutions that guard against identity theft, so banks or governments or anyone that needs to verify the identities of consumers, like those institutions have controls for them.
Starting point is 00:15:32 And, you know, Centrelink is one of those controls. And so actually the reason there's not more fraud out there is because of the controls that institutions take, not because there's not data breaches. Yeah, and I think, like I said, not to be overly cynical, but we've had data and databases for a really long time, and it's relatively recently that there's been a requirement to disclose data breaches, right? California passed the CCPA. Actually, the breach disclosure law in California passed, I think, in 2005, but it wasn't nationally implemented for quite some time. And even then, there's still a patchwork of regulations. And it's the SEC that's actually driving a lot of the breach disclosure requirements. Currently, they require you, I believe, to disclose within 48 hours. after a material security breach, which is only a year old, right? So these breaches have been happening for years and years, and people just never talked about them. And so when you work in the security industry,
Starting point is 00:16:21 especially if you work on the cyber intelligence or the financial fraud side, and you watch these markets, like, this has been going on forever. And it's only now that companies are being forced to disclose it and that consumers are becoming aware. And so I think that's really the thing that's changed. And like all of these different kinds of situations, this is very much a cat and mouse game, right?
Starting point is 00:16:42 It's the attackers and the defenders and you go back and forth. And to be quite honest, the defenders have gotten really good. We have some really excellent technology out there, Sent to Link's a great example of that, where a lot of this stuff can be nipped in the bud, even if the information is out there, you can limit the harm that it causes. Joel, you know what we verify over a million people every day?
Starting point is 00:16:59 There's literally a million people a day that we help to prove who they are. That's amazing. Yeah, we're really proud of it. The bottom line of a lot of this stuff is that, like I said, it's easy to be cynical, it's easy to get worked up, about this stuff or whatever the case may be. But in reality, things have actually gotten a lot better. And if you freeze your credit, if you follow the security best practices,
Starting point is 00:17:18 if you use things like a Yuba key, you know, a hardware security key, you can exist online relatively safely, right? Probably more safe than you are walking through a city street at risk of being robbed, right? We've come a long way. We just, you get these headlines and the media hipes this stuff up and people think it's the end of the world. But in reality, like, things are a lot better. They're a lot better than people would report them to be.
Starting point is 00:17:40 The other really cool thing about the way the world has evolved is that with the startup ecosystem and the ability for, you know, expert founders to build technology to address these things. Like, we've actually shifted a lot of the economics on some of these things where you can build a successful company fighting this stuff and end up financially way better than if you were doing this stuff, right? And I think if you look at all these different kinds of situations and you look at any kind of crime, to be quite honest, it's just about where the incentives
Starting point is 00:18:08 lie. And if you shift the incentives in a meaningful way, you can actually really start to crack down on a lot of this stuff. That's a great point. And Nafali, that's what your company does, right? How many cases of identity fraud are you blocking per day? We stop over 20,000 a day. And who is paying for that? Is it the end customer who's paying you to monitor or how does that work? No, it's the institution. So we serve over 300 banks, lenders, financial institutions, telcos, governments throughout the United States to help them figure out if their customers or users are who they say they are. So, for example, before someone opens a credit card, that financial institution will ask us, hey, is this a real person? Are they, so that identity is stolen? And we'll be able to answer that for them in
Starting point is 00:18:51 real time. On the note of some of the new technologies coming online, they do open up a new vector, both for, to your point, Joel, attacking and defending. Curious if you see any gaps in terms of places that builders should be addressing on this new frontier, as again, like the attack vector has also opened up. Everyone's talking about generative AI and sort of the ability to do deepfakes and that sort of thing. And there's a lot of activity there. We actually have an investment in a company called Pindrop, which is really good at spotting audio deepfakes. And they sell a lot of products, as you can imagine, to financial service companies, because that's typically where you see the threat. But it all rolls downstream. And so it's not just J.P. Morgan Chase and Citibank that are getting hit
Starting point is 00:19:33 by these generative AI fakes, it's actually becoming grandmas and grandpas and parents, right? They're getting the fake phone calls from grandchildren and children, that they're being held and you need to wire the money and stuff to that effect, the virtual kidnappings, right? These are things that trickle down. And so enterprises are doing a good job of protecting themselves from some of this.
Starting point is 00:19:53 And what we need is for some of that technology to start to filter down into protecting consumers at large. Obviously, we've been using the same PII for ages, right? Like you guys mentioned, Social Security. I mean, it's also crazy to me that they send you that on a piece of paper. But in any case, is there some world where we have similar to password managers, like forcing you to update your password every so often or other forms of like biological identification? Should we be rethinking the idea that we use name, email, address, phone, etc? Or am I thinking about this
Starting point is 00:20:29 incorrectly? And even those have just like the same kind of vectors. I would say, like, yes, for sure. We should be thinking about this differently. Is that ever going to happen? Unfortunately, no. But, you know, frankly, public cryptography solves, like, quite a bit of this. And I'm not talking about, like, crypto blockchains or anything like that. I mean, simply, every citizen having a public private key pair and having the government
Starting point is 00:20:50 or some trusted entity go and cryptographically sign those would solve a bunch of identity verification issues. Is that going to happen in the United States? Absolutely not. But, you know, would that be an elegant solution that would solve a lot of problems? It would. There has been a dream for a really long time among the number of diehard old cryptography people
Starting point is 00:21:06 that one day the U.S. government would get into proving identity and there has been a NIST working group. The National Institute of Standards Technology has been trying to set standards for proofing for decades. There was a hope that maybe one day the post office would become the place where you could go prove your digital identity and get a token or some kind of key. I think we're still as far away from it today as we were 10 years ago,
Starting point is 00:21:29 but I hold on hope that one day. One day, I mean, California's rolling out digital driver's licenses, right? I got a digital license plate for my car. Like, we might get there. It might happen in my lifetime. I'm hoping. I think the Naftali's point, like the technology exists. We know how to stop this. We just need someone with a political will and desire to make this a thing and maybe go after the real problems that everyday American consumers face. So one day we'll get there. I'm optimistic. Hopefully in my life. Just a few more breaches. along the way. All right, if you've made it this far, thank you so much for listening. And if you like us covering these timely topics, be sure to let us know at rate thispodcast.com or you can email us at podpitches at a16.com. We'll see you on the flip side.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.