a16z Podcast - The SSN Breach: What Now?
Episode Date: August 18, 2024In this episode, we cover the recent data breach of nearly 3B records, including a significant number of social security numbers. Joining us to discuss are security experts Joel de la Garza and Naftal...i Harris. Incredibly enough, Naftali and his team were able to get their hands on the breached dataset and were able to validate the nature of the claims. Listen in as we explore the who, what, when, where, why… but also how a breach of this magnitude happens and what we can do about it.Resources:Read 16 Steps to Securing Your Data (and Life)Find Naftali on Twitter: https://x.com/naftaliharrisCheck out Sentilink: https://www.sentilink.com/Stay Updated: Let us know what you think: https://ratethispodcast.com/a16zFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
Hello, everyone. Welcome back to the A16Z podcast. Today, we've got a special episode covering a timely piece of news that, quite frankly, I wish we were not reporting on. In case you missed it, this week, there was a reported breach of nearly three billion records, but not just any records. Headlines included, quote, billions of social security numbers exposed, or even, quote, did hackers steal every social security number?
Naturally, we wanted to bring in the experts to break down what really happened here and its expected impact.
So, joining us today are Joel DeLogarza and Naftali Harris.
Joel is an operating partner at A16Z, who was previously the chief security officer at Box,
and previous to that, the global head of threat management and cyber intelligence per city group.
Naftali, on the other hand, is co-founder and CEO of Centrelink,
a company that helps block identity theft and fraud for hundreds of financial institutions.
at a scale that might make you wins.
We verify over a million people every day.
Incredibly enough,
Naftali's team was actually able to get their hands
on the breach data set,
and we're actually in the room as we were recording,
so you'll hear Neftali reference them
as they were poking and prodding to validate the claims.
Listen in as we explore the who, the what,
the when, the where, the why,
but also how a breach like this happens
and what we can do about it.
We watch these markets, like this has been going on forever.
You can see the fraudsters talking about this on the forums.
Social security numbers are the kind of things that don't change, right?
You get one when you're born and you're stuck with it for a while.
Yep, you're in this breach.
You're in the breach.
And probably all three of us are, frankly.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal, business, tax, or investment advice,
or be used to evaluate any investment or security
and is not directed at any investors or potential investors in any A16C fund.
please note that A16Z and its affiliates may also maintain investments in the companies discussed
in this podcast. For more details, including a link to our investments, please see A16C.com
slash disposures. So Joel Naftali, this was a pretty crazy week. I got a slack from one of our
co-workers, Joel, that was like, have you seen this social security hack? And I had not at that point.
And that is a pretty, you know, frightening message.
So why don't we just take a second to recap what actually happened here?
What was this breach and what data was potentially at risk?
Yeah, so just when you thought there wasn't any more information to leak out into the world,
and then there's always a surprise that there's still more data to come out.
And so this week we saw there was a third-party company that collects all this information
and uses it for things like validating your identity.
And so they have your name, your social security number, your address.
They also have nicknames.
And it seems like they had it for all U.S.
citizens as well as all Canadian citizens. So this is larger than just the U.S.
So I'm not safe as a Canadian.
Well, I can't save you this time, unfortunately. And so these hackers somehow came about
getting this information. And then they tried to sell it on the dark web. And there weren't
any takers. Nobody wanted to buy it because, like I said, I thought all this stuff was already
public. And so when they couldn't sell it, they just released it for free. And so now there is this
hundreds of gigabyte file out there on the internet that encapsulates the
data about all Americans and most Canadians. So there you go. That's what happened.
Oof. When I read the articles yesterday, it seemed like this was alleged reporting people
weren't sure confidently, per se, that this was billions of data points, including social
security numbers. How sure are we that is the data that was hacked? Well, Steph, I can answer that
quite confident because we actually have it. Oh. So we founded ourselves on the dark web. And so to fill
in a little bit of the timeline here, so national public data, which is the company that had the
reach. They reported that the hack itself happened in December of 2023. And then it got released
onto the dark web on a place called breached forums by some hacker named Phenis. If I mispronounced
your name, please don't come after me. But that person released it on August 6th. And we got to
copy yourselves. And so we look through it. And yeah, it's as reported. So there's names,
dates of birth, addresses. The data would say is like relatively messy relative to some other data
breaches that you sometimes see. But no, we're confident and it's true because we literally have it.
Jeez. And so when you say it's messy, so like if there's a name and there's a social security number,
are those linked? And are those linked to email or any other fields that might be in there?
Yeah, I'll give you an example of the way in which it's messy. So for example, the first six
records all correspond to the same individual, a woman from Alaska. But they have different
variants on her name, including like nicknames and stuff like that. I believe it's across two different
addresses that she had. That's one level of messiness. Another way that the data is messy is
about 10% of the SSNs are obviously fake. Like they begin with three zeros or four zeros. So the
data is not as clean as it could be, which is obviously a good thing, but there's no question
there's a lot of bad stuff in there. You've obviously accessed the data set. How long did it
take you to actually get access to it? My six, sorry, how long did it take us, guys? Literally on August
My God. You guys fucking believe this team? Unbelievable. Incredible proud of my team here. We already got it like the day it was released.
And maybe for the listeners, give us a little insight. When you get access to a data set like this, what are you looking at? Right? Because I mean, obviously this is not your first rodeo.
Yeah. When we first get a data set like this, the first thing we're trying to do is just understand like what's in it. So we'll take a look at the first couple thousand rows and just understand what fields are present and where does it look like the dead set actually came from. How common are the different.
fields. So, for example, for this particular breach, phone number is mostly missing. It's mostly
like aim, address, social data birth is sometimes in there, sometimes not. For example, we looked
at the evolved data breach from about a month or two ago, and that one had information on
ACH transactions and balances across different fintechs and stuff like that. And so, you know,
that led us down a different sort of path of inquiry. And just for folks listening who aren't
spending time on the dark web. How easy is it really to access the data set?
It's relatively straightforward if you know where to look. And like, we're also by far not
the only people doing this. I mean, I think as of this morning, we'd seen 26,000 views on
breach forums for the thread. So like the fraudster community is looking at this and we've seen
this there. We've seen it on telegram. We've seen it on leak base. It's just, it's all over the
place. And so if you know where to look, it's not that hard. Obviously, folks like the three of us don't
do this every day. But for fraudsters or for Infosec profession,
as you can find it.
That's not reassuring, but, I mean, it's the answer I expected.
I would say that this is probably one of the big wins for sensible regulation around breach disclosures.
Like, I think, having worked in this space since before there were breach disclosure requirements,
these things were always happening and no one talked about them, and consumers were just oblivious.
And I think that knowledge is power in making consumers aware of what's happened with their data is super important.
And this is one of those cases where I think forcing disclosure around breaches makes the world a safer place.
and makes people respond to them and handle them in a correct way.
We're going to get to how this happens, and obviously it's impact,
but maybe we could just get a sense for scale.
I mean, when I heard this, it felt bigger, but I'm a layman.
I hear about breaches all the time.
And so how would you actually characterize maybe like the magnitude
or importance of this particular breach?
In terms of magnitude, so it's 277 gigabytes of data uncompressed,
which is a lot. That's across two different files, which totals 2.7 billion rows. Now, some of the
reporting you've seen in the media is like, oh, this is on, you know, three billion people have had
their identities stolen, which is fortunately not the case. As I mentioned, there's a lot of duplicates
there. But there are 2.7 billion records. It's literally a CSV file. And so each row is some
different piece of information about an individual. Now, we haven't gone through the full file,
but based on sampling, we think about approximately a third of the records are unique.
And so if you run the math on that, it's high hundreds of millions of people.
But again, we're not completely sure because you haven't seen the whole thing.
So I'd say hundreds of millions of individuals confidently and 2.7 billion records.
Joel, you've been working in security for so long.
How would you characterize maybe not only the sheer number of records, but maybe the quality of the information,
the particular kind of information?
I'm unfortunately probably a little desensitized.
I'm only partially being snarky.
Like, I do think a lot of this information's already leaked out there.
Like, we've had multiple breaches of credit reporting agencies.
And you have to remember that social security numbers are the kind of things that don't change, right?
You get one when you're born and you're stuck with it for a while.
And so not through any central repository, but just the breaches over the last 20 years,
a lot of this information's already leaked.
And so I don't know how unique it is.
What might be interesting is that it gives you
sort of maybe a central repository
where you can QA the information you already have
or maybe there's some information in there
that hasn't already leaked.
And so that's probably going to make a little bit
of a difference for folks.
I agree the bureaus have all had leaks at different points.
And I think the Equifax breach
from what, five or seven years ago
had something like 80% of Americans in it
or something like this.
But one of the things that I'm sort of thinking about here
and actually you can see the fraudsters
talking about this on the forums
is they're sort of using this as a backbone to other breaches.
And the other thing, too, is frankly, fraudsters today,
folks who commit identity theft are not limited by PII.
Like, PI is already out there.
It's relatively easy to get an identity that you can use as a base to steal.
But the place where breaches really get bad is when you connect the sort of core PI information,
so named, data birth, S&N, an address, when you connect that to other things.
So if you connect that to a driver's license or a bank account or,
a VIN or email addresses.
Like that's when you can actually start to do something interesting from a fraudster's
perspective with the information.
And this data that has gotten breached here, we think could be used as sort of a backbone
to connect to all other sorts of information's been breached.
As I mentioned, like in breached forums, the forums or the fraudsters are talking about
those.
Yeah, and it's funny when you actually read through some of these chatter with the attackers, right,
because they have a lot of the same problems that like legitimate businesses have,
specifically like marketing companies, right, which is like how do we make sure that we have
the right Joel and how do we know that we've got his right car and do we have his right
identification? Because a lot of times these guys are trying to defeat things that are using
personal information about you for authentication, right? They ask me what school you went to when you
were five and stuff like that. And so the more of this demographic information these folks
can build up and the more accurate they can make it, the easier it is to subvert a lot of the
security controls in place for them to commit fraud.
Right. And as more of these breaches happen and more data is released, I mean, how much risk is there for me? Like, let's just say as the average American, should I be really concerned with this new breach? Or like, how would you measure that?
I mean, I think the risk is always there. It's ever present. I think that you should probably have a lock or a freeze on your credit, right? That's sort of step one. I think if you do that, you mitigate some of the problems from these sorts of things. I think the bigger issue is going to be, at least as you look forward and you think about how thieves and scammers are going to use this stuff, you know, you can start to use this demographic information pretty convincingly if you could clone someone's voice using GenAI or you could take this in a new direction in which you get a lot more attributes.
it's about a person that let you build a much more believable profile,
that then let you replicate the presence, their kind of identity,
and a lot more difficult to verify world.
And what we've heard from folks is that this kind of fraud,
this sort of next level social engineering,
is the thing that's been happening more and more.
I can give the advice I typically give to my family at Thanksgiving.
I would ask me the same question, which is, look, at the end of the day,
there's not too much that people can do to prevent fraudsters from stealing their identity.
If you're in this breach, you're in the breach, and probably all three of us are, frankly.
But the things that you can do are pretty basic and strong personal security things.
Like, for instance, turn on two-factor authentication for all the important services that you have,
probably the ones that are not important as well.
Use a password manager, so they don't have a bunch of repeated passwords everywhere.
And maybe use your best judgment.
If something seems like it's too good to be true, it probably actually is.
Joel is a good point of freezing your credit.
That's a great idea.
It's also a good idea to just check your accounts on a regular basis to see if there's anything that you don't expect.
Yeah, and we actually have a helpful blog post that we wrote years ago called 16 things to protect yourself online that still is applicable today even after this data breach.
Yeah, Joel, I think you've probably gotten way more use out of that than any of us would hope, huh?
I wish I could say that things had changed radically, but it's still the same problems.
How does something like this actually happen, right?
We know all of these companies have various versions of our data, some more than others, some more.
more important than others. Is it a lack of good infrastructure, or is this just the kind of thing
that's bound to happen when you put data all in one central place? If I was a gambling man, I'd bet that
they had some kind of configuration issue on a data store, that they had a cloud database that
probably had a guessable password or wasn't using two-factor authentication and someone stole the
credentials, right? If you look at the snowflake breach, which impacted, I think, 137 different
companies. That was all because there wasn't two-factor authentication enabled and people were able
to guess or steal those passwords and usernames. And so to be quite honest, these breaches are
usually lowest common denominator, right? They don't have to pick the lock if you leave the window
open and you'd be surprised how people leave windows open. And that tends to be how these
things happen. I mean, on that note, I'm a little bit surprised by maybe like how unsurprised you are
by this breach. And so where are we in that arc? Is this just really something that we expect to just
continue to happen? And if you frame things the way you have as like the hackers basically
become more effective as more of these happen and they can piece together different blocks,
where does that put us? How does the industry need to shift, if at all? Or should we just expect
like a rolling cadence of this? If you go back decades, people could be secure by this data,
actually not being out there as much.
SSNs were secret and your possession of one meant that it was probably you.
I like to joke that some social security numbers are both your username and your password.
And at this point, they're also public.
So it's like kind of the worst possible thing you could have.
But so many different data breaches have completely broken that paradigm.
And, you know, as I mentioned, frankly, there's so much data out there that PI being secret is no longer control at all, frankly,
to prevent identity theft or other kinds of fraud.
No, frankly, the reason why there's not more identity theft or other fraud out there
is because institutions that guard against identity theft,
so banks or governments or anyone that needs to verify the identities of consumers,
like those institutions have controls for them.
And, you know, Centrelink is one of those controls.
And so actually the reason there's not more fraud out there is because of the controls
that institutions take, not because there's not data breaches.
Yeah, and I think, like I said, not to be overly cynical, but we've had data and databases for a really long time, and it's relatively recently that there's been a requirement to disclose data breaches, right? California passed the CCPA. Actually, the breach disclosure law in California passed, I think, in 2005, but it wasn't nationally implemented for quite some time. And even then, there's still a patchwork of regulations. And it's the SEC that's actually driving a lot of the breach disclosure requirements. Currently, they require you, I believe, to disclose within 48 hours.
after a material security breach, which is only a year old, right?
So these breaches have been happening for years and years,
and people just never talked about them.
And so when you work in the security industry,
especially if you work on the cyber intelligence
or the financial fraud side,
and you watch these markets, like, this has been going on forever.
And it's only now that companies are being forced to disclose it
and that consumers are becoming aware.
And so I think that's really the thing that's changed.
And like all of these different kinds of situations,
this is very much a cat and mouse game, right?
It's the attackers and the defenders and you go back and forth.
And to be quite honest, the defenders have gotten really good.
We have some really excellent technology out there,
Sent to Link's a great example of that,
where a lot of this stuff can be nipped in the bud,
even if the information is out there,
you can limit the harm that it causes.
Joel, you know what we verify over a million people every day?
There's literally a million people a day that we help to prove who they are.
That's amazing.
Yeah, we're really proud of it.
The bottom line of a lot of this stuff is that, like I said,
it's easy to be cynical, it's easy to get worked up,
about this stuff or whatever the case may be.
But in reality, things have actually gotten a lot better.
And if you freeze your credit, if you follow the security best practices,
if you use things like a Yuba key, you know, a hardware security key,
you can exist online relatively safely, right?
Probably more safe than you are walking through a city street at risk of being robbed, right?
We've come a long way.
We just, you get these headlines and the media hipes this stuff up and people think it's
the end of the world.
But in reality, like, things are a lot better.
They're a lot better than people would report them to be.
The other really cool thing about the way the world has evolved is that with the startup
ecosystem and the ability for, you know, expert founders to build technology to address these
things.
Like, we've actually shifted a lot of the economics on some of these things where you can
build a successful company fighting this stuff and end up financially way better than if you
were doing this stuff, right?
And I think if you look at all these different kinds of situations and you look at any kind of
crime, to be quite honest, it's just about where the incentives
lie. And if you shift the incentives in a meaningful way, you can actually really start to crack down on
a lot of this stuff. That's a great point. And Nafali, that's what your company does, right? How many
cases of identity fraud are you blocking per day? We stop over 20,000 a day. And who is paying for that?
Is it the end customer who's paying you to monitor or how does that work? No, it's the institution.
So we serve over 300 banks, lenders, financial institutions, telcos, governments throughout the United
States to help them figure out if their customers or users are who they say they are. So, for
example, before someone opens a credit card, that financial institution will ask us, hey, is this a
real person? Are they, so that identity is stolen? And we'll be able to answer that for them in
real time. On the note of some of the new technologies coming online, they do open up a new vector,
both for, to your point, Joel, attacking and defending. Curious if you see any gaps in terms of
places that builders should be addressing on this new frontier, as again, like the attack vector
has also opened up. Everyone's talking about generative AI and sort of the ability to do deepfakes
and that sort of thing. And there's a lot of activity there. We actually have an investment in a company
called Pindrop, which is really good at spotting audio deepfakes. And they sell a lot of products,
as you can imagine, to financial service companies, because that's typically where you see the threat.
But it all rolls downstream. And so it's not just J.P. Morgan Chase and Citibank that are getting hit
by these generative AI fakes, it's actually becoming
grandmas and grandpas and parents, right?
They're getting the fake phone calls from grandchildren and children,
that they're being held and you need to wire the money
and stuff to that effect, the virtual kidnappings, right?
These are things that trickle down.
And so enterprises are doing a good job of protecting themselves
from some of this.
And what we need is for some of that technology
to start to filter down into protecting consumers at large.
Obviously, we've been using the same PII for ages, right?
Like you guys mentioned, Social Security.
I mean, it's also crazy to me that they send you that on a piece of paper. But in any case,
is there some world where we have similar to password managers, like forcing you to update your
password every so often or other forms of like biological identification? Should we be
rethinking the idea that we use name, email, address, phone, etc? Or am I thinking about this
incorrectly? And even those have just like the same kind of vectors.
I would say, like, yes, for sure.
We should be thinking about this differently.
Is that ever going to happen?
Unfortunately, no.
But, you know, frankly, public cryptography solves, like, quite a bit of this.
And I'm not talking about, like, crypto blockchains or anything like that.
I mean, simply, every citizen having a public private key pair and having the government
or some trusted entity go and cryptographically sign those would solve a bunch of
identity verification issues.
Is that going to happen in the United States?
Absolutely not.
But, you know, would that be an elegant solution that would solve a lot of problems?
It would.
There has been a dream for a really long time
among the number of diehard old cryptography people
that one day the U.S. government would get into proving identity
and there has been a NIST working group.
The National Institute of Standards Technology
has been trying to set standards for proofing for decades.
There was a hope that maybe one day the post office
would become the place where you could go prove your digital identity
and get a token or some kind of key.
I think we're still as far away from it today as we were 10 years ago,
but I hold on hope that one day. One day, I mean, California's rolling out digital driver's licenses, right? I got a digital license plate for my car. Like, we might get there. It might happen in my lifetime. I'm hoping. I think the Naftali's point, like the technology exists. We know how to stop this. We just need someone with a political will and desire to make this a thing and maybe go after the real problems that everyday American consumers face. So one day we'll get there. I'm optimistic. Hopefully in my life. Just a few more breaches.
along the way.
All right, if you've made it this far,
thank you so much for listening.
And if you like us covering these timely topics,
be sure to let us know at rate thispodcast.com
or you can email us at podpitches at a16.com.
We'll see you on the flip side.