Risky Business - Snake Oilers: Kodex, ClearVector and Censys
Episode Date: April 4, 2024In this edition of Snake Oilers you’ll hear pitches from three companies: Kodex: Makes a platform companies can use to interact with law enforcement (Solves the la...w enforcement impersonator problem, among others.) ClearVector: Cloud security startup from former FireEye/Mandiant SVP/CTO John Laliberte Censys: Scans the entire internet, identifies assets you didn’t know were yours, helps you track attacker infrastructure like C2
Transcript
Discussion (0)
Hey everyone, and welcome to this edition of Snake Oilers.
This is the podcast we do here at Risky Biz HQ a few times a year, where vendors get to
come onto the show to pitch you their wonderful wares.
This whole thing is sponsored, and yes, that means everyone you're about to hear from in
this podcast paid to be here.
If you are looking for the regular
weekly Risky Business podcast, just scroll one back in your podcast feed. They're the editions
that have the numbers after them. So you could find one there. So yeah, we're going to hear from
three vendors today. The first is Codex, which can help you deal with law enforcement data requests
and stop law enforcement impersonators from getting their grubby hands on your data. The second is ClearVector, a startup that makes a security tool
for cloud environments that lets you detect and remediate badness. And the third is Census,
which maps out the internet and lets you discover and identify company assets you did not know you
owned. It's also useful for CTI people and threat hunters who want to explore attacker C2
infrastructure and you know find cobalt strike beacons and things like that. That is coming up
later but let's get into it now with our first snake oiler Codex. Now unless you've been living
under a rock the last few years you would know that forged law enforcement data requests are a huge problem. Attackers take
over some cop's email account and then they use it to send emergency data requests to telcos,
social media companies, you know, whoever. And this is a way for them to walk data out without
having to do any actual hacking. But beyond bogus requests, you know, dealing with law enforcement
can be a lot of work and require a lot of
specialist knowledge. And that's why former FBI agent Matt Donahue founded Codex. Codex is a
system that basically takes over a lot of that law enforcement relations stuff from your company.
They verify law enforcement contacts and identities, and they even vet the data requests
that are being sent to your company. Now, sometimes law enforcement agencies will ask for too much data
or ask for stuff that they're not actually entitled to ask for.
So Codex, yeah, verifies the identities of the law enforcement agencies
sending the requests or the agents sending the request
and checks the requests themselves to make sure everything is honky-dory
before you have to do anything.
Their system even
spins up like a dedicated law enforcement portal for each customer where they can share information
with investigators like data request guides, glossaries, whatever. The whole point of Codex is
if you're finding dealing with law enforcement difficult or risky due to impersonators,
Codex fixes that problem. So here's Matt Donahue explaining why he started Codex.
So Codex basically helps companies directly and authentically communicate with government
agencies who are reaching out, requesting user data in the form of subpoenas, search warrants,
or other illegal process to further their investigations. And basically, this all started through my time
doing counterterrorism intelligence with the FBI. I always joke how a lot of it was a childhood
dream fulfilled, you know, the meeting with sources in hotel rooms, given an envelope full
of cash. A lot of it really did feel like a movie. But it was also very disillusioning to
see beyond the veil and see how government operates compared
to what I think the outside observer would think. It ain't like in Hollywood, right?
The human intelligence aspect is a lot like Hollywood and it was a lot of fun. But the
first time they told me to fax a subpoena to a telecom provider in Southern California,
I genuinely thought they were just messing around. I'm like, yeah, you thought that asked you to go and buy some headlight fluid or a left-handed
screwdriver.
Yeah.
Literally.
I'm like, yeah, yeah.
New guy.
I get it.
How do I really do this?
And they're like, what do you mean you fax it?
I'm like, what do you mean?
This is the goddamn FBI.
I've never sent a fax in my life.
What the hell are we doing here?
And it just blew my mind that for as important
as this process is basically sending out data requests all these different companies and getting
the pieces of the puzzle uh together it is the most archaic and antiquated and chaotic part of
the modern investigation but it's also the the most important because even if you just look
at publicly available transparency reports, the volume of requests across all different industries,
not just the Facebooks and Googles of the world, but across the social media messaging, telecom,
traditional finance, fintech, crypto, gaming, basically all every industry like car industries
cars are basically data companies now they're mapping the worlds and i don't think you'd ever
expect like ford motor company to be getting government data requests but it affects everyone
from a 7-eleven to a bank of america and i just kind of thought something like codex already existed
before i was in the fbi and i was stunned to realize how chaotic it actually was where well
i mean you know that's that's what i mean by it ain't hollywood because in hollywood they say i'll
go put this in the system and then bang and then in reality you realize there is no system you know
there's some post-it notes and maybe a couple spreadsheets that some people have uneven access to.
But, you know, it ain't like it is portrayed.
And, you know, that said, though, these sorts of systems are only a problem when they become a problem.
And they have become a problem.
I mean, we've seen a lot of people fraudulently obtaining data via what are they called?
Emergency data requests like that's been a
huge thing over the last couple of years where some of these you know criminal you know cyber
crime rings and stuff are exploiting this to do various things to get people's personal details
of you know rivals or whatever or just for swatting attacks and whatnot so like this thing
has really become a problem hasn't, over the last couple of years?
Yeah, absolutely.
And that was actually one of the straws that broke the camel's back when I was in the FBI of we were reaching out to a company who was a new social media company at the time, 2017, 2018.
And on our side, it was very frustrating because we weren't getting any response to get the digital evidence to lock these people up. And on their side, come to find out they had never received a piece of legal
process before, and they didn't believe it was actually the FBI. And I don't actually blame them
because no one builds a company with one day that the expectation that the FBI or someone else is
going to come knocking. But the unfortunate reality is, is that people are going to end up abusing your product in ways in which you never wanted nor intended.
And as a result of that, like business email compromise and social engineering have been a
fan favorite of the hacking community or fraudsters for decades. And with the advent of
law enforcement requests, especially emergency requests, that gave them
a new avenue and a soft underbelly into a company that you can put up as many cybersecurity
software infrastructure as you want, but if you're leaving this gaping hole into your
company and into your users' data, there's a big flaw in your security posture there. There are 135 countries
that are using Codex to interact with our customers to date to defend against law enforcement
impersonators who will oftentimes abuse law enforcement credentials that are traded like
Skittles on the dark web and on Telegram. Yeah. I mean, it's funny, right? Because you
just spoke about a social media company that got outreach from the FBI and they didn't believe it, which I suppose is par for the course when the person rings up and asks them for their fax number.
So, you know, like that, yeah, that makes sense. But like, why don't we talk about actually how Codex works, who the customers are and whatnot, because, you know, essentially you've built like a identity, you know, it's like an identity service really, so that you can be much more
confident that you are talking to who you think you're talking to.
So, you know, trying to get around that issue of, you know, just one stolen credential allows
people to impersonate a law enforcement agency.
So, you know, so essentially there's a verification component to this and it's sort of like a
communications component to this.
Why don't you actually just tell us in simple terms, like what Codex actually does, how it works?
Yeah, absolutely. agents across over 12,000 agencies, as I mentioned, in more than 130 countries across the world,
where anytime someone needs to reach out to one of our companies, they now go through Codex.
Our 24-7 verification team vets all of these users before they're ever allowed into the system,
so that once one of our customers is interacting with a government user, they know that it's legitimate, they know that it's been vetted, and they know that they're part of this herd immunity that is the Codex network. and security posture is that the same person who is using a Polizia to Stato email address to
fraudulently request information from Facebook is doing the same thing to Verizon. I don't know if
you saw the news before Christmas about something like that. They're doing it to Google. They're
doing it across companies and they don't have a way of protecting each other.
Comparing notes. Like I'm with you. Yeah. I get it. So say I'm your customer,
right? I'm running an up and coming social network. I get some sort of data request from a
cop in England. I'm your customer. What do I do then? Do I reach out to Codex and say,
this person's trying to contact me for a data request, go and vet them, go and verify them?
And how does that gel with something like an emergency data request, which, you know, has to be expedient? So I'll start by like the process
of law enforcement encountering Codex. So basically, pre-Codex, a company would have
an email address at best, like lawenforcementatcompany.com, where any government
agent from anywhere in the world would just email that singular inbox that shared amongst up to again up to probably like 50 60 plus uh analysts on the
company side in some cases and you can imagine how complex and difficult that is and compounds
with the variables including like there are 18 000 law enforcement agencies just in the US, not to mention the other 192 countries in the world. And so the moment they adopt Codex,
that email address turns into an auto reply, basically saying, hey, we're no longer accepting
service through this email. Please go to our new government request portal. They're redirected
there and prompted to either sign in if they already have an account or sign up if they
do not yet have an account. And one of the things that either investors or customers have always
asked is like, how did you get the FBI or the Met Police out in the UK to use Codex? And like,
the answer to that was really the opportunity caused by the problem of my question back in Southern California.
Why am I faxing this?
Yeah.
Because that's how they had it set up.
Well, I think it's also like you don't charge the law enforcement agencies for this, right?
So like it's solving a problem that costs companies money, adds risk.
And I'm sure, you know, the police that I know, like something like this comes along,
they're like, sure, as long as it doesn't cost us anything, because we don't have budget
for this. Exactly. And it's in their best interest to go
the path of least resistance to where if a company has a formal method of accepting these,
going outside of that method is only going to impact the timeline of their own investigation.
So hang on, speaking of, and sorry to cut you off, it's just we are kind of running a little bit out of time
at this point.
So, you know, just going back to that example
of like an emergency data request coming in.
Okay, so I'm a Codex customer.
Someone has contacted you with an emergency data request.
You've got a 24 by 7 verification team.
But like, what's the process there
if you can't verify the person?
Is it that
someone at Codex will reach out to someone else who is a trusted contact at that law enforcement
agency? Or, you know, tell me how that works. Yeah. So that's part of the process. It's a
multi-layered approach that has both human involvement and automation in order to accommodate
for scale as well. And so someone who signs up through one customer's portal,
for like LinkedIn, for example, they're a customer.
If someone signs up through LinkedIn's portal,
we verify and authenticate them.
They're now not just verified for LinkedIn,
but also for Binance and OpenSea
and all the other customers that we have.
So they share the network of verified government agents and agencies
that allows us to do suspicious activity monitoring
across the government agent user base.
So I guess what you're saying is the idea that someone
who'd never encountered any of your customers in the past
lodging an emergency data request, like that's going to be a corner case.
I wouldn't say it's a corner case, actually.
Like that, you never know when and where in law enforcement someone's going to actually need to reach out to any of the thousands of companies, including any of the companies on Codex. points of contact across the world but it also includes under the hood information of is this
ip address known to be associated with this agency based on the other users in that agency
is it a browser or device that we're expecting based off to the other users in that agency like
i always joke that like no one's going to be using a macbook on dark mode logging in through
tor if they're a legitimate government agent serving an emergency.
And so there is low hanging fruit like that that we accommodate for. But also, let's say it is an
emergency, they pass these base layer checks, there's a part within the signup flow to identify
like, hey, this is an emergency, lead me to the top of the queue. And so at that point, we allow
them in again, under the controls of our
suspicious activity monitoring and signals intelligence. And part of the process that
has always been so burdensome, and I think it unfortunately leads to the taboo nature
and the demonization of this process is that people like to say government data requests.
Oh, it's big brother wanting to gobble up all your data.
Anyone who's ever gotten a 13,000 page single spaced PDF data return in a search warrant
does not want all of that information.
They want the most granular and narrowly tailored piece of information that's going to drive
their investigation forward.
Very interesting conversation.
Matt Donoghue, thank you so much for your time and all the best with it. It sounds like a really worthwhile endeavor that I think we're in a
better place if people use something like this. So cheers. Yeah, no, I appreciate it. Really
appreciate the time. Thanks for having me. That was Matt Donoghue from Codex there,
and you can find them at codexglobal.com. And that's Codex with a K. So k-o-d-e-x global.com. Next up in snake oilers,
we're going to hear from Clear Vector. Clear Vector makes a monitoring product that helps you
see how your cloud environments are actually being used by your developers. And if a breach happens,
they'll help you to understand the blast radius. There's even an isolate button,
which response teams can smack real hard. So that's always satisfying.
And I guess, you know, ClearVector's core philosophy really is that what matters is
human identities and mapping those out from the nuts and bolts of your AWS GCP or GitHub
tenancies, you know, with lambdas and IAMs and all that complexity, that map is what
you really need to make sense of everything that's going on.
So John Laliberte is a ex-FireEye Mandiant guy and the founder of ClearVector.
And he joined me for this interview and started off by explaining what ClearVector actually is.
ClearVector hooks up to your cloud environment and looks at all the runtime activity in your cloud.
So think about that in terms of the control plane activity, the activity inside of the workload.
So think like EC2, Lambda, things like that. We also hook up to SaaS applications like GitHub and connect all of
that together and build models of the identities. And then we surface all of the risky activity to
you in the form of notifications in either Slack or Teams, and then give you the ability to stop
the activity by identity by giving you a
big red isolate button in the product that will prevent that identity or entity from doing anything
else in your cloud environment. So when we're talking about identities, are we just talking
about like user accounts and developer accounts? Or are we talking about sort of like, oh God,
what would you even call them? Like sub identities are like you know access tokens that have been generated by a you know developer identity and then it'll be used being used to do
xyz like i mean i guess that's probably what you're trying to address right is this is this
sprawling diagram that stems from you know the real identities exactly we take a look at all
that right so one of the big questions that people, they may be notified by someone or they may find out, hey, I have this access key.
Well, they can't answer the question, what happened with that access key?
Whose access key is it, right? That can be a hard question.
That's a basic question. Yeah. And so as an example, what we can tell you is the prominence
of that, right? We can tell you this is the identity that created that key, and then this
is the identity that actually used the key, right? And for something like a CICD role, which we, you know,
you've met briefly mentioned in terms of kind of machine accounts or SaaS accounts, we can actually
tell you the developer that approved a particular PR that caused the activity by this role in your
cloud account, which helps out a lot of different ways, not just from a traditional availability,
like, hey, who broke production, but also from a security perspective to be able to say like, well, yes, we know where
this change originated from and be able to tie that back to the identity itself.
Yeah, right. So what sort of things are you most likely to, well, I mean, you know,
this thing's in prod, people are using it, right? So what sort of things is it most often surfacing in terms of popping up that big red button,
which says isolate?
So today it kind of falls into two camps.
The first camp is basically third party risk.
So think all those vendors that you have plugged into your cloud environment that are doing
all kinds of things.
It's pretty much every single time we plug into a new environment, it kind of all the alarms
go off, the red button pops up. And it's like, hey, did you know that this, you know, identity
is sharing this with some other AWS account, or, you know, this thing is sending your data over
here. And invariably, what that does is kind of, you know, person looks at it, and they're like,
well, that wasn't really part of how they thought about that vendor of what they were actually
doing. And so that's kind of the first case. The second case is when there
actually is an adversary in the environment doing something that's unexpected or is actually risky.
And so in that case, the isolate button does help because at most cloud accounts,
it's actually very difficult. Once you get into, 10, 20, 40, 50 different accounts.
It gets very difficult to track who did what in each account.
And when you think about isolation from that perspective, it's not as simple as just saying disable the account because there's active sessions open for that particular identity with all these different accounts.
And so that's some of the hard things that we do under the hood that kind of backs that isolate button. And obviously all of that's available via API. So you can integrate that with,
you know, any kind of a product that, that can do that.
Yeah. So I find this attribution component, like that seems to be the special thing here,
right. Which is to be able to tie an action back to, you know, the origin, right. In terms of an
account or a user. But what about, let's talk a little bit
about the detections, which are firing on funny things. You know, what are the dead giveaway
kind of things that'll happen in a cloud environment that'll result in you actually,
you know, going through that process of tracking something back to the origin? What are the,
you know, what are the things you're firing on, I guess is what I'm asking.
So first we look at all the activity and we map it back to identities all the time.
So it's not like we say, oh, this particular identity, we're going to track all of it. Yeah, and then press a button and then it goes and finds it.
It's all there.
It's all there all the time.
It's all there all the time.
And so we basically shred all of that activity, whether it's from things that the cloud providers provide or it's our own instrumentation, like our eBPF Rust-based agent.
We take all of that signal in and then we basically shred it.
We turn it into natural language.
We put it into a custom graph that we have patented.
And then we look over periods of time.
So after we go through that kind of identity attribution phase, we then have what we call
a risk engine.
And there's lots of different subsystems of the risk engine. Yeah. So this is the bit that I'm asking about, right? Because I'm
real curious to know, like once you've got all of this and yeah, if you're throwing it into some
sort of, you know, graph-based thing, yeah, you're going to see, you're going to start to see some
interesting stuff and get a sense of like when something's gone sideways, you know, you're going
to start to detect that. But yeah, I guess what I'm asking is like, what does it look like when it goes sideways and how do you detect it?
So all that activity after it goes into the graph, it goes into this risk engine and think
of the risk engine almost as like this team of cloud security experts that's looking at all this
activity. So it's like the security team you wish you could afford and build. And it's kind of this,
you know, ever present looking across everything, right? And it looks over periods of time. And it's kind of this, you know, ever present looking across everything, right? And
it looks over periods of time. So it's like as new activity is added to this model for particular
identities, it can basically look and say like, okay, what does this risk level look like over
time, and then surface that notification to you. And it doesn't do this, it's not like this rule
matching thing, right? It's a it's a pretty robust algorithm behind the scenes that's constantly being improved. Think of it like a recommendation type of an algorithm
that automatically kind of surfaces that to you. And what it looks like is a natural language
narrative. So it's kind of how you would describe to, you know, whether it's your boss or somebody
who's not a security expert of what happened. It's not the technical bits and bytes. It's not even
necessarily the IP addresses or anything like that. It says, hey, this identity did these things,
and we think it's risky for these reasons, and you should click isolate on this and stop it.
I guess I'm still going back to that original question though. You say this identity did these
risky things. What are the risky things? What is typically being surfaced by Clear Vector in these environments?
So a few things.
First thing is when you take data out of the environment or you share it outside the environment.
So that's a typical indicator of either an attacker from a data theft perspective,
or it's a kind of indicator of a developer, you know, unintentionally sharing
data outside the environment.
The other thing is thinking about new vendors that are introduced to your environment.
So meaning new identities introduced to your environment without the kind of thought process
that goes into, well, should I allow this, you know, new identity in the environment?
The other thing is kind of things that are dormant, right?
Like say access keys that haven't been rotated or used in a long time, and then suddenly kind of,
you know, burst on the scene. Right. So that's a good one. That's the first one where I'm like,
wow, okay. That's, that's, you know, a dead giveaway, right? Like some old token has been
sitting around for like 12 months untouched. And then all of a sudden it's, yeah, extremely active.
I mean, that's going to jump out, right? Right. And so there's a lot of these kind of really hard situations to figure out of, you know,
let's take that access token as an example.
Okay, great.
How did that get there, right?
Who created it?
Like, is this authorized?
Is this unauthorized?
And so being able to say, yes, this was the identity they created, it gives you the ability,
you know, with in minutes to go talk to the actual human that created that access key and turn what normally is like, you know, jump into a SIM or something else, right?
Do queries for a few hours, try to like, you know, get through all the hoops and do all of that
kind of on demand. So that's kind of the kind of pain point that we saw was it turns us like
hours or days on process into, you know, hey, just 30 seconds or two minutes, because we've
been able to do that
ahead of time. But even just knowing when something dormant gets active again, right? Like actually
having something that can tell you when that happens, that's going to be useful. Just that
on its own is incredibly useful. Yeah. And the other thing is lateral movement is one of the
big things that we're able to do given all the technology that you have, right? So think about
there's a cloud control plane,
and then there's the actual kind of inside the workloads,
which is where a lot of, you know,
the traditional things that we already know in security live.
And so, you know, my belief is that there's always attackers
that live in the cracks and gaps, you know?
So that's kind of that new area for lateral movement
where historically it was more on the Active Directory side
with lateral movement.
Now in the cloud, you can laterally move between the control plane and the workloads and the
SaaS apps and kind of like all of these areas need to be combined in order to really get
a sense of who is doing what where before you can even have a conversation about kind
of what to detect or is it bad or is it risky?
And so that's kind of one of
the core things that we can also do with our graph because we're connecting, you know, what happened
inside that EC2 with what happened at the control plane layer, and then be able to connect that back,
you know, to say your SSO user, you know, from something like Okta in your IDP.
Well, it's interesting that you mentioned that, right? Because you did briefly say something
about an agent, and I'm guessing that's what you're using to look inside those EC2 instances. into the environment where they want a full auto log along with the identity context, right? Of
who's logging into these boxes, what are they running on the box? And also workloads like
Lambda and other things, right? Where it's very difficult to get visibility into these serverless
workloads because by definition, right, there is no server, which is how we normally deploy a lot
of our security tooling and security software. And so that's a lot of original R&D and time that's
gone into that to detecting kind of all of those different conditions around Lara movement
and the traditional auditing of these cloud workloads.
So who's using this? Do you have a typical customer yet or it's still too early?
We definitely have a typical customer. So it basically looks like a fairly good sized company that makes a lot
of their money by shipping their SaaS application. And they basically got to the point where they
have a lot of data sitting in their SaaS application and they probably have a CISO,
maybe one or two security people. And security kind of is still a shared responsibility between
the ops team and the
security team. So it's big Amazon shops with a dev team and a couple of security people,
basically. Right. And that's kind of that use case of they don't have time to build out a
whole security operation center or hire a whole bunch of people. They don't have time to look
through log files. They barely have time to get the software shipped over the line and working functional. And security is definitely important, but it's kind of in service to the core mission.
The other type of customer that we currently have is really where the security hat is worn
by the engineering and ops teams.
Meaning you'd have a security champion on the team where if somebody said, hey, who's
responsible for the security of your production environment? It would be somebody on the ops team or somebody on the development team.
And it's that person's 10th job, right? It's the end of the day. So they're looking at,
hey, I don't want to look at another vulnerability management tool that's going to tell me the 10,000
or 20,000 things I have to go fix. Just tell me about the things that I need to know about right now.
And that's exactly what we do, right? Which is this happened in your environment. It's not risk
that may or may not at some later date be realized. It's like, no, this happened, right?
And this is important for you to go fix. All right. Well, John Lilibidi, thank you so much
for joining us on Snake Oilers to walk through Clear Vector. Very interesting stuff.
Thanks a lot.
That was John Laliberte there from Clear Vector. Big thanks to him for that scans the entire internet and maps it all out.
And once they've got all that data in one place, it's useful for a bunch of things, right? So if
you're doing threat hunting and CTI work, you can get yourself some amazing visibility into attacker
infrastructure, you know, look at C2, cobalt strike beacons, that sort of thing. But you can also use
Census to do asset discovery and you can find all sorts of assets you don't even know you have.
The bigger a company gets,
the harder it gets to do asset discovery via scanning
where you're expecting to find stuff.
I mean, sure, you can give scanners access
to your AWS tenant
and get it to bring back a lot of great information,
but what about that company you acquired three months ago?
This is an acquisition nobody told the security team about.
You know, what about all the cloud computing environments
that you don't know you have, right?
So you can't really ask a scanner to go look there.
So that's when these sort of, you know,
whole of internet discovery tools like census come in handy.
You know, they're constantly scanning the internet
and attributing assets to organizations,
you know, assigning them, this asset is owned by this org, through a variety of means.
And as you'll hear, Census has even had to argue with some of its customers who have insisted, these assets attributed to us, they don't belong to us, and they are eventually proven wrong because it's that case of like an acquisition or something weird happening.
Anyway, here's Census' founder, Zakir Durumeric, talking all about Census.
So Census is really all about mapping out what the internet looks like. What is every IP address,
every host, every website, every autonomous system. And this data gets used for all sorts
of purposes. I suspect most folks listening to this show know us through our community search
engine, but it's also used in a lot of enterprise environments to manage attack surfaces, to
understand companies during the acquisition process, also to understand the infrastructure
that threat actors have stood up, things like Cobalt Strike or C2 servers. And when those show
up in the logs, our data can also help pinpoint what those are. When we started this, we actually started as an academic research project, and we were really committed to the
quality of the data and really the consistency of that data. And that was, I think we've sort
of continued to really focus on that, focusing on making sure we find things quickly, making sure we
understand how to fingerprint different enterprise applications,
different types of adversary infrastructure. But I think you're right. The idea is very similar. We're trying to go out and map things, but we end up looking at a fairly different set of the
internet during that process. Yeah, it's less about finding just random open RDP and more about
mapping C2 and, you know, cobalt strike beacons and all of that sort of stuff. Tell us about the
C2 stuff, because I understand like, you know, if you're a threat Intel company,
like you're pretty much guaranteed to be a census customer at this point.
Sure. And I think you're right there.
A lot of this data is used by many of sort of the CTI companies,
but it's also used by threat teams within different enterprises as well.
When they're trying to make sense of logs in retrospect and
understand why was there a flow to this particular asset? Was it something that we owned? Was it
something that belonged to one of our subsidiaries? Or was it something that was controlled by an
adversary? They want to be able to go look up at a very specific point in time, what was this? What
did it look like beforehand? What did it look like afterwards? Does it have evidence of running a piece of C2 software? Does it have something like Cobalt Strike on it? Are there
open directories that actually have data that was leaked out of one of these companies?
But we'll have a look at this open directory and see if our stuff is in it.
But we see that stuff sometimes. We'll see actually the tools that they're using. We'll
actually see data that's been pulled out of companies as part of an extortion scheme,
as part of the new types of ransomware we're seeing recently.
We'll oftentimes see that data actually on some of those servers.
So what does your typical customer look like?
I mean, we know that the CTI firms like to use Census for various things, but what does
a typical enterprise using Census look like, and what are they using it for? Yeah, I think the largest use case we have are
mid to upper sized enterprises who are really so complex that they are trying to make sense of who
they are. What do they have exposed out there on the internet? And this might be because they are
spread across dozens of different cloud accounts or different cloud providers. It might be that they are sort of doing acquisitions quickly enough that they are trying
to bring in infrastructure that they themselves didn't set up. And they have questions when
something like one of the sort of new vulnerabilities drops that's being used by a ransomware
actor of, are we using Avanti? Where is it? Are any of our subsidiaries using it? Are any of our offices using it? But a lot of these
folks are trying to make sense of their environments. And for a lot of these big companies,
their environments are bigger than some countries these days. They have hundreds of thousands of
assets. They're spread across dozens of nations, dozens of cloud providers. And in some cases, these will be in existing tools.
But in a lot of cases, it's not possible to connect to every cloud environment out there
and to monitor what's in it.
And you want to have something that gives you sort of this cohesive visibility into
everything that you own.
Now, that's an interesting problem, right?
Because as I mentioned,
I've always been interested in discovery tools.
And one of the trickiest things
that you have to do as an asset discovery play
is to do, especially when you're scanning internet wide,
like you are, is how to attribute assets to organizations, right?
Like it's really hard,
especially now that everything's in the cloud.
You know, you find this cloud thing, how do you then know it belongs to
Acme Corporation, right? So, I mean, it's an, as I say, it's an interesting field. Why don't you
explain to the listeners, like how you actually go about doing some of this attribution?
Yeah, this is a fantastic question. And as you said, it's really hard. It's actually much easier
to collect all the data than to figure out who owns all of these different assets. How do you do this in close to real time?
How do you handle things like auto-scaling containers in the cloud? We use really a ton
of different data sources to do this. A lot of it is that we will look directly in our global scans.
We'll look at everything from favicons to copyright statements to shared
certificates, shared keys, shared configurations of how you... All this sort of stuff, right?
Yes. Yes. And I'll say that's part of it too, is the who is. But then it's the question of how
many hops out do you go? Someone with this domain registered another domain. And that domain has
websites on it and they tie to something else. We look at active
DNS data and passive DNS data. We're starting to crawl websites more. We actually connect to
every name and certificate transparency and other DNS data sets. So we're not just scanning IP
addresses. We're scanning every name out there and pulling that data in and looking for those
connections. We actually pull in data from data sources like
D&B and Crunchbase. So these business databases to figure out what subsidiaries are of a company.
I mean, some of these companies, they might even not necessarily know what they own.
I was going to say, it's a little bit upsetting, but on a somewhat regular basis, we have folks
that will say something like, that's not ours. And we'll say, actually, you acquired them last year.
Someone forgot to tell the security team.
Exactly.
Exactly.
And that happens.
It's wild, but it's true.
It happens a lot.
It happens way more than you'd sort of expect or hope.
I mean, this is the advantage of using one of these sort of internet-wide discovery tools, right? It can answer those sort of questions.
Yeah, I think that's a lot of it. I think one of the things that folks really like is that what's
in front of them is real. It's exposed. We can tell you that because we can see it. So there
are these questions of maybe this host is exposed or maybe it has these problems. What we can say
is, no, it actually is,
this is the exact time it showed up. And this is how it links back to your organization.
And these are the problems with it. People oftentimes also will combine that data with other
tools. They'll integrate us in with maybe their VM tool, or with their cloud.
So it pops up as like belonging to them, and then you can kick it off to Nessus for a scan.
You can ask questions like, what do we have exposed kick it off to Nessus for a scan or whatever. Exactly. And you can ask questions like,
what do I have exposed that's not in Nessus, right? Or of the things that are coming through
in Nessus, which of these are actually exposed on the public internet? And those questions are
actually really important. We actually have to decide what are we going to fix? I think that
we operate in a world today where we don't have time to fix every single security problem to
handle every single thing that comes through our system. Well, this was going to be my next question. It's one thing to
be able to kick it out to your web app security scanner or Nessus or whatever and task that to
go have a look at it. But I'm guessing there are certain issues that when you have done a scan,
you've attributed an asset to an organization that is a customer of yours, and that asset happens to be
particularly risky, I'm guessing you flag that somehow, right? Because that's what most asset
discovery people do. That's absolutely right. And I think at the end of the day, for a lot of folks,
asset management is not where they're necessarily spending time. I think we wish that everyone was
spending more time on asset management, but I think a lot of it comes down to how do we actually
prioritize the problems or the risks that we have. And so identifying those risks is actually a huge part of our product
is saying, what is it that you actually need to fix? Which of these vulnerabilities have exploits?
Which of those exploits are actually being used by different actors? But we have folks who will
come in and say, okay, for this vulnerability that just appeared, what's affected? But we also see it on us to go out and be keeping track of what's being exploited, to understand that from other data sources and saying, here's what you really do need to go patch. Here of the stuff that you're identifying at the moment, like what's the most commonly uncovered like horror show, risky, like smack the big red button issue that enterprises, that your customers are dealing with at the moment?
Because I'd imagine there would be a handful of like things that are cropping up repetitively and consistently.
I would say a big one for our enterprise customers is actually the enterprise back office applications themselves.
Like Pyroll systems and things like that?
Yes.
Yeah, yeah, yeah.
And we look at sort of everything from industrial control systems and water systems and these pieces, which I think we sort of talked about.
But what gets people are sort of the boring, right?
It is the thing that is not behind SSO that has some web login.
And the amount of these applications that are out there are scary.
And I think for us is the percentage of these that really don't go patch, right?
We can track things like the patch curve for every major vulnerability,
for every major application out there.
And the amount of this that doesn't get patched
and actually the amount of it
that actually gets taken offline
when a vulnerability comes out is fascinating.
A lot of times we don't see things patched,
we just see them pulled offline,
which I think is sort of a signal
that these were applications
that weren't really being used in the first place.
Externally anyway, yeah, yeah.
They were just sort of there until someone realized.
And I'm guessing some of them, too,
they're going to be like self-hosted,
but in cloud environments.
So you might have some payroll system running an EC2
on AWS and whatever, and you've done that attribution,
and they go, oh, you know?
Yeah, no, I think that's right.
These are systems that sort of end up all over.
I think that cloud is probably
where folks see the most surprises.
Just because on-prem
you sort of have to get an exception made to actually
put something out there. But it isn't
necessarily the sort of
marketing site that someone just forgot about,
which I think is how a lot of people think about
ASM as sort of just what
fell through the cracks or just what is the
shadow IT. No, it's stuff that's being used.
It is stuff that's being used.
Yeah, the things that scare people
aren't necessarily the shadow IT.
It's actually the legitimate app that is out there
that hasn't been patched or doesn't have,
it doesn't even have a patch to apply to it yet
as we've seen in a lot of these attacks
the last year or two.
Cough, Avanti, cough.
Yeah.
All right, well, look look let's wrap it up
there Zakir Durumeric
thank you so much for joining us to talk
to us all about census very interesting stuff
thank you
that was Zakir Durumeric
there from census and you can find them
at census.io that is
c-e-n-s-y-s.io
they've even got like a large language
model based query builder,
which is like fun to play around with.
So you can go find that and,
you know,
turn natural language into census queries.
Fun times anyway.
But that is it for this edition of snake oilers.
I do hope you enjoyed it.
I'll be back soon with more risky biz for you all.
But until then,
I've been Patrick Gray.
Thanks for listening.