Risky Business - Snake Oilers: Kodex, ClearVector and Censys

Starting point is 00:00:00 Hey everyone, and welcome to this edition of Snake Oilers. This is the podcast we do here at Risky Biz HQ a few times a year, where vendors get to come onto the show to pitch you their wonderful wares. This whole thing is sponsored, and yes, that means everyone you're about to hear from in this podcast paid to be here. If you are looking for the regular weekly Risky Business podcast, just scroll one back in your podcast feed. They're the editions that have the numbers after them. So you could find one there. So yeah, we're going to hear from

Starting point is 00:00:35 three vendors today. The first is Codex, which can help you deal with law enforcement data requests and stop law enforcement impersonators from getting their grubby hands on your data. The second is ClearVector, a startup that makes a security tool for cloud environments that lets you detect and remediate badness. And the third is Census, which maps out the internet and lets you discover and identify company assets you did not know you owned. It's also useful for CTI people and threat hunters who want to explore attacker C2 infrastructure and you know find cobalt strike beacons and things like that. That is coming up later but let's get into it now with our first snake oiler Codex. Now unless you've been living under a rock the last few years you would know that forged law enforcement data requests are a huge problem. Attackers take

Starting point is 00:01:26 over some cop's email account and then they use it to send emergency data requests to telcos, social media companies, you know, whoever. And this is a way for them to walk data out without having to do any actual hacking. But beyond bogus requests, you know, dealing with law enforcement can be a lot of work and require a lot of specialist knowledge. And that's why former FBI agent Matt Donahue founded Codex. Codex is a system that basically takes over a lot of that law enforcement relations stuff from your company. They verify law enforcement contacts and identities, and they even vet the data requests that are being sent to your company. Now, sometimes law enforcement agencies will ask for too much data

Starting point is 00:02:08 or ask for stuff that they're not actually entitled to ask for. So Codex, yeah, verifies the identities of the law enforcement agencies sending the requests or the agents sending the request and checks the requests themselves to make sure everything is honky-dory before you have to do anything. Their system even spins up like a dedicated law enforcement portal for each customer where they can share information with investigators like data request guides, glossaries, whatever. The whole point of Codex is

Starting point is 00:02:35 if you're finding dealing with law enforcement difficult or risky due to impersonators, Codex fixes that problem. So here's Matt Donahue explaining why he started Codex. So Codex basically helps companies directly and authentically communicate with government agencies who are reaching out, requesting user data in the form of subpoenas, search warrants, or other illegal process to further their investigations. And basically, this all started through my time doing counterterrorism intelligence with the FBI. I always joke how a lot of it was a childhood dream fulfilled, you know, the meeting with sources in hotel rooms, given an envelope full of cash. A lot of it really did feel like a movie. But it was also very disillusioning to

Starting point is 00:03:22 see beyond the veil and see how government operates compared to what I think the outside observer would think. It ain't like in Hollywood, right? The human intelligence aspect is a lot like Hollywood and it was a lot of fun. But the first time they told me to fax a subpoena to a telecom provider in Southern California, I genuinely thought they were just messing around. I'm like, yeah, you thought that asked you to go and buy some headlight fluid or a left-handed screwdriver. Yeah. Literally.

Starting point is 00:03:51 I'm like, yeah, yeah. New guy. I get it. How do I really do this? And they're like, what do you mean you fax it? I'm like, what do you mean? This is the goddamn FBI. I've never sent a fax in my life.

Starting point is 00:04:00 What the hell are we doing here? And it just blew my mind that for as important as this process is basically sending out data requests all these different companies and getting the pieces of the puzzle uh together it is the most archaic and antiquated and chaotic part of the modern investigation but it's also the the most important because even if you just look at publicly available transparency reports, the volume of requests across all different industries, not just the Facebooks and Googles of the world, but across the social media messaging, telecom, traditional finance, fintech, crypto, gaming, basically all every industry like car industries

Starting point is 00:04:47 cars are basically data companies now they're mapping the worlds and i don't think you'd ever expect like ford motor company to be getting government data requests but it affects everyone from a 7-eleven to a bank of america and i just kind of thought something like codex already existed before i was in the fbi and i was stunned to realize how chaotic it actually was where well i mean you know that's that's what i mean by it ain't hollywood because in hollywood they say i'll go put this in the system and then bang and then in reality you realize there is no system you know there's some post-it notes and maybe a couple spreadsheets that some people have uneven access to. But, you know, it ain't like it is portrayed.

Starting point is 00:05:31 And, you know, that said, though, these sorts of systems are only a problem when they become a problem. And they have become a problem. I mean, we've seen a lot of people fraudulently obtaining data via what are they called? Emergency data requests like that's been a huge thing over the last couple of years where some of these you know criminal you know cyber crime rings and stuff are exploiting this to do various things to get people's personal details of you know rivals or whatever or just for swatting attacks and whatnot so like this thing has really become a problem hasn't, over the last couple of years?

Starting point is 00:06:06 Yeah, absolutely. And that was actually one of the straws that broke the camel's back when I was in the FBI of we were reaching out to a company who was a new social media company at the time, 2017, 2018. And on our side, it was very frustrating because we weren't getting any response to get the digital evidence to lock these people up. And on their side, come to find out they had never received a piece of legal process before, and they didn't believe it was actually the FBI. And I don't actually blame them because no one builds a company with one day that the expectation that the FBI or someone else is going to come knocking. But the unfortunate reality is, is that people are going to end up abusing your product in ways in which you never wanted nor intended. And as a result of that, like business email compromise and social engineering have been a fan favorite of the hacking community or fraudsters for decades. And with the advent of

Starting point is 00:07:01 law enforcement requests, especially emergency requests, that gave them a new avenue and a soft underbelly into a company that you can put up as many cybersecurity software infrastructure as you want, but if you're leaving this gaping hole into your company and into your users' data, there's a big flaw in your security posture there. There are 135 countries that are using Codex to interact with our customers to date to defend against law enforcement impersonators who will oftentimes abuse law enforcement credentials that are traded like Skittles on the dark web and on Telegram. Yeah. I mean, it's funny, right? Because you just spoke about a social media company that got outreach from the FBI and they didn't believe it, which I suppose is par for the course when the person rings up and asks them for their fax number.

Starting point is 00:07:53 So, you know, like that, yeah, that makes sense. But like, why don't we talk about actually how Codex works, who the customers are and whatnot, because, you know, essentially you've built like a identity, you know, it's like an identity service really, so that you can be much more confident that you are talking to who you think you're talking to. So, you know, trying to get around that issue of, you know, just one stolen credential allows people to impersonate a law enforcement agency. So, you know, so essentially there's a verification component to this and it's sort of like a communications component to this. Why don't you actually just tell us in simple terms, like what Codex actually does, how it works? Yeah, absolutely. agents across over 12,000 agencies, as I mentioned, in more than 130 countries across the world,

Starting point is 00:08:47 where anytime someone needs to reach out to one of our companies, they now go through Codex. Our 24-7 verification team vets all of these users before they're ever allowed into the system, so that once one of our customers is interacting with a government user, they know that it's legitimate, they know that it's been vetted, and they know that they're part of this herd immunity that is the Codex network. and security posture is that the same person who is using a Polizia to Stato email address to fraudulently request information from Facebook is doing the same thing to Verizon. I don't know if you saw the news before Christmas about something like that. They're doing it to Google. They're doing it across companies and they don't have a way of protecting each other. Comparing notes. Like I'm with you. Yeah. I get it. So say I'm your customer, right? I'm running an up and coming social network. I get some sort of data request from a

Starting point is 00:09:51 cop in England. I'm your customer. What do I do then? Do I reach out to Codex and say, this person's trying to contact me for a data request, go and vet them, go and verify them? And how does that gel with something like an emergency data request, which, you know, has to be expedient? So I'll start by like the process of law enforcement encountering Codex. So basically, pre-Codex, a company would have an email address at best, like lawenforcementatcompany.com, where any government agent from anywhere in the world would just email that singular inbox that shared amongst up to again up to probably like 50 60 plus uh analysts on the company side in some cases and you can imagine how complex and difficult that is and compounds with the variables including like there are 18 000 law enforcement agencies just in the US, not to mention the other 192 countries in the world. And so the moment they adopt Codex,

Starting point is 00:10:51 that email address turns into an auto reply, basically saying, hey, we're no longer accepting service through this email. Please go to our new government request portal. They're redirected there and prompted to either sign in if they already have an account or sign up if they do not yet have an account. And one of the things that either investors or customers have always asked is like, how did you get the FBI or the Met Police out in the UK to use Codex? And like, the answer to that was really the opportunity caused by the problem of my question back in Southern California. Why am I faxing this? Yeah.

Starting point is 00:11:28 Because that's how they had it set up. Well, I think it's also like you don't charge the law enforcement agencies for this, right? So like it's solving a problem that costs companies money, adds risk. And I'm sure, you know, the police that I know, like something like this comes along, they're like, sure, as long as it doesn't cost us anything, because we don't have budget for this. Exactly. And it's in their best interest to go the path of least resistance to where if a company has a formal method of accepting these, going outside of that method is only going to impact the timeline of their own investigation.

Starting point is 00:12:01 So hang on, speaking of, and sorry to cut you off, it's just we are kind of running a little bit out of time at this point. So, you know, just going back to that example of like an emergency data request coming in. Okay, so I'm a Codex customer. Someone has contacted you with an emergency data request. You've got a 24 by 7 verification team. But like, what's the process there

Starting point is 00:12:23 if you can't verify the person? Is it that someone at Codex will reach out to someone else who is a trusted contact at that law enforcement agency? Or, you know, tell me how that works. Yeah. So that's part of the process. It's a multi-layered approach that has both human involvement and automation in order to accommodate for scale as well. And so someone who signs up through one customer's portal, for like LinkedIn, for example, they're a customer. If someone signs up through LinkedIn's portal,

Starting point is 00:12:53 we verify and authenticate them. They're now not just verified for LinkedIn, but also for Binance and OpenSea and all the other customers that we have. So they share the network of verified government agents and agencies that allows us to do suspicious activity monitoring across the government agent user base. So I guess what you're saying is the idea that someone

Starting point is 00:13:13 who'd never encountered any of your customers in the past lodging an emergency data request, like that's going to be a corner case. I wouldn't say it's a corner case, actually. Like that, you never know when and where in law enforcement someone's going to actually need to reach out to any of the thousands of companies, including any of the companies on Codex. points of contact across the world but it also includes under the hood information of is this ip address known to be associated with this agency based on the other users in that agency is it a browser or device that we're expecting based off to the other users in that agency like i always joke that like no one's going to be using a macbook on dark mode logging in through tor if they're a legitimate government agent serving an emergency.

Starting point is 00:14:06 And so there is low hanging fruit like that that we accommodate for. But also, let's say it is an emergency, they pass these base layer checks, there's a part within the signup flow to identify like, hey, this is an emergency, lead me to the top of the queue. And so at that point, we allow them in again, under the controls of our suspicious activity monitoring and signals intelligence. And part of the process that has always been so burdensome, and I think it unfortunately leads to the taboo nature and the demonization of this process is that people like to say government data requests. Oh, it's big brother wanting to gobble up all your data.

Starting point is 00:14:46 Anyone who's ever gotten a 13,000 page single spaced PDF data return in a search warrant does not want all of that information. They want the most granular and narrowly tailored piece of information that's going to drive their investigation forward. Very interesting conversation. Matt Donoghue, thank you so much for your time and all the best with it. It sounds like a really worthwhile endeavor that I think we're in a better place if people use something like this. So cheers. Yeah, no, I appreciate it. Really appreciate the time. Thanks for having me. That was Matt Donoghue from Codex there,

Starting point is 00:15:19 and you can find them at codexglobal.com. And that's Codex with a K. So k-o-d-e-x global.com. Next up in snake oilers, we're going to hear from Clear Vector. Clear Vector makes a monitoring product that helps you see how your cloud environments are actually being used by your developers. And if a breach happens, they'll help you to understand the blast radius. There's even an isolate button, which response teams can smack real hard. So that's always satisfying. And I guess, you know, ClearVector's core philosophy really is that what matters is human identities and mapping those out from the nuts and bolts of your AWS GCP or GitHub tenancies, you know, with lambdas and IAMs and all that complexity, that map is what

Starting point is 00:16:01 you really need to make sense of everything that's going on. So John Laliberte is a ex-FireEye Mandiant guy and the founder of ClearVector. And he joined me for this interview and started off by explaining what ClearVector actually is. ClearVector hooks up to your cloud environment and looks at all the runtime activity in your cloud. So think about that in terms of the control plane activity, the activity inside of the workload. So think like EC2, Lambda, things like that. We also hook up to SaaS applications like GitHub and connect all of that together and build models of the identities. And then we surface all of the risky activity to you in the form of notifications in either Slack or Teams, and then give you the ability to stop

Starting point is 00:16:43 the activity by identity by giving you a big red isolate button in the product that will prevent that identity or entity from doing anything else in your cloud environment. So when we're talking about identities, are we just talking about like user accounts and developer accounts? Or are we talking about sort of like, oh God, what would you even call them? Like sub identities are like you know access tokens that have been generated by a you know developer identity and then it'll be used being used to do xyz like i mean i guess that's probably what you're trying to address right is this is this sprawling diagram that stems from you know the real identities exactly we take a look at all that right so one of the big questions that people, they may be notified by someone or they may find out, hey, I have this access key.

Starting point is 00:17:28 Well, they can't answer the question, what happened with that access key? Whose access key is it, right? That can be a hard question. That's a basic question. Yeah. And so as an example, what we can tell you is the prominence of that, right? We can tell you this is the identity that created that key, and then this is the identity that actually used the key, right? And for something like a CICD role, which we, you know, you've met briefly mentioned in terms of kind of machine accounts or SaaS accounts, we can actually tell you the developer that approved a particular PR that caused the activity by this role in your cloud account, which helps out a lot of different ways, not just from a traditional availability,

Starting point is 00:18:02 like, hey, who broke production, but also from a security perspective to be able to say like, well, yes, we know where this change originated from and be able to tie that back to the identity itself. Yeah, right. So what sort of things are you most likely to, well, I mean, you know, this thing's in prod, people are using it, right? So what sort of things is it most often surfacing in terms of popping up that big red button, which says isolate? So today it kind of falls into two camps. The first camp is basically third party risk. So think all those vendors that you have plugged into your cloud environment that are doing

Starting point is 00:18:38 all kinds of things. It's pretty much every single time we plug into a new environment, it kind of all the alarms go off, the red button pops up. And it's like, hey, did you know that this, you know, identity is sharing this with some other AWS account, or, you know, this thing is sending your data over here. And invariably, what that does is kind of, you know, person looks at it, and they're like, well, that wasn't really part of how they thought about that vendor of what they were actually doing. And so that's kind of the first case. The second case is when there actually is an adversary in the environment doing something that's unexpected or is actually risky.

Starting point is 00:19:15 And so in that case, the isolate button does help because at most cloud accounts, it's actually very difficult. Once you get into, 10, 20, 40, 50 different accounts. It gets very difficult to track who did what in each account. And when you think about isolation from that perspective, it's not as simple as just saying disable the account because there's active sessions open for that particular identity with all these different accounts. And so that's some of the hard things that we do under the hood that kind of backs that isolate button. And obviously all of that's available via API. So you can integrate that with, you know, any kind of a product that, that can do that. Yeah. So I find this attribution component, like that seems to be the special thing here, right. Which is to be able to tie an action back to, you know, the origin, right. In terms of an

Starting point is 00:20:02 account or a user. But what about, let's talk a little bit about the detections, which are firing on funny things. You know, what are the dead giveaway kind of things that'll happen in a cloud environment that'll result in you actually, you know, going through that process of tracking something back to the origin? What are the, you know, what are the things you're firing on, I guess is what I'm asking. So first we look at all the activity and we map it back to identities all the time. So it's not like we say, oh, this particular identity, we're going to track all of it. Yeah, and then press a button and then it goes and finds it. It's all there.

Starting point is 00:20:36 It's all there all the time. It's all there all the time. And so we basically shred all of that activity, whether it's from things that the cloud providers provide or it's our own instrumentation, like our eBPF Rust-based agent. We take all of that signal in and then we basically shred it. We turn it into natural language. We put it into a custom graph that we have patented. And then we look over periods of time. So after we go through that kind of identity attribution phase, we then have what we call

Starting point is 00:21:02 a risk engine. And there's lots of different subsystems of the risk engine. Yeah. So this is the bit that I'm asking about, right? Because I'm real curious to know, like once you've got all of this and yeah, if you're throwing it into some sort of, you know, graph-based thing, yeah, you're going to see, you're going to start to see some interesting stuff and get a sense of like when something's gone sideways, you know, you're going to start to detect that. But yeah, I guess what I'm asking is like, what does it look like when it goes sideways and how do you detect it? So all that activity after it goes into the graph, it goes into this risk engine and think of the risk engine almost as like this team of cloud security experts that's looking at all this

Starting point is 00:21:36 activity. So it's like the security team you wish you could afford and build. And it's kind of this, you know, ever present looking across everything, right? And it looks over periods of time. And it's kind of this, you know, ever present looking across everything, right? And it looks over periods of time. So it's like as new activity is added to this model for particular identities, it can basically look and say like, okay, what does this risk level look like over time, and then surface that notification to you. And it doesn't do this, it's not like this rule matching thing, right? It's a it's a pretty robust algorithm behind the scenes that's constantly being improved. Think of it like a recommendation type of an algorithm that automatically kind of surfaces that to you. And what it looks like is a natural language narrative. So it's kind of how you would describe to, you know, whether it's your boss or somebody

Starting point is 00:22:19 who's not a security expert of what happened. It's not the technical bits and bytes. It's not even necessarily the IP addresses or anything like that. It says, hey, this identity did these things, and we think it's risky for these reasons, and you should click isolate on this and stop it. I guess I'm still going back to that original question though. You say this identity did these risky things. What are the risky things? What is typically being surfaced by Clear Vector in these environments? So a few things. First thing is when you take data out of the environment or you share it outside the environment. So that's a typical indicator of either an attacker from a data theft perspective,

Starting point is 00:23:00 or it's a kind of indicator of a developer, you know, unintentionally sharing data outside the environment. The other thing is thinking about new vendors that are introduced to your environment. So meaning new identities introduced to your environment without the kind of thought process that goes into, well, should I allow this, you know, new identity in the environment? The other thing is kind of things that are dormant, right? Like say access keys that haven't been rotated or used in a long time, and then suddenly kind of, you know, burst on the scene. Right. So that's a good one. That's the first one where I'm like,

Starting point is 00:23:33 wow, okay. That's, that's, you know, a dead giveaway, right? Like some old token has been sitting around for like 12 months untouched. And then all of a sudden it's, yeah, extremely active. I mean, that's going to jump out, right? Right. And so there's a lot of these kind of really hard situations to figure out of, you know, let's take that access token as an example. Okay, great. How did that get there, right? Who created it? Like, is this authorized?

Starting point is 00:23:55 Is this unauthorized? And so being able to say, yes, this was the identity they created, it gives you the ability, you know, with in minutes to go talk to the actual human that created that access key and turn what normally is like, you know, jump into a SIM or something else, right? Do queries for a few hours, try to like, you know, get through all the hoops and do all of that kind of on demand. So that's kind of the kind of pain point that we saw was it turns us like hours or days on process into, you know, hey, just 30 seconds or two minutes, because we've been able to do that ahead of time. But even just knowing when something dormant gets active again, right? Like actually

Starting point is 00:24:31 having something that can tell you when that happens, that's going to be useful. Just that on its own is incredibly useful. Yeah. And the other thing is lateral movement is one of the big things that we're able to do given all the technology that you have, right? So think about there's a cloud control plane, and then there's the actual kind of inside the workloads, which is where a lot of, you know, the traditional things that we already know in security live. And so, you know, my belief is that there's always attackers

Starting point is 00:24:55 that live in the cracks and gaps, you know? So that's kind of that new area for lateral movement where historically it was more on the Active Directory side with lateral movement. Now in the cloud, you can laterally move between the control plane and the workloads and the SaaS apps and kind of like all of these areas need to be combined in order to really get a sense of who is doing what where before you can even have a conversation about kind of what to detect or is it bad or is it risky?

Starting point is 00:25:23 And so that's kind of one of the core things that we can also do with our graph because we're connecting, you know, what happened inside that EC2 with what happened at the control plane layer, and then be able to connect that back, you know, to say your SSO user, you know, from something like Okta in your IDP. Well, it's interesting that you mentioned that, right? Because you did briefly say something about an agent, and I'm guessing that's what you're using to look inside those EC2 instances. into the environment where they want a full auto log along with the identity context, right? Of who's logging into these boxes, what are they running on the box? And also workloads like Lambda and other things, right? Where it's very difficult to get visibility into these serverless

Starting point is 00:26:14 workloads because by definition, right, there is no server, which is how we normally deploy a lot of our security tooling and security software. And so that's a lot of original R&D and time that's gone into that to detecting kind of all of those different conditions around Lara movement and the traditional auditing of these cloud workloads. So who's using this? Do you have a typical customer yet or it's still too early? We definitely have a typical customer. So it basically looks like a fairly good sized company that makes a lot of their money by shipping their SaaS application. And they basically got to the point where they have a lot of data sitting in their SaaS application and they probably have a CISO,

Starting point is 00:26:58 maybe one or two security people. And security kind of is still a shared responsibility between the ops team and the security team. So it's big Amazon shops with a dev team and a couple of security people, basically. Right. And that's kind of that use case of they don't have time to build out a whole security operation center or hire a whole bunch of people. They don't have time to look through log files. They barely have time to get the software shipped over the line and working functional. And security is definitely important, but it's kind of in service to the core mission. The other type of customer that we currently have is really where the security hat is worn by the engineering and ops teams.

Starting point is 00:27:38 Meaning you'd have a security champion on the team where if somebody said, hey, who's responsible for the security of your production environment? It would be somebody on the ops team or somebody on the development team. And it's that person's 10th job, right? It's the end of the day. So they're looking at, hey, I don't want to look at another vulnerability management tool that's going to tell me the 10,000 or 20,000 things I have to go fix. Just tell me about the things that I need to know about right now. And that's exactly what we do, right? Which is this happened in your environment. It's not risk that may or may not at some later date be realized. It's like, no, this happened, right? And this is important for you to go fix. All right. Well, John Lilibidi, thank you so much

Starting point is 00:28:23 for joining us on Snake Oilers to walk through Clear Vector. Very interesting stuff. Thanks a lot. That was John Laliberte there from Clear Vector. Big thanks to him for that scans the entire internet and maps it all out. And once they've got all that data in one place, it's useful for a bunch of things, right? So if you're doing threat hunting and CTI work, you can get yourself some amazing visibility into attacker infrastructure, you know, look at C2, cobalt strike beacons, that sort of thing. But you can also use Census to do asset discovery and you can find all sorts of assets you don't even know you have. The bigger a company gets,

Starting point is 00:29:08 the harder it gets to do asset discovery via scanning where you're expecting to find stuff. I mean, sure, you can give scanners access to your AWS tenant and get it to bring back a lot of great information, but what about that company you acquired three months ago? This is an acquisition nobody told the security team about. You know, what about all the cloud computing environments

Starting point is 00:29:28 that you don't know you have, right? So you can't really ask a scanner to go look there. So that's when these sort of, you know, whole of internet discovery tools like census come in handy. You know, they're constantly scanning the internet and attributing assets to organizations, you know, assigning them, this asset is owned by this org, through a variety of means. And as you'll hear, Census has even had to argue with some of its customers who have insisted, these assets attributed to us, they don't belong to us, and they are eventually proven wrong because it's that case of like an acquisition or something weird happening.

Starting point is 00:30:03 Anyway, here's Census' founder, Zakir Durumeric, talking all about Census. So Census is really all about mapping out what the internet looks like. What is every IP address, every host, every website, every autonomous system. And this data gets used for all sorts of purposes. I suspect most folks listening to this show know us through our community search engine, but it's also used in a lot of enterprise environments to manage attack surfaces, to understand companies during the acquisition process, also to understand the infrastructure that threat actors have stood up, things like Cobalt Strike or C2 servers. And when those show up in the logs, our data can also help pinpoint what those are. When we started this, we actually started as an academic research project, and we were really committed to the

Starting point is 00:30:49 quality of the data and really the consistency of that data. And that was, I think we've sort of continued to really focus on that, focusing on making sure we find things quickly, making sure we understand how to fingerprint different enterprise applications, different types of adversary infrastructure. But I think you're right. The idea is very similar. We're trying to go out and map things, but we end up looking at a fairly different set of the internet during that process. Yeah, it's less about finding just random open RDP and more about mapping C2 and, you know, cobalt strike beacons and all of that sort of stuff. Tell us about the C2 stuff, because I understand like, you know, if you're a threat Intel company, like you're pretty much guaranteed to be a census customer at this point.

Starting point is 00:31:32 Sure. And I think you're right there. A lot of this data is used by many of sort of the CTI companies, but it's also used by threat teams within different enterprises as well. When they're trying to make sense of logs in retrospect and understand why was there a flow to this particular asset? Was it something that we owned? Was it something that belonged to one of our subsidiaries? Or was it something that was controlled by an adversary? They want to be able to go look up at a very specific point in time, what was this? What did it look like beforehand? What did it look like afterwards? Does it have evidence of running a piece of C2 software? Does it have something like Cobalt Strike on it? Are there

Starting point is 00:32:10 open directories that actually have data that was leaked out of one of these companies? But we'll have a look at this open directory and see if our stuff is in it. But we see that stuff sometimes. We'll see actually the tools that they're using. We'll actually see data that's been pulled out of companies as part of an extortion scheme, as part of the new types of ransomware we're seeing recently. We'll oftentimes see that data actually on some of those servers. So what does your typical customer look like? I mean, we know that the CTI firms like to use Census for various things, but what does

Starting point is 00:32:42 a typical enterprise using Census look like, and what are they using it for? Yeah, I think the largest use case we have are mid to upper sized enterprises who are really so complex that they are trying to make sense of who they are. What do they have exposed out there on the internet? And this might be because they are spread across dozens of different cloud accounts or different cloud providers. It might be that they are sort of doing acquisitions quickly enough that they are trying to bring in infrastructure that they themselves didn't set up. And they have questions when something like one of the sort of new vulnerabilities drops that's being used by a ransomware actor of, are we using Avanti? Where is it? Are any of our subsidiaries using it? Are any of our offices using it? But a lot of these folks are trying to make sense of their environments. And for a lot of these big companies,

Starting point is 00:33:35 their environments are bigger than some countries these days. They have hundreds of thousands of assets. They're spread across dozens of nations, dozens of cloud providers. And in some cases, these will be in existing tools. But in a lot of cases, it's not possible to connect to every cloud environment out there and to monitor what's in it. And you want to have something that gives you sort of this cohesive visibility into everything that you own. Now, that's an interesting problem, right? Because as I mentioned,

Starting point is 00:34:05 I've always been interested in discovery tools. And one of the trickiest things that you have to do as an asset discovery play is to do, especially when you're scanning internet wide, like you are, is how to attribute assets to organizations, right? Like it's really hard, especially now that everything's in the cloud. You know, you find this cloud thing, how do you then know it belongs to

Starting point is 00:34:29 Acme Corporation, right? So, I mean, it's an, as I say, it's an interesting field. Why don't you explain to the listeners, like how you actually go about doing some of this attribution? Yeah, this is a fantastic question. And as you said, it's really hard. It's actually much easier to collect all the data than to figure out who owns all of these different assets. How do you do this in close to real time? How do you handle things like auto-scaling containers in the cloud? We use really a ton of different data sources to do this. A lot of it is that we will look directly in our global scans. We'll look at everything from favicons to copyright statements to shared certificates, shared keys, shared configurations of how you... All this sort of stuff, right?

Starting point is 00:35:11 Yes. Yes. And I'll say that's part of it too, is the who is. But then it's the question of how many hops out do you go? Someone with this domain registered another domain. And that domain has websites on it and they tie to something else. We look at active DNS data and passive DNS data. We're starting to crawl websites more. We actually connect to every name and certificate transparency and other DNS data sets. So we're not just scanning IP addresses. We're scanning every name out there and pulling that data in and looking for those connections. We actually pull in data from data sources like D&B and Crunchbase. So these business databases to figure out what subsidiaries are of a company.

Starting point is 00:35:53 I mean, some of these companies, they might even not necessarily know what they own. I was going to say, it's a little bit upsetting, but on a somewhat regular basis, we have folks that will say something like, that's not ours. And we'll say, actually, you acquired them last year. Someone forgot to tell the security team. Exactly. Exactly. And that happens. It's wild, but it's true.

Starting point is 00:36:16 It happens a lot. It happens way more than you'd sort of expect or hope. I mean, this is the advantage of using one of these sort of internet-wide discovery tools, right? It can answer those sort of questions. Yeah, I think that's a lot of it. I think one of the things that folks really like is that what's in front of them is real. It's exposed. We can tell you that because we can see it. So there are these questions of maybe this host is exposed or maybe it has these problems. What we can say is, no, it actually is, this is the exact time it showed up. And this is how it links back to your organization.

Starting point is 00:36:50 And these are the problems with it. People oftentimes also will combine that data with other tools. They'll integrate us in with maybe their VM tool, or with their cloud. So it pops up as like belonging to them, and then you can kick it off to Nessus for a scan. You can ask questions like, what do we have exposed kick it off to Nessus for a scan or whatever. Exactly. And you can ask questions like, what do I have exposed that's not in Nessus, right? Or of the things that are coming through in Nessus, which of these are actually exposed on the public internet? And those questions are actually really important. We actually have to decide what are we going to fix? I think that we operate in a world today where we don't have time to fix every single security problem to

Starting point is 00:37:23 handle every single thing that comes through our system. Well, this was going to be my next question. It's one thing to be able to kick it out to your web app security scanner or Nessus or whatever and task that to go have a look at it. But I'm guessing there are certain issues that when you have done a scan, you've attributed an asset to an organization that is a customer of yours, and that asset happens to be particularly risky, I'm guessing you flag that somehow, right? Because that's what most asset discovery people do. That's absolutely right. And I think at the end of the day, for a lot of folks, asset management is not where they're necessarily spending time. I think we wish that everyone was spending more time on asset management, but I think a lot of it comes down to how do we actually

Starting point is 00:38:03 prioritize the problems or the risks that we have. And so identifying those risks is actually a huge part of our product is saying, what is it that you actually need to fix? Which of these vulnerabilities have exploits? Which of those exploits are actually being used by different actors? But we have folks who will come in and say, okay, for this vulnerability that just appeared, what's affected? But we also see it on us to go out and be keeping track of what's being exploited, to understand that from other data sources and saying, here's what you really do need to go patch. Here of the stuff that you're identifying at the moment, like what's the most commonly uncovered like horror show, risky, like smack the big red button issue that enterprises, that your customers are dealing with at the moment? Because I'd imagine there would be a handful of like things that are cropping up repetitively and consistently. I would say a big one for our enterprise customers is actually the enterprise back office applications themselves. Like Pyroll systems and things like that? Yes.

Starting point is 00:39:11 Yeah, yeah, yeah. And we look at sort of everything from industrial control systems and water systems and these pieces, which I think we sort of talked about. But what gets people are sort of the boring, right? It is the thing that is not behind SSO that has some web login. And the amount of these applications that are out there are scary. And I think for us is the percentage of these that really don't go patch, right? We can track things like the patch curve for every major vulnerability, for every major application out there.

Starting point is 00:39:45 And the amount of this that doesn't get patched and actually the amount of it that actually gets taken offline when a vulnerability comes out is fascinating. A lot of times we don't see things patched, we just see them pulled offline, which I think is sort of a signal that these were applications

Starting point is 00:40:00 that weren't really being used in the first place. Externally anyway, yeah, yeah. They were just sort of there until someone realized. And I'm guessing some of them, too, they're going to be like self-hosted, but in cloud environments. So you might have some payroll system running an EC2 on AWS and whatever, and you've done that attribution,

Starting point is 00:40:17 and they go, oh, you know? Yeah, no, I think that's right. These are systems that sort of end up all over. I think that cloud is probably where folks see the most surprises. Just because on-prem you sort of have to get an exception made to actually put something out there. But it isn't

Starting point is 00:40:33 necessarily the sort of marketing site that someone just forgot about, which I think is how a lot of people think about ASM as sort of just what fell through the cracks or just what is the shadow IT. No, it's stuff that's being used. It is stuff that's being used. Yeah, the things that scare people

Starting point is 00:40:48 aren't necessarily the shadow IT. It's actually the legitimate app that is out there that hasn't been patched or doesn't have, it doesn't even have a patch to apply to it yet as we've seen in a lot of these attacks the last year or two. Cough, Avanti, cough. Yeah.

Starting point is 00:41:04 All right, well, look look let's wrap it up there Zakir Durumeric thank you so much for joining us to talk to us all about census very interesting stuff thank you that was Zakir Durumeric there from census and you can find them at census.io that is

Starting point is 00:41:19 c-e-n-s-y-s.io they've even got like a large language model based query builder, which is like fun to play around with. So you can go find that and, you know, turn natural language into census queries. Fun times anyway.

Starting point is 00:41:33 But that is it for this edition of snake oilers. I do hope you enjoyed it. I'll be back soon with more risky biz for you all. But until then, I've been Patrick Gray. Thanks for listening.

Risky Business - Snake Oilers: Kodex, ClearVector and Censys

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.