Risky Business - Risky Biz Soap Box: Mike Wiacek on lazy mode threat hunting
Episode Date: July 17, 2024This Soap Box edition of the show is with Mike Wiacek, the CEO and Founder of Stairwell. Stairwell is a platform that creates something similar to an NDR, but for file ...analysis instead of network traffic. The idea is you get a copy of every unique file in your environment to the Stairwell platform, via a file forwarding agent. You get an inventory that lists where these files exist in your environment, at what times, and from there you can start doing analysis. If you find a dodgy file you can do all the usual malware analysis type stuff, but you can also do things like immediately find out where else that file is in your organisation, or even where else it was. From there you can identify other files that are similar – variants of those files – and search for those. And you can unpack all this very, very quickly. This is the type of tool that EDR companies use internally to do threat hunting, but it’s just for you and your org – you can drive it. And as you’ll hear, the idea of a transparent, customisable and programmable security stack is something that’s on-trend at the moment. Mike lays out the case that doing this sort of file analysis in your organisation makes a whole lot of sense.
Transcript
Discussion (0)
Hey everyone and welcome to this special Soapbox edition of the show. My name's Patrick Gray.
And for those of you who don't know, these Soapbox editions of the podcast are wholly
sponsored and that means everyone you hear in one of these Soapbox shows paid to be here.
Today's Soapbox is with Mike Wyasek, the Chief Executive and Founder of Stairwell.
Mike is an ex-Googler who was also the founder of Chronicle, the company that was spun out of Google and then later reacquired by Google because they're galaxy brain strategic geniuses, I guess.
But yeah, after the dust settled on all of that, he spun up Stairwell. And Stairwell is a really interesting platform that
basically seeks to create something similar to a network detection and response platform,
but for file analysis instead of network traffic. The idea is you get a copy of every unique file
in your environment, you know, historically, and as they arrive, you get all of those files,
you stick them in the Stairwell platform, and you can do that via what is essentially a file forwarding agent and then you
set up an inventory also that lists where these files exist in your environment at what times
so on and so forth and from there once you've brought all of this stuff together you can start
doing some analysis so if you find a dodgy file, you can do all of the usual
malware analysis type stuff, but you can also do things like immediately find out where else that
file is in your organization or even where else it was. And from there, you can identify other
files that are similar, variants of those files. So you can identify the whole family and then
search from those and outwards and onwards and outwards and onwards.
And you can unpack all of this very, very quickly.
So this is the type of tool that EDR companies use internally on their back ends to do threat
hunting.
But Stairwell is for you and your organization.
You can drive it.
And as you'll hear, the idea of a transparent, customizable, and programmable security stack
is something that's kind of on trend at the moment.
And I actually think that's quite positive.
So here's Mike starting out by laying out the case that doing this sort of file analysis
in your organization makes a whole lot of sense.
Enjoy.
I'm a guy who built systems to play with Yara at scale, studied data in VirusTotal over and over
and over again, always constantly evolving. And I realized that from a data perception,
I knew more about the files that exist in a feed like VirusTotal than I did about the files on the
computer two cubicles down from me. And that's very strange, right? Like I had a better ability to apply Yara
over billions of files in this data set,
but not on the thousands of files
on the computer two computers over.
And how do I flip that?
Because I actually had a really cool sense
of information through this window,
but I couldn't look out that window.
And as we started trying to figure out what stairwell would do, we decided to say,
let's stop. Let's do the opposite of what we were doing with Chronicle. Instead of collecting logs,
I'm pretty certain when I left, they were in the exabytes of logs. And if they were to tell me
today, they're in the tens of exabytes of logs. And if they were to tell me today they're in the
tens of exabytes of logs stored within Chronicles, a platform at Google, I would not be surprised.
And so when you start thinking about that much data, you start thinking about that volume,
what is the actual signal in that cost? I did some math the other day, and this is absolutely crazy. If you were storing
logs on four terabyte hard drives, and I'm talking three and a half inch SATA hard drives,
if you were storing logs on four terabyte hard drives, to get to one exabyte, and if you stacked
these hard drives on top of each other, like flat, flat, flat, going up, what would be a good
representation of how big that tower would be right and it's actually it's
250 000 hard drives and if you were to stack them up on top of each other it ends up being
3.157 miles tall that i was trying to come that is tall i mean that's you know it's not as cool
as stacking them up and they go to the moon, which is the usual metric when we're stacking things, Mike.
But, you know, that's still a lot of hard drives.
That's a lot of hard drives.
I mean, I guess that's an interesting point, right? files in your environment on every single endpoint is actually less data intensive than
handling logs, which is wild when you think about it. It's so counterintuitive. People are like,
oh, but you're collecting all of the files. It's terrible. Isn't that a lot of data? And it's like,
it's not small amounts of data by any means. Yeah. But compared to what you're putting in your logs,
it's actually incredibly smaller.
And if you want proof of that, just go check your latest Splunk invoice,
because they're going to tell you exactly how many bytes.
Yeah. I like to say, it's like logs are not worthless, but they're worth a lot less than
we think they are, is basically one of the insights that we had. If I start collecting
the files and I start storing those files, that opens up so many more opportunities because now
what I have is in some sense, ground truth data, right? Like logs give me metadata about what
happened around some sort of time. But usually when you start getting down to files or, you know, thinking about like,
you know, the actual contents of a Python script that's running or PowerShell script that's
actually running, those are the actual commands that are running. That is the ground truth.
The logs is showing you, telling you about what happened when that thing ran. I'd rather have
the thing that actually ran so I can go over and look at it and be able to like reconstruct it and do analysis over it after the fact. Sure, sure. But I mean, you're not going to find everything
with files, right? Because obviously there's various types of attacks where, you know,
you're not really doing anything with files, but yeah, for the vast majority of stuff,
a hundred percent, absolutely. But you know, what occurs to me too about Stairwell is that what you've done is create something similar to the types of tools that are actually used by the EDR companies on their end to do analysis across all of the files that they encounter.
But instead, you're sort of moving that, the possibility of doing that sort of work into people's enterprises, which makes me think, I mean, there's a whole bunch of startups at the moment who are doing this sort of thing, right?
So you've got ones like Sublime Security and Sublime are doing email security, but now you
can actually throw your own rules into it, right? So it's not like one of these black box, you know,
cloud-based email security firms, you know, you actually then can come on-prem, do threat hunting on your own data, you know, spin up your own rules,
things like that.
I mean, it's, it seems like, you know, there's another vendor too, that's kind of like a
little bit stealth, so I won't mention them, but it does seem like there's this trend now
of, you know, getting more programmable, transparent and and configurable security tools into your own
premises that allow for greater flexibility, right? So that's a big part of, I'm guessing,
what Stairwell's bringing for a lot of your customers. I think that's totally true. I think
we're entering a new stage when I think about security capabilities. Before it was like,
give me some black box, which I install or I send data to, and then it tells me what's bad. And I just
act on that. And a personal soapbox of mine for a long time has been like, too many security
professionals are just consumers of alerts from security products. We should be practitioners. You would not go to a lawyer
and this lawyer simply just typed everything you said in the chat GPT, asked for legal advice,
and then he just regurgitate it back to you what the box told him to say to you. You actually want
someone to actually think about this. But in security, we've almost pushed the other direction.
It's like, just do what the thing says.
Do what my EDR says.
Do what my firewall's alerts are telling you to do.
And we've, you know, we've, I've actually had a customer say this to me the other day,
which I thought was great.
He said, stairwell is the one platform I don't want to soar to death.
And I thought that that
was a really one i took that as a as a badge of honor but he's like no i don't want to just
plug you in with an api and call it done he goes i get a lot of value by going in and looking around
um and we can talk about some of the some of the cool things we've been building uh in a bit but
like that's like that's like one of those cases where show them that
there's a workbench where they can understand how this particular thing fits into everything else
that's happening in their environment, how things may be moving around. Give them that visibility.
Most of the time, that's it. They have one alert and that's it. But in cases where there's not,
or in cases where there's co-occurring things or anything happening on other devices, you may know about
this piece of the puzzle, but you don't know about that piece of the puzzle. And being able to
construct that together across time where I'm not wholly reliant on signatures, heuristics,
an AI model, pick your magic black box of filtering good or bad,
watching the here and now and only the here and now. And so since we're basically collecting all
of these files and restoring them in perpetuity, any new bit of information almost essentially
colors the graph of what's bad and good, different shades of gray at any point in time. And so
you're always having new adaptations. And that's one of the things that customers actually often
are kind of shocked by is like, well, this was good yesterday. I'm like, yeah, well,
it's bad today. Well, this was bad yesterday. Well, now it's good today, where you start
understanding the fluidity of what's happening. And so you start looking for trend lines much
more importantly, not trying to solve every problem right like i i don't i don't view
stairwell as a replacement for edr you should have your edr stairwell provides a different level of
detection above what edr is capable of doing well it gives you the flexibility right and and i think
a lot of what we do now it's so much of how we handle things these days is driven by like there being a tight labor market for really specialist people in security.
So we kind of outsource a lot of these decisions, you know, in the moment decisions to companies like your CrowdStrikes and your Sentinel Ones and whatever, who have teams who are trying to do this at scale for everyone all the time.
And it's actually a model that makes sense but i i guess what i'm getting at is if you're a larger company or a more attacked organization and you just need that
little bit more flexibility yeah i just think this is an interesting trend towards more of a
you know programmable and transparent security tech stack.
Yeah. I mean, I actually think, in some sense, black boxes are a problem in security. They
benefit the attackers. Because here's an honest question, or it's not a question, more of an
anecdote. It would be, when I was at Google and we were looking at some AV solutions,
this is years and years and years ago, we were looking at different AV vendors that were out
there. And we asked the Project Zero team to say, which of these products do you think actually
holds up well to an attacker going after them? And none of them did. We won't name names or say anything like this,
and this is over a decade ago, so it's a long time ago, but it was what none of them did.
Like, in fact, they came back to us and they simply said, you know, all of them,
we found remote code execution vulnerabilities in all of them. And so you're like, wow,
by installing the security software, I've made myself less secure.
How many companies get down to that level where you would have like a Project Zero engineer tearing apart a security control or system you're thinking about deploying before you do it?
And the answer is like none. Like Nobody does that. That's ridiculous. But do you know
what? The groups who are writing prolific ransomware, they do that. They're taking these
things apart. Conti is taking this stuff apart. Three-letter agencies around the world, three,
four-letter agencies, depending on what country you're in, they're taking it apart. And so the
question there is like you're putting your eggs in the basket of a product who the people you probably are most afraid of and impose the most risk to your business.
They know how these tools work inside and out and you don't. my story with becomes like in some sense, a self-fulfilling prophecy of like, I'm hoping
this thing protects me, even though I know that they already have figured out how to
work around whatever its capabilities are.
Going back to the Stuxnet anecdote of like code paths for different security products
on device.
That's strange, right?
And so like one of the things for us is like, we want to show you this.
We want to tell you like, this is how this is based on uh you know high entropy this is based
on this this is based on how common it is this is based on these yar signatures which you can go
look at this is what's happening um you know definitely lots of interesting value there
the most fundamentally valuable thing that you get out of stairwell too is that you're doing if you're
trying to set up for detection you're doing the analysis like somewhere other than the endpoint
bingo right so this is like like doing the scanning on the actual endpoints like when it
comes to file scanning honestly it only makes sense because that's what the way we've always
done it like as you said earlier, like imagine green fields,
like should we scan the dangerous things on the target devices?
Yeah, probably not.
That seems crazy, actually.
Do you have to beep that?
Yes, I do.
No, that is, yes, that's exactly the point.
It's like the big thing for stairwell, when we think about it in that particular case is that we're place shifting where the analysis happens.
Yes.
There is no feedback.
In some sense, that's actually one of the strangest things that people will first realize is they're like, well, how do you block something?
And I'm like, we don't.
Right.
That is where your EDR has an incredibly valuable role. That is the active change agent
on a device that you're responsible for. Work with them to block a process, kill off a file,
do what you need to do. We are completely passive. We collect data, we store it privately and
securely in our platform.
You can see everything there is to see about it. It's there for forever as long as you want it to be. At the end of the day, if something is believed to be malicious, we will tell you and
we will show you where and we'll tell you why and do all of this type of stuff. But at the end of
the day, you build the integration or we'll build it for you to actually go take that action on the
device. But it's not like
there's an, using a academic term, there's not an oracle on that machine that's saying good, bad,
good, bad. There is no way to see what does stairwell think about a file without providing
it to stairwell in the first place. And then it gets stored. And then that opens up all sorts of
interesting opportunities. Do you actually see detection, detection, detection, a variant of the file that fails detection, you realize they've worked around it. So now you can actually study the evolution of files the way people do inside of VirusTotal. But you can actually do that on devices in your enterprise. And that actually becomes a really, really interesting thing. And the barrier to entry for tackling some of that does not require you to be reverse engineer anymore.
Like we've been able to like really bring down the level of sophistication required that, you know,
I've been teaching my 11 year old son how to discern malware from non-malware, just looking
at metadata that we have, like in some of our download environments. And it's like when you
can teach an 11 year old boy who has the attention span of a race of Mario Kart
to start doing some of this work,
you start realizing that you're actually really bringing down
that barrier to entry.
And so as you said earlier, decisions
made because of human resource constraints can go away.
That is where you get to start to have a lot of fun
with security again, right? You're basically setting up your platform to be able to do
threat hunting fairly trivially, right? And I don't currently know, I mean, I know that the
EDR companies do this on behalf of their customers. But again, you know, you're talking about spreading a small number of people across just epic scale.
This is much more about, you know, doing it yourself, using a whole bunch of automated tools based on files that are appearing in your organization.
I mean, you can even do something as simple as like, there is this executable file or DLL in my organization that is unique.
No one else in the world has ever seen this file. Geez, you know, maybe I should just do some
analysis on it and see if there's variants of this file elsewhere in my organization. And, you know,
just off you go from there. But it, you know, it's much easier when you set up for it with
something like stairwell, like that's clearly true. clearly true that that is that's the value of preserving those files up front building that data lake you're future
proofing your future ability to threat hunt at any point in time you're future proofing your
ability to train ai models at any point in time like one of the things i find really interesting
with the way security companies are using ai is they're training it on this relatively,
not universally, but relatively the same set of data. They're buying the same data, the same feeds,
they're getting it from the same people, and then they're training on it. And so what you end up
with is a bunch of different AIs from different vendors that all work the same. And so one of the
cool things is, again, for any AI model,
the training data defines how good your model is going to be at the end of the day, garbage in,
garbage out. What's cool about the way that we're working, since we're building unique data lakes
per customer, you actually have the ability at any point in time to take that data and train
whatever you want on it. And so when you start thinking about it like that, you can take what
our global view is, and then you could take what your local view is and build a hybrid model
that gives unique data for you. If you're walking down the street and you find my Visa card and you
find Eric Schmidt's Visa card, one of them could probably go buy a Tesla on the credit card and
one of them will not. I'll leave it to you to decide which credit card could buy the Tesla
on a Visa card, right?
It's not me.
And so when you start thinking about it like that, like credit card, they have fraud models which are tailored to the customer,
tailored to the customer's income, buying purchases and so forth.
But security seems to be one more, much more, again, not universally, but much more on the one size fits all approach.
And that homogeneity, that groupthink does not benefit the defenders. It benefits the threat actors.
Well, yeah, this is again, this is like the third time I've come back to this, but again,
this is why I'm seeing this push towards much more configurable and programmable security tools, right?
It's happening across the board.
I mean, I think the trend that was like the last five years
was towards security tools all becoming, all getting an API, right?
So that there was at least some level of automation that you could,
you know, or saw that you could do, right?
So you could take something like Tynes
or whatever automation platform you want
and start gluing this stuff together.
But now the trend seems to be, you know,
to be able to adjust what that API actually does now, right?
So everything's becoming much more programmable.
So the first stage was getting all of the APIs working.
Now it's like, okay, how do we actually start tuning these APIs or, you know, configuring things in the background so that the APIs do what we want,
you know? And I can think of like five companies off the top of my head that fall into this
category of being, you know, programmable security stack play. So let's jump from there, right? So
it's like, when you think about APIs in security, right, they're useful. I a file, delete a file.
You have a very simplistic API for S3 at the high level.
And that serves it really well.
The question, though, is that the right API for people to use for a higher level challenge like security on a daily basis?
Like storing a file, anyone can go store files like in S3 real easy.
The question is, what do you, for security purposes,
it's not just get the file.
There's a lot of analysis that you want on that file.
There's a lot of data you want on it.
So at what level of like the bare,
from the bare bottom of get, store, delete,
do you have all the way up to like,
tell me which of the files I've stored is malicious
via a threat actor based in the Middle East, right?
Like where on that stack do you do that?
And then how do you plug into it?
Well, I guess the point that you're making
is that perhaps, you know,
it's less about automations with you
and more about like actually having someone there to manually work through it, you know, in's less about automations with you and more about like actually having
someone there to manually work through it, you know, in a way that's very efficient.
I think it's a bit of both. And I actually think it's like doing some of that work automatically
may lead to interesting novel detections, which could be, should go to people after the fact.
It's the level of where we look at it. What is the unit of granularity? Are we looking at the
behavior of one particular process? Are we looking at the behavior of one particular process?
Are we looking at the byte sequences of one particular file?
How do they all kind of interconnect?
This actually led to a new feature that we've been prototyping and building lately.
We've been showing with customers.
I was going to talk about it with you here today. And it kind of came across as a conversation with a, we hired a new VP of marketing recently,
and I was walking him through, he asked me, he said, how would a SOC analyst use Stairwell?
And they get an alert from their EDR. What do they do? And I said, oh, okay, interesting. I've never actually,
I've never storyboarded this out before. So I grabbed a marker and went up to a whiteboard
with them. And I said, this is what I would do. I'd probably take the hash to the file from the
EDR alert. I'd come in the stairwell and I'd first see, do we have, what machines have that exact
file? If it's from an EDR,
I know at least one does. So what other machines have that? And then I would say, let me extract
out the IOCs from that file. I'll detonate it. I'll pull in host names, IPs, registry keys. I'll
save those and we can come back to those later. I'd then probably see what does it match any
YARA rules. And you think in our platform, we have tens of thousands of YAR rules floating around. So I'm looking for any high precision rules that match that make me
a quick idea about what it is. I'll check that. Again, detonate that in a sandbox, try and get
a constrained read on what its dynamic behavior is. Then we have a feature called variant discovery
where we can, in two or three seconds, show you from our corpus
of almost a billion files what other files are very similar to that one in near real time. So
it's like, do I have any variants of this file on any other computer? And it ended up being like a
10-stage process. And I'm sitting there with him, and we're looking at it, and I'm like,
yeah, that about paints there's more but like that
about paints out that picture and then i'm i had this light bulb in the back of my head goes off
i was like well why don't we just do all of that in one step automatically why why would why why
make you go through 10 steps to get that there and kind of we i ran over grab some engineers
and we we started prototyping something literally that afternoon.
And it's a feature that we're calling Run to Ground.
And honestly, it's giddy to be able to show it.
Maybe we'll do another one of the video demos with you to actually show a demo of this.
When we have a non-internal ugly UI of the capability.
There's like a polished one.
So it's like, give me a hash.
Here's what run to ground does.
And it kind of, it's awesome.
It basically, it's like,
take that file,
run it through a variant discovery process,
find me the variants of it.
And then do that recursively
like three or four more times.
So if you think about family members,
it's like, find me the suspect,
their siblings,
their first cousin,
second cousins,
third cousins.
So it basically blows that out into a giant set of potential files,
some far distant from the original,
but you have a very large set.
We then go over and we say,
find me every computer at my company.
That has anything from this family.
Any of them.
Yep.
For each one of those, that defines a potential, the time that that file was first seen on
each such device defines like a reference time and a device.
And so then what we do is, since we have a timeline of everything on every machine with
the files intact, we say, okay, go back n units of time and go forward n units of
time on each device centered around the time when that file showed up, and then filter away any file
that was first seen on that device around that time. That's very common. So toss those away.
I have very uncommon file timeline hours to days before that file, hours to days after
that file for every such thing.
And yeah, we had the data to do this.
And we had the infrastructure to pull that off very quickly at scale.
And so we threw it together.
And we started asking some of our longtime customers to come in and say, run some of your alerts from your EDR through this thing.
And it has what I call the Craigslist UI.
It's pure text.
But run it through.
See what you found.
And within like a day, we actually found stuff that was going undetected by EDR on their systems.
And one of them was actually an APT infection that we found.
And what it was, was their EDR flagged it on one machine. And we found a variant that was on a
different machine, like a day before, that EDR did not flag and actually had managed to take hold on.
And so by being able to take
new information and apply it across all time, because we have the bytes of the files,
we're not trying to match course behaviors, you end up with this amazing ability to almost
self-inoculate your entire history and forward history as well for anything that you know to
be bad.
And when you get that, you're like, boom, oh yeah.
What happened around the time that file was on there?
You actually saw a script run, which hit a URL, which downloads the file,
saves it as the one that you found.
And then after that, you found a second stage show up about two seconds after it started executing.
And so you boom, boom, boom.
You almost have that forensic timeline of what
happened on every machine in like, again, five to 10 seconds after you start loading it up.
And that's where it's like, you know, I think about it like this is that, you know, if we were,
if you were, if I was at a different company and we had a breach and we found something that we
believed to be believed to be APT, we'd end up calling a company we end up calling mandiant or or someone
like that up for breach help um they'd come in they'd be there for a week two three more um
probably writing a six seven figure check to get that data that we just gave you in five seconds
yeah so i mean how common is it that you're finding that you're finding like you know apt tools
uh that are that are going undetected i mean you just gave us an example there. Where's the threat report?
Mike, that's normally what happens is someone finds something like this and there's a threat
report that goes out and a big song and dance. We don't own that data, right? I think, you know,
at the end of the day, it's like that customer, that's their files, that's their data. We don't
have that data available in a public source where we can go talk about it. So, at the end of the day, it's like that customer, that's their files, that's their data. We don't have that data available in a public source where we can go talk about it. So at the end of the day,
we want to respect people's privacy as well. You wanted to talk about it and I wouldn't let you,
is what I'm getting from this. That's their call, right? If they were willing to do that,
I'm all for it. I mean, I have some really cool stories. We've been finding interesting malware
that was capable of jumping air gaps and so forth. And it was on USB drives that were being moved
around. And that's another one. When we capture files, we're able to capture the GUID attached to
the file system of a USB drive as it's moved around. So not only can we tell you what was on a drive that was plugged into a machine,
we can tell you every single machine that that particular drive has ever been plugged into as well.
And so when you start thinking about this, like having this giant source of not just the collection of the files from all of these things,
but the metadata about what computers had seen those files intersected together is literally you're operationalizing.
Think about like a Mandiant level IR engagement.
And you can operationalize that 1,000, 5,000 times a day on every single alert that you actually have to look at.
And you get this full historic view, which is just like, to me, it's a security candy store.
Yeah, yeah, yeah.
That's exactly what I've always wanted.
Mike Wyasek, thank you so much for joining us for that conversation.
All very interesting stuff.
And we'll look forward to doing it again sometime soon.
Cheers.
Cheers.
That was Mike Wyasek of Stairwell there.
And you can find them at stairwell.com.
I do hope you enjoyed that podcast.
Thanks for listening.