Grey Beards on Systems - 152: GreyBeards talk agent-less data security with Jonathan Halstuch, Co-Founder & CTO, RackTop Systems
Episode Date: August 8, 2023Sponsored By: Once again we return to our ongoing series with RackTop Systems, and their Co-Founder & CTO, Jonathan Halstuch (@JAHGT). This time we discuss how agent-less, storage based, security work...s and how it can help secure many organizations with (IoT) end points they may not control or can’t deploy agents on them. But agent-less … Continue reading "152: GreyBeards talk agent-less data security with Jonathan Halstuch, Co-Founder & CTO, RackTop Systems"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
This Greybeard on Storage episode, again, is brought to you today by Racktop Systems.
And now it's my great pleasure to once again introduce Jonathan Halstuck,
co-founder and CTO of Racktop Systems.
Jonathan, it's your fourth time on the show today, and we are certainly glad to have you back again.
Why don't you tell us a little bit about Ageless Security
and how it works with
Racktop Systems? Thanks, Ray. It's great to be back. I love doing this podcast. So
when we talk about agentless security, what we really are talking about, it's not that you don't
want to have agents on your endpoints to monitor what's happening on the endpoint and look for
malicious processes and things. But what we're able to do at Racktop is provide full end-to-end monitoring
of how users or end applications are interacting with files and data and not require anything to
be installed on the client or any sort of agent or on the endpoint, right? And so that's important
because what an adversary is going to try to do is sidestep those devices that have an agent on
them or try to
disable the agent. And we've seen cases of that. We've even seen cases where the agents have been
co-opted by the adversary and used to steal data. So with Brickstore, we're eliminating the need to
maintain the agents and able to see all that user activity without requiring any new software be
deployed. You said end-to-end monitoring and that sort of thing.
Can you kind of explain how that works from a storage perspective?
Sure.
So with BrickStore, we're providing a NAS capability, right?
And so users and applications are interacting with data and files stored on BrickStore over
SMB or NFS protocols.
And what we're doing is monitoring that activity when users or applications open a file,
modify a file, or read a file.
And so we're basically able to audit and record the IP address that they came from, the user account they used, the time, the file path of the file they operated on, and also how
many operations.
Did they do a lot of reads?
Did they read the whole file?
Did they overwrite the file?
All that kind of stuff.
And so that's what I mean by end-to-end.
So effectively, anytime anything in the system accesses a file, you're sitting there, you're
tracking and logging all this telemetry data for them to try to understand what's going on, whether they're acting inappropriately
or not, I guess.
Is that what you're saying?
Exactly, right?
So the first thing is just recording it so that you can potentially analyze it and look
back or a user can say, hey, did this user access this file or what happened in the past?
Maybe something comes to your attention that seems suspicious about a user or machine, and now you want to retrospectively investigate.
So you have that full record of what's happened in the past. But also what's really unique is our
active defense capability that can take proactive measures. So when it sees something suspicious,
like a user application overwriting a bunch of files and encrypting them,
something like a ransomware attack, we could actually block and stop that. And so we're not
relying on stopping it at the endpoint or with an agent. We're actually blocking their interaction
with the files over the protocol. So, John, I have a pretty good handle on the bad parts of an agent. One, I got to get definition files out
to the agent. I have to make sure the agent is communicating with the controller. If there's a
break in that control plane problem, I can't update the agent. I can't deploy new rules. But there's a lot to like about agents. I can get really granular. Am I missing out on some of those capabilities when I kind of go with a centralized agent-less approach? still need to have agents in these cases that are monitoring the endpoint, right? So you want to still monitor that Windows desktop for malicious processes and bad activity that's happening
locally on that device. But what you don't want to do is rely 100% on that and not have something
that's protecting the data itself. So I guess one of the things that pops into my mind is kind of a distributed attack.
If malware is spread across several different machines, endpoints, and they're all making changes to directories or files independently, my agent may not be able to recognize and catch that type of activity. Is that an activity that this solution can catch?
Yes, we would be able to see that and see that all in the context of multiple systems or machines
or multiple accounts trying to access stuff that could be anomalous to how they normally access the
data. And so we would be able to catch that for sure. And a big fear that you also have in the
environment, especially when you talk about
healthcare or other solutions where you start to have IoT devices, there's really no way to
install an agent on something like a radiology machine or something like that. And so you have
to rely on something external to protect that data. Yeah. I've had this dance a lot of times,
whether it's a manufacturing environment, health care, financial services, the infamous appliance.
This thing is an appliance, but, you know, it has a login to AD.
It has a login to the NAS or storage profile and is either placing files there, removing files.
It has some natural activity that is really difficult to protect using some type of agent based system. Matter of fact, we get exceptions to have an electron microscope that still works very,
very expensive piece of product, but it runs a very outdated version of windows that even just supports, you know, SIFS SMB1, right? And so you still need to use that. They're not going to
replace the electron microscope, but we can scrutinize how that device is interacting with
the file system, what data it's writing or reading.
And so it gives you that visibility into what's happening and how that device is behaving too.
And just having a central agentless monitoring capability, I mean, is it something that you can,
I mean, the challenge with agents, obviously, as it deployed throughout the environment is there could be thousands of them out there and, you know, updating them and managing them and stuff like that.
By having a centralized agent list approach, are you able to maintain it more currently and easier?
Is that sort of thing also one of the benefits?
Yeah, it's definitely the benefit of having a single location and
instance where as you update and upgrade, everything is protected, right? That data,
you're putting the protections as close to the data as possible. So instead of having to find
every place that somebody could access the data from, you're saying, I'm going to put these
protections as close to the data as possible. And anything that wants to access this data,
it's like a choke point. You have to go through this data firewall to get access to that data. And we're able to observe
that and determine what actions we want to take as you're trying to access the data. So I think
it's definitely a valuable approach when you start to think about filers, NAS, unstructured data,
where you have a large set of data and you want to protect it, right? If you're protecting
money at a bank, you put the money in the vault. You don't try to put protections around each user
that's trying to access the money, right? And so it just makes more logical sense with
how people operate in the physical world as well. So talk to me about that, you know, workflow. Detection is fairly simple. I can, if I'm at the vault, I can determine if someone's trying to access the vault that shouldn't be accessing the vault. You know, there's someone with a key that comes in and they're not typically the person with that key, I'm going to be hiding.
My senses are going to be heightened.
I'm going to ask more questions.
If I decide that this is a intrusion,
I can look at the person or intruder and react.
How does that work when it comes to this virtual use case
where you're the vault, you're the data,
and you can see that someone's accessing the data that shouldn't be accessing.
But from a SOC perspective, a security operating center, operation center,
how do I determine who that intruder is in shutting down that root cause.
Right. Yeah. So there's two cases there. One is kind of that breach where we have an adversary
that's compromised an account or has taken over ownership or has gotten inappropriate access on a
system and is now trying to go and steal data. So we could look for anomalies in behavior, right? All of a sudden,
you're looking at a high level of data exfiltration potential. You're reading a lot more data than you
normally do. You're going to data sets or folders that you don't normally access. So all those
behaviors, we're using behavior analytics to determine, hey, this seems unusual or anomalous,
so this should be investigated further.
And we have some new capabilities coming out that will take more proactive action to stop and alert on that. But even today, we can alert on those excessive file activities,
like a high number of reads or writes or deletes. Maybe somebody's doing something malicious.
But then also, we have to think about the insider threat. Just like in the bank,
if the employee turns rogue or does something illicit, we want to know about that too.
So we're also monitoring even a normal user that starts to do things anomalous.
Maybe they're looking at stuff out of their normal work pattern at night, coming from
a device they haven't used before.
They're reading lots of data.
It seems like they're doing a lot more higher level of read to write ratio.
So all those types of things can be used to determine anomalous activity.
Also, what we also see in the insider threat world is the use of escalated or admin level privileges when they shouldn't be used.
And we often see that in a lot of these type of both a malicious attacker trying to gain access and elevate privilege, as well as the
insider threat. And so we can immediately alert on the use of an admin credential or privileged
account to access data and investigate that. And that allows us to take the detection time
average down closer to zero. Right now, the average detection time before an adversary or
a breach is detected is about nine months.
You need to get that way down.
Yeah, you mentioned the word data firewall,
which I thought was a very interesting concept.
My challenge is with, you know, firewalls,
and I'm kind of a local network user kind of here,
is that, you know, maintaining that
and allowing, you know, new devices to come in
and not, It takes some configuration and manipulation and some knowledge, actually,
to maintain a firewall and a network and stuff like that.
But in your case, it seems like all that's internal.
Is that true?
It is internal, but there is the same kind of concept where there's a rules engine. So we have different assessors that are looking for different types of malicious and suspicious activity. And you configure those rules in a way that says, hey, if you see this activity, this is what I want you to do. I want you to alert only on this or I want you to can configure this. And when you deploy Brickstore the first time, you can put it into an observation only mode.
And so you can look to see how the environment's behaving and how applications are accessing
data, who's using administrative accounts, and then determine, hey, should this admin
account be being used to do this?
Maybe I want to restrict that activity so that I can say, yeah, we're going to use this
admin account.
It's only going to be allowed during this time period and on this data set. Or maybe you realize there's some bad hygiene
things going on and we want to clean up our hygiene because that'll help us reduce our
overall risk if that account gets compromised. And so we create special accounts to do those
jobs. And so it's not as complicated, I would say, as a traditional network firewall. We have
the benefit of only having two protocols. And then it comes down to saying, okay, what accounts are going to be allowed to do this?
What are the rules of, of the road for, you know, when this should be happening,
who's going to be allowed to do that? And then kind of reviewing that. And then if you do
determine somebody, you know, triggers a rule and it goes off, you also have a very easy workflow
there to follow to say, Hey, I want to allow this and I want to make this a permanent rule to allow this behavior or a temporary rule. Maybe somebody's
going to be doing data migration and they need to use admin credentials to do that. We can make it
very specific to say this particular admin and admin account can be used to move data from this
client IP or IPs to this data set or data sets and then have it auto expire as well, you know, on Tuesday,
so that it's not a permanent rule, and nobody has to remember to go back and remove that rule as
well. So we've tried to make it easier. And that's based on our understanding as well as,
you know, being in the business of providing, you know, network attached storage infrastructure for
both small and large scale environments. Yeah, So I can see this as both an advantage and disadvantage.
Let's kind of roll a real world scenario to you.
There's a, you know,
a database account that typically has a write pattern or writing to read
pattern and randomly once a year, not randomly,
once a year, not randomly, once a year, the database, the DBA, runs a zero backup.
Now, my policy is, in general, is that in our organization, you don't run zero backups.
You let the backup software do what it's supposed to do.
But I can see a scenario in which this would fill a
rule, say, oh, this shouldn't be happening. This database account never writes this or reads this
much data. Something is going wrong. Block. So one, the advantage I see is, you know, as a
the data protection person in my organization, I can enforce, that's
an opportunity for me to enforce our policy, but as a, uh, function of, well, what happens
if that legitimately a policy that we want to happen?
You know, we've ran it in observation for six months, but this has never happened.
Now it happens.
What's the remediation?
How do I now enable that account to do this business function that's allowed within our
policy? That's very unusual, right? It's not as periodic as like a monthly thing or something.
Yeah, it's just, it would be, some DBAs says, you know how DBAs are. None of us are DBAs. We get to pick on them on this podcast. You know, they're like, ah, you
know, I trust the backup software, but I only trust it so much. I'm going to have my own immutable
copy. Yep. Yeah. And actually that would be even valuable for some of the cyber vault features we
have coming out where you can have a vaulted immutable copy of something like that backup. And so what does that mean? So basically, you know,
there's going to be times where a rule gets flagged and it shouldn't have, or it's not the
normal behavior, but it's expected or a one-time thing. And so there's a couple of ways to handle
that. One case, if you were actually blocking that and it got blocked because it fired, it triggered the rule and nobody was kind of prepared to do that, then unblock. It's very
easy through the incident management workflow. It'll send a web hook and quickly alert everybody.
Hey, this user has been blocked. This host IP, if so, has been blocked. And this is why this was
the rule that was triggered. And this is the activity that happened around that. And you'd
quickly see, oh, they ran a backup and it was a lot more data. So I can see that. And so
you could quickly unblock them and that can be done in under 30 seconds. The other thing though,
is as you educate people about how the system works and that there is a system in place that's
monitoring and evaluating this is that you can, as part of the change control board, right?
Hopefully when you're doing these backups or doing these things, there's some sort of configuration management
and people realize, oh, I'm going to do something that's going to be out of the norm.
You can actually go in ahead of time into that data firewall and say, hey, I'm going to do work.
I'm going to take a maintenance window during this period of time. So I want to put a rule in
that says I'm going to allow data to be written during this period of time. And I expect it to be higher than normal and basically put a whitelist type of rule in there and have it expire after that work time is over.
So if it's a four hour maintenance window, it would only be valid for those four hours.
And then it would go back to the normal monitoring and enforcement.
Oh, that's nice. That's nice.
So you could actually create if you knew it was going to come into existence, you can create a new rule that could be temporarily active for a duration tied to a particular user, tied to an IP, et cetera, et cetera.
Exactly. And tie that to a ticket or change control process approval and things like that. So you have that whole audit history and control. Right, right. That's nice. That's nice. You mentioned that the alert goes out with a webhook that you could
potentially use to re-enable the access, I guess,
because it's blocked at that point.
So it's just a web URL link kind of thing that allows you to click on it
once and fill in some information and let it rip?
You'd be allowed to do that if you're an authorized user to control the security stuff.
Your security or the SOC could see that and say, okay, I'm going to allow that.
After I've investigated it, it's quick to enable a user to unblock an IP or unblock an account.
And the information in the security alert would identify the IP, identify the rule that's being violated, identify potentially the
user. I mean, it would give enough information to supply a cognizant individual in authority to say
yay or nay, I guess. That's correct. And we're constantly listening to our customers about the
workflows and the situation, seeing what happens and then making that workflow
smoother for everybody.
And, you know, as the world goes on, it's everybody's kind of responsibility has a security
responsibility within it.
So, you know, as a storage administrator or infrastructure manager, you know, you're more
and more of that security role is becoming part of everyday aspect of your responsibility.
Right. So how is this machine learning and the definitions associated with that updated?
Yeah. So we have assessors that are built in or are loaded into the product, right? So
they can be updated at any time.
So you don't have to update the whole operating system or anything,
but similar to probably how people are used to with, you know,
virus definitions or security updates in that way,
we have the ability to upload new assessor definitions and new assessors.
And so once those are downloaded from our MyRacktop portal, they then start enforcing
in that aspect. And you can choose when you apply those updates as a user. And then for
entities that aren't connected to the internet, they can download those assessors through a
secure supply chain format to their environment and then load them into the brick store and then
get the advantage of those
new capabilities. And so there isn't like a scheduled period for when we're releasing new
assessors, but it's based on obviously new things that come out in the wild, new threats that we see
or new capabilities we feel are worthwhile to update and get those out to our customers as
soon as possible. Right, right, right, right. And then we can't
talk about security without talking about the performance overhead associated with that.
What's the performance impact of doing this real-time analysis?
Yeah, so we've designed the system from the beginning to have security and compliance
enabled while operating. So we've scaled and designed the
systems to have the security running while you're doing that. So if they were replacing,
let's say a legacy NAS solution or file solution with Brickstore, there should be no impact to the
user experience or application performance, right? We're able to deliver the latency,
the IOPS and the throughput,
but also include that security compliance. And we do that because we're leveraging a modern
architecture and we take advantage of, you know, RAM as well as the multiple cores within the
latest x86 processors from Intel and AMD. So we're delivering that highly parallelized
IO workflow with low latency. And we can back, you know, for the backend capacity,
we can use a SAN from another, you know, enterprise OEM. We can use a hybrid storage pool, which
leverages some flash and spinning hard drives, or we can use an all flash pool for the, you know,
lowest latency, high throughput, consistent consistency. All right. Well, Keith, is there any last questions for Jonathan?
No, I'm always interested to know how customers, I guess there is one last question. How are
customers using this in production? Has there been any kind of use cases that popped up that
had you folks kind of scratching your head and thinking, oh, I didn't think
that someone would use this security
feature in this type of workflow. So I would say I always want people to use it in every workflow.
So I wouldn't say I'm surprised, right? The whole goal is to protect your data wherever it is. But
I think we, I would say there's a wide variety of workflows and use cases and a wide variety of
verticals, right? From manufacturing, right? We
talked about that where you're interacting with IoT devices, robots that are dealing with a lot
of sensitive data and intellectual property as they build systems and manufacture components,
et cetera. You have healthcare where you have, you know, critical life-saving type operations
that have to operate 24 seven and protect data against theft and ransomware
and destruction. And then you have, you know, the traditional corporate use cases with, you know,
file data and sensitive information, they don't want to get out to their competitors or to the
public. So it's a pretty wide range. And it's enabled by, you know, having all flash solutions
or hybrid solutions or large archives.
But as we know, data has become super valuable.
And so as people start to recognize that, and that's really what the bad guys are after,
people are looking for ways to protect it.
All right.
Hey, Jonathan, is there anything you'd like to say to our listening audience before we close?
No, it's been a pleasure.
I always enjoy my time on here in the conversation.
All right. Well, this has been great, Jonathan. Thanks again for being on our show today.
Thanks for having me.
And thanks to Racktop Systems for sponsoring this podcast.
That's it for now. Bye, Keith. Bye, Jonathan. Until next time.
Next time, we will talk to the most system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.