Grey Beards on Systems - 152: GreyBeards talk agent-less data security with Jonathan Halstuch, Co-Founder & CTO, RackTop Systems

Episode Date: August 8, 2023

Sponsored By: Once again we return to our ongoing series with RackTop Systems, and their Co-Founder & CTO, Jonathan Halstuch (@JAHGT). This time we discuss how agent-less, storage based, security work...s and how it can help secure many organizations with (IoT) end points they may not control or can’t deploy agents on them. But agent-less … Continue reading "152: GreyBeards talk agent-less data security with Jonathan Halstuch, Co-Founder & CTO, RackTop Systems"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This Greybeard on Storage episode, again, is brought to you today by Racktop Systems. And now it's my great pleasure to once again introduce Jonathan Halstuck, co-founder and CTO of Racktop Systems. Jonathan, it's your fourth time on the show today, and we are certainly glad to have you back again.
Starting point is 00:00:41 Why don't you tell us a little bit about Ageless Security and how it works with Racktop Systems? Thanks, Ray. It's great to be back. I love doing this podcast. So when we talk about agentless security, what we really are talking about, it's not that you don't want to have agents on your endpoints to monitor what's happening on the endpoint and look for malicious processes and things. But what we're able to do at Racktop is provide full end-to-end monitoring of how users or end applications are interacting with files and data and not require anything to be installed on the client or any sort of agent or on the endpoint, right? And so that's important
Starting point is 00:01:19 because what an adversary is going to try to do is sidestep those devices that have an agent on them or try to disable the agent. And we've seen cases of that. We've even seen cases where the agents have been co-opted by the adversary and used to steal data. So with Brickstore, we're eliminating the need to maintain the agents and able to see all that user activity without requiring any new software be deployed. You said end-to-end monitoring and that sort of thing. Can you kind of explain how that works from a storage perspective? Sure.
Starting point is 00:01:52 So with BrickStore, we're providing a NAS capability, right? And so users and applications are interacting with data and files stored on BrickStore over SMB or NFS protocols. And what we're doing is monitoring that activity when users or applications open a file, modify a file, or read a file. And so we're basically able to audit and record the IP address that they came from, the user account they used, the time, the file path of the file they operated on, and also how many operations. Did they do a lot of reads?
Starting point is 00:02:29 Did they read the whole file? Did they overwrite the file? All that kind of stuff. And so that's what I mean by end-to-end. So effectively, anytime anything in the system accesses a file, you're sitting there, you're tracking and logging all this telemetry data for them to try to understand what's going on, whether they're acting inappropriately or not, I guess. Is that what you're saying?
Starting point is 00:02:52 Exactly, right? So the first thing is just recording it so that you can potentially analyze it and look back or a user can say, hey, did this user access this file or what happened in the past? Maybe something comes to your attention that seems suspicious about a user or machine, and now you want to retrospectively investigate. So you have that full record of what's happened in the past. But also what's really unique is our active defense capability that can take proactive measures. So when it sees something suspicious, like a user application overwriting a bunch of files and encrypting them, something like a ransomware attack, we could actually block and stop that. And so we're not
Starting point is 00:03:29 relying on stopping it at the endpoint or with an agent. We're actually blocking their interaction with the files over the protocol. So, John, I have a pretty good handle on the bad parts of an agent. One, I got to get definition files out to the agent. I have to make sure the agent is communicating with the controller. If there's a break in that control plane problem, I can't update the agent. I can't deploy new rules. But there's a lot to like about agents. I can get really granular. Am I missing out on some of those capabilities when I kind of go with a centralized agent-less approach? still need to have agents in these cases that are monitoring the endpoint, right? So you want to still monitor that Windows desktop for malicious processes and bad activity that's happening locally on that device. But what you don't want to do is rely 100% on that and not have something that's protecting the data itself. So I guess one of the things that pops into my mind is kind of a distributed attack. If malware is spread across several different machines, endpoints, and they're all making changes to directories or files independently, my agent may not be able to recognize and catch that type of activity. Is that an activity that this solution can catch? Yes, we would be able to see that and see that all in the context of multiple systems or machines
Starting point is 00:05:12 or multiple accounts trying to access stuff that could be anomalous to how they normally access the data. And so we would be able to catch that for sure. And a big fear that you also have in the environment, especially when you talk about healthcare or other solutions where you start to have IoT devices, there's really no way to install an agent on something like a radiology machine or something like that. And so you have to rely on something external to protect that data. Yeah. I've had this dance a lot of times, whether it's a manufacturing environment, health care, financial services, the infamous appliance. This thing is an appliance, but, you know, it has a login to AD.
Starting point is 00:05:53 It has a login to the NAS or storage profile and is either placing files there, removing files. It has some natural activity that is really difficult to protect using some type of agent based system. Matter of fact, we get exceptions to have an electron microscope that still works very, very expensive piece of product, but it runs a very outdated version of windows that even just supports, you know, SIFS SMB1, right? And so you still need to use that. They're not going to replace the electron microscope, but we can scrutinize how that device is interacting with the file system, what data it's writing or reading. And so it gives you that visibility into what's happening and how that device is behaving too. And just having a central agentless monitoring capability, I mean, is it something that you can, I mean, the challenge with agents, obviously, as it deployed throughout the environment is there could be thousands of them out there and, you know, updating them and managing them and stuff like that.
Starting point is 00:07:09 By having a centralized agent list approach, are you able to maintain it more currently and easier? Is that sort of thing also one of the benefits? Yeah, it's definitely the benefit of having a single location and instance where as you update and upgrade, everything is protected, right? That data, you're putting the protections as close to the data as possible. So instead of having to find every place that somebody could access the data from, you're saying, I'm going to put these protections as close to the data as possible. And anything that wants to access this data, it's like a choke point. You have to go through this data firewall to get access to that data. And we're able to observe
Starting point is 00:07:48 that and determine what actions we want to take as you're trying to access the data. So I think it's definitely a valuable approach when you start to think about filers, NAS, unstructured data, where you have a large set of data and you want to protect it, right? If you're protecting money at a bank, you put the money in the vault. You don't try to put protections around each user that's trying to access the money, right? And so it just makes more logical sense with how people operate in the physical world as well. So talk to me about that, you know, workflow. Detection is fairly simple. I can, if I'm at the vault, I can determine if someone's trying to access the vault that shouldn't be accessing the vault. You know, there's someone with a key that comes in and they're not typically the person with that key, I'm going to be hiding. My senses are going to be heightened. I'm going to ask more questions.
Starting point is 00:08:49 If I decide that this is a intrusion, I can look at the person or intruder and react. How does that work when it comes to this virtual use case where you're the vault, you're the data, and you can see that someone's accessing the data that shouldn't be accessing. But from a SOC perspective, a security operating center, operation center, how do I determine who that intruder is in shutting down that root cause. Right. Yeah. So there's two cases there. One is kind of that breach where we have an adversary
Starting point is 00:09:32 that's compromised an account or has taken over ownership or has gotten inappropriate access on a system and is now trying to go and steal data. So we could look for anomalies in behavior, right? All of a sudden, you're looking at a high level of data exfiltration potential. You're reading a lot more data than you normally do. You're going to data sets or folders that you don't normally access. So all those behaviors, we're using behavior analytics to determine, hey, this seems unusual or anomalous, so this should be investigated further. And we have some new capabilities coming out that will take more proactive action to stop and alert on that. But even today, we can alert on those excessive file activities, like a high number of reads or writes or deletes. Maybe somebody's doing something malicious.
Starting point is 00:10:18 But then also, we have to think about the insider threat. Just like in the bank, if the employee turns rogue or does something illicit, we want to know about that too. So we're also monitoring even a normal user that starts to do things anomalous. Maybe they're looking at stuff out of their normal work pattern at night, coming from a device they haven't used before. They're reading lots of data. It seems like they're doing a lot more higher level of read to write ratio. So all those types of things can be used to determine anomalous activity.
Starting point is 00:10:49 Also, what we also see in the insider threat world is the use of escalated or admin level privileges when they shouldn't be used. And we often see that in a lot of these type of both a malicious attacker trying to gain access and elevate privilege, as well as the insider threat. And so we can immediately alert on the use of an admin credential or privileged account to access data and investigate that. And that allows us to take the detection time average down closer to zero. Right now, the average detection time before an adversary or a breach is detected is about nine months. You need to get that way down. Yeah, you mentioned the word data firewall,
Starting point is 00:11:31 which I thought was a very interesting concept. My challenge is with, you know, firewalls, and I'm kind of a local network user kind of here, is that, you know, maintaining that and allowing, you know, new devices to come in and not, It takes some configuration and manipulation and some knowledge, actually, to maintain a firewall and a network and stuff like that. But in your case, it seems like all that's internal.
Starting point is 00:11:58 Is that true? It is internal, but there is the same kind of concept where there's a rules engine. So we have different assessors that are looking for different types of malicious and suspicious activity. And you configure those rules in a way that says, hey, if you see this activity, this is what I want you to do. I want you to alert only on this or I want you to can configure this. And when you deploy Brickstore the first time, you can put it into an observation only mode. And so you can look to see how the environment's behaving and how applications are accessing data, who's using administrative accounts, and then determine, hey, should this admin account be being used to do this? Maybe I want to restrict that activity so that I can say, yeah, we're going to use this admin account. It's only going to be allowed during this time period and on this data set. Or maybe you realize there's some bad hygiene
Starting point is 00:12:48 things going on and we want to clean up our hygiene because that'll help us reduce our overall risk if that account gets compromised. And so we create special accounts to do those jobs. And so it's not as complicated, I would say, as a traditional network firewall. We have the benefit of only having two protocols. And then it comes down to saying, okay, what accounts are going to be allowed to do this? What are the rules of, of the road for, you know, when this should be happening, who's going to be allowed to do that? And then kind of reviewing that. And then if you do determine somebody, you know, triggers a rule and it goes off, you also have a very easy workflow there to follow to say, Hey, I want to allow this and I want to make this a permanent rule to allow this behavior or a temporary rule. Maybe somebody's
Starting point is 00:13:28 going to be doing data migration and they need to use admin credentials to do that. We can make it very specific to say this particular admin and admin account can be used to move data from this client IP or IPs to this data set or data sets and then have it auto expire as well, you know, on Tuesday, so that it's not a permanent rule, and nobody has to remember to go back and remove that rule as well. So we've tried to make it easier. And that's based on our understanding as well as, you know, being in the business of providing, you know, network attached storage infrastructure for both small and large scale environments. Yeah, So I can see this as both an advantage and disadvantage. Let's kind of roll a real world scenario to you.
Starting point is 00:14:13 There's a, you know, a database account that typically has a write pattern or writing to read pattern and randomly once a year, not randomly, once a year, not randomly, once a year, the database, the DBA, runs a zero backup. Now, my policy is, in general, is that in our organization, you don't run zero backups. You let the backup software do what it's supposed to do. But I can see a scenario in which this would fill a rule, say, oh, this shouldn't be happening. This database account never writes this or reads this
Starting point is 00:14:53 much data. Something is going wrong. Block. So one, the advantage I see is, you know, as a the data protection person in my organization, I can enforce, that's an opportunity for me to enforce our policy, but as a, uh, function of, well, what happens if that legitimately a policy that we want to happen? You know, we've ran it in observation for six months, but this has never happened. Now it happens. What's the remediation? How do I now enable that account to do this business function that's allowed within our
Starting point is 00:15:32 policy? That's very unusual, right? It's not as periodic as like a monthly thing or something. Yeah, it's just, it would be, some DBAs says, you know how DBAs are. None of us are DBAs. We get to pick on them on this podcast. You know, they're like, ah, you know, I trust the backup software, but I only trust it so much. I'm going to have my own immutable copy. Yep. Yeah. And actually that would be even valuable for some of the cyber vault features we have coming out where you can have a vaulted immutable copy of something like that backup. And so what does that mean? So basically, you know, there's going to be times where a rule gets flagged and it shouldn't have, or it's not the normal behavior, but it's expected or a one-time thing. And so there's a couple of ways to handle that. One case, if you were actually blocking that and it got blocked because it fired, it triggered the rule and nobody was kind of prepared to do that, then unblock. It's very
Starting point is 00:16:30 easy through the incident management workflow. It'll send a web hook and quickly alert everybody. Hey, this user has been blocked. This host IP, if so, has been blocked. And this is why this was the rule that was triggered. And this is the activity that happened around that. And you'd quickly see, oh, they ran a backup and it was a lot more data. So I can see that. And so you could quickly unblock them and that can be done in under 30 seconds. The other thing though, is as you educate people about how the system works and that there is a system in place that's monitoring and evaluating this is that you can, as part of the change control board, right? Hopefully when you're doing these backups or doing these things, there's some sort of configuration management
Starting point is 00:17:08 and people realize, oh, I'm going to do something that's going to be out of the norm. You can actually go in ahead of time into that data firewall and say, hey, I'm going to do work. I'm going to take a maintenance window during this period of time. So I want to put a rule in that says I'm going to allow data to be written during this period of time. And I expect it to be higher than normal and basically put a whitelist type of rule in there and have it expire after that work time is over. So if it's a four hour maintenance window, it would only be valid for those four hours. And then it would go back to the normal monitoring and enforcement. Oh, that's nice. That's nice. So you could actually create if you knew it was going to come into existence, you can create a new rule that could be temporarily active for a duration tied to a particular user, tied to an IP, et cetera, et cetera.
Starting point is 00:17:54 Exactly. And tie that to a ticket or change control process approval and things like that. So you have that whole audit history and control. Right, right. That's nice. That's nice. You mentioned that the alert goes out with a webhook that you could potentially use to re-enable the access, I guess, because it's blocked at that point. So it's just a web URL link kind of thing that allows you to click on it once and fill in some information and let it rip? You'd be allowed to do that if you're an authorized user to control the security stuff. Your security or the SOC could see that and say, okay, I'm going to allow that. After I've investigated it, it's quick to enable a user to unblock an IP or unblock an account.
Starting point is 00:18:38 And the information in the security alert would identify the IP, identify the rule that's being violated, identify potentially the user. I mean, it would give enough information to supply a cognizant individual in authority to say yay or nay, I guess. That's correct. And we're constantly listening to our customers about the workflows and the situation, seeing what happens and then making that workflow smoother for everybody. And, you know, as the world goes on, it's everybody's kind of responsibility has a security responsibility within it. So, you know, as a storage administrator or infrastructure manager, you know, you're more
Starting point is 00:19:19 and more of that security role is becoming part of everyday aspect of your responsibility. Right. So how is this machine learning and the definitions associated with that updated? Yeah. So we have assessors that are built in or are loaded into the product, right? So they can be updated at any time. So you don't have to update the whole operating system or anything, but similar to probably how people are used to with, you know, virus definitions or security updates in that way, we have the ability to upload new assessor definitions and new assessors.
Starting point is 00:20:01 And so once those are downloaded from our MyRacktop portal, they then start enforcing in that aspect. And you can choose when you apply those updates as a user. And then for entities that aren't connected to the internet, they can download those assessors through a secure supply chain format to their environment and then load them into the brick store and then get the advantage of those new capabilities. And so there isn't like a scheduled period for when we're releasing new assessors, but it's based on obviously new things that come out in the wild, new threats that we see or new capabilities we feel are worthwhile to update and get those out to our customers as
Starting point is 00:20:41 soon as possible. Right, right, right, right. And then we can't talk about security without talking about the performance overhead associated with that. What's the performance impact of doing this real-time analysis? Yeah, so we've designed the system from the beginning to have security and compliance enabled while operating. So we've scaled and designed the systems to have the security running while you're doing that. So if they were replacing, let's say a legacy NAS solution or file solution with Brickstore, there should be no impact to the user experience or application performance, right? We're able to deliver the latency,
Starting point is 00:21:23 the IOPS and the throughput, but also include that security compliance. And we do that because we're leveraging a modern architecture and we take advantage of, you know, RAM as well as the multiple cores within the latest x86 processors from Intel and AMD. So we're delivering that highly parallelized IO workflow with low latency. And we can back, you know, for the backend capacity, we can use a SAN from another, you know, enterprise OEM. We can use a hybrid storage pool, which leverages some flash and spinning hard drives, or we can use an all flash pool for the, you know, lowest latency, high throughput, consistent consistency. All right. Well, Keith, is there any last questions for Jonathan?
Starting point is 00:22:05 No, I'm always interested to know how customers, I guess there is one last question. How are customers using this in production? Has there been any kind of use cases that popped up that had you folks kind of scratching your head and thinking, oh, I didn't think that someone would use this security feature in this type of workflow. So I would say I always want people to use it in every workflow. So I wouldn't say I'm surprised, right? The whole goal is to protect your data wherever it is. But I think we, I would say there's a wide variety of workflows and use cases and a wide variety of verticals, right? From manufacturing, right? We
Starting point is 00:22:45 talked about that where you're interacting with IoT devices, robots that are dealing with a lot of sensitive data and intellectual property as they build systems and manufacture components, et cetera. You have healthcare where you have, you know, critical life-saving type operations that have to operate 24 seven and protect data against theft and ransomware and destruction. And then you have, you know, the traditional corporate use cases with, you know, file data and sensitive information, they don't want to get out to their competitors or to the public. So it's a pretty wide range. And it's enabled by, you know, having all flash solutions or hybrid solutions or large archives.
Starting point is 00:23:28 But as we know, data has become super valuable. And so as people start to recognize that, and that's really what the bad guys are after, people are looking for ways to protect it. All right. Hey, Jonathan, is there anything you'd like to say to our listening audience before we close? No, it's been a pleasure. I always enjoy my time on here in the conversation. All right. Well, this has been great, Jonathan. Thanks again for being on our show today.
Starting point is 00:23:49 Thanks for having me. And thanks to Racktop Systems for sponsoring this podcast. That's it for now. Bye, Keith. Bye, Jonathan. Until next time. Next time, we will talk to the most system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.