Grey Beards on Systems - 150: GreyBeard talks Zero Trust with Jonathan Halstuch, Co-founder & CTO, RackTop Systems

Episode Date: June 22, 2023

Sponsored By: This is another in our series of sponsored podcasts with Jonathan Halstuch (@JAHGT), Co-Founder and CTO of RackTop Systems. You can hear more in Episode #147 on RansomWare protection and... Episode #145 on proactive NAS security. Zero Trust Architecture (ZTA) has been touted as the next level of security for a while now. … Continue reading "150: GreyBeard talks Zero Trust with Jonathan Halstuch, Co-founder & CTO, RackTop Systems"

Transcript
Discussion (0)
Starting point is 00:00:01 Hey everybody, Ray Lucchese here. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This Greybeards on Storage episode is again brought to you by Racktop Systems. And now it is my great pleasure to once again introduce Jonathan Halstuck, co-founder and CTO of Racktop Systems. So, Jonathan, why don't you tell us a little bit about yourself and what Zero Trust Security means for storage? Hey, Ray, thanks for having me on the show. Excited to talk about that today.
Starting point is 00:00:48 So, yeah, my background's in defense and intelligence and protecting data and also understanding how adversaries and nation states go after data for their own purposes. And really, it's all about collecting data, right? When we think about cybersecurity, sometimes people think about the network first and endpoints, but really, the bad guys are after your data. So what we're thinking about is putting the protections as close to the data as possible where the data lives. And if you think about your largest asset, it's that unstructured data that typically makes up 80 to 90% of an enterprise environment. And that's the stuff that lives on a NAS or a file share. And so we wanted to take a zero trust approach to that security, right?
Starting point is 00:01:31 Everybody's talking about zero trust and what that means. And really, if you break down zero trust, it moves away from that kind of implicit trust evaluation to kind of one time saying, yeah, I'm going to give you permission to read and write to this data. And then you always have permission to a more dynamic evaluation of trust where you're basically evaluating trust for each transaction to an enterprise resource. And in this case, we're talking about that biggest enterprise resource, your unstructured data. So in real time, after a user is given read write permissions to a folder or read permissions to a folder,
Starting point is 00:02:03 we're then evaluating, hey, do we want that file operation to happen in real time? And if it seems nefarious or suspicious or malicious, we can alert on it or block it. And we're doing it where the data lives so that you don't have to require endpoints to be deployed to monitor this or agents. You can actually do this right from the NAS or the file share itself. So my understanding of zero trust architecture is it's mutual authentication happening almost all the time during transactions that go on and stuff like that. The challenge with storage is, you know, there's lots of transactions that go between a client and the storage server. So how does one actually implement something like zero trust architecture in a
Starting point is 00:02:46 storage environment? Yeah. So from that aspect, a lot of that capability is built into the protocol and we're not changing how that works, right? So you have different protocols from different eras. So NFS v3 is a stateless protocol. And so every transaction is going through that log on type of event. You have newer protocols like NFS 4.0 and later, as well as all the SMB protocols popular around Windows that are stateful. And with those, you get the concept of a log on and authentication, and then a session that you're logged into where you have that session for a period of time. And different policies will dictate how long that session can go on for before you need to reauthenticate to create a new session. And both the client could log off and end the session or
Starting point is 00:03:34 the server could say, Hey, it's been too long. We need you to reauthenticate and start a new session. And so that's kind of the base of the protocol. So where the real differences in the real zero trust approach that we bring to the table though, is what happens after you authenticate, right? So somehow either that credential was gotten, you know, it's the right, it's the actual user and they're using their credentials, or maybe the credential has been compromised and it's a bad actor that's using that credential. And then it's looking at what are they doing to access the data on the storage and then making that zero trust evaluation for those operations, right? The read of the file, the modify, the write, the delete, that type of stuff.
Starting point is 00:04:16 So that logon stuff's kind of handled at the protocol layer. We don't really have to do anything to be compliant with zero trust from that aspect, but then it's that user entity behavior analytics on top of that afterwards. So this is based on trying to understand how bad actors would differ from a normal actor looking at a file or something like that. Is that how this plays out? Yeah, essentially. Because if you think about it, a bad actor, if they steal a login credential and have access to get to that file share somehow, or they co-opt even a multi-factor authentication login, once they're in and they have the permissions, right, you've logged in now, even things like encryption tend to be tied to credentials and things. So they're going to be able to decrypt data and things like that. So then we have to monitor the behaviors of those users to see if they're doing something bad. And it was another thing that was brought up, something like, I think it was policy-based access controls. Is that something you guys have support for? I mean,
Starting point is 00:05:14 different sorts of identification or authentication capabilities based on types of data, I guess. I'm not sure quite how it all plays out. Yeah, so policy-based access controls and attribute-based access controls kind of tie together. And that is something we are supporting among different organizations. One of the things that was started really by the government, specifically the DOD,
Starting point is 00:05:39 but has become more kind of relevant in the mainstream is basically having policies about who can access a particular type of data with an attribute. And maybe that attribute is PII data or HIPAA data. So it would be the ability to access that type of data as well as from what either system or machine or location and tying all that together to determine kind of a policy approach beyond just our traditional kind of role-based access controls. And so that gets more involved and can be valuable even in the private sector. So in the government, they do it and they call it multi-level security where you can have unclassified data, secret data, top secret data.
Starting point is 00:06:26 When you're on a top secret machine and top secret network, you can browse down and read secret data and unclassified data and write to the top secret level. If you're on a secret level machine and network, you can only see secret data and below. You wouldn't be able to see the top secret data. So all that policy engine kind of gets tied in and rolled in. And so it's typically happening outside the storage, but the storage is questioning the policy engine. Do I allow this access to this particular file or not? There is to back to kind of, is it chatty or does it require more IO? It does, but it's definitely doable and can be architected with caching and sessions to still be performing and even be used in high-performance type storage applications. Yeah.
Starting point is 00:07:11 So, I mean, there's a, you know, I do SSH kind of calls quite often to servers and stuff like that. And every once in a while it comes back with, you know, a hash has changed or something like that. And I kind of typically ignore those sorts of things, but you can't ignore these sorts of things in a real secure environment. Right. You're saying like the fingerprints change that somebody changed the servers or something like that. Yeah. And I think we all know kind of security is all about layers and defense and depth.
Starting point is 00:07:39 And so, you know, as you get into more secure environments or places with more hygiene and rigor, you know, you get into the fact to your point of like, it's not just self-generated certificates or self-signed certificates. You're using certificates from a certificate authority. You have potentially, you know, centralized key management and other things to ensure all of the keys and the certificates are being managed. They're being rotated. They have, you know, expirations of certain amount of times you're revoking certificates. So I think it's, it becomes a whole ecosystem, right? As you start to get into these more advanced security models,
Starting point is 00:08:13 you can't just do one thing. You need to do a compliment of things and do it in a way that it can be managed and monitored. So, so from a zero trust perspective, the types of things that look different are the sort of policy-based, access-based, access controls, and that sort of thing, a little bit more stringent credential authentication kinds of things. Is that what it means from a storage perspective? From a storage perspective, yeah, it could mean leveraging attributes on the file. So not only are we being rigorous about what device and network you're coming from, we're restricting where that can happen. Then we're looking up the policy about you as an entity and your access to these specific files that might contain different categories or classifications of information like PII sensitive information or financial information, and then using those
Starting point is 00:09:10 attributes and rules to determine if you're going to allow that access to happen. And then that zero trust approach, it kind of fits into that, right? Because with zero trust, it's really evaluating trust for each transaction. So in the case of just the true ABAC, it's just doing it to say, hey, do you have access and are you authorized to do this? I think the zero trust approach takes it up another level and says, do you have access to do this? Can you see this file? But then does this seem normal or does this seem suspicious that constant evaluation, remediation, or not remediation, but constant
Starting point is 00:09:45 mediation to say, yeah, this person has access. This seems like it really is the legitimate person and they do have access to this file. So we're going to allow the read to happen. But in the zero trust approach, in the context of everything else, you might say, well, this seems suspicious because it's not a normal time of day for this person to be accessing this data or they're accessing an excessive amount of data. So maybe we want to alert the security operations center to investigate this and observe what's going on. Or even further, if we're more concerned, maybe we want to block that activity until it can be investigated.
Starting point is 00:10:19 So from a zero trust perspective, it's more than just logins and credentials and things of that nature. You're now starting to get into modus operandi. How are people or how are applications actually referencing the data? Is this something that's normal or abnormal? And then what should we do about it? Exactly right. Yeah. It's that kind of what's normal, what's not normal and observing that and learning that and then responding to that. And, and I think that's, that's the big, that's the real big difference.
Starting point is 00:10:49 And that's the zero trust approach to things. And that's not traditionally what we've seen in network attached storage. Traditionally, what you're seeing is somebody gets to read, write privileges to a folder. They have privileges to do anything they want to read and write files that folder until somebody goes in and removes their credentials at some point. Right, right, right. So how does a storage system learn what's normal and what's abnormal in a stored environment? Is this something that's done offline or is this something that's done in real time at the system, at the customer's environment or both? So it's a little bit of both. So some types of behaviors are kind of, can be determined right away and can be done without really, it can be, the models can be trained and learned ahead of time and then deployed to the system. So we can also look for things like the use of admin credentials. We know right away if, you know, an admin credentials being used, which is a potential threat that, hey, it's an admin credential just because of the way we interact with AD or LDAP, we know that's a privileged user. So we can alert on that and take actions based on that.
Starting point is 00:11:47 We also know when you're starting to overwrite files, that's a behavior that's different than what traditional applications and users do. When you get into kind of long-term analysis and trends for user types and groups, that's where we kind of have to learn per customer. So you'd learn what those trends are in the customer environment in a learning mode, and then go into an enforcement
Starting point is 00:12:10 mode or past observation mode where you essentially say, okay, when we start to see these anomalies, we're going to take actions. And as a customer organization, you can define what you want those actions to be. It could just be provide an alert or it could be you want to block the access of the user account or client IP to further data until it can be investigated. It's almost like learning, training an AI system
Starting point is 00:12:34 and then using it to perform inferencing and stuff like that. Exactly. And so there's actually a learning mode as well as an enforcement mode in the storage system. Exactly. Yep. And everyone's starting to get used to the concept of using AI and machine learning in order to aid or assist us.
Starting point is 00:12:55 I definitely don't think it's replacing us, but obviously you see that it's been able to be a big tool that we can use. And we see AI ops growing within the IT sector. And so we're leveraging that the same way in our product to help, you know, storage admins and security admins alike, uh, with, with the storage. From a, from a zero trust perspective, does it introduce any, um, I was going to say, I'm not sure if the Racktop system has a separate storage JBOD or if it's integrated into the appliance or I think in some cases it's software only, so it must be different to some extent. Are there some sort of protocols between the storage server and the JBOD in this case? Yeah, so it depends on the architecture. So the software we deploy, whether you deploy it as a virtual machine or as a physical instance with bare metal, is the same.
Starting point is 00:13:48 But the physical capacity that we use to store the data can vary depending on the deployment, right? So in a virtual machine, it could be an iSCSI line being presented to it. We could be attaching elastic block storage in the case of an AWS instance, or on an on-prem deployment, we might have a bare metal appliance with direct attached disk where we're talking over SAS, or what we see to be even more common these days is where we're working with the leading OEM SAN providers and taking iSCSI or fiber channel LUNs from them, presenting those to our controllers or our software, and then laying a file system down on those LUNs, and then storing the data
Starting point is 00:14:31 on those LUNs provided by the SAN. And we can leverage, you know, one SAN or many SANs to the same set of controllers. Multiple controllers gives you high availability and can also scale out performance that way too, as you start to scale up in the need for, you know, either bandwidth or IOPS, et cetera. Those cases on the iSCSI protocols would be, you know, the dominant solution to provide logins and stuff like that. Is that what you're saying? iSCSI between us and the capacity, but the users and the data is going to be served over SMB or NFS, right? Yeah.
Starting point is 00:15:09 Exactly. I got you. It can be flash or hybrid, right? Whatever they want, right? We have the ability to use hybrid architectures with a little bit of flash and spinning hard drives that give you the economics
Starting point is 00:15:23 of the hard drives and capacity, as well as the performance of the flash. Or for those really demanding workloads, we can do an all flash tier. So we give you that flexibility. And then all of our solutions leverage RAM as a primary caching tier to deliver that low latency and high IOPS. But it also helps with being able to perform all these security operations as users are reading and writing data to the system and providing that zero trust evaluation. And this sort of this learning and enforcement modes, those really depend on not just, you know, it could be an application, it could be the user, it could be, you know, it's very specific to a specific IO activity, I guess. Is that how I read this? I mean, because I mean, different things are going to be bad for others.
Starting point is 00:16:12 Right. Like applications are going to have certain ways of doing things. You know, they might work on specific types of files and file extensions. The way they handle those files is going to be one way. The way a typical user opens files, reads files, modifies files is going to be typically different. And then what we're going to be looking for is those long-term trends as well as those users are members of groups. We kind of know that through group membership and active directory, where they're coming from. We know the machines and all that stuff. So we're able to tie those pieces together, which is why things like machine learning and trend analysis is very helpful because there's multiple factors we can leverage to make those decisions. And that's kind
Starting point is 00:16:53 of the value of tying all those pieces together and understanding that. And also understanding not everything's the same, right, to your point. So it's not like you treat the whole system identically, right? It could be this data set and this type of data has this type of trend and pattern. This user and these groups have these trends and patterns and they work on these data sets. So there's lots of things you can do to segment this and isolate it as well and to simplify the problem. And some of the things that we do as part of our real-time analysis, you know, doesn't really require training and learning
Starting point is 00:17:29 out of the box. The customer gets that benefit, you know, to detect things like ransomware, privileged access abuse, and excessive file activity. And those would be standard regardless of the user environment and that sort of stuff. I mean, those are things that you've learned offline that you would deploy automatically as part of the system. Exactly. Yep. And we have assessors, for instance, for specific types of ransomware, as well as generic assessors looking for a zero-day type ransomware attack. The difference being that on the specific assessors,
Starting point is 00:18:00 we can build up confidence much more quickly to then say, hey, we think it's this specific variant of ransomware. And these are the affecting, you know, user account and machine versus generic. It might take us a little bit longer to build up confidence that it is a ransomware attack, but we will still stop that attack as well and make it easy for you to remediate and recover from that as well. And you mentioned that the actions that are actually taken are something that is almost configurable by the customer. Is that true as well? That is true. So out of the box, there's a default configuration that we provide that we recommend. But you can think of it like a data firewall where there's a rules engine built into the user interface that it has the
Starting point is 00:18:41 default settings, but then an organization can go in and change those settings as well as add new rules. And also as they find things in their environment, like they might be doing something that requires the use of a privileged account, and they're going to be restricting access to a particular data set. And maybe it's a long-time rule, or maybe it's a temporary rule that they want to allow where they're doing something like a data migration. And the system allows you to put in a permanent rule or a temporary rule and then choose the actions you want to happen with that rule. Ignore the behavior, alert on it, or block it, or do all three. It seems fairly sophisticated and challenging to correctly set up.
Starting point is 00:19:29 Is there some tools that you have in place to make this, I mean, obviously the defaults and things of that nature, but to make this easier for the customers to deploy it in a secure fashion? Yeah. So I think having observed some of the sales calls, I think people get a little concerned, especially when it's been, you know, because a lot of times it's the storage admin or the infrastructure operations team that's going to be purchasing and managing the brick store. Sometimes they're a little concerned, hey, is this going to be a big time requirement for me? Is this going to be very challenging or difficult? But we've been successfully able to show them that it actually is not that difficult to configure.
Starting point is 00:20:03 It's kind of easy. And then once you configure it, it's not like some of the other file management tools that have lots of false positive because of the way we do it. So you're not going to be barraged with that. But we also make it very easy in that observation mode to see what's going on. So in the observation or learning mode, you're seeing, hey, here's a case where a privileged account is being used. It's accessing data here and it's doing this. Normally, that would fire off an incident and put an enforcement action in potentially. With this, you can right from there create something like an ignore rule where you choose what you want to happen right from there. Yeah, that would be great from a perspective. So it's like the customer actually gets to determine whether it's an observation mode or enforcement mode for how long.
Starting point is 00:20:50 And then once it's in enforcement mode, all those things start to really trigger actions and things of that nature. That's great. And what we've seen, too, with a lot of customers is they get visibility they haven't had before. So they can see, hey, I kind of remember we did that and we meant to go back and clean that up and we haven't. So sometimes they'll go and make those changes and improve the hygiene right then. Other times, that's just the way it's going to be
Starting point is 00:21:14 or it's going to be for a while. So they'll just allow that behavior to happen. This is great. All right. Well, Jonathan, is there anything you'd like to say to our listening audience before we close? I appreciate it. It was a great opportunity to talk about Zero Trust and Storage.
Starting point is 00:21:28 We'll be at HPE Discover later in June. And if you're there, please stop by our booth. Great. Great. Well, this has been great, Jonathan. Thanks again for being on our show today. And thanks again to Racktop Systems for sponsoring this podcast. Thank you.
Starting point is 00:21:43 That's it for now. Bye, Jonathan. Bye, Jonathan. Bye, Ray. Until next time. Next time, we will talk to the system storage technology person. Any questions you want us to ask, please let us know.
Starting point is 00:21:55 And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.