Grey Beards on Systems - 88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019

Starting point is 00:00:00 Hey everybody, Ray Lucchesi here. Welcome to another sponsored episode of the Graybeards on Storage podcast. This Graybeards on Storage podcast is brought to you today by Cohesity and is being recorded at VMworld on November 28, 2019. We have with us here today John Hildebrand, Principal Technologist at Cohesity. John, please tell us about what's new at Cohesity and what's of interest for you at VMworld. Thanks a lot for having us, Ray. So, a lot of things happening over there at Cohesity.

Starting point is 00:00:44 I know the last time we had conversations it was roughly around February so many many new things have started to crop up especially in Cohesity land. So recently we just had a major, even though the numbers don't seem to indicate for version number, but we've had a pretty major release here in the last last month. Another expansion for some applications on the the application framework for our system. The two of them being a runbook automation capability so take utilizing that that backup of a say of a VMware VM and being able to utilize it in a runbook fashion to spinning that workload up as an EC2

Starting point is 00:01:26 instance as an example inside of AWS. So, somewhere like a sandbox solution or something like that, to be able to bring it up as a sandbox test environment? It can be. The point of development is to eventually get that to be more of kind of a DR solution at the same time. But with version ones that tend to come out, sometimes the functionalities in some cases are a little behind. I know from my perspective there are parts of it where I wish we could have some better integration. But that will come in time. Yeah. If you think about it, DR, moving the data, is just as important as making sure that the plumbing is correct on the other end. So still working through some of those components.

Starting point is 00:02:13 At least at the moment we're proving that the data can be moved over. And right now providing a scripting engine to kind of handle some of those components. But what, over the course of time, hopefully those are native components that are built into the canvassing to be able to create the workflow in question. And then on top of that, our application CyberScan. So what this application does is utilizing the backup copies that we create on our device, and what we do within the application framework is we create a space for that virtual machine to be brought up in.

Starting point is 00:02:48 And from there, we're subjecting it with our partners at Tenable to be able to run security and vulnerability scans against those copies. So basically- The application and the data? Or, oh really? Yeah, so the idea with our partners at Tenable,

Starting point is 00:03:02 we have access to their, essentially their threat matrix, along with all the CVE articles that are out there. So the idea is that not only that we can scan the operating system layer, but then applications that are installed within to give folks a better idea, especially with point in time recoveries. If you're able to associate that, think like in a ransomware recovery. The idea being that maybe you can identify, especially since ransomware is getting crafty, we'll put it that way. In the past, as soon as you got it, it would definitely start doing its encryption thing. Now, think of it, ransomware kind of also doubles as a little bit of like, maybe like a Trojan horse to some degree as well. So it sits around for a while and doesn't do anything

Starting point is 00:03:46 until it's nefarious throughout your whole environment? Yeah, it's basically doing all the discovery to find all your weak points, what's already in there, and it knows exactly its attack vector that it's going to take in your enterprise. So the idea there is with the scans of the backup data, if it's been sitting latent, you might be able to discover it as sitting latent before it has a chance to activate. So maybe your recovery,

Starting point is 00:04:11 you now know which of your backup copies is going to be the one that obviously doesn't have the ransomware sitting on it, but also gives you an idea of, in fact, when you bring that back up, what you specifically need to patch to ensure that that doesn't happen again. Well, that's very nice. So, yeah, so we're utilizing, again, with the application framework, as most of the systems,

Starting point is 00:04:37 we're not always constantly backing up data 24-7. So let's throw some of that compute power at some of these other use cases. And the marketing tagline is, derive more from your data. So here, definitely helping from a security standpoint to be able to, again, I mean, within the last week or two, how many ransomware public notices have we heard? I think Texas has been hit by a thousand of them every every County Department in Texas been hit by ransomware or something like that. So yeah so it's a very important topic for not even just the largest of enterprises but even even small companies are getting hit big time. It's never about specifically someone specifically targeting an individual enterprise in some cases I mean some of these attacks they're they're literally spray and pray

Starting point is 00:05:28 that they hope they can find something so the more they have out there the more they can ask for it I wanted to mention I tagged this thing called data is never at rest or something like that so you're taking that data even though it might be backed up and sitting there pretty much not doing anything but you can take that data plus the applications actually do stuff with it. That's great news. So again, basically trying to bring forth the power of our application extensibility.

Starting point is 00:05:56 Especially from a Cohesity standpoint, the backup portion's always been out there. That's been one of the first use cases. When we were doing a storage field day back in February we tried to do more about the the NAS side of the equation and then the application portion brings that that next level into it as you know that the first two are kind of the gathering of the data and the application component as more of a way to again try to extract the value out of that data.

Starting point is 00:06:25 Bring something to the forefront based off that latent data. Right, right. Well, that's very interesting. And that's a new feature coming out in the last version of it? Yeah, so a 6.4 we just announced. It is GA, so it is out there and available for customers. Although our internal data says that most of our customers like to stay on the...

Starting point is 00:06:49 It takes a while for them to migrate over a new version and stuff. LTS versions are very popular for a reason, because they provide the most stability. So 6.4 being the newest, folks are definitely interested in that. I've had, especially in the booth, plenty of conversations around the cyber scan side of the equation. And so it's just one of those that definitely folks are understanding what those things are bringing to the table now.

Starting point is 00:07:16 And, oh yeah, the interest is huge. So again, not just backup, not just NAS, but now some of these ways to kind of crack that onion open. Right, right. You mentioned LTS, so big releases like 6.0 or 7.0, those would be LTS releases? What do you call them? Well, so one of these sorts of things that,

Starting point is 00:07:38 it's very interesting the way that sometimes our engineering department approaches this. That I've always seen it that a major release usually means new number and start from there. So it's kind of odd from my perspective to be talking about our brand new major release, 6.4. So what it's showing, though, is the fact that our engineering department is definitely pushing out a lot of new features in a much more rapid fashion. How often do you guys release major releases or versions of the system and stuff like that? It's definitely pushing out a lot of new features in a much more rapid fashion. How often do you guys release major releases or versions of the system and stuff like that?

Starting point is 00:08:09 Well, just to give you an example, so I started at Cohesity about 14 months ago, and version 6.0 came out about two weeks, not two weeks, two months after I started. So within 12 months, once a quarter almost, roughly. Basically, so there's been a major release. The idea is to try to accelerate also those LTS releases to the newer packages as they come along. But 6.4, I'm personally, I call it a very audacious release. We saw something to the effect of, you know, in the hundreds for new or updated features that were added to it. And again, over the course of time,

Starting point is 00:08:47 we're going to expect folks will adopt that and things like the runbook and the cyber scan component, hopefully pushing that forward. I used to work in disk technology guys, and they would take years to move up to a new release, stuff like that, very conservative stuff. Application, backup, data protection, higher levels of storage, maybe a lot quicker, stuff like that. Very conservative stuff. Application, backup, data protection, higher levels of storage, maybe a lot quicker

Starting point is 00:09:08 and stuff like that. Nowadays too, they're much more used to seeing a lot more releases and stuff. Well, I mean, all you got to do is take a look at it. I mean, how many times I fire up an Office application and like seemingly every week there's a new patch available. No kidding. Yeah, I see this all the time actually. Yeah, we don't need

Starting point is 00:09:26 to go there. It's obvious it's changing a lot and by providing functionality that customers actually want and see value in, you can have a fairly quickly transition of the field to a new release, I think. I mean, some of the expectations, I mean, you see it across the industry, idea to public release can be in 90-day cycles or even 60-day cycles, depending upon the engineering effort or the time that's available. Right, right, right. That's interesting. We spent most of our days, when I was doing stuff, it was like half it was development, half was validation. I mean, it was intense activity and stuff like that. So you mentioned like 100 plus features in the 6.4.

Starting point is 00:10:08 What's another significant functionality coming out? Runbook, CyberScan, and? Something that might interest you, especially on the storage side of the equation. So we're starting to definitely on the NAS side. So one of the other gentlemen on my team, Mike Letchen, has been working heavily. As his pedigree, he came from Nexenta. So he's had some insight into some of the customer requests.

Starting point is 00:10:35 And I will always claim, I've always claimed especially to like Steven as well, I am not a storage guy. So at the end of the day I just wanted to consume it and make sure that it worked. But some of the features that we've been noticing in the use cases is for folks who have NAS devices that are probably serving more IO intensive or you know they're spending a lot of money for either I wouldn't say high capacity but data that maybe needs to be accessed more frequently. What's happened is they've also thrown data of infrequent utilization on it. So they mixed the profiles per se, kind of thing.

Starting point is 00:11:19 So, and I'm sure you've probably heard of the term, but stubbing. There's a lot of other storage arrays that would do basically tiering to some degree. But the idea is that we can do the stubbing of some of these devices. We can point at it based off of a policy, say, you know what, if these files haven't been changed or they're so old, let's just move them to the Cohesity platform instead and basically leave a SIM link out there and basically free your space up of some of those devices. So in that case, you're working with other NAS vendors that have higher performance, higher expensive solutions

Starting point is 00:12:00 and trying to free up some of the space and capacity that they would devote to these inactive data effectively? Basically. So the idea is move that data off of those expensive workhorses and use it for something. Well, if you think about it, some of those companies, they have very expensive refresh cycles. So if you're slowing the pace that you actually need capacity or the change for

Starting point is 00:12:24 those devices in question, why not kick those off to something else? Plus then, you know, it goes without saying, now you're on a device that can basically already back up those files to begin with. Yeah, so, you know, to be able to help with that, we've got plenty of use cases, not just with some of the tried and trues like your isolons and some of your NetApp series but basically just just kind of like a generic NAS mount feature that we have where you know I even at home if I wanted to my little Synology device as long as I can communicate on that protocol I can I can get to that particular mount point and perform that same operation so in this case the metadata would still exist on the NAS filer or something like that and they would do a directory list, they'd still see it there, but effectively when they went

Starting point is 00:13:15 to access it, it would automatically access it behind the scenes through the Cohesity System. Yeah, and the idea also that I think I should make, some folks believe that then as soon as I access it I would move it back to that NAS, but in this particular case, at least at the moment, they're accessing it from the Cohesity side instead. Now if there's a reason to move it back, like for instance now the frequency, yeah, the more active, then you can just as easily take it from the Cohesity side of the equation and basically delete

Starting point is 00:13:45 the SIM link and put it back where it's at. I think at some particular point there may be a way, right now it's a one way, but I would expect as time goes on, we'd have the ability to say, oh, put it back because of the, yeah, make it policy based to say if it's been accessed this amount of times over this amount of frequency, then maybe put it back. So John so let's say this is a and I'm not sure it's the right term let's say it's an archive file sitting on Cohesity with a symlink over here and it's time to back up the original directory you're gonna know enough to go out to the symlink itself and go get the grab the data and

Starting point is 00:14:22 back it up from there as well even though it might be on Cohesity that you're backing it up to? Yeah, so the backup is going to be in combination with what is on that NAS device, with what is on the Cohesity device in a single protection job at that point. So it'll, yeah, it's definitely to kind of, again, simplify that whole thing.

Starting point is 00:14:43 We know where the data is located at. You may not have to know exactly to say, oh, I got a backup now, two spots at the same time. So the idea is that, especially over the course of time, we'll get all of that integrated into very single view. Because at the end of the day, as far as what the end users see, it still looks like the same mount point that they're connecting to. Yeah. Yeah. So, I mean, you know, as a storage guy, I figured you probably would appreciate, we're trying to utilize and come up with some really good use cases that...

Starting point is 00:15:13 Make a lot of sense and provide some real value. You know, we always looked at Archive as a way to reduce the backup load to some extent, reduce the time it takes to do all that, and reduce the capacity load to some extent, reduce the time it takes to do all that, and reduce the capacity load, right? And if you can do that sort of thing with your solution, I think it's a high value. Well, and you know, you basically, through that last statement, there was those,

Starting point is 00:15:38 basically like the three reasons on why you would have to go through a refresh or add more to, and basically to kind of slow the cost down of that workhorse sitting out front. I mean, if at the end of the day, it keeps from you having to spend two, three million dollars on a refresh, then slow it down a bit.

Starting point is 00:15:56 Now, again, now those funds can be moved to something else other than just keeping up with your data growth. Right, right, right, right. Well, that's interesting. Yeah, so you mentioned the NAS solution. So this was introduced earlier this year, or was that? So one of the things that folks haven't quite realized with our solution, or they're starting to now,

Starting point is 00:16:17 is that NAS system powers a lot of even our data protection. So it's kind of behind the scenes for a lot of the stuff that you're doing already, right? Yeah, it's our way to interact with essentially the Google file system running underneath on the lowest layers. So whenever we state that we do say any backup operation or say I think we've talked about like SnapTree. So the idea, so when we do a VM backup, we're not doing like an incremental merge. We're not worried about doing like a chain-based and putting things together.

Starting point is 00:16:53 So when we say we've created a snapshot and put it in the tree, that is actually a full image. And that is stored on one of those, like a sliver image inside of there. So the idea is that we access our own file system through, now it may not be publicly accessible, but we call upon the file system and it goes, oh, so you want this back to a VMware environment. We will present it as an NFS mount point back to VMware.

Starting point is 00:17:21 And from there, we'll start the process of either so a like our we dub it instant mass recovery the idea that we can power the VM on on that NFS mount point and then on top of it move it off of it to whatever storage you wanted it on originally or to restore to correct but at the end of the day that's basically so that NAS system underneath has been able to provide that to other solutions. Another example we have like active which we announced not in this version but a prior active directory granular support so basically the idea that we can basically pull up an image, present it to an active directory controller. An active directory control image. Yeah, so basically it gets mounted as kind of a VHD file almost to that device.

Starting point is 00:18:10 So it's presented using the protocols to that device and the data within, and we're able to do comparisons against the active, so what's currently in Active Directory and what we have in the backup. And if you want to, say somebody deleted 10 user accounts, you would be able to identify, oh, there's 10 user accounts in this. So you can see the delta, you can do searches for it, find whether it's a computer object, active directory object, and then state, yeah, I'd like to bring that over.

Starting point is 00:18:38 And basically, it'll perform the operation of, if it can, bring the object over. Now, there's some limitations. Microsoft doesn't want you to basically bring it verbatim and have the password and the hashes already built into it. But then if you're able to at least bring the account back and say this is your default first password, well, there goes a lot of, especially in mass, if I have to recreate hundreds of user accounts,

Starting point is 00:19:04 well, here you go. Yeah, I've brought it back from the dead essentially it's back on that system and now here you go here's your first password and you're back in the system well that's pretty impressive actually and then behind the scenes is this file NAS servers that you use for all this stuff I mean again it's one of those we haven't quite put it at the forefront, but when you combine all those things together, when we claim it's a platform, from my perspective, again, I work for the company,

Starting point is 00:19:35 but we're not kidding when we say it's a platform across the board. Right, right, right, right, right. Well, we had those discussions at Storage Field and stuff like that, and it was kind of surprising to me what all you were trying to do with the platform and stuff like that, but didn't realize that there was some sophistication behind it that you've been using all along that was there. So that's an interesting point and stuff like that.

Starting point is 00:19:58 So what else? What do you think about what's going on in VMworld with Container Land? I was going to say, it's not VMworld. It's K8's world at this particular point. So, well, yeah, I mean, what we're seeing now is, at least from VMware's perspective, is the realization I think we've saturated the virtual machine construct.

Starting point is 00:20:21 Well, they've done real well, obviously. I don't know how much percentage of the market share they have, but I'm thinking 85, 95%-ish kind of numbers. So I don't really want to call containers and Docker and Kubernetes as emerging, but we're now getting to that whole idea that enterprises are starting to adopt it, not just the startup unicorns that we hear about out there developing new apps, but because VMware is a heavy enterprise, now they have to start talking that talk and just the recent acquisitions

Starting point is 00:20:55 that they've gone through and things like that. Plus, honestly, our booth right now here on the floor, Google Cloud is sitting right next to it. I'm sure there's probably a booth devoted just to GKE, so the Google Kubernetes engine. So it's pervasive all over the place,

Starting point is 00:21:14 and folks, I've had plenty of conversations, just like, say, in the vExpert community. Folks are, I don't know if they're quite at the interest stage, but they're curious. I've seen one startup that, well, I don't know if they're quite at the interest stage, but they're curious. I've seen one startup that, well, I don't know if he's a startup, more of a mid-range customer, that had been moving a lot of new functionality to containers, and they started to move back because of the complexity. I mean, containers, to some extent, are intended to be hundreds or a thousand container types of system. Fairly complex, multi-level tier

Starting point is 00:21:45 architecture and stuff like that. It just needed complexity and it's difficult to manage and put it together and run it and stuff like that. But you know stuff that VMware is trying to do with Tanzu Mission Control and all the other stuff, you can start seeing if they can make it simpler, they can have the same sort of impact they've had with virtualization. Well, again, and what it comes down to is if you can, you know, at the end of the day, if you can get more enterprise license agreements on top of it. It's good for VMware. Yeah. So, so yeah. All right. So John, do you have anything you'd like to say to our listening audience? We think this has been a great conversation. Well, like I told you beforehand,

Starting point is 00:22:25 it's been February, so it feels like, what, 30 months ago? Yeah, it's not that long. But just to try to keep up with the pace that we're putting things out, I mean, the website, Cohesity.com, and I can't have any sort of interview without talking about our truck.

Starting point is 00:22:42 Okay, what's the truck? So the idea there is, so we end up putting a data center in the back of a semi-trailer. The idea was if we couldn't bring folks out to the EBC. You could bring the EBC out to them? Yeah, so this thing has been rolling now since last December. So Cohesity.com slash tour, so you can see where it may be at. Sign up in Denver to see what's going on? Yeah, correct.

Starting point is 00:23:07 Yeah, you can sign up. They'll have the event information, where it's going to be located at. Come on out. Chris Collati and I, we put a fully functioning data center in the back of this thing, running off the diesel generator that's attached to it. It is definitely one of the most interesting projects that I've had a chance to work on. But again, yeah, so as we're able to, since it's mostly a dark site while it rolls around,

Starting point is 00:23:34 we try to do our best to make sure quarterly it gets updated with some of the latest stuff. So as you, you know, in the last 12 months, we've had to do some major updates to, yep. All right, well, this has been great. Thank you very much, John, for being on our show today at VMworld 2019, and thanks for Cohesity for sponsoring our podcast. Next time we'll talk to another system storage technology person.

Starting point is 00:23:54 Any questions you want us to ask, please let us know. If you enjoy our podcast, tell your friends about it. Please review us on iTunes and Google Play, as this will also help get the word out. That's it for now. Bye, John. Bye, Ray. Bye, Ray. Until next time.

Starting point is 00:24:07 Thank you.

Grey Beards on Systems - 88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.