Grey Beards on Systems - 64: GreyBeards discuss cloud data protection with Chris Wahl, Chief Technologist, Rubrik

Episode Date: June 21, 2018

Sponsored by: In this episode we talk with Chris Wahl, Chief Technologist, Rubrik. This is our second time having Chris on our show. The last time was about three years ago (see our Chris on agentless... backup podcast). Talking with Chris again was great and there’s been plenty of news since we last spoke with … Continue reading "64: GreyBeards discuss cloud data protection with Chris Wahl, Chief Technologist, Rubrik"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to another sponsored episode of the Graybeards on Storage podcast, the show where we get Graybeards Storage Assistant bloggers to talk with Storage Assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This Greybirds on Storage podcast is brought to you today by Rubrik and was recorded on May 10, 2018. We are very glad to have with us here today for a repeat performance, Chris Wall, Chief Technologist for Rubrik. So Chris, why don't you tell us a little bit about yourself and what's new at Rubrik since we last talked? Hey, yeah, it's been a while, man. It's been like three years. It's good to
Starting point is 00:00:50 catch up with y'all and your audience again. So yeah, Chris Wall, as you said, Chief Technologist at Rubrik. Basically, I was in the field doing consulting and customer type stuff for most of my career. And then about three years ago, I joined Rubrik to lead up our technical marketing team. And kind of that's me in a nutshell. I also podcast at DataNauts. So maybe we have some crossover there for people listening to this show. Well, we hope we do. Yeah.
Starting point is 00:01:17 So what's happened at Rubrik? When we last talked, we thought of you guys as a backup company, but it seems a bit more now. Certainly it's at the heart of our DNA, right? last talked, we thought of you guys as a backup company, but it seems a bit more now. Certainly, it's at the heart of our DNA, right? So providing data protection, which is backup and recovery and long-term retention, all that jazz, certainly at the heart of what we do. I'll say kind of beyond just the tech, since we talked last, we were 50 people, plus or minus, out of a very small office in Palo Alto. It was great kind of startup, you know, DNA soup in there. And since then, we've had 10 releases come out. We started with 1.0, now we're at 4.1,
Starting point is 00:01:53 and a new release will be coming out soon. We're over 1,000 employees, so there's 950 people I had to meet over the last three years. That's good. But nobody left. Good for you. Very few. Yeah, very, very few. Very small amount of attrition, which I think speaks well to the company.
Starting point is 00:02:12 And had a lot of money invested over four rounds. We're at $292 million, sunk into the company with a run rate right around $300 million annual. And now we have a new office. Oh, you're a real company now. Yeah, it's like, oh my gosh have we have like hr and all that normal stuff you know so uh yeah global global campuses now all over the world throughout cork and london and amsterdam and a much bigger place in in palo alto so it's uh it's been quite the adventure of three years and you guys swallowed up one of my favorite little companies datos
Starting point is 00:02:41 yeah that was uh i tell you, there's nothing quite like going through the acquisition of a company when you're, when you start with a company around 50 people, it's definitely a moment that makes you say, wow, but their technology was compelling. The people that are working there are awesome. And really being able to kind of take their protection of NoSQL and distributed databases with their semantic dedupe really blended nicely with what we were doing with kind of the traditional data center application. You know, we talked to Datos before on our webcast podcast, I should say.
Starting point is 00:03:18 Yeah, they're still rocking and rolling. And over time, we should have more to showcase as the two products kind of get melded into one another and the solution becomes a little more seamless with the traditional cloud data management product that Rubrik offers and Rubrik Datos IO. Yeah, I'm looking forward to that. Is it separately branded like that? Yeah, we have three kind of product suites at this point.
Starting point is 00:03:39 There's the Rubrik cloud data management software that runs all of our data protection on physical virtual whatever there's rubric dados io which is its own thing at this point and will eventually kind of be folded into the the main the main kind of a release cycle and then there's a rubric polaris which is our sas offering what's polaris about is this new it is yeah so um just kind of level setting cloud data management is the software that we came out with back in 2015 it's the original is called converged data management we rebranded as there's actually all quite a few compelling reasons to run it directly in the cloud but the cloud data management software is what's doing all the data protection for physical virtual and even just kind of application level protection
Starting point is 00:04:25 across on-prem, cloud, colo, et cetera. And then Polaris or Rubric Polaris is literally a SaaS offering. You just get a turnkey. Here's your URL, log in, assign users and whatnot to it. And what its job is to absorb all the metadata out of your Rubric cloud data management instances globally so that you can start doing interesting things with that using what we call data management applications. The first one of those was called Polaris GPS,
Starting point is 00:04:52 and GPS takes all the data and gives you runway as far as capacity planning, lets you understand how you're doing as far as your SLA audits globally, how are the actual tasks of taking backups, handling globally per SLA, per job, et cetera, just really gives you kind of a bird's eye view of everything. And then you can drill down as deep as you need and start investigating if
Starting point is 00:05:14 there's an issue or if you're not meeting an SLA, et cetera. So my cloud, I run cloud data platform appliances in my data center and in AWS to protect my EMI instances. And that all reports up to Polaris as a central monitoring and control point. Exactly. Think of it as like a global control plane that just happens to be running a SaaS so no one has to deal with it. You consume it. And that through the APIs that are native to the cloud data management
Starting point is 00:05:49 software, that's how Polaris talks to your physical or your virtual, your cloud instances of rubric to gather that data. And then Polaris offers APIs on top of itself for those data management applications to hook into and consume so gps was the first one that we released and this was a couple months ago that we we made this announcement and went ga with it uh and since then we're going to offer more data management applications both things that a rubric are building as well as third parties are building okay so you mentioned uh the cloud data management yeah we i think when last we when last we talked, it was physical and our virtual, but you mentioned that it also runs in the cloud too.
Starting point is 00:06:29 Yeah, when it first came out, it was physical, and it's your standard let's do a turnkey appliance so that you can get into the hands of a customer really, really simply and easily. That's pretty much still the case, except if you want to deploy it on a specific vendor's suite of hardware, such as HP or Lenovo, you can do that too. And we'll go ahead and get that all packaged and prepped for you. The virtual machine comes in a couple different flavors now.
Starting point is 00:06:51 So you can run it kind of as yourself for a remote office, branch office, using what we call Rubric Edge. And that's a bit of a more lightweight single-node virtual machine that's job in life is to capture data at those edge endpoints. You know, the things like file servers and maybe a domain controller, something like that that's running at your edge site. And the goal is to archive it or do long-term retention to public cloud or have it come back to a master site in your data center, your colo. But that's edge. There's also an error version of Rubrik that is meant for MSPs to kind of do the same thing so that if you don't have a very large footprint physically,
Starting point is 00:07:26 maybe you're a smaller shop, you can work with a MSP to provide you a Rubric Air instance and that can handle more workloads, more intense workloads and bring that back to the MSP for long-term retention and DR and things like that. And then Cloud Cluster is where you're deploying a four-plus node cluster of Rubric instances that are natively captured as either Azure VMs or AWS AMIs running in EC2.
Starting point is 00:07:49 So any one of those are valid, and there's no requirement to run the physical version of Rubric if you're doing, let's say, a cloud cluster implementation. And the backend storage for the Azure or the AWS is object storage? It would be block storage. Yeah, the actual backend. AWS is object storage? It would be block storage. It would be block storage, okay. Yeah, and think of that's where the actual,
Starting point is 00:08:11 like the services and the applications that run in Rubrik are at, as well as potentially some short-term holding of the data as we grab it and then figure out how to make it efficient and whatnot. Most customers, however, will go ahead and do long-term retention or even short-term retention in some cases to object or even to another cloud environment because that's a perfectly valid
Starting point is 00:08:29 choice too and that's where you could go with any of the rubric uh solutions i guess yeah yeah it's it's pretty common like one thing why we try to steer away like we're not looking to be your your storage solution for long term is the idea is you have it come into the rubric from a control plane perspective, whether it's an appliance or cloud cluster or whatnot, and we understand what the application was doing and the ACLs and the permissions and the users and all that jazz. It comes into our system. We then fingerprint it, figure out how to dedupit and compress it,
Starting point is 00:09:00 where to put it based on the SLAs you generate. And then more often than not, it's going to leave the Rubrik system and go somewhere else, but we control that action so that it can go to Amazon S3 or Azure Blob Storage or Ascality or Cloudian, something like that, which offers a really cheap and deep object storage repository. That's pretty interesting. So if I was using the traditional appliances, would there be like the current round of backups stored in that in internal to the appliances and everything else in the object store? With metadata, of course, across all of that, right? Yeah. I mean, the most common scenario that we see is folks will set up an SLA saying, I want one year of retention on this particular application. And within that year, I want two to four weeks of that data to live on-prem in the
Starting point is 00:09:53 appliance. And then after that, send out everything to the object store, keeping in mind that we need to make sure that the object store can survive even if the appliance doesn't. So we obviously have all the data that we need to reconstruct the application in the object store. And we're just pruning what we don't need out of the metadata table into that environment. So it's not like there's a dependency on the appliance in case there's an issue with it. The object store contains all the metadata. We back up the rubric appliance itself to it. And we can reconstruct the workload to any point in time based on your S requirements even if it's in the long-term retention environment that's pretty neat
Starting point is 00:10:29 it's it it's close to the solution i've been thinking we we need to move to i've been yeah i've been talking about how object storage is the right place to be backing things up yeah for a while now and the problem with object storage is frequently well the initial purchase is 700 terabytes doesn't make any sense smaller than that and while some of your developers may be saying i want to write applications that use the s3 user interface those applications don't justify the capacity yeah Yeah, some aren't terabyte stuff. So backup's a great way to be the foot in the door and go, oh, I can invest in Cloudian or Scality or something
Starting point is 00:11:13 because it's cheaper than data domains. Yeah. You guys have been growing like weeds. What's going on? I mean, 50 to 1,000 employees? This is pretty impressive in what, two or three years? Yeah, yeah. It's been, I don't know, about 10 or 11 quarters, something like that.
Starting point is 00:11:32 So just right around three years since I joined, and that's when the first version of the product came out. I think it comes down to the fact that what we do at a high level is we take away the operational investment that people have had to make into data protection. And by that, I mean, most customers, when they come to me and tell me, this is what I love about the product. What they tell me is the fact that they don't have to deal with it. Not, not just on a day-to-day basis, but a week to week or even month to month base. It just goes away. If I were still doing consulting, I would hate you for that.
Starting point is 00:12:11 Well, so that's the boring, like we're basically removing the boring part where a lot of times it's set up a bunch of jobs, try really hard to chain them together so that they logically make sense versus, you know, all the weirdness and change and dynamic nature of the data center, which means it's like impossible. Well's not even that it's it's how many media servers do i need and yeah where's the master you know the engineering of an enterprise backup system used to be a nice gig i'd have to spend two or three weeks figuring that stuff out and billing by the hour you guys are killing that off i don't do that anymore there's not a whole bunch of like fiber channel arbitrated loop installers anymore either i mean there's there's not a whole bunch of like fiber channel arbitrated loop installers anymore either i mean there's there's certainly things are you sure yeah
Starting point is 00:12:50 that's not good the change is that now it's about okay you have you have slas that mirror the business languages um that are being used by c levels and directors and whatnot and it's doing what you need it to do. Now it's about having a conversation around, we have all this metadata, what can we do with that? And that's where Polaris comes in and starts showcasing, you know, just information on growth patterns, changes, opportunities to move data or applications from one environment to the next.
Starting point is 00:13:20 And also, I just feel like there's a lot of conversations that as a MSP or a service provider or a VAR, you can have with customers around plugging in data protection into your cloud-first strategy via the RESTful APIs. Because that's really one of the magic sauce pieces of the product is that everything is a RESTful API and the GUI is just calling those endpoints. So if you want to automate with it, I'm not aware of any of the solution that has has that breadth of kind of native integration with their api so that if you're looking to do service now or you know cloud.com or open stack or vrealize or just any kind of software-based management portal or cloud management platform you can literally just plug this thing in with you know a very small amount of work it's a very small investment even building you know i'm instantiating a new application
Starting point is 00:14:06 via a chef recipe, maybe I should include hacking it up too. Yeah, yeah, yeah, yeah. Yeah, I wrote a pretty long piece. The folks at TechTarget asked me to write something about kind of their DevOps-ian piece around protecting legacy or traditional applications, but making that happen in kind of a DevOps flavor.
Starting point is 00:14:24 And I was like, oh my gosh, I can't keep this at 500 words. I think I had like 2000 words to say about it. Because what you're saying is exactly right. You try to take these like workflow-based, continuous integration-based, you know, very, very, very lightweight tool that, a set of tools that are stateless. You try to throw those at like traditional backup, especially around around you know less modern applications it falls apart well less modern applications and here's where you always hit the oh look the backup guys haven't been included in the we're going devops workflow conversation yeah yeah so it's the one thing that never gets carried forward and i'm saying
Starting point is 00:15:01 to those folks like yeah you can do all these all these things that you want to do and that in the kind of application and devops world you just assume our reality that you look at a legacy backup product you're like oh go we can't do any of those things you start putting in rubric into the equation it makes a lot more sense or something of the vein of rubric so they tend to get really excited about it and so do i you know a lot of these container stuffs now are are moving from stateless to more stateful opportunities and stateful applications and stuff so that's where you know data protection is going to make a big difference exactly does kubernetes have the equivalent of the v storage api for data protection gosh i haven't dug that deep that's i don't i don't know but you know they are starting to uh provide stateful container solutions in that space mesosphere and kubernetes and others
Starting point is 00:15:54 so it's only a matter of time before uh well i mean with with the restful apis that you guys have you can probably plug it into the scripts and fire it up to containers anyways. But there's more to this game. Yeah, there's certainly opportunity. That's the cool thing. It's kind of one of those things where as you're looking to try out different solutions in the enterprise, you're not handcuffed by the solution you're working with. I mean, maybe you'd want to try something
Starting point is 00:16:21 we don't have a direct product integration with at day zero, but you can just fire up the API Explorer and say, oh, okay, I just need this application to quiesce itself for a moment and then call the API to grab the bits and bring them over and then release the IO so it can flow again. Just traditional dump and scrape could even be day zero
Starting point is 00:16:39 just to get it off the ground. Or you can go with like a Rubric Datos IO type product where we can talk to these distributed kind of shared nothing you know i'm trying to find a quorum and it is tough yeah the semantic dedupe and dealing with you know a shifting level of quorum to do application consistency is tough and that's why we got data backing up anything that's only eventually consistent as a challenge ouch ouch i mean step one in making a backup push put everything in a consistent state and a lot of these things don't do it so you know and dados is just so much smarter about
Starting point is 00:17:13 what it's dealing with in the data then yeah it understands that environment a lot better than than many of the other products out there i would say and And I think the neat thing on the back end is that the way that Rubrik was designed from a cloud data management perspective was that everything is strictly consistent or application consistent, whichever term you prefer, even to the point where when we live mount a workload and give you a copy of it to play with, and if a node fails or something like that under the covers, it's still strictly consistent. There's no data loss that occurs on our end. So it doesn't really matter what application you're working with. We're making sure that what we present you with is always strictly consistent. Interesting. So what makes Rubrik different from the competition, Chris?
Starting point is 00:18:01 I think the biggest one that tends to resonate with people, just because I guess, Howard, what you were talking about, the complexity of building an enterprise backup offering is the simplicity. It is, I'll say after three years of developing the product, it's still kind of a rotisserie chicken. You set it and you forget it. It just does its thing. You don't have to be, even though we now protect 40 plus different types of workloads across physical, virtual, and cloud, all of those objects are protected basically the same way. You build an SLA, you assign the SLA to an object, which is a VM or an app or database or whatever. And it does the work. There's really no extra heavy
Starting point is 00:18:42 lifting for different types of applications. And so once you've learned the product, and that takes the better part of 10 minutes, you're good. You can handle just about any type of application. And that simplicity is reinforced throughout the product. And I think that's something that user experience and that simplicity and that I'm not going to be in here very much because everything kind of does itself or manages itself intelligently, is the major differentiator with the product. You know, it was designed that way from ground zero. It's not like under the covers there's this nasty SDK being consumed to call, you know,
Starting point is 00:19:15 like old C-sharp code or some Windows application or something. Like, these are all little services running on distributed cluster. Well, as long as all the moving parts keep working i'm okay with stuff lashed up but something always something always ends up breaking that's the other thing you know the simplicity also boils into what you as the operator are having to face because i think the one thing i had to deal with that i really hated was the morning triage where every morning i get this report of backup job failed or just failures, you know, and the failures are very cryptic and arbitrary. And I would spend a couple hours basically every morning just fixing and like, like a janitor, you know, like going in
Starting point is 00:19:56 and unclogging the toilets from these old backup job failures. And half the time, you know, the backup job failed because it couldn't back up the paging file. And so now it's not really a failure. You just have to redefine the backup job to go. Don't tell me when you can't back that up because it wasn't important anyway. Yeah, exactly. And that's where we've talked about. That's kind of the that we're talking about.
Starting point is 00:20:17 The second kind of strength that I feel differentiates Rubik in the market is the control plane. The actual underneath under the covers, it's called Cerebro. And that's literally the brains, all the algorithms, all the intelligence, all of the smarts that go in the product are at the Cerebro layer. And its job in life is literally, it's a distributed services architecture that's running across all of the nodes, whether they're physical, virtual, doesn't matter. And its role in life is to assess the complexity of what you're trying to protect and assess the complexity of how you've built your topology, you know, where the cluster's running, where the applications are at, where the archives are, all that jazz, and literally makes the decisions to make sure that those SLAs are
Starting point is 00:20:59 hit. You know, if you want a four-hour RPO, it's going to assess the workload, see how long that takes kind of crunch the numbers and do its best effort to get that done correctly and learn from itself. If it's slightly deviates, you know, please tell me it's smart enough that when I say I want a zero RPO that it comes back and says, Nope, can't do that. Please invest $4 billion. There is no zero, zero RPO, um, with the product. Yeah, it does yell at you if you're like, here, take 500 exabytes and make that happen in a one-hour RPO. It's going to say, that's not possible, but here's what I can do. I thought he was 5,000 rubric nodes. Well, yeah, I'm sure the sales guy would love to do that with you.
Starting point is 00:21:42 I'm assuming you don't have 5,000 in the end. If you have one appliance with four nodes, it's probably not going to work. And so that's a big one, not having to. I don't like systems that require me to be all knowing and omnipresent about my data center because there's no way I'm going to know everything. That's just not possible.
Starting point is 00:22:01 Unless I have a very tiny 10 server environment. And even then I might not know a few things i don't know everything in my lab your lab is like the size of a medium-sized company so true i like to consider myself omnipotent but we shouldn't go there it's omnipotent we really want to work on that's probably it yeah and so that's it's that's its role in life is to take all of the guesswork out of it i try to equate it to imagine if a city like chicago or la or whatnot were to try to run with you know traffic cops instead of stoplights at each correcting all the traffic that would be that's basically what we're asking people to do with traditional backup systems or even
Starting point is 00:22:42 kind of the more quote-unquote next gen backup systems where they have really crude scheduling technologies. And Cerebro is extremely advanced because we recognize that as a problem. It was one of the reasons we wanted to build Rubrik. Yeah. You know, all those things being stipulated, I'm still impressed with the rate of growth that you guys have because traditionally backup applications are just so sticky. People may have hated NetWorker or NetBackup or TSM, but they knew that was their retention method. And so even if they stopped making backups with them, they'd have to keep that infrastructure around for years.
Starting point is 00:23:22 True. Yeah, I think it's like having it's a, it's like a, it's like having six arms and punching all at the same time. We've got the simplicity and the control plane. So we're telling them it's going to be really easy to get up and running. You're not going to have to deal with it day to day. Uh, if you're spending a couple hours a week, that's probably going to go to a couple minutes a week. There's APIs to integrate it to your cloud first initiative or your cloud management platform or whatever you're trying to do private or public. All the search is native, meaning it's
Starting point is 00:23:52 no harder than Google search to find just about anything you want, a file, a version of a file, a whole database, whatever. We can move the data to the cloud. So cloud mobility is a pretty strong facet of the technology. And we can actually build virtual machines that you set to retain in a public cloud. Like you can back up a VM that's running on a vSphere cluster locally, archive that out to Amazon S3, and then go to and say, oh, you know what? I want an EC2 instance out of that. Build it for me. We can do that. And you guys do the conversion? We do the conversion. We do the sizing. We put it in the correct uh security group the right vpc uh we give you the the actual global unique id for the
Starting point is 00:24:32 workload and pass that through the interface so that you don't even have to potentially give any access to aws to the developer without vmware aws running i mean this is pretty impressive we don't need anything. As long as you're doing a Hyper-V or a VMware backup on-prem, we can basically transform that into a native EC2 instance or an Azure virtual machine instance running in AWS or Azure, respectively. That is impressive. That's very nice. So think of the move and improve initiatives.
Starting point is 00:25:03 I mean, not everyone likes lift and shift. I get it in a perfect world i wouldn't do that either but if you're hitting the wall and you need provisions and local stuff you can't get it or you just want to play with it and then finally security everything's encrypted it's all your keys even the also this is the level we commit to security even the node to node communication is using ssl certificates to one another in a zero trust model so it's not even like we don't even trust our own nodes but natively we make sure everything has a certificate and is assigned to one another we don't even trust ourselves yeah i don't know who you are give me a certificate all the data is encrypted the backup data all
Starting point is 00:25:40 the object archive all that stuff is encrypted yeah there's there's no reason to not and especially with gdpr coming around the horizon like having the data just feels like a big big no-no oh but now you're opening the right to be forgotten can of worms yeah well even though it's encrypted you know we still have access to all the metadata we still control the whole flow from ingest archive right um but you provide the keys you know we as rubric can't see those keys we just use them to execute your will when you're archiving at rest yes i like that term yeah and make sure that you write that key store that key in several safe places for emergency write it down physically put it in a security box you know don't save it on
Starting point is 00:26:22 your enterprise storage array that may fail. Yeah. Yeah. All right. So this has been great. Howard, any last questions for Chris? No, I think I got it. Hey, Chris, any last comments to make to our listening audience? I just want to let you know that we've put together
Starting point is 00:26:38 a special URL that we're going to use specifically for your Gray Beards on Storage audience. So if you want to get a free kit with some goodies in it and learn some more about Rubrik, go to rubrik.com slash G B O S. Rewards for listening to gray beards on storage. I love it. Well, this has been great. Thank you very much, Chris, for being on our show today. It was my pleasure. Thank you. And if you enjoy our podcast,
Starting point is 00:27:04 please tell your friends about us and take some time to review us on iTunes as this will also help get the word out. That's it for now. Bye, Howard. Bye, Ray. Until next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.