Grey Beards on Systems - GreyBeards talk VMware agentless backup with Chris Wahl, Tech Evangelist, Rubrik
Episode Date: August 19, 2015In this edition we discuss Rubrik’s converged data backup with Chris Wahl (@ChrisWahl), Tech Evangelist for Rubrik.  You may recall Chris as a blogger on a number of Tech, Virtualization and Storag...e Field Days (VFD2, TFD extra at VMworld2014, SFD4, etc.) which is where  I met him. Chris is one of the bloggers that complains … Continue reading "GreyBeards talk VMware agentless backup with Chris Wahl, Tech Evangelist, Rubrik"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here and Howard Marks here.
Welcome to the next episode of the Graybeards on Storage monthly podcast,
a show where we get Graybeards storage and system bloggers to talk with storage and system vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
Welcome to the 23rd episode of Greybirds on Storage, which was recorded on August 5th, 2015.
We have with us here today Chris Wall, tech evangelist at Rubrik Inc.
Why don't you tell us a little bit about yourself and Rubrik, Chris?
Oh, absolutely. Thanks for having me on.
So my name is Chris Wall. I'm at Chris
Wall on the Twitters. I've done a number of things over the years around VMware virtualization. I'm a
VMware certified design expert for data center and network virtualization. I wrote a book called
Networking for VMware Administrators, and I run a blog called wallnetwork.com.
That's kind of the things about me that I think are somewhat interesting.
As Sean Connery would say, I read that book.
Was it any good, Howard?
Halsey was an idiot.
Oh, wait, sorry.
Your book, yeah, your book was fun.
All right.
Gosh, okay.
So what about Rubrik?
What the heck is this Rubrik thing?
Yeah, I'm like three and some change weeks fresh at the Rubrik group.
So you should know everything then.
I've been trying to drink the Kool-Aid. It's pretty tasty, I've got to say.
I think it's fruit punch flavored, which is my favorite.
Oh, God. Okay.
I'm not a marketing kind of guy. I'm very technical, so we'll keep it straightforward.
Rubrik is all about data protection.
They market it as converged data
management. And the idea is to make a lot of the complexity around data protection, you know,
doing backups, being able to offer restores, archival, et cetera, go away, make it really
simple, offer it as a 2U appliance where there's no backup software even required and offer cloud as an archive choice instead of tape.
So I'd say that's the elevator pitch of what Rubrik is.
Yeah, I saw something on your website.
Take the backup out of recovery.
What the heck is that?
How can you not do backup?
Well, it's taking the backup software out of recovery.
You don't have any software to deploy
because that's the converge part of the converge
data management. You take all of the backup software that you used to have to run and the
proxy servers and catalog servers and search servers and all those servers that you'd have
to use in order to ingest data and index it. And you bake that into an appliance. And then you've
taken care of the storage problem as well, because we'll ingest the data, catalog it, index it, figure out what the metadata needs to be, as well as putting the data on the platform and doing all those fancy data efficiency things that everyone loves, you know, dedupe, compression, all that jazz.
And then we'll archive it off to cloud, and that also gets the benefits of the dedupe.
So you're not just a backup appliance, but you're also a storage appliance?
It's almost a convergence between storage and backup. Is that what you've got?
Yeah, you're getting it. We've been seeing a lot of, you know,
I mean, really, backup appliances have to include storage
unless you're, you know, here's the server with NetBackup
pre-installed. Good luck.
But, you know, I'm kind of interested in the, we don't need any backup
software because some of us have heard this before. Yeah. Well, go ahead with your question.
It's so deeper. There's, there's a whole series of things I rely on backup software for.
And making the copy is one,
but not the most important of those things.
The cataloging, you know, and the finger, you know,
so what happens if I, you know,
inadvertently delete my Oracle database or something like that?
Is it automatically being copied to some off-site location and stuff like that?
Or how does that work?
Yeah, I think it would help if we go in kind of that workflow a little bit.
And that may shake out questions.
Let's start at the beginning.
Where do you collect your data, young man?
All right.
So today we're a VMware-associated appliance.
We'll do backups for a VMware environment.
And in the future, we're going to do it for Hyper-V, KVM, bare metal, et cetera. But that's what we are today.
And so when you set up the appliance, you tell it, hey, here's where my vCenter server or servers
are. You give it, you know, name and password, all that kind of jazz. It will say, okay, you have
these number of virtual machines. What do you want to do with them? And you build a policy.
And so the policy will say, I want to take backups three times a day and twice a week and four times a month, et cetera. And that's the SLA that we call it. The SLA then overlays on the virtual machine.
And at that point, Rubrik takes care of figuring out when and how to backup the virtual machines.
So there's no backup jobs that you create.
It's all SLA-driven.
But you're also creating the storage for those virtual machines?
I mean, the VMDKs, VVols, whatever the case may be?
There's no Magic LUN, VVols-type stuff.
There's no storage to configure at all.
It's all handled for you.
So you don't have to build RAID groups or anything like that.
Right, because your appliance is the target. Yeah. And we're scale out storage. It was written by
these really smart guys at Google and Facebook and, you know, the typical next gen startup kind
of portfolio of talent. And so they've written the scale out file system as well as the, you know,
master list system so that every node within the 2U appliance can be a backup server. And they
all handle Ddupe and the metadata amongst one another. So it'll reach out using VADP, the vSphere
API for data protection, to the host that has the virtual machine that is associated with that SLA
domain and grab a backup using VADP. So it'll send the data over the wire. The first full is all the data,
and then everything after that uses VMware's change block tracking, or CBT.
Okay.
Pretty standard on that front, right?
And I can tell it, you know, I don't have to tell it by policy when to run the backup.
I just tell it, this is the level of protection I want.
But I can tell it, I can
throttle back the load you're presenting,
right? Yeah, there's a couple
nerd knobs that you kind of get for that.
You know, if you wanted to just let it...
Yeah, it's one of my favorite terms.
If you want to just let it do
its thing, you can say exactly what you
were intending, Howard.
Do it every four hours and it'll just figure it out.
But you can also put bumpers on it and say only run at night or only run in the morning if that is your requirement.
So are you acting as the VMDK or VVOL storage at that point?
Or are you just a backup solution for the VM itself?
Yeah, we will grab.
So we act like a backup target.
Typically, we just look like an NFS data store,
but it doesn't really matter for the case of doing the VADP backup
because it's just going over the wire to one of the nodes,
sending the VMDKs of the virtual machine,
and we can crack those open once they've hit the appliance
because they're actually being written to a flash drive
because every node inside of our system has a combination of flash and disk.
So we ingest the flash, try to make it super quick,
and every node is doing an ingest,
so there's no, like, choke point.
And at that point, we can crack open the VMDK
and index all the files that are in there
and then share that index in a distributed manner.
So we know that we have the virtual machine,
and we also know we have all the files
within the virtual machine,
and they're all searchable in real time, no matter if they're on the appliance or in the cloud
so it really does converge a lot of those you know backup servers proxies catalog databases all that
stuff into one platform okay does does that index extend to common applications like Exchange and SharePoint? That's a goal, right? So right now we have, with version 2.0,
which came out on August 19th,
we have the ability to use our own VSS providers
to do very granular and quality QIYAS
to applications like Exchange and SQL
and Active Directory and so forth.
So we can make sure to do transactional,
consistent backups of those workloads.
I think in the future the goal is to offer more support
around single table restore,
single mailbox restore, etc.
Okay. Go ahead.
You mentioned Cloud as a solutions here.
So behind the appliance scale-out file system is cloud?
Yeah, there's a couple options that you have today.
So remember I talked about that SLA policy for doing data protection?
Within the policy, you can also say, all right, I want, you know,
let's say you have a seven-year retention policy on this SLA.
You could say I want three months of it to be local, and everything after that, I want to be an Amazon S3
or OpenStack Swift or any other, you know, S3-compatible Swift object store. So you just
make that part of your policy, and it'll figure out, okay, I've hit three months on this particular
data point. Let me start shuffling it off to whatever that archival cloud destination may be. And that's all handled automatically for you. So it kind of prunes it out or ages it out
to your cloud target. In the case of Swift, it could be even local. It could be a local cloud,
you know, on-premises cloud versus using Amazon's public cloud. Back to VSS for a second. So your
VSS provider does application support and log truncation and the like?
Yep. Basically, you just provide some credentials so that the rubric appliance can tap into the
guest operating system and pop up or drop in the VSS provider. And then we'll call that instead
of using the... I know you love VMware snapshots. So we don't use that one unless we have to.
Right. Okay.
So do you support, I would call it, replication of the backup store?
So one of the reasons I do backup, in my case,
is for disasters here at the Silverton Consulting worldwide headquarters.
So how would I do that from an off-site perspective?
Is it something I would automatically do it to the cloud?
Archive it to the cloud?
Archive has been around since we came out.
We GA'd in May of 2015.
And actually, when we did our announcement on August 19th, what we did introduce
was replication. So three months after GA, we've already got replication of the product.
It's a box-to-box, so you can go from primary to secondary site, and they all have those SLAs
associated with them. So the replication's baked into the SLA. So if you have a gold SLA and you
tell it, archive after three months, but I also want all
the data replicated for the next three weeks, you can do that. You can get kind of complex with that.
At the same time, the destination that you're going to, so that secondary site, can also be going to
the cloud as an archive. So there's no limitation saying this is only a, you know, replication
target. It can't do anything else. Ah. That's very nice.
Well, thank you.
Synchronous or asynchronous? I'm assuming asynchronous is a backup,
so that would be a reasonable way to go.
Yeah, asynchronous replication based on...
We don't want to hose your network
connection between the WAN, so
we'll make sure that what we're sending is
data efficient. We're just sending the changes.
Okay, so you're doing hash exchange and...
Yeah, it's whatever's left after dedupe compression
and, you know, whatever we actually need to send it to the side gets sent.
Nothing more, nothing less.
And you mentioned dedupe and compression.
So is it...
The dedupe can have different regionality perspectives of it.
You know, it can be global dedupe across the whole, you know, file system.
It could be across a vSphere cluster.
It could be across the replicated instances of the data.
Do you have, how does that work in your scenario here?
Is it at a VM level or?
Yeah, that's usually a pain point, right, is all these little silos of dedupe.
It's something I've certainly not liked.
The dedupe with Rubrik is global.
It expands from the local copies of the data that we have across every single SLA.
There's no boundary for the deduplication, and it also extends into the cloud if you archive to cloud.
Okay.
So generally, as I build a retention scheme,
I want less granularity as I go further back in time.
Yes, I want to be able to go back seven years,
but I don't want to go back to every day of seven years ago.
Right.
You guys can consolidate backups together, right? Oh, absolutely. Yeah. I think our gold SLA by default is something like every four hours and then once a day and then three or four times
a week. And the system will go through that. It'll prune out. If I have 10 backups and I'll
need two of them, it'll go ahead and remove what it doesn't need. It'll prune those as the backup, you know, hits that older demarcation point. Okay, we have all these
weeklies. Let's get rid of all but the one that we need. And that's all automated. The same goes
for if you deleted the virtual machine. There's nothing you have to do in Rubrik to say that you
deleted it. It'll still hold on to that virtual machine based on the SLA policy until it's aged
out and then eventually flushed away. And the SLA policy is at a virtual machine level?
Well, it is its own thing. You can assign it to a virtual machine. You can assign it to
a folder or a host or data center. It's really just saying anything within this logical object
picks up the SLA policy. And then once that's happened, you're absolutely correct, Ray.
It would be a per VM type of protection.
Because we're not going to say, oh, this disk is an SLA policy of gold
and this one's silver.
That would be a little mucky.
So you wouldn't want to protect a VM half on gold and half on silver.
It wouldn't make a lot of sense.
So we'll take the whole VM as the granular.
Well, I update the database every 20 seconds,
but I only patch the operating system once a day.
Yeah.
I think it might be.
I can see it.
I can see where you're going, Howard.
But at the same time, I'm thinking that's a lot of complexity.
I'm not saying it wouldn't be stupid.
It's just.
All right. All right. So
deduplication at the file
level, block level,
chunk level.
I believe it's fixed block, but...
Fixed block. Yeah, it's...
But it is global, like you're saying.
By comparison to, say, a data domain
that has to be variable block,
fixed block works
fine for this because everything is already aligned because you're reading
the VMDKs.
Right, right, right.
Yeah, from my background, it felt like the fixed versus variable block thing was kind
of an old school discussion when the storage array wasn't really involved in the data stream.
But now it is, so it's not a huge deal.
What the variable is really needed for is when your backup software thinks it's talking to a tape drive.
Yeah.
And, you know, so first it backs up the root folder,
and then it backs up the lowest alphabetical folder,
and then it hits stuff it's backed up before,
but that's all shifted over four bytes,
so now it doesn't dedupe anymore.
And so data domain was to address that problem.
But if you're using vStorage API DP,
you're getting everything aligned on 4K blocks already.
You got it.
You mentioned change block tracking.
So, I mean, one of the challenges with backups, obviously,
is this big scan they have to go to find out what's changed.
With VADP, you get sort of a free
view of just the changes to a VMDK or a VM? Yeah, it'll list all the blocks that have changed since
we last grabbed a copy of the backup. But we'll still, when we ingest the data that's changed,
we'll still crack it open, check and see what's changed from a file level, figure out what we actually need to write to disk.
The Flash layer is just kind of a landing pad.
I mean, we keep other stuff at that layer as well.
We keep metadata so that you can search in real time
without having to pull off a disk.
But for the most part, we ingest,
we figure out what we need to write,
then we flush it down to disk.
Yeah.
I mean, the other big challenge with backup, of course,
is how is the recovery instant?
How is that handled? How is it done the recovery instant, you know, how is that
handled? How's it done? I mean, who's got access to it? Is it user level? Is it admin level? vCenter?
Those sorts of things. So what's the recovery environment look like for Rubrik?
This is where it gets a little bit different, I think. You know, you've got your traditional
recovery type, you know, processes and workflows that you have in your head where,
okay, I want a file or a VM, and I'll go through that real quick. So if you wanted to grab a whole
virtual machine out, you can do what we call an instant restore. And that's where you go to,
you know, rubric, you type in the search engine, hey, I want this whatever VM, it's going to list
every point where we have a backup. And at that point, you can kind of click on one and say,
I want the VM at this point, and I'm doing a recovery,
so I don't want the old one anymore.
We'll actually mark the existing VM that's running there as a deprecated virtual machine.
We'll change the name.
We'll power it off.
And then we'll restore the backup that you're restoring from into the environment.
So we just kind of replace the old one with the new one,
or actually the fresher one with the backed up older copy.
Thank you so much. Thank you so much for not overwriting.
Yeah. Yeah. So, and that's helpful because you might, you know, say, ooh, I really porked this thing, but I'm not even sure if the two day old copy is the one I want. So it gives you the
opportunity to kind of make sure that it's the one you want. We'll power it on and it'll run and, you know, typical restore.
Or you can do single file restore.
And that works just like Google search.
You'll type in a search bar.
Hey, this is roughly the name that I'm looking for.
And what I like is that it'll start auto filling in what it thinks you're trying to search for.
Just like Google, you know, try to figure it out.
And it's all, you know, real time, no matter if it's local or in the cloud.
So I've done this a number of times where I'm grabbing
files and it could be forever ago
in the cloud. And that way you can restore
it based on whatever you want to do
with it, into the virtual machine, on your desktop, whatever.
Wait a minute.
Let me go back.
Howard mentioned
something about you not overriding
the current VDMDKs
when you do a restore like that
what does that mean so let's say you wanted to restore a copy of the virtual machine from a week
ago yeah you know typically that would mean you blow away the current one that's running in the
system right right but that may not always be the greatest idea you might actually find out that the
one you restored a doesn't work or b that you left something on the greatest idea. You might actually find out that the one you restored, A, doesn't work,
or B, that you left something on the original one that you wanted.
So it just gives you more choice.
You could just throw it away.
You don't have to keep it around.
Oh, okay.
But we want to keep it safe.
I like the extra step that I have to decide to throw it away,
and I can't throw it away accidentally.
Yeah.
Yeah.
And this is on the primary storage.
We're not just talking...
So the backups are not an issue,
but you were talking the primary storage
at the point of recovery
creates a new VMDK for the VM and fires that up.
Oh, no, we can run the virtual machine's data
right on Rubrik.
So we'll give you the VMDKs directly on Rubrik
running on that flash tier.
Oh, I didn't know that.
Yeah, and then if you want it... if you find it's going to be good.
Just skip the important stuff, Chris. It's fine.
Sorry. When did that show up? Did I miss that?
It's misdirection. I have the rabbit over on my left,
and I'm pointing to this thing on my right.
Yeah, that's also part of that Converge story is, you know,
we got that flash and capacity disk combo.
So we'll run it on our flash for you,
and, you know know you can get
anywhere from 10 to 30k IO on a 2U box. So it's not going to hurt. So you can run it there and if
you decide you like it just storage vMotion it over into your environment. Okay I got you. And
so some portion of that flash is used as a cache for the running VMs? Yeah there's a pretty
significant amount of the SSD left over for running virtual machines
on. Only a small amount has to be used for ingesting data and our metadata and the operating
system, things like that. Right. Right. Because you're scale out. And so that keeps moving across
multiple nodes. You don't have to keep all the metadata in one place. Exactly. So I've said it
gets a little bit different. Here's where it gets a little different.
We also have the concept of instant mount.
And so just like I said earlier, where we can run a virtual machine from a storage perspective on Rubrik,
we can actually kind of do what would be called maybe a clone or a fork of the virtual machine. We can run a completely separate copy and not even touch the primary with the instant mount at any point in time. So you
could say, I have a backup from last night. I want to run a copy of it. We'll actually spin up a copy
of that virtual machine. We'll mount Rubrik to VMware as if it were an NFS data store. We'll
power on the virtual machine, but we'll disconnect its NIC, the virtual machine's NIC, so that we
have an IP conflict. And at that point, you can log in and either change the IP and reconnect the
NIC or just do a test on it, whatever you want. And there's also no limit on the number that you
can spin up. You could say, I want 20 of these things, and we'll go ahead and fire up 20 copies
of the virtual machine for you. And it's space efficient because we only need to write any changes that occur.
So, I mean, you're effectively cloning the VM from a backup that you've created.
Yeah.
I mean, frankly, while these are both great features, we've seen this before.
Yeah.
The guys in Columbus can do this.
Yeah, or whoever they are.
But the problem is that if you used a backup program and a backup target, you always scale the performance of the backup target to match the backup not to match and i have to run
this vm for three days till the weekend when i can afford the storage v motion
and and so the flash in the rubric is new you know i've been telling people for a year or two
the only reason to use the vmware flash read cache is to speed up an instant recovery from your Veeam that's just too slow.
Yeah, yeah.
Yep, and you hit the nail on the head there.
I mean, you're getting really great performance on it,
and we think that for pre-production workloads, dev tests,
you might have some golden backups that you just spin up 30 copies for the day,
let your developers go nuts on them, check in their code, throw them away the next day,
and spin up a whole fresh new set of copies. And you're doing it all on your backup appliance. I
mean, how crazy is that? But you're getting awesome performance, so who cares?
Well, your secondary storage, but secondary here doesn't necessarily mean performance. It's just,
you know, if I need to actually deal with something that failed, those test and dev
machines have to go power down. Exactly. Or you just have the headroom, you know,
you may be running enough nodes that you don't care, you know, unless you're the whole data
center went down and you're like, all right, we need to recover it. There comes a point where I would start accusing you of selling me too much Flash.
Sure.
That, you know, to say, yes, and you can do a recovery when that whole rack failed
and still run your test in depth at the same time, I think that was over-specced a little.
Yeah, it depends on the dev work, right?
Okay, okay, okay.
So when the data is written to the Rubrik appliance, you can have multiple of the appliances.
Is the data replicated across or mirrored across those appliances?
So if one of the appliances goes down, I can still do these sorts of things, or how does that work?
Yeah, and it'll depend on how many nodes you have.
By default, it's a four node in the 2U,
and we triple mirror the data to our SATA disks as well as across the nodes. So when we ingest to
the SATA layer, we'll go ahead and mirror across the three nodes, basically stripe across all three
of the capacity disks so that we get pretty good write speeds, as well as copying all that data to
two other nodes so that you have three
complete copies of the data.
And so as you scale out more nodes, obviously just more targets for that data to be written
to from a replication perspective, yeah.
All right.
Well, triple mirror is good.
It's good.
And so you're also striping across all the three disks in a node.
So in a typical node, there would be three SATA disks and a flash SSD?
Yep.
Yeah, there's two choices.
You can do three disks at four terabytes a pop
or three disks at eight terabytes a pop.
So those are your choices for how much capacity you want, basically.
Right, right, right, right, right, right.
Now, that's a lot of smarts for that much storage
you want more storage or less smarts um both well you know on the one hand on the one hand
having a lot of smarts means you can do a lot of things yeah yeah um you know and i want all
the smarts that i'm willing to get as long as I can afford it. Yeah, yeah.
You know, it's like the relationship between selling price and what you have to pay Intel is disconnected enough that I'm not necessarily making an assumption.
Yeah, yeah.
You didn't mention how much SSD is in that.
Is it a 4-terabyte SSD?
If there is such a thing, it's 400 gigabytes.
It's not an insanely huge SSD.
400 gig per node, along with 3-, 4-, or 8-terabyte capacity drives.
And SanDisk does make a 4-terabyte SSD, but it's got really bad write endurance.
Yeah, no doubt.
Isn't it the price of a Jaguar or something?
No, no, no.
It's under $3,000, actually.
Oh, okay.
A 20-year-old SUV or something like that, yes.
I saw someone posting on Twitter, like, oh, it's the price of a midsize luxury car.
I was like, what SSD is this?
Yeah, yeah.
Fusion IO.
Yeah.
Oh, okay.
Naturally, naturally. Okay, so backup and recovery and replication and instant recovery.
What else do I need?
I need a full-text index.
You want to crack open the files.
For the archive use case, when, you know, in the backup use case,
I care about the catalog i care
about where the data came from in the archive use case somebody from legal just came down to me and
said who was john smith when did he work here and what did he do and so now i need to find all the
files that say john smith inside them yeah i've heard that one a few times. So, I mean, I'll be perfectly blunt.
For right now, we look at the file itself,
but not inside the file.
So if you knew, you know, John Smith's files,
we could grab them.
But if you wanted us to search inside of a Word doc
for the word John Smith, it's no bueno today.
But I think that's a good idea, though.
Yeah, yeah.
Well, in the indexing, there's a cost to that, obviously, but it's all in the cloud anyway.
They got a lot of smarts in those boxes.
Yeah, yeah.
At the very least, if you wanted to search, you know, every, if you knew the file and you wanted to grab a copy of it from six years ago, it's literally type in the search box, click a button on the file, and it comes to you. It's not, let's go find the tape that has the right catalog for that particular year
and then ask Iron Mountain to ship you a tape from 2008.
Although we weren't around back in 2008, so you'll have to do that for a little while.
But you get the drift.
The problem is just I put in the perfect system,
and now I've got at least seven years of running both of them.
Yeah.
You mentioned data compression and deduplication.
So you're compressing the data as well?
That's correct.
Yeah, it all happens when we ingest it.
We pretty much take it and do all the data efficiency procedures
while it's in the flash layer and then write whatever comes out.
So it's kind of
lazy deduplication.
Isn't it always?
The problem is the state of the data, right?
As opposed, you know, it's not
really
post-process because the delay
is measured in seconds
at most.
But it's not really in line either.
Yeah.
The goal is just to make sure that we spend as little time dealing with the virtual machine as possible. You know,
kick off the VADP process, grab that snapped, you know, the underlying VMDK, pull it in as
quickly as possible, you know, avoid stunning the VM for as much as we can, give it back to
the virtual machine, and then we do our thing. And every node can do that. So it's actually a lot less stressful on the virtual machine. If you think
about this, because it doesn't have to go through a proxy to a backup server to a storage target,
we've eliminated a lot of the middlemen so we can reduce the stun on the virtual machine pretty
significantly. Yeah. A question on the appliance. Is it like a dual controller appliance or is it
a single controller environment or what's that look like? It's an infinite controller appliance or is it a single controller environment or what's what's that look like it's an infinite controller appliance no no i understand it's you can have scalable
nodes you can have multiple nodes but if one particular node what does that look like is it
it's four nodes in a 2u it's using the super micro i think it's the the twin pro you know
the pretty standard 2u super micro box it's, right? It's not like a NetApp controller or DMC, you know, where it's got dual heads, that kind of thing.
Every node in the 4, you know, in the 2U has four nodes.
Every node is, you know, a master, quote-unquote.
There's no master realistically.
Right.
So they all share kind of a job scheduling system and then figure out based on the SLAs,
okay, node 1, go back up VM number 3.
Node 2, I'm going to, go back up VM number three.
Node two, I'm going to grab VM number four, et cetera.
Right.
I think the challenge is when you start operating as primary storage,
even if it's only for a certain period of time,
the availability of the node becomes one of the crucial aspects here.
Even though it's scalable and the data is located in three other nodes,
maybe that's enough.
I don't know.
Well, it's better than vSAN.
Oh, well, I won't go there.
Guns blazing already.
Well, if you run vSAN in the default mode,
it's only two-way replication across the nodes.
Yeah, yeah, yeah.
And those nodes are busy running VMware and all those applications,
so they've got a lot more ways to fail.
I'm glad you brought that up, Howard.
I did see some folks saying this is hyper-converged backup or what is hyper-converged.
Hyper-converged would be running the VMs on the compute on Rubrik.
We only run the storage, so you'll still need servers somewhere to run the compute.
You mentioned that converge word. I mean, something like Unitrends,
where when you do the instant recovery,
they run KVM on their box and spin up the VM.
You could call that hyper-converged backup,
but what a stupid term.
Yeah, I agree. I don't like the term hyper-converged backup.
So I've been trying to, I don't know where that came from.
I'm trying to make sure that it dies quickly.
Yeah.
So is there a limit to the number of nodes?
I mean, 64 nodes, is that a reasonable configuration here?
We don't have any published limit.
There shouldn't be a limit.
You know, we obviously, you know, we can only test so much hardware being a startup,
but we've tested quite a few just fine.
Okay, the node address is only 64 bits eventually.
Yeah.
How many bits with 64 bits?
That's a lot of address space.
Okay, another question.
Can one, I'll call it a rubric cluster, can it span vSphere clusters?
Or does it have to be limited to one vSphere cluster?
You can have as many vSphere clusters or vCenters as you want.
Talking to the rubric cluster.
Yeah.
Well, so one rubric cluster can be as big as you want.
But, yeah, you can plug in.
When you go to plug in that association with vCenter,
you can plug in multiple vCenters.
So you can have eight vCenters all talking to one rubric fabric and that's fine.
You could even apply the SLA policy at a cluster layer and say, you know, this cluster's
gold, this one's silver, and delineate with clusters
based on SLA, if that makes sense.
Well, this is the VDI cluster.
Back it up once a month.
Yeah.
Never back it up.
No one likes that cluster.
Or just back up the file server serving those user files.
Right.
Another example.
It's supposed to be stateless.
Yeah, that's the goal, right?
Yeah, yeah.
And you mentioned VSS providers.
VMware snapshots, how does that play into this discussion?
Or you guys pretty much avoid all that?
We shouldn't really ever have to use the VMware snapshot
unless the virtual machine doesn't have any VMware tools installed upon it
or we're just not able to install our VSS provider into the virtual machine.
With VMware tools, it would use a VMware Snapshot.
So the VMware Tools is required so that we can put our VSS provider into the guest OS.
Otherwise, we don't have a vehicle to get in there.
Yeah, but if they just have VMware Tools, then it would create a VMware Snapshot until
you finish copying the change blocks.
Yeah, it'll definitely always look like a snapshot's kicking off in vCenter.
You know, you'll see a snapshot operation occur every time just so that we can shift
the IO off the VMDK and grab a backup, that kind of jazz.
It's just a matter of how do we, how do we quiesce the IO from the application?
And that's where the VSS provider comes into play.
Right.
Well, it's also how you, it's also how you create the stable image
that you're going to back up.
Yeah.
Correct.
Yeah, I just know...
Where do you send the writes while the backup's running?
Is, you know, and...
Yeah, so you'll have that.
When that snapshot kicks off, you'll have that, when that snapshot kicks off,
you'll have that, you know, snap, basically, delta file there
that just writes, you know, it's basically a bit app writing the changes.
Yeah.
That's your temporary holding pit for writes,
and then we, you know, it comes back together
and writes it down onto the underlying VMDK when the snap is done.
Yeah.
And so those transient snaps, as I will call them, from VMware,
they exist only for the moment,
and then they are effectively gone after the backup's completed?
Is that how it works?
Correct. Yeah, as soon as the backup's done.
That's how the storage API works.
Yeah, yeah, I got you.
It creates a snapshot so that the backup engine can mount the VMDK directly from the storage
and it'll be fixed content. Yeah. What about SRM and stuff like that? Do you guys have any support
built in for that? No, not today. You know, it definitely is something that's tickling my brain
as a potential future that you have to have an SRA, you know, plugged into
SRM in order to be one of their targets.
Right, right. Especially
when you start playing primary storage
games and stuff like that. Oh, we're secondary
storage. We're good.
Yeah, sometimes you're secondary
storage. I put my virtual hands up.
Secondary
but not necessarily, yeah, but not
necessarily, you know, down the end of the chain
like secondary with the guy shrugging next to it
like I don't know
look
if you're going to have backup systems
that you're afraid to use
doesn't do you any good
and that's one of the nice things about it that you're afraid to use doesn't do you any good. Yeah.
Sure.
And that's one of the nice things about it.
You can actually, it's not just data that sits there and basically lies dormant forever.
Well, it's not just insurance.
Right.
So you mentioned the ultimate goal of doing this for bare metal
and other virtualization environments.
Hyper-V would be the next one from my perspective
that somehow needs
to be there. You see this playing in that space? Well, I can't give you any spoilers on specifically
what the prioritization of the backlog is, but definitely it has been brought up. Hyper-V, KVM,
bare metal, they're all three things we publicly come out and said we're going to go after those
use cases as time permits, basically. Just because it's rough. You know, you got some products that
are VM only. You got some that are physical that sort of do virtual machines through an agent or
some other crusty stuff. And we'd rather be able to do all of that seamlessly and not require agents.
Yeah, yeah, yeah. It's a bit, it's much more of a challenge in a bare metal environment, in my mind, to try to do this agentless.
Yeah, I'm interested to see how we kick that off.
We've got some pretty brilliant folks there.
Everybody's got a PhD or something.
I'm just the tech evangelist, right?
I just try to disseminate the wizardry that they do and turn it into stuff that we can consume.
I'm sure you guys could talk one-to-one.
Sure. I could just listen upon and gaze.
No doubt.
I still wake up in a cold sweat every once
in a while about
the system that ran
an HR app that Arthur
Anderson consultants created
and
ran Windows NT and nobody knew how
to maintain it at all.
And then Arthur Anderson went out of business.
Oh, God, yeah.
And so there was just this box that we had to figure out how to deal with.
I was so happy when we virtualized that sucker.
Yeah, you would think.
It's like, oh, look, we haven't been able to buy that server model for six years,
and the operating system has been out of support for two.
Oh, God.
Well, if you're having bad dreams or nightmares, I should say, I got one thing that I think you'll appreciate is that as you get a pretty decent backup set,
let's say you've got seven years of backups that you're managing and you've got all these SLAs that you're being held to as a business.
I want X number of dailies, weeklies, monthlies, et cetera.
We'll actually real-time give you a compliance report based on the SLA compliance and policy that you built into the system.
So we'll tell you, okay, over the last seven years, are you compliant?
Where are you not compliant?
What are you missing?
There's no work to do that.
You just go into the system and the report tells you exactly what's going on with
that particular workload. So I thought that was pretty snazzy because there's no, you know,
there's no, it's not just telling you, all right, you have 300 backups or you have backups for seven
years, but you don't know exactly what you have. You can look across the whole ecosystem or across
the whole fabric and say, here's exactly what I have as far as dailies, weeklies, et cetera.
And how does it compare to my SLA?
So has Rubik been around seven years now?
Am I missing something?
Well, I invented time travel.
In seven years, if we buy Rubik today, we will be very happy.
I like where this guy's going.
He's a good guy.
How long has Rubik been out nowadays? So when did you guys get general availability?
GA was in May.
The company was founded January of 2014.
So we're a young, scrappy startup.
Yeah, yeah, yeah, yeah.
We're all hungry and excited.
Mere infants.
That's right.
I mean, it's not bad considering, what is that, 17 months to go GA and then all the features.
No, it's actually quick.
Three, four months. Yeah, yeah, yeah. Okay, we and then all the features 3-4 months.
Okay, we're about to the end of the
podcast. Howard, do you have any final questions
for my man Chris here?
No, I'm just going to miss having
Chris sitting on my left at
field day. What are we going to do now?
We'll have to find somebody else to complain about my
keyboard usage.
I'll keep complaining.
I won't be there.
Alright, Chris, do you have anything else
you'd like to tell the audience?
Definitely come out to
if you're at VMworld, come out to the booth.
I've got stickers and stuff I'm giving away.
Say hi. Tell me what you like and don't like
and we'll have a virtual beer.
A real one.
Speaking of VMworld,
ah.
Greybeards onorage will actually be a session at VMworld this year.
That's correct.
Our friends at VMworld have decided that this podcast is just so wonderful
that at 2 p.m. on Monday, August 31st,
grayhairsonst, live on stage,
will be Ray and I
and a veritable murderer's row
of storage luminaries,
including Paula Long,
who we just could not make wear a beard,
so we had to change the name of the session.
Gray Hairs, yes.
Yes.
Well, thanks for that plug, Howard, and I really appreciate it.
I should have done it myself, I guess.
Well, this has been great.
It's been a pleasure to have Chris with us here on our podcast.
Next month, we'll talk to another startup storage technology person.
Any questions you have, please let us know.
That's it for now.
Bye, Howard.
Bye, Ray.
Bye, Chris.
Until next time, thanks again, Chris.
Hasta luego. Have a good day.