Grey Beards on Systems - 49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium
Episode Date: August 15, 2017Sponsored By: In this episode we talk with Brian Biles, CEO and Co-founder of Datrium. We last talked with Brian and Datrium in May of 2016 and at that time we called it deconstructed storage. These d...ays, Datrium offers a converged infrastructure (C/I) solution, which they call “open convergence”. Datrium C/I Datrium’s C/I solution stores persistent data … Continue reading "49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to a sponsored episode of Greybeards on Storage monthly podcast, a show where we
get Greybeards Storage assistant bloggers to talk with storage and system vendors to
discuss upcoming products, technologies, and trends affecting the data
center today. This Greybeards on Storage podcast is brought to you by Datrium and was recorded on
July 19th, 2017. We have with us here today Brian Biles, CEO and co-founder of Datrium. So Brian,
why don't you tell us a little bit about yourself and what's new with Datrium?
Hey guys, I'm Brian Biles.
You may have known me as a founder of Data Domain some years ago and have been a founder at Datrium and CEO.
And we've been selling our product for about a little over a year.
We discussed in the last Greybeards podcast the kind of product that we're building.
It's a converged infrastructure with a couple of twists.
And if you want, I can go over that quickly.
And we're here today to talk about some new announcements having to do with Linux support and balanced scaling.
Yeah, I think a short recap of the basic architecture would come in handy.
Okay, great.
Unlike, so we're a converged infrastructure.
We run storage operations on hosts, kind of in the spirit of software-defined storage.
What we do differently is all of the persistent data is stored off-host in a separate kind of enclosure.
So hosts scale speed with software that runs on them and local flash drives for high locality for
reads uses host local CPU for things like calculating erasure codes and dedup and compression
and snapshots and cloning and so on.
But all the data is written off-host for persistence to a separate place,
so the hosts don't actually talk to each other,
making scaling very easy and linear and predictable because you don't get neighbor noise interrupts across hosts.
The hosts can fail in any combination or be brought down for maintenance
because that doesn't affect
persistent data access unlike a hyper-converged model. But the place you keep all the persistent
data while it looks kind of like an array is somewhat less intelligent than that because
the hosts are doing all the thinking, right? That's exactly right. So it's first kind of
low cost because it doesn't have much CPU to
do anything. But second of all, it's just a very scalable model for the CPU required for things
like IO. And as you add hosts, you don't run into the problem that arrays have of controllers
bottlenecking on just the compute involved in storage access. So that also plays out in replication and that sort of stuff as well, right?
That's right.
So last quarter, we announced a whole series of features for VM-specific snapshotting and
cloning and replication.
All of that uses also the software on hosts to do the work.
In replication, it means that all of the hosts are doing the data movement for replication between sites.
So if you have some number of hosts go down, replication keeps going, which is different from, again, a hyperconverged model.
That's great.
That's great.
So what's new with Datrium?
Great question.
We have two major things that we're announcing.
First, on the host side, we've always been VMware-centric and VM-centric.
We just announced two pretty significant deltas from that.
We're now multi-hypervisor. So we've done partnering with Red Hat and Docker in support of simultaneously being able
to support not only VMware VMs, but also Linux KVM VMs, as well as Docker containers, all with
the same data services that we've had for VMware in the past. So snapshotting is on a per VM basis
or now per containercontainer.
Same with cloning, same with, you know,
when you do replication,
you set up policy groups of, you know,
related VMs or containers,
and they all have the same snapshot policy settings
for timing and retention
and which replicas, you know, to send data to
and that sort of thing.
Okay, can I mix some vSphere hosts and some KVM hosts with the same back-end shelf?
Yeah, absolutely. The shelf is now called a data node. Let's change in terminology.
But yes, that is exactly the idea that a lot of our customers have, you know, maybe a VMware as the driving IT focus, but they might have a, you know, development arm that's doing some Linux based work or container based work.
Our new approach allows you to manage all of those from the same pane of glass.
So they're they're just VMs.
A given host will be one or the other, right?
With, you know, if we're supporting Linux,
it's on bare metal.
It's not Linux as a guest.
So the data node is shared across the mixed hypervisors?
And it's a full, you know,
we have partnering with Red Hat
to support sort of ongoing ideas
about how to interact with all the stack development that they're doing.
So all the folks that are taking Linux seriously with a support agreement with Red Hat,
this comes as a really pleasant way to consolidate all of those workloads.
You guys install your code and look like a block driver to Linux, and then?
Like VMware and our approach to that, we look like an NFS share.
So we've optimized and tested
for virtual disk style files.
So, you know, virtual disks for KVM
or persistent volumes for Docker.
Not kind of any file,
but it uses the sort of NFS friendliness
of how to set up a share.
Yeah, and certainly Docker and NFS get along very well.
Right.
So we're really excited about it.
We have a bunch of new customer dialogues going on with customers
who not only are focused on containers as an example for lightweight development, but also for, you know, KVM because they have some efforts in doing cloud-based development.
A lot of the best work on public clouds is also Linux-based or container-based.
So this gives a whole new market access to our kind of simplified data services.
Does your system run in the public cloud?
I mean, how would that work, I guess?
Well, no, it doesn't today.
We've announced that we're going to have backup sort of snap archiving
in the public cloud at the end of the year.
And that'll be, it just looks like one more replica site to our policy engine
for doing snapshots and replication.
And that'll be on Amazon.
Okay.
And that'll also support snapshots of containers or Linux VMs as well.
Yeah.
Certainly when you say KVM, part of me thinks OpenStack.
CinderDriver coming soon, I hope?
We have, you know, we're not announcing it today.
We have some investigation going on into it. The fans of OpenStack and Cinder
are clear-voiced and
I like that term, yeah.
have, you know, significant interest in it.
It's not as broad as the general Linux space,
so we're starting with the sort of bigger territory,
and then we'll, you know, see where the action is.
Okay.
You know, I did some research recently
about data locality in the HCI world,
and the more I looked at it
and the more I thought about
what all the ramifications
of trying to keep data local
to the host where the VM is running
when that VM is one of 200 clones
from a single golden master
or you have data deduplication.
And the more I dug into it,
the more I liked the way
you guys manage it, where the,
you have the performance advantage because the local flash and the host keeps a copy of the data
of those VMs, but you don't have the movement downsides that real HCI does where I, you know,
that host went offline and now I have to rebuild and copy things around.
Yeah, I appreciate that. We always thought the same way. If you sort of entangle hosts with all
the problems of storage, it seems like you're kind of taking a step backward. So that said, the speed advantages of on-host Flash are significant.
So we found our balance was, I don't know, it made more sense.
It means hosts can be managed as servers used to be in the array world.
You can take them up or down as you need to for maintenance without having to worry about things like rebuilds. Yeah, for me, it was, you know, one of the nicest things about virtualizing was that the amount of
state in my hosts went down substantially. And so I didn't have to work nights and weekends to do
CPU maintenance. And then HCI seems to have taken that away and I want it back.
Okay, we can help you with that.
That was an open question.
So Docker containers, I mean, there are gazillions of these things out there.
You know, if you have a persistent volume, you know, on a per container basis,
I mean, how many of these things can you actually have?
Well, actually, yeah, quite a lot.
It's sort of an order of magnitude more than VMs, which is maybe an order of magnitude more than physical OSs on hosts would otherwise be.
So it's a lot.
Part of the wizardry of our engineering team was to build a data management system that anticipated that. So we can have literally millions of snapshots in our
system, and it doesn't affect performance at all. So it's a, you know, you have to think differently
from the ground up to get to that. And so all of our, for example, if you're setting snapshot
policies, you set them on related groups of VMs or containers. And that can be either as a list or it can be, we say, it has dynamic binding.
If by file name, you can set a pattern.
And if a new clone comes up that matches the pattern,
it'll be automatically bound into the policy that has that pattern.
So you don't have to go back and rework your sort of backup policies to deal with all of these small granularity changes.
So it's based on like a nomenclature specification that you could say that, you know,
for these guys, you want a snapshot on an eight-hour basis or something like that.
And anybody that fits into that pattern, file name pattern, gets that automatically applied.
Right, like, you know, database VM star.
Yeah.
So in the same way, first of all, you need a lot of granularity of snapshot definition
and metadata to deal with all of that.
But second, you need accelerators like these kind of dynamic binding approaches or search.
In our approach, you can search for any of these names across many, many hundreds of thousands of instances
and find them quickly.
Yeah, and before you adopted the Linux support, I would have said, but how about tags in vCenter?
But cross-platform, that breaks down.
So, yep, we'll put it in the name.
Yeah, I mean, it's possible.
We're also looking at tags as a possible way to do cross-platform.
I wouldn't say we've nailed it yet,
but there's some interesting dialogue about how it might be done.
Why Docker versus Kubernetes or, you know, container world in general, I say?
Great question.
And it's really just sort of a timing issue.
We are working on Kubernetes.
It'll probably come out a little later.
But we didn't have the sort of final frosting on it.
And that world in general is changing very quickly.
It was only a couple of weeks ago,
it felt like the Kubernetes guys was only a couple of weeks ago it felt like the Kubernetes guys
put in a couple of things that make our approach fit much easier. So Kubernetes will be following
soon.
Yeah, but you thought the container world was ripe for this sort of thing? that um i'd say it's uh it's an evolving turf um it it's certainly the right
time to get in containers are um you know an emerging certainly you know success story the
methodology for using them across different environments is still evolving pretty quickly
and the thing that is maybe earliest in the cycle of best practices
is how to deal with persistence.
The original idea for containers was,
we'll run these ephemeral microservices,
so if I need to transcode this from MP3 to MP4,
I'll spin up a container, it'll transcode the one file it'll go away and now i'm starting
to see applications just distributed as docker containers so that so that the software vendor
can not have installation instructions for 14 different linux distributions and not have to
worry about all of the dependencies.
Just say, here, it's a Docker container.
It'll work.
Right.
So where some of our just even internal software development tools folks have used it is even
in things like just our own test and dev, having their model, they're regularly updating code and then trying a new harness with it.
Having our new ability to be able to clone or snapshot persistent volume for a container
and have it be in the namespace of all the other hosts in the group,
to be able to reuse that and plug things differently quickly
is a major step forward.
So they have been doing it with VMs,
doing it with containers as a very quickly evolving landscape,
and we're happy to jump in.
Yeah.
So in the press release,
I saw there was another feature, split provisioning.
I don't quite get it.
So could you tell us what that is?
Yeah, as it turns out, that's a very separate topic.
All the stuff we've been talking about so far has been on the host side.
And if you recall, we have this host side, which does all the data service work and CPU and caching.
And then we have a separate layer that does persistence.
Right.
In the persistence layer, it's a separate enclosure with drives for storing data durably.
It also, because that's where the sort of writes go,
that's where the write speed bandwidth is gated.
So it's a combination of how much capacity, right now we support one of those things, it has 12 drives, so it can
support about 800 megasecond in write throughput, and it's 30 terabytes post erasure coding and
spares, so, you know, before deduplication and compression. So we usually say 100 terabytes effective capacity.
Yeah, that's probably fair.
In our new release, instead of a host doing erasure-coded striping across those 12 drives,
it can do an inline load-balanced erasure-coded stripe across 10 of those enclosures.
So up to a petabyte of capacity. Those are also, and the stripes are written directly to individual drives. We've got some pretty advanced techniques to make sure all the hosts don't run into each
other. They balance rights across all the drives, but because it's all one group, we have these amazing sort of
beneficial side effects. One is sort of obvious, you get a lot of capacity. Another is it's
incrementally scalable and in a balanced way. So if you need more, you know, either CPU or
re-diops, you add a host. That comes just sort of with every host. If you need more right
bandwidth or durable capacity that comes with every data node. The right bandwidth is also a
sort of linear growth. So as you as you add more, it just gets faster by that same share. Because
all the hosts are working together in some respects
for administrative things things like rebuilds use all the resources across
the group so if you have some you know a group that's four times the size of our
current product rebuilds go four times faster right so you said stripe across 10 data nodes.
I assume you meant up to 10.
Up to 10.
Yeah, because I don't scale from 1 directly to 10, right?
Is it still a 12-drive stripe across those nodes?
The stripe is actually 2 parity and 8 data.
So it's 10 sort of chunks to ten drives.
Okay.
And so then once we get to ten enclosures, the...
Well, that's all we've tested to.
Right, but at that point,
you could lose a whole enclosure
and still have eight plus one.
In our current sort of failure model,
each data node has two controllers,
two small motherboards.
The group of data nodes, or a pool,
can support up to one controller failure per data node
plus two drive failures across the group at the same time.
Okay.
And those also, they all include NV RAM, so writes are very fast.
Right.
So I guess the other question, so you guys support NVMe SSDs without any problem, anything
that runs on the host, I guess, right?
That's right.
We've had production customers for more than a year with NVMe on hosts. Okay, that's great. Collectively also, we've grown the number of hosts. In our
first release, it was 32 hosts max and one of these data nodes. In the new release, it's up to 128
hosts across 10 of the data nodes, or up to. Each host is somewhere north of 100k IOPS.
So it's like 12 million IOPS in the whole thing.
Somewhere north of 8 giga second in write bandwidth.
As my uncle Groucho once said,
that's okay for me, but I got a partner.
Right.
Yeah, it gets pretty big.
You said 8 gigabytes per second bandwidth, right?
Across the 10 nodes, yeah. Each one is about 800 gig. Oh, that's great. 800 meg, rather, I said.
Yeah. Yeah, yeah, yeah. And if I remember you go to market properly, you guys will sell me the
compute nodes, or I can use the servers from my favorite server vendor and my favorite SSDs, right?
That's right, yeah.
We announced that last quarter.
So we offer both a fully turnkey system offering where we supply the data nodes as well as the compute nodes,
or the compute node software is available with a compatibility list to all the major vendor,
leading vendor servers and their flash drives.
Mm-hmm.
Yeah, well, that's nice because one of my, you know,
my other problem with HCI is you end up paying storage markups for your compute.
Yes, you do.
It's priced with a gross margin model of arrays.
And even on our compute nodes, we don't do that.
Our compute nodes are kind of like a web price for a leading vendor server with a very small transaction fee.
Right.
And if you add more SSDs, that's up to you.
Our license fee is flat per host.
Oh, okay. So I buy data nodes, and then I can buy compute nodes, which gives me one throat to choke.
Or I can keep using my Dell or Lenovo servers and pay you guys a per host license.
That's right.
Cool.
So back to this petabyte of storage.
And I'm not sure what's the right term.
Can a persistent line be up to a petabyte of storage?
You mean in a container sense?
Well, yeah, even in a Linux KVM sense.
Yeah, I don't know the answer to that.
In VMware, a virtual disk can't get that big.
Yes, I understand that.
That's a good question.
I'll look up on containers.
I'm not sure.
Yeah, okay.
Yeah, but it is the VM, not the data store.
That's concerned about this.
Right.
Right.
Yeah, I got you.
I got you.
Well, it's kind of an off-the-wall question.
Yeah, when you go, yes, and I need one VMDK, and it has to be four petabytes.
Yeah, we don't run into that often.
You might want to look at some application issues there.
Yeah.
Yes, it's my roll-the-world application.
Yeah.
You know, we currently support up to 16 terabytes of raw flash per host,
and that's inline deduplicated and compressed and isn't
rated. So you really get all of that plus some data reduction factor.
Yeah, which really means if that host is dealing with about 50 terabytes of data,
it's all going to be read from the local.
It's all fast. And we've had one customer who's in the sort of fast financial calculation world who required that. And so we did that. It's a synthetic limit. We could grow bigger. We just haven't run into accounts that needed it yet.
Right.
Of course, it could be NBME too, which is...
Of course, yeah.
And with that customer probably is.
No doubt.
Maybe.
No doubt.
Right, and those are the customers who won't actually tell you as well, exactly.
Yeah, I'm not allowed to say who it is.
No doubt, no doubt.
All right, well, Jens, this has been great.
Howard, is there anything you'd like to ask as final question?
No, I think we got it covered.
I mean, there's a good set of announcements for you guys, Brian. I appreciate it. You've broken the scaling limit. You've added support
for Linux directly and therefore KVM and containers. So this is really good.
I appreciate it. And always a pleasure to talk to you guys.
Yeah. Brian, is there anything you'd like to say to our audience before we cut off here?
You know, check us out. A lot of people think that, you know, hyperconvergence is kind of winding down on R&D.
And, you know, that might be true. We're really not hyperconverged, but there's a lot of other stuff going on. So check us out.
Yeah, you have an interesting solution that has many of the advantages of
hyperconvergence without some of its disadvantages. Yeah. You know, clearly you're not a robo solution,
but that's fine. Yeah. And so, you know, we're getting a lot of great pickup from it. Our last
quarter, we grew 130% from the prior quarter. There's a lot of good dialogue going on.
Can't complain about that.
No, you can't.
All right, gents.
Well, this has been great.
Brian, thanks to you and Datrium for sponsoring our podcast today.
Our pleasure.
Always a pleasure to talk to you.
Our next monthly podcast, we'll talk to another startup technology, storage technology person.
Any question you want us to ask, please let us know.
That's it for now.
Bye, Howard.
Bye, Ray. Until next time.