Grey Beards on Systems - 49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to a sponsored episode of Greybeards on Storage monthly podcast, a show where we get Greybeards Storage assistant bloggers to talk with storage and system vendors to discuss upcoming products, technologies, and trends affecting the data center today. This Greybeards on Storage podcast is brought to you by Datrium and was recorded on July 19th, 2017. We have with us here today Brian Biles, CEO and co-founder of Datrium. So Brian, why don't you tell us a little bit about yourself and what's new with Datrium? Hey guys, I'm Brian Biles.

Starting point is 00:00:45 You may have known me as a founder of Data Domain some years ago and have been a founder at Datrium and CEO. And we've been selling our product for about a little over a year. We discussed in the last Greybeards podcast the kind of product that we're building. It's a converged infrastructure with a couple of twists. And if you want, I can go over that quickly. And we're here today to talk about some new announcements having to do with Linux support and balanced scaling. Yeah, I think a short recap of the basic architecture would come in handy. Okay, great.

Starting point is 00:01:26 Unlike, so we're a converged infrastructure. We run storage operations on hosts, kind of in the spirit of software-defined storage. What we do differently is all of the persistent data is stored off-host in a separate kind of enclosure. So hosts scale speed with software that runs on them and local flash drives for high locality for reads uses host local CPU for things like calculating erasure codes and dedup and compression and snapshots and cloning and so on. But all the data is written off-host for persistence to a separate place, so the hosts don't actually talk to each other,

Starting point is 00:02:15 making scaling very easy and linear and predictable because you don't get neighbor noise interrupts across hosts. The hosts can fail in any combination or be brought down for maintenance because that doesn't affect persistent data access unlike a hyper-converged model. But the place you keep all the persistent data while it looks kind of like an array is somewhat less intelligent than that because the hosts are doing all the thinking, right? That's exactly right. So it's first kind of low cost because it doesn't have much CPU to do anything. But second of all, it's just a very scalable model for the CPU required for things

Starting point is 00:02:53 like IO. And as you add hosts, you don't run into the problem that arrays have of controllers bottlenecking on just the compute involved in storage access. So that also plays out in replication and that sort of stuff as well, right? That's right. So last quarter, we announced a whole series of features for VM-specific snapshotting and cloning and replication. All of that uses also the software on hosts to do the work. In replication, it means that all of the hosts are doing the data movement for replication between sites. So if you have some number of hosts go down, replication keeps going, which is different from, again, a hyperconverged model.

Starting point is 00:03:38 That's great. That's great. So what's new with Datrium? Great question. We have two major things that we're announcing. First, on the host side, we've always been VMware-centric and VM-centric. We just announced two pretty significant deltas from that. We're now multi-hypervisor. So we've done partnering with Red Hat and Docker in support of simultaneously being able

Starting point is 00:04:08 to support not only VMware VMs, but also Linux KVM VMs, as well as Docker containers, all with the same data services that we've had for VMware in the past. So snapshotting is on a per VM basis or now per containercontainer. Same with cloning, same with, you know, when you do replication, you set up policy groups of, you know, related VMs or containers, and they all have the same snapshot policy settings

Starting point is 00:04:37 for timing and retention and which replicas, you know, to send data to and that sort of thing. Okay, can I mix some vSphere hosts and some KVM hosts with the same back-end shelf? Yeah, absolutely. The shelf is now called a data node. Let's change in terminology. But yes, that is exactly the idea that a lot of our customers have, you know, maybe a VMware as the driving IT focus, but they might have a, you know, development arm that's doing some Linux based work or container based work. Our new approach allows you to manage all of those from the same pane of glass. So they're they're just VMs.

Starting point is 00:05:23 A given host will be one or the other, right? With, you know, if we're supporting Linux, it's on bare metal. It's not Linux as a guest. So the data node is shared across the mixed hypervisors? And it's a full, you know, we have partnering with Red Hat to support sort of ongoing ideas

Starting point is 00:05:42 about how to interact with all the stack development that they're doing. So all the folks that are taking Linux seriously with a support agreement with Red Hat, this comes as a really pleasant way to consolidate all of those workloads. You guys install your code and look like a block driver to Linux, and then? Like VMware and our approach to that, we look like an NFS share. So we've optimized and tested for virtual disk style files. So, you know, virtual disks for KVM

Starting point is 00:06:15 or persistent volumes for Docker. Not kind of any file, but it uses the sort of NFS friendliness of how to set up a share. Yeah, and certainly Docker and NFS get along very well. Right. So we're really excited about it. We have a bunch of new customer dialogues going on with customers

Starting point is 00:06:36 who not only are focused on containers as an example for lightweight development, but also for, you know, KVM because they have some efforts in doing cloud-based development. A lot of the best work on public clouds is also Linux-based or container-based. So this gives a whole new market access to our kind of simplified data services. Does your system run in the public cloud? I mean, how would that work, I guess? Well, no, it doesn't today. We've announced that we're going to have backup sort of snap archiving in the public cloud at the end of the year.

Starting point is 00:07:19 And that'll be, it just looks like one more replica site to our policy engine for doing snapshots and replication. And that'll be on Amazon. Okay. And that'll also support snapshots of containers or Linux VMs as well. Yeah. Certainly when you say KVM, part of me thinks OpenStack. CinderDriver coming soon, I hope?

Starting point is 00:07:44 We have, you know, we're not announcing it today. We have some investigation going on into it. The fans of OpenStack and Cinder are clear-voiced and I like that term, yeah. have, you know, significant interest in it. It's not as broad as the general Linux space, so we're starting with the sort of bigger territory, and then we'll, you know, see where the action is.

Starting point is 00:08:12 Okay. You know, I did some research recently about data locality in the HCI world, and the more I looked at it and the more I thought about what all the ramifications of trying to keep data local to the host where the VM is running

Starting point is 00:08:31 when that VM is one of 200 clones from a single golden master or you have data deduplication. And the more I dug into it, the more I liked the way you guys manage it, where the, you have the performance advantage because the local flash and the host keeps a copy of the data of those VMs, but you don't have the movement downsides that real HCI does where I, you know,

Starting point is 00:09:01 that host went offline and now I have to rebuild and copy things around. Yeah, I appreciate that. We always thought the same way. If you sort of entangle hosts with all the problems of storage, it seems like you're kind of taking a step backward. So that said, the speed advantages of on-host Flash are significant. So we found our balance was, I don't know, it made more sense. It means hosts can be managed as servers used to be in the array world. You can take them up or down as you need to for maintenance without having to worry about things like rebuilds. Yeah, for me, it was, you know, one of the nicest things about virtualizing was that the amount of state in my hosts went down substantially. And so I didn't have to work nights and weekends to do CPU maintenance. And then HCI seems to have taken that away and I want it back.

Starting point is 00:10:04 Okay, we can help you with that. That was an open question. So Docker containers, I mean, there are gazillions of these things out there. You know, if you have a persistent volume, you know, on a per container basis, I mean, how many of these things can you actually have? Well, actually, yeah, quite a lot. It's sort of an order of magnitude more than VMs, which is maybe an order of magnitude more than physical OSs on hosts would otherwise be. So it's a lot.

Starting point is 00:10:35 Part of the wizardry of our engineering team was to build a data management system that anticipated that. So we can have literally millions of snapshots in our system, and it doesn't affect performance at all. So it's a, you know, you have to think differently from the ground up to get to that. And so all of our, for example, if you're setting snapshot policies, you set them on related groups of VMs or containers. And that can be either as a list or it can be, we say, it has dynamic binding. If by file name, you can set a pattern. And if a new clone comes up that matches the pattern, it'll be automatically bound into the policy that has that pattern. So you don't have to go back and rework your sort of backup policies to deal with all of these small granularity changes.

Starting point is 00:11:30 So it's based on like a nomenclature specification that you could say that, you know, for these guys, you want a snapshot on an eight-hour basis or something like that. And anybody that fits into that pattern, file name pattern, gets that automatically applied. Right, like, you know, database VM star. Yeah. So in the same way, first of all, you need a lot of granularity of snapshot definition and metadata to deal with all of that. But second, you need accelerators like these kind of dynamic binding approaches or search.

Starting point is 00:12:05 In our approach, you can search for any of these names across many, many hundreds of thousands of instances and find them quickly. Yeah, and before you adopted the Linux support, I would have said, but how about tags in vCenter? But cross-platform, that breaks down. So, yep, we'll put it in the name. Yeah, I mean, it's possible. We're also looking at tags as a possible way to do cross-platform. I wouldn't say we've nailed it yet,

Starting point is 00:12:33 but there's some interesting dialogue about how it might be done. Why Docker versus Kubernetes or, you know, container world in general, I say? Great question. And it's really just sort of a timing issue. We are working on Kubernetes. It'll probably come out a little later. But we didn't have the sort of final frosting on it. And that world in general is changing very quickly.

Starting point is 00:13:01 It was only a couple of weeks ago, it felt like the Kubernetes guys was only a couple of weeks ago it felt like the Kubernetes guys put in a couple of things that make our approach fit much easier. So Kubernetes will be following soon. Yeah, but you thought the container world was ripe for this sort of thing? that um i'd say it's uh it's an evolving turf um it it's certainly the right time to get in containers are um you know an emerging certainly you know success story the methodology for using them across different environments is still evolving pretty quickly and the thing that is maybe earliest in the cycle of best practices

Starting point is 00:13:46 is how to deal with persistence. The original idea for containers was, we'll run these ephemeral microservices, so if I need to transcode this from MP3 to MP4, I'll spin up a container, it'll transcode the one file it'll go away and now i'm starting to see applications just distributed as docker containers so that so that the software vendor can not have installation instructions for 14 different linux distributions and not have to worry about all of the dependencies.

Starting point is 00:14:27 Just say, here, it's a Docker container. It'll work. Right. So where some of our just even internal software development tools folks have used it is even in things like just our own test and dev, having their model, they're regularly updating code and then trying a new harness with it. Having our new ability to be able to clone or snapshot persistent volume for a container and have it be in the namespace of all the other hosts in the group, to be able to reuse that and plug things differently quickly

Starting point is 00:15:06 is a major step forward. So they have been doing it with VMs, doing it with containers as a very quickly evolving landscape, and we're happy to jump in. Yeah. So in the press release, I saw there was another feature, split provisioning. I don't quite get it.

Starting point is 00:15:25 So could you tell us what that is? Yeah, as it turns out, that's a very separate topic. All the stuff we've been talking about so far has been on the host side. And if you recall, we have this host side, which does all the data service work and CPU and caching. And then we have a separate layer that does persistence. Right. In the persistence layer, it's a separate enclosure with drives for storing data durably. It also, because that's where the sort of writes go,

Starting point is 00:16:01 that's where the write speed bandwidth is gated. So it's a combination of how much capacity, right now we support one of those things, it has 12 drives, so it can support about 800 megasecond in write throughput, and it's 30 terabytes post erasure coding and spares, so, you know, before deduplication and compression. So we usually say 100 terabytes effective capacity. Yeah, that's probably fair. In our new release, instead of a host doing erasure-coded striping across those 12 drives, it can do an inline load-balanced erasure-coded stripe across 10 of those enclosures. So up to a petabyte of capacity. Those are also, and the stripes are written directly to individual drives. We've got some pretty advanced techniques to make sure all the hosts don't run into each

Starting point is 00:16:58 other. They balance rights across all the drives, but because it's all one group, we have these amazing sort of beneficial side effects. One is sort of obvious, you get a lot of capacity. Another is it's incrementally scalable and in a balanced way. So if you need more, you know, either CPU or re-diops, you add a host. That comes just sort of with every host. If you need more right bandwidth or durable capacity that comes with every data node. The right bandwidth is also a sort of linear growth. So as you as you add more, it just gets faster by that same share. Because all the hosts are working together in some respects for administrative things things like rebuilds use all the resources across

Starting point is 00:17:50 the group so if you have some you know a group that's four times the size of our current product rebuilds go four times faster right so you said stripe across 10 data nodes. I assume you meant up to 10. Up to 10. Yeah, because I don't scale from 1 directly to 10, right? Is it still a 12-drive stripe across those nodes? The stripe is actually 2 parity and 8 data. So it's 10 sort of chunks to ten drives.

Starting point is 00:18:27 Okay. And so then once we get to ten enclosures, the... Well, that's all we've tested to. Right, but at that point, you could lose a whole enclosure and still have eight plus one. In our current sort of failure model, each data node has two controllers,

Starting point is 00:18:50 two small motherboards. The group of data nodes, or a pool, can support up to one controller failure per data node plus two drive failures across the group at the same time. Okay. And those also, they all include NV RAM, so writes are very fast. Right. So I guess the other question, so you guys support NVMe SSDs without any problem, anything

Starting point is 00:19:19 that runs on the host, I guess, right? That's right. We've had production customers for more than a year with NVMe on hosts. Okay, that's great. Collectively also, we've grown the number of hosts. In our first release, it was 32 hosts max and one of these data nodes. In the new release, it's up to 128 hosts across 10 of the data nodes, or up to. Each host is somewhere north of 100k IOPS. So it's like 12 million IOPS in the whole thing. Somewhere north of 8 giga second in write bandwidth. As my uncle Groucho once said,

Starting point is 00:19:58 that's okay for me, but I got a partner. Right. Yeah, it gets pretty big. You said 8 gigabytes per second bandwidth, right? Across the 10 nodes, yeah. Each one is about 800 gig. Oh, that's great. 800 meg, rather, I said. Yeah. Yeah, yeah, yeah. And if I remember you go to market properly, you guys will sell me the compute nodes, or I can use the servers from my favorite server vendor and my favorite SSDs, right? That's right, yeah.

Starting point is 00:20:26 We announced that last quarter. So we offer both a fully turnkey system offering where we supply the data nodes as well as the compute nodes, or the compute node software is available with a compatibility list to all the major vendor, leading vendor servers and their flash drives. Mm-hmm. Yeah, well, that's nice because one of my, you know, my other problem with HCI is you end up paying storage markups for your compute. Yes, you do.

Starting point is 00:21:00 It's priced with a gross margin model of arrays. And even on our compute nodes, we don't do that. Our compute nodes are kind of like a web price for a leading vendor server with a very small transaction fee. Right. And if you add more SSDs, that's up to you. Our license fee is flat per host. Oh, okay. So I buy data nodes, and then I can buy compute nodes, which gives me one throat to choke. Or I can keep using my Dell or Lenovo servers and pay you guys a per host license.

Starting point is 00:21:41 That's right. Cool. So back to this petabyte of storage. And I'm not sure what's the right term. Can a persistent line be up to a petabyte of storage? You mean in a container sense? Well, yeah, even in a Linux KVM sense. Yeah, I don't know the answer to that.

Starting point is 00:22:01 In VMware, a virtual disk can't get that big. Yes, I understand that. That's a good question. I'll look up on containers. I'm not sure. Yeah, okay. Yeah, but it is the VM, not the data store. That's concerned about this.

Starting point is 00:22:18 Right. Right. Yeah, I got you. I got you. Well, it's kind of an off-the-wall question. Yeah, when you go, yes, and I need one VMDK, and it has to be four petabytes. Yeah, we don't run into that often. You might want to look at some application issues there.

Starting point is 00:22:32 Yeah. Yes, it's my roll-the-world application. Yeah. You know, we currently support up to 16 terabytes of raw flash per host, and that's inline deduplicated and compressed and isn't rated. So you really get all of that plus some data reduction factor. Yeah, which really means if that host is dealing with about 50 terabytes of data, it's all going to be read from the local.

Starting point is 00:23:00 It's all fast. And we've had one customer who's in the sort of fast financial calculation world who required that. And so we did that. It's a synthetic limit. We could grow bigger. We just haven't run into accounts that needed it yet. Right. Of course, it could be NBME too, which is... Of course, yeah. And with that customer probably is. No doubt. Maybe. No doubt.

Starting point is 00:23:28 Right, and those are the customers who won't actually tell you as well, exactly. Yeah, I'm not allowed to say who it is. No doubt, no doubt. All right, well, Jens, this has been great. Howard, is there anything you'd like to ask as final question? No, I think we got it covered. I mean, there's a good set of announcements for you guys, Brian. I appreciate it. You've broken the scaling limit. You've added support for Linux directly and therefore KVM and containers. So this is really good.

Starting point is 00:23:59 I appreciate it. And always a pleasure to talk to you guys. Yeah. Brian, is there anything you'd like to say to our audience before we cut off here? You know, check us out. A lot of people think that, you know, hyperconvergence is kind of winding down on R&D. And, you know, that might be true. We're really not hyperconverged, but there's a lot of other stuff going on. So check us out. Yeah, you have an interesting solution that has many of the advantages of hyperconvergence without some of its disadvantages. Yeah. You know, clearly you're not a robo solution, but that's fine. Yeah. And so, you know, we're getting a lot of great pickup from it. Our last quarter, we grew 130% from the prior quarter. There's a lot of good dialogue going on.

Starting point is 00:24:45 Can't complain about that. No, you can't. All right, gents. Well, this has been great. Brian, thanks to you and Datrium for sponsoring our podcast today. Our pleasure. Always a pleasure to talk to you. Our next monthly podcast, we'll talk to another startup technology, storage technology person.

Starting point is 00:25:01 Any question you want us to ask, please let us know. That's it for now. Bye, Howard. Bye, Ray. Until next time.

Your Ad Here

Grey Beards on Systems - 49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.