Grey Beards on Systems - GreyBeards deconstruct storage with Brian Biles and Hugo Patterson, CEO and CTO, Datrium

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Greybeards on Storage, a monthly podcast, a show where we get Greybeards Storage Assistant bloggers to talk with Storage Assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This is our 32nd episode of Greybeard in the Storage, which was recorded on April 29, 2016. We have with us here today Brian Biles and Hugo Patterson, CEO and CTO of Datrium. Brian, why don't you tell us a little bit about Datrium? Sure. Thanks for having us. First, just a little bit about our background. I was a

Starting point is 00:00:45 founder of a company called Data Domain that got bought by EMC a couple of years ago. Hugo was the chief architect there, and he can tell you a little bit more about his background. So we all left EMC after a couple of years. Hugo and I and his sort of second in command, a guy named Cezala Reddy, had agreed to try to think of something to do together afterward. While we were taking our time to do that, we were contacted, actually, Hugo was contacted by Diane Green, who, you know, from VMware days, now running Google infrastructure. She had a couple of guys that had been working for 12 years at VMware as their sort of VP level principal engineers, who had just gone through a cycle of trying to design a database

Starting point is 00:01:27 and finding that storage performance was holding them back. It was too sort of inelastic for all the properties they could think of for their database. So they'd like to develop a new storage system. They saw a gap in the market. One thing led to another. We hung out for six months, and Daytrium happened. That's a little bit of background on me and that story. Hugo? Yeah, I've been in the storage area for a long time, starting with a PhD

Starting point is 00:01:53 from Carnegie Mellon and then went to NetApp and built SnapVault there and, you know, really helped get the near store product line going and then join data domain and serve there as chief architect and then CTO. So I'm excited by this new opportunity. I think I never imagined we could do something as comparable to, to that, but I actually, you know, now that we're into this, I think Datrium is even more exciting. So I'm really happy about it. So this is kind of an outgrowth of a Google problem with their database?

Starting point is 00:02:34 No, no, no, VMware. And it wasn't even a VMware problem. Ganesh Venkateshwaran and Boris Weissman were the two guys at VMware that we met. They were just leaving VMware. They had been building a database project and that product came out. They were just frustrated in their interactions with possible storage systems that storage was kind of too rigid for all the things they thought a database should

Starting point is 00:02:56 be able to do. So that was their starting point. So Flash doesn't really fix everything? Well, it didn't at that time. And still, depending on your implementation, there are trade-offs in different kinds of array structures. So we were looking at something that could do the job a different way. We had come from a couple other debates that were also, I think, formative. I was really interested when it first came out in Fusion.io, but I kept wondering, where's the storage system that goes with that? Where's the data services and how do you orchestrate a cluster to work collaboratively and so on? And it just didn't feel like there was an early answer.

Starting point is 00:03:36 They had other problems with commoditization. And what do you do when the server crashes? It was always my problem is servers are inherently unreliable. That's right. You can put an SSD in a server and, you know, that thing will go pretty fast. But yeah, it's just a storage device. It's not a storage system. And so you don't get the enterprise, you know, storage features that most people want. Okay, so what's different about Datrium then? SAN or Ethernet, and then on the other side of a controller, just defeats the whole point of the low latency flash. So flash belongs in the server. But really what Datrium is about is, you know,

Starting point is 00:04:33 turning that flash in the server into an enterprise class storage system. So adding the resiliency that tolerates host failure, adding data management features to it, and, you know, really turning just what would otherwise just be a storage device into a storage system. And in the process of doing that, you know, we learned, figured out some other kind of interesting things like, gee, those hosts have a lot of cores. And really, that's a lot cheaper place to do storage processing than in a siloed storage control. In a virtualized infrastructure environment, the whole idea is to have a nice, flexible, expandable pool of compute resources. And in the face of that you the standard array

Starting point is 00:05:28 architecture plonks down a single purpose inflexible uh very expensive storage controller and that just seemed like not a great fit just to put this back into perspective this was 2012 when we were starting to have these discussions. A lot of things in other parts of infrastructure and technology were moving to take further advantage of growing numbers of cores and RAM and virtualization. You know, we were talking about in-memory databases basically moving all reads to be local from RAM, but still writing to something off-host. Network function virtualization, you know, bandied around then by other names, but still kind of in a nice era, moving more of the control plane of networking into host software, but still leaving, you know, switching on appliances and TCAM if needed and so on.

Starting point is 00:06:23 So it seemed like, you know, done in the right way, there was a lot of impetus to reconsider storage, and Flash in particular was an interesting point of inflection. But we were concerned with how to both keep some of the dependable properties of enterprise storage for quality of service, for troubleshooting, for, you know, management, you management, not taken in a direction that wouldn't allow it to scale in a very manageable way. Okay.

Starting point is 00:06:51 And what did you end up with? So the Datrium DBX product really has two pieces. And so the first piece is a hyperdriver software, which gets installed as a vib on a VMware ESX host. And this software leverages local flash within that host to serve IOs at super high speed. And it also organizes the data for compression and deduplication and RAID and then writes it out to the second piece of the product, which is an external net shelf storage appliance. Basically, it's a network attached shelf of drives, but with a little bit of smarts in there to make sharing among the hosts nice and smooth. So the hyperdriver does most of the computing that you normally think of as being done inside of an array controller, but none of the durable storage is there.

Starting point is 00:08:01 All of the durable capacity is in an external dedicated net shell. God, it's almost like you're taking this thing and taking an inside out view of this thing. It's the latest thing in the culinary world. We deconstructed storage. It is. It is. You've, God. Well, you know, everybody else was in a couple of years back. They were all talking about Flash and the server as a caching layer. And there was some smarts that was going to be developed for it across different hosts and everything. But all the real smarts was in the storage. You've kind of moved all the smarts back to the host, but the only thing that's in the storage is the persistence layer. Is that a read?

Starting point is 00:08:42 Pretty much. Just a footnote to that. The net shelf is an appliance. Mechanically, it's a little bit like an array. It has dual controllers for HA. It has NV RAM for fast write persistence. We took advantage of the basic tricks. And hot swap everything for supportability, maintenance, and so forth. Right. But what it doesn't have is all the sort of compute cores and RAM and software that you'd normally put in an array to do the heavy lifting.

Starting point is 00:09:14 All of that moved to hosts to use local flash for caching and local cycles for processing, which makes the speed grow as you add hosts instead of, you know, a normal array where the speed is divided if you add hosts. Yeah, and that's a big deal these days because storage computing is so much more involved than it used to be. You know, it used to be an array controller kind of mapped directly to physical spindles and, you know, it computed parity for RAID 5 or something, or maybe not even, and it just did mirroring. But now we have Reed-Solomon coding for RAID 6. Data gets compressed. It gets fingerprinted for deduplication and, you know, snapshotted and all the rest of it. And all of these operations are a lot of sense now to leverage that same compute pool of the hosts for these operations. And even with the storage controllers getting beefier and

Starting point is 00:10:35 beefier, we still have the problem that with a lot of systems, you can't turn all the features on without running out of CPU overhead. Of course. I mean, there's always a limit. I think that there's also an important observation that comes in having the external device. So it's nice that we can scale these things the way we do. Once you go to write, we really wanted to incorporate the levels of isolation, not just the locality of Flask,

Starting point is 00:11:01 but the isolation of data elements that you see in a sort of bigger scale environment. And by that, I mean, it didn't make sense to us any way we added it up to do writes to other hosts. It made sense to write that to a separate place off host, because otherwise, you know, there are many examples today of emerging product that don't go that way. We felt that it was hard to get management isolation for troubleshooting and performance tuning and so on if you were to get the neighbor noise of post writing to each other. That's one of the principal reasons to have a separate device to do.

Starting point is 00:11:36 So does the back end have flash or is it a disk base or is it both? And does it support DRAM caching? And how do you do snaps snapshots because snapshots is kind of a you know a management level functionality but it involves a lot of metadata stuff. Start from the basics the net shelf is disks it has a little bit of flash just for NVRAM vaulting in case of a power outage but it's not writing or reading from flash in a normal IO event. And it's really not where the caching happens. So there's not a huge DRAM cache. It's really just, you know, you want to store stuff persistently on disk,

Starting point is 00:12:14 you write to it, you want to read from the disk, you read from it. The Flash on the hosts and RAM on the hosts is really where all the caching action happens. And so all of those IOs are absorbed by the host and they don't have to go over the network at all. So that's a huge win from a performance standpoint. Right. And just to put a little bit of a packaging face on that, when we sell a system, you get this net shelf and up to 32 hosts worth of licenses for the hyperdrivers that run on servers, which is as much as support today. You don't have any fees for the hyperdrivers. We don't sell the flash on hosts. You buy those from your server vendor.

Starting point is 00:12:59 So we don't sell the servers. We don't sell the servers either. So it's meant for existing infrastructure with the assumption that you're probably going to get flash cheaper from your server vendor than you are from any array vendor. It's all commoditizing at this point. And we don't want to get in the way of that. We want to promote it. So if we support up to eight SSDs or eight terabytes of flash per host. Yeah, you're also going to get those servers cheaper than you would from a hyper-converged vendor. Right. Yeah, certainly the margins for some of the hyper-converged appliances look a bit steep. Well, they're trending toward array margins, right? Right. We decided that that business was just going away. It was going to be commodity.

Starting point is 00:13:43 It was going to be server vendor territory, and we should support that and make just going away. It was going to be commodity. It was going to be server-vendor territory, and we should support that and make it work better. Yeah, so the 8 terabytes of flash are the read and write cache at the host layer. At the host. Yeah. At a host. At a host, and it could be any number of virtual machines, obviously, supporting that.

Starting point is 00:13:59 Do hyperdrivers support VMware, vSphere? Does it support things like other hypervisors? Not today. Today, our entry product is just VMware. Okay. I'm a little confused about the persistence model now. Yeah. So, Ray, you mentioned that the write cache was on the host.

Starting point is 00:14:18 And really, while written data is added to the host side cache, all writes go through to the net shelf. Synchronously. Synchronously. That's right. So there's NVRAM on the net shelf so that write latencies are low. Performance is good. But every single write is persistent on the net shelf before it is acknowledged to the VM. Okay, so the host is a write-through cache. That's correct. Okay. And that means there's no unique data on the host, so if a host crashes,

Starting point is 00:14:53 we haven't lost anything. That's correct. In fact, if all hosts crash simultaneously, you haven't lost anything. Okay. Frankly, that, I've discovered in testing, is a problem with some of the hyper-converged solutions. They don't properly deal with the data center-wide power failure problem. I defer to your testing. You know, NVRAM, and I think it's an important part of what an enterprise storage system is. And so our NetShelf appliance includes it. And in that sense, provides really that kind of availability and durability that you expect from an enterprise storage system that, you know, just a software running on a generic server really can't achieve. So the protocol between the host and the net shelf is your own protocol?

Starting point is 00:15:52 Is it like iSCSI or is it? No, it's our own. In the hyperdriver, the software we run on the host, that presents itself to vSphere locally looking like NFS, like an NFS map. So we capture the data locally with NFS, NFS terminates there, and then we use our own protocol to write to the disks on the back end. Part of the reason for this is we do all the RAID computing on the hosts, so we had to be able to identify each drive more or less directly.

Starting point is 00:16:23 Okay, so what happens when there's a drive failure in the net shelf? Does a host have to do the rebuild? There's two parts of that. So if the drive has failed and it hasn't been reconstructed yet, then on the fly, the host will have to do the rebuild for that data. It also does various kinds of error correcting and so forth. But then to reconstruct the failed drive, yes, the RAID is a distributed process. And so all of the hosts would participate together to reconstruct any failed drives. Okay. It's a nice benefit.

Starting point is 00:17:02 You know, the bigger the cluster, the faster it goes. Right, right. failed drives. Okay. It's a nice benefit. You know, the bigger the cluster, the faster it goes. Right. Right. And it's a many to many rebuild from many engines. Right. Correct. Very unusual. Quite frankly,

Starting point is 00:17:13 I, I, I see some performance advantages. I see some performance disadvantages from doing that sort of rebuild. I understand the scalability aspect of it, but you end up moving the data to and from the storage device to the host and back again. Yeah, it's going to generate a lot of network traffic.

Starting point is 00:17:28 Right, right, right. The following question was a snapshot. So do you support snapshots? Probably the first question. And the second question is, where is that metadata, if it is metadata management, done? The current release doesn't support snapshots. We couldn't get it in in time for our first release, but it is in active develop now and the system was architected for it from the beginning. So it should be coming soon. Yeah, but we do have clones. We do have clones,

Starting point is 00:18:00 correct. And we support the AI interface for that. So in that sense, we support that in the VMware environment. As far as the metadata for it, we have a log-structured file system, and so it's sort of right into a new location. We don't do the kind of read, move, write kind of style of some, some systems. And so the tracking of the data in the snapshots is done in the same way for, you know, as normal, right. So it's all of this. I mean, today the metadata along with everything else is stored in the net shelf, as is the namespace where NFS file service. There is, though the NFS is terminated on each host, it is one data store for the whole cluster.

Starting point is 00:18:57 So if things like vMotion work seamlessly, metadata is stored with the data in the nutshell. So a log structured file system, to some extent, you're maintaining some pointers or some free space kind of thing. Is that sort of maintained across all the hosts? I mean, obviously, the metadata has to be in the storage itself. But if I'm doing a rebuild, for instance, I have to know which devices are currently working, which ones aren't currently working working and where the new data is going to go and all that stuff. Yeah. I mean, that's a lot of the secret sauce of the is exactly how that is handled. And the key thing is that we've figured out a way where, you know, you can have one host doing a write and another host could be reading the same data and the hosts still don't have to talk to each other to coordinate to get this to happen.

Starting point is 00:19:55 Not that that's a problem in the vSphere world because one host owns one VMDK. That's right, but you may vMotion it from one to another, and then another host needs to be able to quickly pick up where the other one left off. But yes, you're absolutely right. There's not concurrent writing to the same vDisk. But to lay the data out on the disk and so forth is all done in a way that doesn't require host chatter back and forth. And this is really why we have proprietary protocol. Yeah, I'm seeing something with a logical log for each host and the net shelf allocating space. Yeah, so we have a different kind of naming for data that makes sharing much easier. Okay.

Starting point is 00:20:47 SANS block addresses, the traditional SCSI kind of interface, was not designed with sharing in mind. And it really was designed for a single writer. And so it's not a good fit for a product like this. No, although, you know, an NFS is better, but... NFS is better, but it's the wrong level, right? So we have more like a volume management kind of internally. Right, but you get the additional data that NFS is giving you, even though you use it differently. Yes, that's right. From a VM management perspective, we have a lot more insight into what a VM is and so forth. And this allows us to report, for example, stats on individual VMs.

Starting point is 00:21:37 So we have a very VM-centric kind of management model. You can see IOPS and latency and so forth on each individual VM, which makes it a lot easier to tell what's going on with your environment. Sure does. So when you implement snapshots, I assume that you're going to do that on a VM, not a data store level, right? Correct. Okay. And do you know if the first implementation of that will support the vVols UI in SPBM? Or if you were attaching with NFS, you don't have to. You could kind of fake it the way Tintree did. That's right.

Starting point is 00:22:21 NFS, if you're smart about it, you don't really need the vBalls. It's all there available for you. And already in our performance reporting metrics, we take advantage of that. So no, it won't support vBalls, but you'll have all the same kinds of functionality and more. Okay. What happens when I outgrow a net shelf? In our first product, you just, you know, it's not growable. Next year, we'll add both capacity and write speed expansion.

Starting point is 00:22:55 That's not. Okay. And how much capacity do I get in a net shelf? It's about 30 terabytes usable post, you know, and spare allocation and so on. It's a global dedupe domain. So all writes are compressed and then there's dedupe on top. So you figure with thin data, maybe 100 terabytes.

Starting point is 00:23:19 Depends on the dedupe and compression ratio. So it's a global dedupe realm managed across N hosts. Hash management becomes interesting. Since I jumped into that tarpit. There are two dedupe domains. So just to start with compression, all data is compressed in all places, on the flash, in the host, as well as across the network and in the nutshell. There are two dedupe domains. The host flash domain is deduped inline and is local to that host. We don't have a global pool of host flash and try to dedupe across the hosts.

Starting point is 00:23:59 So it's local and inline. The writes are, to the net shelf shelf compressed but not yet deduped. The global compression on the net shelf is done as a post-write space reclamation process that takes a few hours after the write is laid down. Oh, okay. So it's deduped on the host at local level and then it's written to the back end and then deduped as a space reclamation solution, garbage collection kind of thing. Yeah. Totally. Yeah.

Starting point is 00:24:31 In a sort of post-process. You know, within a few hours, within a day, it doesn't take that long. Yeah. It does complicate capacity planning, though. A little bit. But it's worth understanding. So, you know, we always make sure we explain it. We actually have quite a, say, sophisticated space reporting to simplify that.

Starting point is 00:24:53 So you really don't have to be too expert in it. We have a view to how much space we'll get back, and we take that all into account when we present the space available. And so actually in practice, it's pretty straightforward. Yeah, it was helpful in this case to have not only Hugo, but his following replacements from data domain. Deep knowledge. I kind of assume Hugo understands deduplication. He's only got, what, half a dozen patents on it, Hugo? Half a dozen dozen, yeah.

Starting point is 00:25:25 I was actually in a meeting a couple months ago where they said that you can't do dedupe on disk. I said, what about data domain? Give me a break. Yeah. I would say we're among the very, very few products in primary storage that dedupes by default on disk. Yeah, I would say so. A lot of them have it as an option, and you have to think about it pretty carefully. In our case, it's, you know, all the time. You think that's because you've got all that CPU cores that are available

Starting point is 00:25:56 at the host that you can tap into? Doesn't hurt. Yeah. So that's definitely part of it, but then also understanding the intersection of disk performance and deduplication based on 10 years of building and supporting this kind of systems. Yeah, that might help too. And so these are 7200 RPM disks, right? Correct. And so there are some corner cases where you could really hammer those positioners. Yeah. And so it's a RAID 5 backend or is it RAID 6? RAID 6. But backend, of course, not really the right word here. Yeah. I'm trying to get my handle on what it actually is here.

Starting point is 00:26:37 The data is stored in RAID 6. Data is stored with RAID 6, with each host writing the full RAID 6 stripes. Okay. And that post-process dedupe uses the host processors, right? Naturally. Yeah, we have a distributed execution engine, if you will. So it's basically a way that we can leverage the host to do these kinds of storage tests and whether we use it for running the space reclamation process, but also the disk rebuild process, but also various other kinds of things. You guys mirror data across, I mean, do you support disaster recovery scenarios kind of thing? We currently don't have our own built-in replication.

Starting point is 00:27:27 We're partnering with vendors that do in the short term. In the same time frame that we're talking about snapshots, we'll start to introduce some of those features. Okay. And then so we've got RAID rebuild jobs and space reclamation jobs that are allocated to the hosts, are you load balancing that and taking into account the other CPU load on the hosts? Yeah, so in general, the hyperdriver constrains itself to a particular CPU footprint, and it doesn't exceed that amount, that slice that is allocated to it. So you won't see the hyper driver like taking over your host and the VMs suffering. You just, you know, there's a slice allocated for us and we live within that.

Starting point is 00:28:27 Okay, Yuko, you've got to tell us how big that is. It's about 20% of available cores, and that's done by reservations. If it turns out we're not using them, then VMs can use them. But we have dibs if we need them. Okay. Optionally, you can increase that to insane mode and go even faster. Right. That would be insane.

Starting point is 00:28:51 But it's kind of cool, actually. So worldwide averages for VM CPU utilization are about 30%, 35%. So this won't be for everybody. Some people try to go to much higher levels. Yeah, but most people run out of RAM before they run out of CPU cycles. Yeah. So if you have capacity, we give you an optional button to go to insane mode, and we would reserve 40% of the cores. And you shouldn't do that if you're using more than 40% for your own VMs. But if you have the space, that allows us to, you know, and you have the offered load for storage IO processing, we can go much

Starting point is 00:29:26 faster by doing it. So in this environment, is there a master host configuration? I mean, you talk about this distributed task engine and that sort of stuff. No, it's the net shelf. It's the net shelf that has the distributed task engine that's divvying up the workload that's required to all the hosts that it has access to. And it knows kind of what those hosts are doing on a periodic basis so it can determine which ones are the ones that are idle. Didn't sound like it is that sophisticated yet, Ray. So it does divvy the jobs up to all of the hosts that are part of the DVX cluster. But, you know, it does spread the load over those hosts

Starting point is 00:30:09 pretty much evenly with them, you know, according to the resource allocated to them. It doesn't yet, it's not too sophisticated yet about the load on the host, but that's really because we don't expand to use more of the host CPU than what's been allocated. Yeah, I only meant to say as a sort of simplifier for coordination, the net shelf is a place where we can sort of centralize some of that feature set. So we don't have to ask the admin to make decisions about which host to

Starting point is 00:30:45 put it on and how they all you know interact so the dvx cluster does it have to span the whole vm v-sphere cluster or can it be a portion of the v-sphere cluster or can it span v-sphere clusters so it can be any any subset of vms that you want i mean not vms hosts that you want. I mean, not VMs hosts that you want. Okay. So the vMotion environment that could span across vSphere clusters then, almost, if you did it, right? Oh, so that's a vSphere... Limitation, yeah. Yeah, but it's like, to vSphere, it would look like an NFS mount point on a NetApp. You could have one cluster that uses it or two.

Starting point is 00:31:25 But just thinking about vMotion, obviously it's a very different context than an array in a lot of ways. But one of the things that this model allows us to do is sort of dynamic adaptation of speed for performance, you know, quality of service reasons. We mentioned insane mode. That's sort of a dynamic button on a single host. Our most normal sort of planning discussion with customers is about vMotion. If you run out of performance in this model, you're typically running out of either CPU or Flash. You can add Flash, but the easiest thing to do is look around the group

Starting point is 00:32:04 for a host that has headroom and just do a v motion standard v motion the data in the transition we've done one optimization normally hosts don't talk to each other in this case they talk a little when you v motion from an originator to a destination host the destination will do reads from the origin host's flash cache until it warms up. So it warms up the cache right off the old cache? You're kidding. That's pretty good. Yeah. Well, it cuts down a lot on the demand on the disks in the net shelf. Absolutely. That's right. So it's a little bit of transient host-to-host traffic, but it goes away. The physical interface on the net shelf is 10 gigabit Ethernet?

Starting point is 00:32:46 Correct. So how much memory does the hyperdriver consume? I mean, you talked about up to 8 terabytes of flash, but you didn't mention DRAM. It's 7.5 gig of RAM for the first 1 terabyte of flash, and then 2.5 gig for every terabyte thereafter. Need room for those hash tables. Basically, yes.

Starting point is 00:33:08 Yeah. And that seven and a half includes all the sort of fundamentals, you know, dealing with RAID and there's all the sort of standard operation. Yeah, and running the VMs, managing, you know, all of the structures for each, the many VMs. Right, right. And how many 10-gig ports on the net shelf? Two per controller, and there are two controllers.

Starting point is 00:33:33 And even I can do that math, four. Yeah. As we all can, Howard. I'm just saying. Okay, so we talked about snaps. We talked about replication. We talked about QoS. You mentioned there was some QoS capabilities in the system, but at a VM level, are you able to?

Starting point is 00:33:53 Yeah, not maybe in the way that you're asking the question. I mentioned it in a general sense. We can, because the hosts are configurable specifically, if you have a host that's doing databases, you can add more CPU and RAM as well as more flash. For hyperdriver, yeah, yeah, yeah. So in a particular configuration, you can customize the storage compute and read capacity quite a bit. Separately, if you run out of either of those things for storage, it's a very simple transition to either dynamically use more CPU with this sort of insane mode approach, or the motion to a host with more headroom.

Starting point is 00:34:36 And in this respect, quality of service, because it's more dynamically manageable, I didn't mean it in the sense of setting like an upper or lower limit. Yeah. manageable i didn't mean it in the sense of setting it like an upper or lower limit yeah yeah i meant it more in the sense of uh simple uh manageability it's manageable and tunable i calling it qos is a little bit of a stretch yeah no no no it's it it it's better because you know in a traditional array where you're taking the resources of of a controller and you want to sort of protect the performance of of a client right of either one server or one vm well first of all in a traditional array they have no idea what a VM is, so just forget that part. They don't even typically know about servers, but they may know about LUNs. They may know about volumes.

Starting point is 00:35:32 But in any event, you end up trying to tune settings to reserve resources for this or that application. Maybe you pin this or that LUN into RAM or something like that. And so you're having to go in and manage it like that. Whereas here, sort of just by virtue of the host not sharing, not pooling across them, whatever resources are on that host are used for that VM or that set of VMs. And you don't have to go tune some other knob, right? And so first of all, you add more hosts, you get more of these resources. But whatever resources are there on that host, they're for

Starting point is 00:36:21 the VMs that they're hosting. And so it's a very simple correspondence of resources to application that's using it that's just way easier to think about. So VMware is, you know, tries to train people to think about what a VM is as, you know, some number of cores and some amount of RAM. And if your host doesn't have enough cores or RAM, then, you know, some number of cores and some amount of RAM. And if your host doesn't have enough cores or RAM, then, you know, vMotion the VM to some host that does. And we extend that model to storage to say, well, if it doesn't have enough storage performance, that is, it doesn't have enough of a local flash cache, then move it to a host that does have enough. Or cores. Or cores, right?

Starting point is 00:37:12 So manage your storage performance as part of VM performance generally in the same way that you already manage VM performance. So you just do it with using the exact same kind of thinking and mechanisms, and you don't have to become expert in in an arcane art of of array you know tuning right but we've spent 30 years getting expert in the arcane art of array tuning and we are that's why we're graybeard i would say also just on on you know the the marketing use of quality qos mostly that where i've seen that, that's been like in all flash arrays where you set, you know, there's some peak that's possible and you can de-rate against that peak. You know, you can find some lower number that you can set as your target SLA. In this case, you can actually go higher by going somewhere else. So it's a very different way of thinking.

Starting point is 00:38:01 Now you've got me wishing that DRS and storage DRS could take 20 more parameter inputs from stuff like yours to make those decisions. We remain cautiously optimistic that we can make that case with VMware. Yeah, and it's not even unique to you. It's just storage DRS makes its decisions entirely based on latency and DRS just looks at CPU and RAM. And, you know, anytime you start saying, and I've got a flash resource in the server. Right. Then when I'm deciding I want to move things around, I might want to move things around because I'm out of flash on this server and I want to move it to the one that's got a lot of flash on that server. And so now I want DRS to be smart about those resources too.

Starting point is 00:38:46 Right. All right. We are running out of time here. Well, just one final thing on that same vein is there's the amount of flash, but there have a huge amount of flash that is super high performance and other hosts can have garden variety consumer SSDs. And you can choose, you know, which host to host which app. And in the DVX model, that kind of heterogeneity, that kind of variety among all of the hosts is very natural and easy. And just a final commercial element, because we only do caching on the host instead of durable data, it works great with blade servers that only have two drive bays, which is very difficult for

Starting point is 00:39:38 some of the other sort of emerging models that try to use host resources like a hyper- Yeah, I remember having interesting Twitter conversations about vSAN on Blade servers being a bad idea and getting the fanboys going, but you could do it. Could and should are different, guys. Right.

Starting point is 00:39:56 We're a great model for those environments. Howard, do you have any last questions? No, I think I got it. We live in interesting storage times. God, I'll have to say that, yeah. Ten years ago, it was like, well, do you want the dual controller modular array or do you want the big thing? It was really the only architectural decision you had, and now a thousand flowers are blooming, and I like it.

Starting point is 00:40:22 Yeah, you're right. They have deconstructed storage and reconstructed in a whole different world here. I applaud you guys for doing this. Let's, let's hope it, it, it makes it for you guys.

Starting point is 00:40:35 Well, I appreciate the thought. Yeah. Thanks for the time today. Well, this has been great. It's been a pleasure to have Brian Hugo with us on our podcast next month. We'll talk to another storage startup technology person. Any questions you want

Starting point is 00:40:46 us to ask, please let us know. That's it for now. Bye, Howard. Bye, Ray. Until next time, thanks again, Brian and Hugo. Thanks, guys. Thank you.

Your Ad Here

Grey Beards on Systems - GreyBeards deconstruct storage with Brian Biles and Hugo Patterson, CEO and CTO, Datrium

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.