Grey Beards on Systems - Graybeards talk hyper-convergence with Kelly Murphy, Founder & CTO, Gridstore

Starting point is 00:00:00 Hey everybody, Ray Lucchese here and Howard Marks here. Welcome to the next episode of Greybeards on Storage monthly podcast show where we get Greybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technologies and trends affecting the data center today. Welcome to our 14th episode of Graybeards on Storage, which was recorded on October 31, 2014. We have with us here today Kelly Murphy, CTO of GridStore. Why don't you tell us a little bit about yourself and your company, Kelly? All right.

Starting point is 00:00:42 Thanks. Thank you for having me on the show today. As Howard knows, we are the leading manufacturer of chocolate-covered pickles. What are chocolate-covered pickles, by the way? Congratulations, Kelly. You managed to get two of my favorites into the one sentence. And contrary to popular opinion, they taste great. Oh, God.

Starting point is 00:01:08 I was in Florida and there was, what is it? Tomatoes. Fried tomatoes. Fried green tomatoes. Oh, God. Back to the technology, Kelly. Right. So we are a leading

Starting point is 00:01:23 manufacturer of all-flash hyper-converged infrastructure. All-flash hyper-converged infrastructure. I've heard a lot about hyper-converged infrastructure, but very few talk about all-flash these days. So why don't you give us a little idea of what you're doing there? Well, before Kelly goes on, the chocolate-cover covered pickle line comes from a blog post of mine where about a month ago before anybody announced an all flash hyper converged system i said i didn't think it was a great idea so and then there was at least two announcements yes that was a little plug for howard that's good that's good um But yeah, so what we have done is we've taken

Starting point is 00:02:07 probably the two hottest trends or concepts or directions in storage and put the two together. And when you think about it, what we've really done is put Flash as close to the workload as possible. So when you hyperconverge, you're putting compute and storage together into the same box. You're eliminating a lot of complexity and cost. And, you know, from a performance perspective, what we've done is put Flash as close to that workload as possible. So it's, you know, and there are a number of challenges

Starting point is 00:02:40 with that, but when you actually see the results, it's a very powerful architecture. And this flash is like SSD flash or PCIe flash or both or neither? Correct. Yeah, correct. So if you're familiar with like the VMware Evo product and the kind of architecture behind that, so the appliance is four independent compute, all connecting out over Ethernet. They're high-powered compute nodes. And 24 1-terabyte SSDs in the front of the chassis.

Starting point is 00:03:15 Six of those for each of the nodes. So you've got six terabytes per node. And we can grow that from kind of a starting position of that single appliance of four nodes up to 64 of those appliances or 256 independent nodes. So you can kind of scale that compute and storage into something that looks like a very powerful compute pool and storage pool. So the storage SSDs are separate nodes in your configuration? Is that how you say it? No, they're combined. So the compute, you know, in our software architecture, we have, and this is, you guys, you've seen the architecture before.

Starting point is 00:03:57 So we have a, we call it virtual controller. That sits in the host or on the compute side. That's what we use to access the distributed pool of storage that's coming from other nodes in the grid. On each of those, call it storage nodes, when they were separate, we have another process that's running there that's listening for virtual controllers making requests to it. We effectively put those two pieces of software now on the same host or compute node. One is worrying about handling the storage requests coming in. The other is about handling the storage requests coming from that particular host that it's actually running on.

Starting point is 00:04:37 So effectively, you've got the storage and the compute all in the same node running both client and server software, as it were. Correct. In a configuration and that sort of stuff. And both run on the Windows Server R2 2012. Oh, okay, okay, okay. Kelly, I think we should take a step back a little. Sure. And talk about, A, the fact that you guys really are Windows-centric and the storage architecture, because you guys have only been in the hyper-converged market a little while.

Starting point is 00:05:10 You were selling the same technology as a scale-out storage cluster. Correct. Let's talk about how that works a little. Yeah, so it is the exact same architecture. I mean, one of the interesting things that we've had a couple of things that we've been working towards getting to this position, but the underlying architecture that we have started with is exactly the same. Okay, so as Ray kind of pointed out, there's a client portion to this and a server portion, and they're distributed, right? So it's a true peer-to-peer architecture. Prior to the hyper-converged, as I said, we would put that

Starting point is 00:05:51 virtual controller on a number of hosts that want to access storage. What that presents, and it runs in the Windows kernel, it presents a local SCSI device, a block device to that host. The host sees it, it can put its file system on it, it can put a clustered shared volume on top of that, and that device can be shared by many different hosts. So it handles the SCSI reservations and so on. Behind that can be any number of storage nodes, up to 256 of storage nodes, up to 256, uh, independent storage

Starting point is 00:06:27 nodes, which again are just, uh, in the old model, they were one-use servers and we still have these. They were one-use servers, you know, had between 12 to, uh, we have a two-use unit that has 48 terabytes of capacity in it. And they were hybrid, uh, storage nodes. And each of those guys would contribute their storage into a pool, and then the virtual controllers, when they write data, are breaking data up. They use erasure coding to write to those nodes. So say if they're writing to six nodes, they might

Starting point is 00:06:57 break that up into six parts, encode those parts so that they could lose two of them, okay, and then distribute the six parts in parallel across six nodes. And that would be, you know, how it would write. When it would read, looking for that block, it would talk to those particular six nodes, put those pieces back together again, and serve them up to the host. And that gave us a very scalable architecture in that we could start with a minimum of three of these nodes and then grow that, as I said, as far up to 256 nodes in a single pool.

Starting point is 00:07:33 Yeah, and as George and I proved by blowing one of them up with some thermite... Wait a minute! Did you get this on video? Oh, yeah, yeah, yeah. You haven't seen it yet? I will guess I'll have to go back and take a look. No, it's GridStore's CEO, George Simons, and I went out in the desert and pretended we were the Mythbusters

Starting point is 00:07:55 and took out one node of a four-node cluster while it was running. And because the virtual controller runs on the compute node, the whole detection of failure and failover happens much faster. Yeah, you would think. And so... Yeah, and so it's a resilient, fault-tolerant architecture. We can lose multiple nodes concurrently. We could lose a connection, a network connection, a disk drive, you know, or entire nodes as some people like to pour thermite onto them.

Starting point is 00:08:31 Just in case, right? It makes good video. Oh, yeah, yeah. HP did the same thing with, like, you know, a data center, I think. They took out a data center, which was, you know. Well, they used actual explosives. Yeah, well. I don't have those licenses.

Starting point is 00:08:46 Thank God, Howard. I'm not sure I trust my podcast with you if you did and stuff like that. And I think Howard's done this before because, from what I know, he had to go a long way out into the desert in order to do this. So I think the authorities are watching Howard closely. From now on, yeah. Unfortunately true. Yeah, probably for more than one reason, Mike. So you're now using a super micro twin-two style cabinet

Starting point is 00:09:13 like EvoRail does, where there's four nodes in a two-U cabinet. Correct. Of the 24 terabytes of raw capacity, how much of that do I get to use? That depends on, again, how wide the stripes are. But if you, say, just took that starting unit, that starting appliance, you could use three of the four. So you would have three data nodes and one redundant node. So 75% of it. Okay. Well, that's certainly better than EvoRail where I could use half if I liked two-way mirroring and I don't.

Starting point is 00:09:50 Yes. Or if you wanted a three-way mirror, you'd get a third of it. Right. Yeah. So I guess it begs the question, and we've perhaps talked about this before, but why just Windows? Why not VMware or KVM and the other players in that space? These are some of the bets that an early stage company makes. So when we came to market, there was a lot of players already in the VMware space.

Starting point is 00:10:18 And for the things that we wanted to do, which was to operate in the kernel, which is essential. When you see the architecture, which was to operate in the kernel, which is essential. When you see the architecture, that's an essential component of it, is to operate at that level. And Windows, as an architecture, is very well suited for that. It has all the proper APIs and hooks to allow you to do that. And so we were able to achieve what we wanted to, but when you look at it from a go-to-market perspective,

Starting point is 00:10:53 we were able to achieve what we wanted to, but when you look at it from a go-to-market perspective, we were able to bring something to market that is focused on the Windows ecosystem where we had a lot less competition to deal with. And it's really that simple. And now with the hyper-converged, the all-flash hyper-converged, we've taken that up another level. And there isn't really another product in this class. There's one other competitor that we deal with with a similar product, but we're very competitive against. And just so it makes the playing field a lot smaller for us to deal with. So do you use any of the, you know, so in Windows Storage 2012, they've got a lot of new capabilities in SMB and stuff like that. Do you use any of that

Starting point is 00:11:26 stuff? We, you know, this is a philosophy of ours is that, you know, this is really storage that is built for virtualization, okay? And as a whole architecture now, when we put the compute and the storage together, it is really built to give you a scalable, elastic infrastructure for a highly virtualized environment. And part of that, everything that we do is managed top-down through System Center. So that's the first aspect. When you start to talk about the scale of these environments and the complexity in these environments, you need a single place to manage that. And for us, that's system center. The other thing that we've seen happen and we believe is the right course

Starting point is 00:12:10 is that a lot of the traditional data services that used to live in the traditional storage array, okay, on the other side of the network that really had no concept of what's happening in the virtualized environment, those services are moving into the hypervisor or up into that layer. And they're being driven from, in the case of Windows, from the Windows server or the hypervisor. And you're able to drive those on a per VM basis. So you can replicate a VM, you can snapshot a VM,

Starting point is 00:12:39 you can dedupe a VM. And we believe that's the right way. Instead of duplicating that functionality in the hardware, what we've focused our engineering effort on is leveraging all of those types of services that are coming from the hypervisor and focusing on how we can provide a truly elastic infrastructure that's very cost-effective and high-performance and simple to use. Yeah, I was thinking more like the SMB Direct and, you know, the RDMA kinds of capabilities that are out there and stuff like that. Do you use any of that stuff? It's our own protocol that's connecting between the two. So, you know, to that host, as I said, we provide what looks like a SCSI block device, right?

Starting point is 00:13:24 Very standard, okay? It supports the SCSI 3 reservations and so on. But behind it, it's our protocol, okay? And this gives us a number of advantages. You know, most protocols were not designed for virtualization, right? Or for scale-out, for that matter. Or for scale-out, yeah. So you're trying to cobble, you know, something from a few decades ago into these architectures.

Starting point is 00:13:49 You just you know, there's just a lot of workarounds. And a good example of what we're able to do is we can see where the IOs are actually coming from because we're in the kernel we can see you know block by block which vm is actually producing these ios and we create an isolation lane for each virtual machine and we continue that lane that isolation all the way from the host uh through to the underlying storage nodes that it's talking to and we literally have like a pair of sockets, inbound, outbound sockets for each of these VMs. So we truly keep the IOs separate across these. That alleviates one of the biggest problems in virtual environments, which is the IO blender and all this IO blending together. You have no visibility, and if you can't see it, you can't control it. And that's really the second capability that we bring is once we're able to isolate that I.O. into these separate lanes, then we can actually turn a dial and say, okay, I'm going to throttle this guy down, and I want to give this other one more. So we can provide a true end-to-end quality of service for each of these virtual machines.

Starting point is 00:15:03 And when it comes to these very scalable environments, and this is, I think, a very important point, is that when you build a, you know, if you rolled out the infrastructure, as I talked about, say, you know, 256 of these nodes, that's an awful lot of compute power and an awful lot of storage power if it's all Flash. That's a petabyte and a half of Flash.

Starting point is 00:15:25 But it's this giant machine, right? If you can't control how you want to allocate that resource, it's kind of pointless, right? So if you want to run, you know, hundreds or thousands of virtual machines on top of this, you need to be able to allocate that resource in real time where you want it to. So if you think about service providers and stuff like that who want to offer different levels of service on a truly shared infrastructure, that's an essential capability for these guys to have so that they can actually make a promise,

Starting point is 00:15:54 they can guarantee the delivery of that, and then over time they can actually track that that was actually delivered and show that result back to their customers. Yeah, yeah, yeah. Okay. Now, we've been talking about the all-flash version. You guys haven't decided that you're going to pull a Nimbus and stop selling spinning

Starting point is 00:16:16 disks, right? No, we have spinning disks. You know, so we have, and, you know, just to step back, one other aspect is the, you know, one of the challenges that we've seen with hyperconverged as a general concept, the early version, you know, VMware would be included in that, is that you're forced to scale your compute at the rate of storage growth. And worse, you know, multiply that by three if you have three replicas. So you're growing in this unit that doesn't make sense. And you guys know you've been, you are the gray beards and you know storage has grown whatever, five or six times the rate of compute for the past number of decades. That's only accelerating. That's not slowing down, right? And so it doesn't make sense that you would actually scale your infrastructure in this way.

Starting point is 00:17:17 Because of our architecture, we can scale the storage element of this independently. So we don't have to bind them together, right? And that's where we came from. That was our architecture from the start. We could separate these components or we can run them on the same piece. It's a peer-to-peer architecture. They're just IP addresses to us in a network as to where the locations of those pieces are, and it can pull them together. Whether it's on its own node or five others somewhere in the grid. And so that allows us to scale the storage independently of the compute. And the second aspect is we have different flavors of that storage. So we have an all-flash capability.

Starting point is 00:17:53 We also have a hybrid capability, so a mix of spinning disk and flash. And we have pure SATA as well. So think about the tiers of storage for the different workloads or different aspects of those workloads or, you know, moving things over time. You might grow those pools in various sizes to match what your requirements are over time. Yeah. I mean, the challenge was having, you know, A, separate storage than compute and B, having multiple types of storage configurations, I'll call it, is that there's a is that you add a lot more complexity in configuring the system and stuff like that. How do you guys handle that aspect of this? Because, I mean, with all the various, you've got at least,

Starting point is 00:18:34 well, you must have at least four different nodes, right? Yeah, there's three, really. The flash node is the same as, you know, that is your flash tier. Okay, so if you have a hybrid, you have a second pool. If you have pure capacity, there's a third pool. And they just grow in pools. And so because everything is driven through System Center, when you want to provision a VM,

Starting point is 00:19:02 you just say what type of storage does it want for this VHDX. It will find and put that into the correct pool and allocate some space for it. So, yeah, it's actually very minimal setup. The setup is actually one of the things that we've really focused on is making that simple. And once it's set up, it becomes completely automated. Yeah. One of my pet peeves with

Starting point is 00:19:28 hyperconvergence is that it doesn't provide file access on most systems. If you run EvoRail, that storage is for those VMs. And then they say, and you'll use it for VDI. And I say, where do I put my documents directory? And they say, you you'll use it for VDI, and I say, where do I put my documents directory? And they say you need a net app for that, or you run a pair of Windows servers to be the file servers. Can I run the file services on the root partition of your nodes, or do I have to run a VM to be a file server? No, you can run, you know, we can run on bare metal, right?

Starting point is 00:20:06 So it doesn't, we don't, I mean, we are actually running at bare metal. So we're running on Windows Server itself, which is presenting the disk up to the hypervisor. Equally, it can present disk to a scale-out file service, right? So a number of those nodes, if you want it to be configured that way, you could have an option of putting an SMB server onto that and scale out file services, as well as the hypervisor, or you might just leave them independent. So yeah, you can create a pool of just file storage, no problem. Okay, well, you know, that brings us to the delicate question.

Starting point is 00:20:43 How much does this marvelous hunk of technology cost? You know, they're actually very affordable. You know, you asked the question earlier about the hybrid, which we do sell, obviously, and that's been kind of where we came from. But what we are seeing is that we've brought the cost of flash so close to the hybrid that people are automatically jumping to the all flash. So there's probably about a less than a $10,000 difference between the two nodes and slightly less capacity. Okay. But from a price point, what they're seeing is I can actually get into all flash today. And then I can grow this incrementally into the future as I need more. So our nodes are very cost effective.

Starting point is 00:21:33 On a list price basis, they're around $45,000. Per node? Per node. So that's a high-end dual Xeon, 24 cores, 256 of RAM, a couple 10-gig connections in it, and 6 terabytes of flash on a per-node level. Well, that's an interesting number because it puts your all-flash configuration just about even with the Super super micro version of the EvoRail, which we can only assume is going to perform less well once you get to where your data exceeds the size of the flash in it. Right. Yeah, and we can scale from there. The architecture allows you to kind of keep growing this.

Starting point is 00:22:26 You know, one of the one of the limits, if you look at the difference between our architecture and VMware EvoRail in particular, you know, is that we don't use the three way replica and we don't need that three way replica. What that translates to is that we literally use 50% less of the nodes, okay? Because it's very interesting now because you can see an apples-for-apples comparison here, okay? So we're using the same chassis, as you pointed out, same amount of disks that can go into those chassis, same disks can go into the chassis, right? We use less than half.

Starting point is 00:23:03 And we're not talking about just replicating capacity here, right? When three-way replicas came about, they were based on the principle that capacity is cheap. Spinning disk, cheap SATA disk, if you think about

Starting point is 00:23:19 the Web 2.0 companies and so on, disk was relatively the cheapest resource in the infrastructure. So, okay, we can replicate that. In no small part, it was that, you know, one terabyte disk capacity was cheaper than the software to manage it better. Correct. And now today, when you look at this, okay, there's two things that have changed. One, we're using all flash.

Starting point is 00:23:43 Okay, so flash is not cheap. And if I want to use a three-way replica and throw away two-thirds of that Flash, that's an expensive thing to do. But worst, I'm actually now replicating the entire infrastructure stack on top of this. That's the whole server, the whole license stack. You think about the number of cores and stuff that you're replicating there. Think about putting Oracle on this and the cost of that. Right, but in a hyper-converged environment, I'm using those resources to run my compute as well.

Starting point is 00:24:15 Although, I mean, I have to tell you that... But you may not need them. Right, and frankly, the point you're making is exactly why I called all-flash hyper-convergence a chocolate-covered pickle. If flash capacity is a major component of your total cost of goods, then doing three-way replication on flash just doesn't make sense to me. It's just wasteful. And so this is the key tenant of our architecture, is that we use erasure coding. That's why I said earlier, the fact that we sit in the kernel, we get the block right at the source, we can perform the erasure coding there, we can distribute them and parallel

Starting point is 00:24:55 the fragments. You know, we end up using 50% less infrastructure, we can give you the exact same usable capacity, and the exact same protection level, right? Two failures. But we use 50% less of the stack, right? And that's a very significant cost up front, as well as ongoing operating cost, and especially as you start to do this at scale. Yeah, so you guys don't do any sort of what I'll call data reduction for flash or anything like that? It's just a straight Flash block that's being written, although it's erasure-coded and stuff like that? Yeah, we do that, and then you have the option.

Starting point is 00:25:31 You can run Ddupe on the back of that if you wanted to. So if it makes sense on the data set that you have to run Ddupe, you can do that. But that's the Windows Ddupe, right? Correct, right, but we're in the same box now, so it makes perfect sense to leverage that. Right. It's just a post-processing, you know. Correct.

Starting point is 00:25:50 It's not world-class dedupe. You might argue with Microsoft on that, but that's a different discussion. Yeah, I'm going to say, you know, at the same time, I mean – I would argue that one right from the beginning, especially in an all-flash environment because doing the dedupe inline means that you avoid writing to those disks and you extend their write endurance and maybe that means you can buy the next step lower SSD. So there's lots of good reasons. Yeah. Well, I think there's two ways to look at it, Howard. A, you can look at the cost, and you can see that we're at a very aggressive price point. We made it actually affordable for people to use All Flash. But we're making a different bet, which if you look at the past

Starting point is 00:26:43 two to three years, the all-flash companies have invested a colossal amount of money in trying to perfect these techniques, okay? So it's all in line. If you talk to Pure Storage or, you know, one of the early all-flash companies, you know, there are like five different ways that they do things. And there's a lot of processing going on, okay? Yes. In this, you don't even have the option to turn them off, right? Whether that processing makes sense for the working set, right? If it's a transactional processing, right? There's very little reduction in that, okay? There's very little dedupe in OLTP. There's some compression, okay? But you don't have a choice. So you're doing all this extra work

Starting point is 00:27:27 adding all this extra overhead adding all this extra cost on top of the you know um underlying flash resource you know and we're making the assumption that look at if you just look at economic curve of anything uh in in computing right these costs are coming down. They're coming down very aggressively. Not to mention that since you're hyper-converged, every CPU cycle you use to run the storage is one that you don't have available to run the guests. Correct. So you don't want to be stealing cycles. You want to be minimizing how you utilize that. And that is an important point. And so if you make the assumption that over this next couple of years that that flash resource is going to be getting cheaper and cheaper and cheaper,

Starting point is 00:28:11 and we've seen that happen already in the last three years, okay, investing heavily in those kind of dedupe technologies, if you want to call it that, or data reduction technology, is the wrong place to invest your money right now, right? Because it's going to get cheaper. And what we've focused on is how do we do this at scale, okay? How do we leverage the fact that I actually want to spread this resource around, you know, in a very flat infrastructure, okay? So there's a lot behind what we've actually been able to do, and that's where we've been investing our money as opposed to what I think is going to be a redundant technology in a couple of years' time.

Starting point is 00:28:49 Yeah. I guess the question I would have, you know, getting off a different tangent is, do you guys support things like snapshots and replication of data and things like that? Yes, yes. So again, all of that is coming from the Windows server itself. So you're using DRS replication kinds of things?

Starting point is 00:29:12 Is that? Correct. You can do replicate a VM snapshot. We also do work if we're outside of that environment because, again, we can work on physical servers. So we do also embed DoubleTake in those situations. Joy of working in the Windows environment. You've got an ecosystem to play on.

Starting point is 00:29:33 Yeah, and then it's a matter of what makes sense for the customer. Right. Okay, so what about performance? Do you provide some sort of statements of performance in this type of environment? So how many VMs, for instance, can I run on a four-node cluster, hybrid and or flash kind of thing? That would depend what the VMs are doing. Obviously, we haven't as of yet gone and done performance benchmarks,

Starting point is 00:30:02 so any of the ones that I know you're familiar with. So we're early in that process. But you do know where to find me. Yeah. And we will get to that. But one of the interesting things when we made this move is that when we actually talk to customers and they see, okay, it's all Flash and it's close to the workload,

Starting point is 00:30:25 nobody asks us what are the number of IOPS? What are IOPS going to do? That question, and I'm quite conscious of it, seems to have disappeared. We used to get that asked all the time when we're talking about hybrid, where you get into all these tricks of, well, what's the size of the working set? If the working set? If the working set sits in the cache and it's, it's cacheable, uh, you know, your IOPS are X, but what happens when you go outside of this working set? Okay. And then it starts to fall off a cliff and, uh,

Starting point is 00:30:54 you get the performance of spinning disc and, you know, it's all these questions. And right now people don't ask us, they just see it's all flash and go okay that's going to be fast that's an interesting market observation you know it kind of jibes with one i've been saying for a while which is that budgets are binary yeah and so if people can't afford the all flash solution they're buying the all flash solution not based on need but just based on it fits in the budget just like just like if you can afford a mercedes sl 500 you don't drive a hyundai right um and it comes down to your budget and if you can afford it and and i think this is this is my reading of it is that people see the hybrid being a a stopping point on the way to all flashflash. It's an affordable option, okay?

Starting point is 00:31:46 So as you said, that's the Hyundai version right now. And now what we're making is an affordable Mercedes, if you want to call it that, right? Where people would love to get into that, and they see that as being future-proof. That was the C-Class, right? Okay. I don't drive a Mercedes, so I wouldn't know.

Starting point is 00:32:04 So speaking a Mercedes, so I wouldn't know. So speaking of Mercedes, what do you consider your target market for your solution? We are predominantly selling to mid-sized enterprises, mid-sized to large enterprise. But very interesting, and this is one of the things that we're doing, is we're working. So we kind of have this idea of you know you have building blocks and they're like Lego building blocks I like to tell the story of you know when you were a kid you'd have those basic building blocks and you'd kind of put something together

Starting point is 00:32:33 you're building a dog and you'd go and show it to your mama and she'd look at that and say that's a nice dinosaur you'd be a little bit disappointed right? Remember when Ray and I were kids there were red bricks and white bricks oh god i can't remember that far back quite frankly the whole the whole lego star wars thing is that yeah well that's so now now your son walks up to you and he's got the millennium falcon built out of lego right and so what we're doing is actually working with partners on the go-to

Starting point is 00:33:07 market to really build in uh solutions onto a set of blocks so that to the point where you can literally stand up a solution a great example of that we're working with a company called aprenda who sell a platform as a service focused on DevOps. They're selling predominantly to Fortune 100 companies. So it's kind of like a puppet in a box? Yeah. Cool. And so they're selling to J.P. Morgan, Goldman Sachs, Boeing.

Starting point is 00:33:41 They're very focused on Windows as we are. Apparently, Boeing has more dotnet developers than Microsoft does so you can imagine the development environments there and you know we've built that on to the platform so we can literally go out and stand up a POC that is pre-configured you know predictable in the way it's going to work they can run and have a very high success rate. They're not having to go and try and get onto the existing infrastructure and stuff like that. They can stand this up inside of an hour and have it up and working.

Starting point is 00:34:15 They run their POC for a couple of weeks and now they're into a rollout phase and they can just add more of these blocks as they need to. And so that has all of a sudden started to take us into some much larger customers, which is a surprising aspect of this. And so there are more of these types of solutions in the works? Is that what I'm hearing here? Yeah. We're looking at other ones in, say, legal services.

Starting point is 00:34:43 LexisNexis is a platform that we're working on. We're also looking at more traditional ones I would say like VDI. So, you know, standing up VDI very quickly being able to scale that out, you know, having a cost per desktop, right, of an all flash desktop, you know, infrastructure wise for under $200, right? You know, that's the entire infrastructure, not just the storage aspect of it. Put your license stack on top of that, and you're looking at something like $350 for the entire desktop. And it's an all-flash, high-performance desktop. So we're looking at solutions like that where we can go into organizations,

Starting point is 00:35:24 stand up solutions very quickly, and deploy those very quickly. Makes sense. Remote office, branch office is another one. Dev test is a good one. And private cloud is kind of where we came from. So layering on top of it, Azure and also for service providers, we're focused on that. Okay. But you guys really only scale the hyper-converged environment down to three nodes, right?

Starting point is 00:35:48 Yeah. Okay. Yeah, hyper-convergence is at first glance a brilliant idea for remote offices and branch offices. I just wish more solutions would scale down further. Right. So in that example, that's a good point, Howard, is that we can put one – we have different size compute nodes, first of all. So we can put a small compute node in. If they actually wanted failover in that branch office, they could put two small compute nodes in and a third storage node. So in the same chassis, we just don't put a compute workload on the storage node. So in the same chassis, we just don't put a compute workload on the storage node.

Starting point is 00:36:26 It's a much significantly smaller footprint in terms of CPU, and it just has the disk. So you get the cost of it. Smaller CPU, lighter copy of Windows. Exactly. Yeah, we use Windows embedded in our storage nodes. So we don't have the full license cost of a Windows stack on top of the storage node. Right. And your compute nodes come with data center, right? Correct. Yeah. The other cost people don't figure into the KVM or VMware-based systems. It's like,

Starting point is 00:36:54 oh yeah, now you need to buy data center for all those nodes too. Yeah. Now that is a separate cost, just to be clear, because some customers will have enterprise agreements, so they actually don't want to pay for it twice. But you just put your license key into it, and then it fires up. Yeah, or even bigger, you know, the campus agreements where you just go, we have this many employees, and we get to use anything Microsoft sells for a flat fee. Correct, yeah. But if you do not have that, then yes, we have a SKU with that in it. Okay. Well, you know, I've asked this question before. Why is hyperconvergence coming out these days as such a critical technology?

Starting point is 00:37:30 It's been a long time evolving in my mind. That's a very good question, right? You know, when I founded GridStore, you know, we started my last company, I founded in 1998. I'm actually not from the storage industry. I forgot to say that up front. You guys have a lot more depth and experience in the industry. But Kelly, you have that storage

Starting point is 00:37:58 industry accent. No, that's Irish. Sorry. You threw me off, Howard. Sorry. Yeah, we came from a platform as a service. So we did software as a service in Europe. We ran it for 30,000 companies.

Starting point is 00:38:23 Storage was a very big obstacle for us. And we shared a data center with Google at the time. And the guy who ran our data center told me the story about how it was a Sunday afternoon and the guy was going through. All the racks were open in Google. And there was trash bins at the bottom of all of them. And this guy was going through and literally pulling out dead motherboards. They would take the boards out of the chassis and put them on trays and slide them in really tight to each other. And they would literally cook each other over time.

Starting point is 00:38:56 And it's like, you know, they don't care. They just don't care. They would take them out on a Sunday afternoon and drop them in a trash bin. And there was a guy coming through with a trolley of fresh ones and sliding those in and attaching them and when he told me the story I said we got to do this for storage right because storage for us was a huge cost we had 30,000 companies running on a you know a single platform and scaling was very difficult you know we were using EMC storage we were outgrowing it all the time and having to do the forklift upgrade, all that kind of stuff. And it was a real nightmare when you're trying to run an on-demand service, right?

Starting point is 00:39:33 And you had to literally take these kind of outages to get from A to B just to have more capacity. And so he started working on this for me. We never solved it when we were in that company. But this is the reason we founded GridStore. It was this this for me. We never solved it when we were in that company, but this is the reason we founded GridStore. It was this guy with me. But when, you know, his observation was, look, if I could just take all the storage sitting in all these servers, right, because they all come, and most of the time we would take all the disks out and stuff. He said, what if we just left the disks in and just pooled all that together? So he started working on that as the premise.

Starting point is 00:40:06 And that's why we have this peer-to-peer architecture. And we started with that right from the very beginning. But when I looked at the market and I thought, this will never sell. And this was back in 2009. I just thought this is an interesting idea, but it's a big leap for people to take. And there's a lot of complexity behind doing it. So we kind of said, okay, let's split it apart, and we'll start with this. And we'll see how the market evolves, and we can always put the two together if it makes sense. And, you know... And luckily, Microsoft finally got Hyper-V working. Yes, and that was the other aspect to it.

Starting point is 00:40:47 But you can't try and guess the market. There's too many things that change, and what I wanted to have was the flexibility. So as the market did move to this, and we're very surprised at the take-up. What I think really drives it, Ray, is simplicity. If you look at an infrastructure stack today, even the converged solutions, and Microsoft has just released one with Dell, it's three layers of infrastructure. You've got JBODs at the bottom. You've got SMB servers attached you know, attached to those to serve out a scale-up file service. And then you've got a layer of Hyper-V server sitting on top of that.

Starting point is 00:41:36 And you've got to wire all this stuff together. It's quite complex and so on. And every, you know, every other converged infrastructure follows that same architecture. But when you collapse all three of those layers into one, you literally eliminate a lot of cost and a lot of complexity. And for the mid-sized data center who's been spending a lot of time trying to keep the lights on with all this stuff, it's a breath of fresh air, right? They just see it as being, okay, I can just add these nodes. Everything's in the box. I don't have to worry about integrating

Starting point is 00:42:10 all these different pieces and figuring out what's not working anymore. And it makes my life a lot simpler. And, you know, depending on which analyst you talk to, that can be an order of magnitude reduction in complexity. And that's pretty significant for people who are, you know, overwhelmed in these data centers trying to keep the lights on. And then the cost on top of that, right? You're seeing, you know, depending on, again, which analysts, probably a two to four X cost reduction because you're eliminating those

Starting point is 00:42:42 layers of infrastructure. So there's a... Oh, I could argue that one. You could argue anything, Howard. Yeah, but... And I actually sat down when I first found... I was at an event, and I forget which one because they all blur together. And one of your erstwhile competitors asked me if I had seen what the pricing was on EvoRail,

Starting point is 00:43:10 that they were seeing $200,000 a unit. And I went and I just went off and calculated what four Dell R720s would cost with an equivalent set of software. And it's substantially less than that. In fact, it's enough less than that that when we start talking about, well, you need 12 servers, 12 R720s

Starting point is 00:43:44 and a Tintree T620, which is about the right size storage for that, costs about what two Evo Rails cost and provides the performance of three. Now, you know, you're going to have to pay somebody to put it all together, but as a consultant, I would be perfectly happy to take $30,000, not $150,000 for that one-day job. I'm sure you can find somebody cheaper than me. So there are people – and I think this is a very interesting point in time. As I said, I didn't come from the storage industry. I'm more of an entrepreneur. I'm looking at disruptive opportunities. And what I see happening right now is the infrastructure market is going to be commoditized.

Starting point is 00:44:38 And when you put the two of these things together, because you had – you've got to remember, VMware came from EMC. There's a lot of EMC DNA there. So the storage industry, you know, for the past decade has lived on very fat margins, right? 65% gross margins. The server industry for the past decade has been getting crushed because of virtualization and the consolidation that it's driven. And they operate on probably

Starting point is 00:45:11 single digit gross margins. Now, the interesting thing occurs when you put the two of those things together. And I don't believe that anybody's going to be able to keep maintaining these 65% gross margin. And so I think this, when you put, when you throw hyper-converged into the mix, this really changes the kind of economic cost structure. Now, what you're referring to there, Howard, is people who are kind of maintaining the existing cost structures. Yeah, yeah, yeah. Right. They're taking servers and trying to get storage-style margins on them. Exactly.

Starting point is 00:45:49 And if I deconstruct it and say, I'm only going to pay server-level margins on the servers, then I can find the savings. Yes, yes. the savings. But clearly, the cost of goods sold on a hyper-converged system is lower than on a conventional system. And eventually, we have to start seeing that reflected in the selling price. Yes. And people will hold on to high margins in early days when there's less competition, but there's clearly more and more competition coming into this market. And with competition will come increased pricing pressure, right? So margins will shrink.

Starting point is 00:46:35 And, you know, if you're starting from a position of not having those kind of high gross margins, like we are as a startup company, you can drive that change. So that will be an aggressive part of our building this business in a way that is able to survive on significantly lower gross margin. It's still a very healthy business. Okay. We're about at the end of the podcast. Howard, do you have any last questions? No, I'm pretty satisfied.

Starting point is 00:47:06 I did a vBrownBag yesterday where we talked about hyperconvergence and I didn't actually have a grid store in mind when I said that I thought Hyper-V might turn out to be a really nice platform for hyperconvergence

Starting point is 00:47:22 just because of things like, and you can run the file services in the root partition too. Alright, Kelly, you have anything else you want to mention? No, just I enjoyed having a chat with the two of you guys today. Thank you very much for having me on.

Starting point is 00:47:38 Alright, well this has been great. Thank you, Kelly, for being on our call. We enjoyed it immensely. Next month we will talk to another startup storage technology person. Any questions you have, please let us know. That's it for now. Bye, Howard. Bye, Ray.

Starting point is 00:47:51 Bye, Kelly. And thanks again, Kelly. All right. Thank you, guys. Until next time.

Your Ad Here

Grey Beards on Systems - Graybeards talk hyper-convergence with Kelly Murphy, Founder & CTO, Gridstore

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.