Grey Beards on Systems - 42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage.

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of the Graybeards on Storage monthly podcast, the show where we get Graybeards Storage Assistant bloggers to talk with Storage Assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This is our 42nd episode of Graybeards on Storage, which was recorded on March 13, 2017. We have with us here today Zivane Ori, CEO of E8 Storage. So Zivane, why don't you tell us a little bit about yourself and your company? Sure.

Starting point is 00:00:42 E8 Storage was founded two and a half years ago with the premise of creating a shared NVMe appliance without compromising on the performance that you could get from such an appliance compared to today's old flash arrays, using strictly off-the-shelf componentry and not going to the length of customizing flash networks or any kind of hardware like many other products that we see in the landscape today well one fewer than were on the landscape last week yes that you can you can hear the bells ringing for that one ah we're talking about a famous uh major storage vendor that just canceled one of their solutions yeah so i i first talked to the folks at E8 and Zivon back when I was preparing for last year's Flash Memory Summit, where I was doing a presentation that I called the new tier zero about this new class of direct memory access, very high-performance storage systems. And, of course, the big story there was DSSD and how everything was very proprietary.

Starting point is 00:01:52 Turns out the market didn't seem to like all that very proprietary. EMC has canceled the DSSD project. So it will be interesting to see how these things play out. But I'm sure Zivon is not all that heartbroken about this occurrence, is he? Well, you know, there's something to be said about a big company validating your technology, or at least your idea, or your market space. But yeah, I can understand that. So Zavon, would you call yourself a tier zero storage? So just a quick note about DSSD. I think it's a bit ironic that having started later, we have access to things like NVMe SSDs that are ubiquitous today and 100 gig Ethernet networks. So that kind of re-does the need to develop our custom flash modules like DSSD had to do and develop all sorts of proprietary high bandwidth,

Starting point is 00:02:46 low latency networks like the SSD had to do. And I think that pretty much accelerated their demise because we could get the same performance as them using commodity hardware, essentially. So it was very hard to justify the complexity and the cost. Even the power consumption was three times ours. So I think... It's a story we hear over and over again in our business

Starting point is 00:03:10 that developing hardware takes so long that somebody who can do it in software will come along and just beat you out on costs later because they've got some technology as a merchant product you had to build. Well, we've always thought that. I've come from IBM's XAV storage product. I managed XAV's development for five years.

Starting point is 00:03:33 I grew that team from 13 people when I joined to over 100 people. And XAV's philosophy had been to stick to off-the-shelf componentry. And you could see how all sorts of legacy, not flash arrays at the time, was just storage arrays, became deprecated as more and more products like HP's 3PAR or Dell's Compellent and IBM's XAV were simply using commodity components. And we always thought that the same thing is going to happen with flash. So you could see just three years ago that the leading products in the all-flash landscape were very much hardware-centric, like violin memory, Skyera, Fusion I.O., and these sorts of products have all but died out to be replaced by software-centric products

Starting point is 00:04:16 like EMC's Xtreme I.O. or Pure Storage or NetApp SolidFire. And the same thing is going to happen with what we call the Gen 2 all-flash arrays. So products like DECZ or other competitors building harder solutions to extract the true performance of flash will find themselves deprecated by software-centric solutions like ours. And I think we're seeing it happen at an even faster pace than people anticipated. So let's talk a little bit about the product. What does it look like? How do I connect? What services does it provide?

Starting point is 00:04:54 Where would I use this very fast kind of array? Okay, a lot of questions. So let's start with what is the product. We've chosen to focus on the NVMe 2.5-inch out-swappable form factor because of its ease of field replacement. And we've chosen to focus on the 2U enclosure because we thought that's the smallest enclosure that can provide high availability. So our product is essentially a 2U height, rack-mounted, 24 NVMe drive,

Starting point is 00:05:22 no single point of failure enclosure. It's the first of its kind in the industry. It supports both dual-port NVMe SSDs and single-port NVMe SSDs through an interposer. And it has got two controllers, which are really Intel motherboards, nothing special about them. Those motherboards are connected on one hand

Starting point is 00:05:41 to the NVMe drives, and on the other hand, to 40 gig slash 100 gig Ethernet ports courtesy of Mellanox. Sounds a lot like Storage Bridge Bay for the next generation. Storage Bridge Bay didn't take on as a standard mostly because it was always the same vendor building the front and the back of the enclosure.

Starting point is 00:06:03 So nobody really cared if you're compliant with the standard or not. So it looks like an SBB kind of enclosure. The main benefits are that the controllers are hot-swappable, and we have full HA and redundancy. Both controllers can view all drives. Okay, so there's a PCIe switch fabric between the controllers and the backplane that U.2 drives plug into? Or they're just dual-ported, right? Yes, we have support for dual-port NVMe drives.

Starting point is 00:06:33 I believe that we're the first product in the industry to support dual-port NVMe drives. And we work closely with the SSD vendors to bring up their SSD drives because they also realize that we're the first product that they connect the drives to. Right. I remember talking to Steve Sicola when he was working on what became XIO's ICE and he kept saying that

Starting point is 00:06:57 it turned out that was the first controller that ever used T10 diff and the drive vendors all wanted to test with it because they never had a test bed before. Yeah, so we are in a situation like that with dual-port NVMe drives. Okay. The PCIe fabric is extremely simple.

Starting point is 00:07:16 It's really just a PCIe switch, static topology, connected through a passive midplane. Yeah, you really just need it to act as a lane expander because you've got more lanes than you can get off the motherboard. It's a lane expander. It's also a port expander because the Intel CPU complex itself doesn't support so many

Starting point is 00:07:36 PCI ports. It is also pretty convenient in terms of error isolation and downlink contamination prevention. There's many benefits. But it's acting as a fan-out, not a full fabric, and downlink contamination prevention. There's many kind of benefits. Yeah, but it's acting as a fan-out, not a full fabric, and that's just simpler. Exactly. It's merely a static fan-out. There is no technology changes ever.

Starting point is 00:07:54 It is very, very solid, and we don't see much issues with that. We also have two power supplies, two battery units. All the drivers are swappable. The controllers are swappable. The PSU is swappable. We built this box to be rugged, high-duty cycle. And that's really the mission of building fault-tolerant storage. What's the battery unit's power?

Starting point is 00:08:19 Do you have like a DRAM cache? Why do we have a battery unit? Yes. Yeah, it's for the dirty DRAM cache and the flushing metadata and other data structures. It only backs the motherboards and not the drives.

Starting point is 00:08:33 So you're using standard DRAM as the write cache and then protecting the whole motherboard? Yes. We've always picked simple and pragmatic solutions. There are many fancy technologies out there that make everything a lot more complicated. And we picked very simple things that we can stabilize and productize.

Starting point is 00:08:56 So is that... Enclosure itself is not dissimilar to SaaS, SVB enclosures that are quite common today, all we asked our vendor that built the enclosure to do is really replace the SAS fabric with an NVMe fabric, and that's about it. Other than that, it will look exactly like any other kind of SAS enclosure. It will be hard to tell it apart, but it's a pure NVMe enclosure. That's what I thought. So you're using DRAM as the write buffer.

Starting point is 00:09:25 Do you have a flush flash for that during power fail or enough battery to write through? Yeah, we flush it to our boot devices. Okay. Fine. I just wanted to make sure we weren't going back to the old days when

Starting point is 00:09:42 the generator wouldn't start and you'd have the big stopwatch going, the battery on the DRAM cache is going to fail in 12 hours. So the interface internally to the drives are all NVMe fabric sorts of things. What's the interface between the host and the storage look like? Okay, so we installed a Hotsite software package, which is a small kernel driver and mostly a user space process. And we use that to essentially unload data path activities that will choke the controller. And through that, we're able to show how we can scale performance as you connect more and more servers to E8.

Starting point is 00:10:28 The more servers you connect, basically, the higher the performance you're going to get. And I'm going to consume some client CPU to manage some tasks. We consume usually one core on the server. That will, in a large-scale deployment, that will probably saturate the E8 box. For demos, we can consume a lot more cores and then use less servers to achieve the same level of performance. So it's really horizontally scalable as far as the number of cores you dedicate to it. So you're saying when you mention servers, you're talking about host servers, not the storage server, right? Yes, the application servers.

Starting point is 00:11:04 Right, right, right. Okay, and how do you do data protection? We implemented a form of RAID 6 on the NVMe drives, and it is implemented in a combination of the software that we have running inside the box and the software that we have running on the customer hosts. Okay. So the client driver and the software on the target box cooperate to do that kind of stuff. Yeah, exactly.

Starting point is 00:11:36 The stuff that runs on the agents is only related to that particular server. It is kind of initiator-side processing. There's never any awareness of the servers between each other. They don't know of each other's existence. They only ever talk to the box. So the network topology is strictly north-south, and that is what allows us to scale horizontally very well. Some of the work is carried by the controller itself. Every feature that we design, we often get asked, so what happens inside the box and what happens outside the box? It really varies from feature to feature. Every feature that we design, we cast onto this unique architecture

Starting point is 00:12:16 and determine what should reside inside the enclosure and what can happen on the agents. And since the agents are broadly distributed, you've got much more resources there in aggregate. Yes, but we don't want to disrupt the host, so we never do maintenance operations on the agents. We never do all sorts of control plane operations on the agent. That will always happen inside the box. Okay.

Starting point is 00:12:41 More than that, and that has been our primary design goal, is that we don't really care if those customer servers get turned off or die for whatever reason. One of the huge problems of hyperconverged solutions has been that the customer servers become a point of, you know, a storage turf that you rely on. Well, they become stateful. They become stateful. They become stateful. And one of the great things about server virtualization was that it made my server stateless. And I could vMotion workloads around and upgrade firmware and add RAM. And I didn't have to talk to the FACACTA change control committee anymore.

Starting point is 00:13:26 And as soon as you start using them as storage, well, if I take that host down, now at very least I'm reducing performance and resiliency. At worst, I might be doing, you know, really exposing things to bad situations. Yes. And that has been, I think, the killer seal of hyperconverged solutions. And you can go to what one might call extreme scenarios where you kill the servers. What happens if you kill one server?

Starting point is 00:13:55 The hyperconverged solution can somehow work around it. What happens if you kill two servers? You're pretty much toast with any kind of hyperconverged solution. So you might say that's not a very common scenario or whatever, but let me give you a very simple scenario. I'm upgrading my Linux. It's patch

Starting point is 00:14:11 Tuesday, whatever. What happens to those servers? They reboot. And what happens to your storage? You now face storage outage because of a patch Tuesday. So we've always thought that... This is why I have to manage patches. But that's a whole other story.

Starting point is 00:14:31 But that's usually not the storage admin's duty to manage the application server patches. So the way we designed E8 is that even though we took the liberty of running some stuff on the customer's host, we never assumed that territory is safe or stable. So you can turn off a customer server. We never store volatile metadata on that server.

Starting point is 00:14:54 So nothing happens to your storage. And you can turn off all customer servers, and that happens during power outages. Everything is stored safely on the E8 box. Nothing happens. We just clean up the orphaned IOs or orphaned handles and everything is back to normal as soon as you turn on those servers.

Starting point is 00:15:14 And that is extremely handy in many kinds of outages. But even again, just in the Linux booting or Linux patching. And what we see today in large data centers, they buy such cheap servers because they don't extremely care for their reliability because there's so many of them that they just crash left and right. In one of our deployments, there's 72 servers connected to Yates. On average, only 66 of those servers are actually alive. And then it's got nothing to do with the Yates. That's the nature of the beast.

Starting point is 00:15:46 Okay, I believe in fail in place, but that's a little bit much. It's not that bad. It's not what? But fail in place just as a philosophy, again, is best suited to things that are stateless. Exactly. If you have 10,000 web servers

Starting point is 00:16:04 and 1,000 of them go offline, nobody cares. So, Zavon, you mentioned scaling of the system, but I think you were talking about scaling the host. Is the EH storage a cluster environment or a cluster storage system? No. Each dual controller is managed separately. And if you have more than one, you can stripe across them using an LVM kind of approach. So it's a very loose kind of association. Is that literally LVM or are you handling it in your client?

Starting point is 00:16:39 Right now it's literally LVM. But as we mature the product, it's going to be a more kind of organic feature. The boxes are loosely associated, what we call stacking, and it allows you to put as many as you want. Today we get almost 180 terabytes in a single box, so it's not a huge problem. Okay. Do you support data compression, deduplication, any of those data reduction services, thin provisioning?

Starting point is 00:17:11 No, not presently. So it's 180 terabytes after RAID protection kind of thing? Yes. I got you. 180 usable? Oh, usable, okay.

Starting point is 00:17:23 Our RAID 6 is part of our V1. It's already in production with two customers. We also support RAID 5 and RAID 0, but we've come from the IBM school of hard knocks, so we built something that is extremely robust. So not only can you lose or disconnect all the customer hosts, you can knock out a controller, you can knock out two SSDs, you can turn off the power, you can do all of those things concurrently

Starting point is 00:17:52 and nothing happens to your storage. That's the strength and robustness of the E8 architecture. My kind of storage product. Yeah, I was thinking of something that Howard would love to get his hands on in one form or another. Yeah, I mean, I think building storage is extremely, it's a nasty business. You go to customers

Starting point is 00:18:14 and they don't care that you're a startup, they don't care that you're a V1, they compare you to an EMC VMAX. And you have the slightest glitch and they get extremely upset. They have zero tolerance for failure. And, you know, a lot of my team has come from IBM's XAVN Diligent acquisitions and we've gone through a lot.

Starting point is 00:18:34 We've supported the largest tier one customers in the world at IBM. And if there's one thing we learned is that they have really zero sensitivity to any kind of problems or outages. Right. So that's why we built the storage this way. You know, this is the life we've chosen. This is the life we've chosen. We didn't choose this life.

Starting point is 00:18:56 It chose us. Okay. There you go. So the client is for Linux, right? Yes. The client is for Linux, right? Yes, but because we wrote most of it in user space, it's highly portable, and we have some projects on porting it to other operating systems. Okay, so your plan to support Windows and or vSphere

Starting point is 00:19:19 is to port the client rather than support a standard protocol like NVMe over Fabric? That's a good question. The problem with NVMe over Fabric is it doesn't really have primitives in place for things that we do like RAID 6 and high availability and network multipathing. You kind of need to add that anyway, which is why we're not sticking to NVMe right now. We're kind of on the fence to see where that standard goes and what kind

Starting point is 00:19:48 of adoption it takes. Certainly you can play the trick that we've been playing since the SAN industry existed, which is just emulate an NVMe device and do the RAID 6 underneath. Or on top of or something like that.

Starting point is 00:20:04 On the target side? That's what every SAN array does. It says top of, or something like that. Yeah, on the target side? Well, I mean, that's what every SAN array does. It says, look, I'm a SCSI drive. But it's not. It's a RAID array. But you can do that irrespective of any of your fabrics. The problem is that all flash arrays today are capped at a certain level of performance because they do that kind of emulation

Starting point is 00:20:25 and they do all the features inside their dual controllers. Right, all of those layers of abstraction each takes a little cost, but they add up. Yeah, and I mean, what we can show today is that probably one of our main competitors is something like Pure Storage. And from a hardware perspective, it looks extremely similar to what we have.

Starting point is 00:20:45 But from a hardware perspective, it looks extremely similar to what we have. But from a performance perspective, we get something like 10 times their throughput and one-fifth of their latency using pretty much the same BOM structure and same cost structure. They're still using SAS SSDs for the bulk of their data. The SAS SSDs are not their their bottleneck so if you open up they're mostly cpu bound they're cpu bound so if you replace all of their sas ssds with ngme ssds

Starting point is 00:21:13 arguably but it will not really modify anything in their bandwidth or throughput and will probably knock down their latency by maybe 20 that That's the kind of industry estimate. Whereas we're using the same hardware, but a very different software architecture, are able to get something like 10 or 20 times their bandwidth and throughput and something like one-fifth or even lower of their latency. And it's using the same hardware. It's essentially at cost parity. What sort of numbers are we talking about with respect to 4K IOPS and response time

Starting point is 00:21:48 latency? You're talking about average latency or minimum or maximum? I guess I'd say average. We usually measure that by average. Sometimes it gets measured by a 95th percentile, but it's pretty similar. We essentially reflect the underlying behavior of the SSD, so it does vary from SSD to SSD that we qualify. We will always choose the best SSD for our customers based on their workloads,

Starting point is 00:22:13 and we will offer several different SKUs of E8 based on performance to cost. So some of the SSDs I've looked at, NVMe SSDs, are on the order of 200 microsecond random read latency. Is that the kind of numbers we're talking about here? No. NVMe SSDs are basically NAND latency, which is around 80 or 90 microsecond or maybe 100 microsecond. We hike that up to about 120 microsecond because of the RAID 6 and the network. We measure latency end-to-end from the

Starting point is 00:22:45 application on down. I think that with the SSDs we use today, we see about 120 microsecond latency on 4K reads. As you increase the QDAP, the latency will start increasing as well. For something like 5 million IOPS, we're at about 300 microsecond latency. We saturate our network at about 10 million IOPS. That's the highest we've recorded on our 2U box, is about 10 million IOPS. But then the latency starts going up as well. But at about half of that, around 5 million IOPS, it's about 300 mix. And at low QDAPS, it's about 120 mix, which pretty much reflects what you would get from that kind of SSD if you use it locally. Even though we hike up the latency a bit, because of our wide striping, we actually reduce the latency variation. So the net effect is actually positive for customers. They're more

Starting point is 00:23:37 sensitive to consistency of latency rather than the absolute average latency. So because we spread it out on many SSDs and then you got many hosts that peak at different times, our consistency of latency is actually better than using local SSDs. So that is a pretty cool side effect of the RAID striping. But the numbers, so in terms of IOPS, you can peak at 10 million and anything below that we hit easily. And latency, as I said, the bandwidth is about 40 gigabytes, maybe a bit higher, gigabyte a second.

Starting point is 00:24:12 Write bandwidth depends on block size, but if it's 32K block size, we can get about 30 gigabyte a second. And write IOPS on small, random 4K IOPS is between 1 million to 2 million. There's different application factors that actually impact that. So one thing you have to bear in mind when you test a local SSD, you put it in a server and you test it. That's it. When you test E8, it really varies on the number of servers that you have, the kind of network that you have, the kind of network topology, if it's one switch or two switches, cascaded switches. Oh, Zivon, remember, you're preaching to the benchmarking choir here. Yeah, well, you know, I was just going to say, you know, 10 million IOPS seems like stratospheric numbers here.

Starting point is 00:25:01 Even 5 million IOPS at 300 microseconds seems very good. This is a 2U box. I mean, gosh, it's probably less than $20,000 for the hardware here. I don't know. I guess I should ask, what's the sales price for something like this? We sell for about two to three bucks a gig. And we try to be very competitive with the other all-flush arrays and still provide 10 times their performance. So that's the way for a startup to make a dent in this market. And for those of you with data sets that don't reduce very well, that's a bargain. So pure storage is really the poster child of data reduction and you can see that if if in vdi they get 10 to 1 and or even over that then in vsi it goes down to 4 to 1 or 5 to 1 and then in databases be it structured or unstructured

Starting point is 00:25:54 they publish something like 1.4 to 1 this is all off their website right so the opportunity for dedupe keeps decreasing as you move from just virtual machines into databases and datasets. But more than that, as the cost of media goes down, the motivation to do Ddup in the first place is eroded. Ddup consumes CPU and RAM, which are expensive. Their cost is not going down. And the media you're saving keeps getting cheaper. We have a joke at e8 if one of my guys says i need to order 10 ssds i ask him can you wait until monday because well it will be

Starting point is 00:26:33 cheaper it will be cheaper by then so no actually ssd prices have been trending upward the past few months the the foundries are running at capacity and the transition to 3D is causing a temporary price bump. It brings up a great point. Long term, I'm with you, but short term... It brings up a great point. One of the segments of SSDs is kind of high performance, high reliability SSDs like we use.

Starting point is 00:27:04 So there's two features that the vendors keep pushing there that you would find very hard to use without something like E8. One is capacity. As you grow the capacity of the SSD, if you use it in local storage, that creates huge failure domains. And putting those big SSDs in E8. Now you've got HA, now you've got RAID 6. You can use bigger SSDs. The other thing is high-performance SSDs. So Samsung announced the 1 million IOPS SSD. There's no application that can really drive 1 million IOPS, right?

Starting point is 00:27:37 We see applications at most maybe driving 80,000 or 100,000 IOPS on the server. So by putting such SSDs in E8, we fan out their performance to maybe 100 servers so we can really leverage and benefit from these high-performance, high-capacity SSDs. The point I'm trying to make is the vendors are making these SSDs, but nobody's buying them because they cannot use them.

Starting point is 00:28:01 So we get the grabs on those kinds of SSDs and we're able to find the supply for it, even though other segments of the market become short supplied. Ah, the secret stash theory. I got it. Alrighty. So Ray, is there any other critical thing you need to know about EA, other than your general incredulity that it runs 400 times faster than anything else you've ever seen? Really, really, really. So, as far as your marketing, is it like a direct sale or a channel sale kind of thing? Or how are you getting the market, Zavon? Right now, our sales are mostly focused on the US.

Starting point is 00:28:46 We're seeing a lot of interest in a few verticals. One is financials, one is retailers, one is high-performance computers, computing or large clusters. We're mostly going direct at this stage. These are still kind of bleeding-edge products that require a bit of customer hand-holding. But in 2017, as the market starts to mature and we ramp up, we are also starting

Starting point is 00:29:15 to sign channels and expand into EMEA. Okay. And you're shipping now? Yeah, we GA'd the product in December and we have two customers in production and as you say, incredulity is perhaps the most common reaction that we get because the numbers are just staggering. It sounds too good to be true until you actually see it. Right. I mean, we've demoed the product, and even then people are incredulous and think that we faked the numbers. I saw somebody write an email that we served all the data from RAM, and that's how we got such high performance. So the product is GA. real customers using it, and we are happy to demo it on-prem or via WebEx to anybody who cares to view it. No hidden tricks.

Starting point is 00:30:12 Yeah, so I'm getting back to DRAM cache. You use the DRAM for write cache. Do you do read caching as well? No, we couldn't see much of a reason to do it given the speed of the backend NVMe. I'm hearing more and more people talk about data caching is gone, or is no longer useful anymore. I think that server-side caching is certainly gone and products

Starting point is 00:30:35 that were kind of popular a few years ago, like Pernix Data, and there was another one. Yeah, the Flashsoft. That market never developed the way we thought it was going to. And also with the cost that we sell today, it's not only that everything is going into Flash. We think there's almost no point of deduping it either.

Starting point is 00:31:00 You can literally build huge databases straight on Flash and get amazing performance. This is true, but not everyone has all the budget they would like. And so there are compromises to be made. And there are cases, especially around VDI, where I think deduplication is inherently an advantage. Oh yeah, certainly. We're not trying to get a foothold in that market. We're mostly looking into data sets. Yeah, you would be overkill for VDI.

Starting point is 00:31:30 It's like, look, and it boots in four seconds. But if you think of it, if you look at... I bought a laptop at Costco a few months ago, and it came with a huge NVMe drive. So even our desktops are starting to gain the benefit of that. And we will be hard-pressed to serve VDI on slower devices without the customers getting upset about it.

Starting point is 00:31:55 Yeah, well, right. That's the continuing how can you keep them down on the farm after they've seen Peri problem. Yes, exactly. I'm just glad I don't have to manage a huge vdi farm anymore so gosh you mentioned that you're you're porting the uh the agent software to other systems so as far as linux is concerned is there specific versions of linux that you support we will support whatever the customer has uh we've seen a myriad of versions out there, both Santos and RHEL, obviously, in various levels. SLAS, Ubuntu, Debian, Unbreakable, Linux kernel. It's really

Starting point is 00:32:37 kind of crazy, but our dependency there is very small, so it's not a big deal for us to support. We're looking now into other operating systems as well, and we're kind of prioritizing it based on the customer requests that we get. Okay. As far as service is concerned, you offer like, what kind of service offerings do you have behind this? You mean support? Yeah, support, right?

Starting point is 00:33:00 Support, I'm sorry, yeah. So we give, by default, three-year warranty and support. There's two support plans. One is kind of next business day, and one is for our mission critical. And you can extend the warranty and support to five years. And not that I'm not confident, but you have some spares deals too? Hardware for spares. Well, we don't sell it.

Starting point is 00:33:30 We stock it at Fruze. We stock Fruze at FSLs. So a customer wouldn't necessarily be doing a NVMe swap out if it was a failure. Well, at IBM, we never let the customer do that. Yeah, I know. It's arguable. And as a a customer it made us really crazy when that disk drive is sitting there and the light's flashing and it's a hot swappable part and i have one on the shelf but the service guy's not going to get here for three hours and he's going to yell at me if i change it myself. Well, it's arguable whether or not you want to really be as hard about it as that.

Starting point is 00:34:08 But you've seen that whole article about a huge outage of 3PAR in Australia or something like that. There were some rumors that it was the customer's fault. Oh, yeah. I've been the guy who swapped the wrong drive, too. But then it reflects badly on the product not on you so much so there are merits to not allowing you to do that

Starting point is 00:34:31 I think one of the questions we often get asked is if your entire product is software why do you sell an appliance and that is the reason the customer wants a single throat to choke he doesn't want to get ping ponged between a server vendor, a network vendor, an NVMe vendor,

Starting point is 00:34:47 and a software-defined storage vendor. That just doesn't fly. And at the performance level you guys are operating at, the tolerances for things are very small. Because performance is really the key ticket into the account, I mean, if he's happy with the performance of pure storage, he wouldn't even be talking to us.

Starting point is 00:35:09 So performance is the entry ticket, and you must deliver on it, and you must deliver on it 24-7, 365 days a year, without downtime. It's a challenge. We've seen that at IBM, certainly. So without controlling the hardware vector down to the bits and bytes and firmware levels and registers on the PSU, it just doesn't work. It's impossible to guarantee those levels of performance and reliability without it. And you mentioned consistency of response time. Do you have any statistics on what the standard deviation of your response time might be in,

Starting point is 00:35:51 let's say, a 5 million IOPS kind of configuration? You mentioned, I think, 300 microseconds. We've seen some variations between how the tools measure that. I would have to look into that. But we usually improve on what you get from the SSD. Again, it's pretty much a behavior of the SSD that we use, and we reflect that up to the customer. But by averaging out on many SSDs,

Starting point is 00:36:16 we get something that is better than the single SSD. So we would improve something like a 3% STD dev towards a 1.5% STD dev or something like that. But we've seen really strange reports by different tools that do that. So it's a bit hard to put an exact figure on it. But that's kind of what, you know, when we go into a POC, what we tell the customer is...

Starting point is 00:36:41 The tools aren't really designed to be dealing with the kind of numbers you're delivering to them. God forbid Microsoft. And the tools weren't designed to have 100 servers talking to the box at once. That's also something that's hard to measure. So usually our

Starting point is 00:36:57 POCs are two-stage. In the first stage they just hook up four or eight servers and run synthetic tests. And in the second stage, they put it in production or a production-like environment and measure the real application. And they can measure all the performance benchmarks that they want and their standard deviation

Starting point is 00:37:14 or other measures of consistency of latency. And then they can compare it to the old Flash array that they used before or the local SSD that they used before. And invariably, they're going to see crazy numbers. They're going to have to kind of, you know, scratch their eyes and see if that's really the number that they came up with.

Starting point is 00:37:31 So you mentioned 100 servers. Is that a typical configuration for this type of... I mean, you are talking 5 million IAPs. The most we've seen is 96. Somewhere between, you know, 40, 48, 72, 96. That's the kind of numbers we see, round numbers like that. Usually it's just how many servers they can fit in the rack or in a couple of racks. That's usually where that number comes from.

Starting point is 00:37:57 And you mentioned that you support both 40 gig and 100 gig Ethernet, right? Yeah, the 2U box comes with 8 ports of 100 gig, which are auto-negotiable to 40 gig or 50 gig. On the application server side, we require 10 gig and above. And is this an RDMA connection? Yes, ideally, yes. It wouldn't perform as well without it.

Starting point is 00:38:19 Okay. Rocky, iWarp, both? We're doing mostly Rocky. We have not seen much demand for iWarp, but we are strictly iBeverb compliant, so it should support InfiniBand, Rocky, iWarp, and OmniPath equally well. But right now, all of our deployments are around Rocky.

Starting point is 00:38:38 Okay. Yeah, I haven't seen any OmniPath in the wild yet. God, it's the first time I've heard OmniPath mentioned in a long time. I haven't seen any in the wild lately, actually at all. It's mostly HPC space, I think. Yeah, really HCP, yeah, absolutely.

Starting point is 00:38:56 So what's next on the roadmap? What are we going to see? More performance? More scalability? Or services to attract a broader audience? I think you're going to see more versatility that you can fit it for more use cases, both in terms of getting both higher

Starting point is 00:39:16 and lower performance out of E8 as the case will require. We will start building up more data services as well. This is a common theme, not to rely on LVMs, but rather have such organic features. Right.

Starting point is 00:39:36 So the manageability, how you configure volumes and stuff like that, is that through a web client? Oh yeah, we have GUI, which is web-based. We have REST API. We have CLI.

Starting point is 00:39:51 Isn't it nice that REST APIs are becoming table stakes? Yeah, they are becoming table stakes. Oh, we also support OpenStack provisioning. Oh, so you have a Cinder driver? Yeah. Basically, our storage is a kind of shared pool

Starting point is 00:40:13 across all the drives, and you can provision it into small LANs down to one gigabyte or one huge LAN, and then dish it out to the VMs or physical hosts, depending on how you want to do it. Creating a new LAN is as simple as just, you know, like in XAV, right-click, create new LAN, and then automatically it appears on that host.

Starting point is 00:40:37 Yeah, you guys at XAV were early in having a user interface that mere mortals could understand. The other thing that XAIV came out with was videos, almost YouTube-like instruction sets and stuff like that, right? I don't remember that, actually. Yeah, that was an X-everything. Yeah, maybe it was an IBM thing. I'm sorry.

Starting point is 00:40:57 I was busy with the hardcore escalations. I didn't have time for the video. Yeah, I got you. I got you. All right, well, we're running about to the end of the time here. Howard, is there any other questions you'd like to ask? No, I think I got it.

Starting point is 00:41:14 And it's nice to see that we've got a couple of horses that can move us into this new faster world. God, it's like all the different dimensions of performance, I would have to say. Zivon, is there anything you'd like to say to our audience as a final word? No, I think anybody who's got an interest in what we do is welcome to make contact.

Starting point is 00:41:40 We're doing now POCs, uh, uh, statewide and we're slowly ramping up our sales and we'll be happy to accommodate any kind of requests where we're seeing more and more interest now. Okay. Uh, well, this has been great. Zivon, thanks very much for being on our show today. Thank you for having me. And next month we'll talk to another startup storage technology person.

Starting point is 00:42:03 Any questions you want us to ask, please let us know. That's it for now. Bye, Howard. Bye, Ray. And until next time, thanks again.

Grey Beards on Systems - 42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.