Grey Beards on Systems - 42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage.
Episode Date: March 15, 2017In this episode, we talk with Zivan Ori (@ZivanOri), CEO and Co-founder of E8 Storage, a new storage startup out of Israel. E8 Storage provides a tier 0, next generation all flash array storage solut...ion for HPC and high end environments that need extremely high IO performance, with high availability and modest data services. We first … Continue reading "42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage."
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to the next episode of the Graybeards on Storage monthly podcast,
the show where we get Graybeards Storage Assistant bloggers to talk with Storage Assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
This is our 42nd episode of Graybeards on Storage, which was recorded on March 13, 2017.
We have with us here today Zivane Ori, CEO of E8 Storage.
So Zivane, why don't you tell us a little bit about yourself and your company?
Sure.
E8 Storage was founded two and a half years ago with the
premise of creating a shared NVMe appliance without compromising on the performance that
you could get from such an appliance compared to today's old flash arrays, using strictly
off-the-shelf componentry and not going to the length of customizing flash networks or any kind
of hardware like many other products that we see
in the landscape today well one fewer than were on the landscape last week yes that you can you
can hear the bells ringing for that one ah we're talking about a famous uh major storage vendor
that just canceled one of their solutions yeah so i i first talked to the folks at E8 and Zivon back when I was preparing for last year's Flash Memory Summit, where I was doing a presentation that I called the new tier zero about this new class of direct memory access, very high-performance storage systems. And, of course, the big story there was DSSD and how everything was very proprietary.
Turns out the market didn't seem to like all that very proprietary.
EMC has canceled the DSSD project.
So it will be interesting to see how these things play out.
But I'm sure Zivon is not all that heartbroken about this occurrence, is he?
Well, you know, there's something to be said about a big company validating your technology, or at least your idea, or your market space.
But yeah, I can understand that.
So Zavon, would you call yourself a tier zero storage? So just a quick note about DSSD. I think it's a bit ironic that having started later, we have access to things like NVMe SSDs that are ubiquitous today and 100 gig Ethernet networks.
So that kind of re-does the need to develop our custom flash modules like DSSD had to do and develop all sorts of proprietary high bandwidth,
low latency networks like the SSD had to do.
And I think that pretty much accelerated their demise
because we could get the same performance as them
using commodity hardware, essentially.
So it was very hard to justify the complexity and the cost.
Even the power consumption was three times ours.
So I think...
It's a story we hear over and over again in our business
that developing hardware takes so long
that somebody who can do it in software will come along
and just beat you out on costs later
because they've got some technology
as a merchant product you had to build.
Well, we've always thought that.
I've come from IBM's XAV storage product.
I managed XAV's development for five years.
I grew that team from 13 people when I joined to over 100 people.
And XAV's philosophy had been to stick to off-the-shelf componentry.
And you could see how all sorts of legacy, not flash arrays at the time,
was just storage arrays, became deprecated as more and more products like HP's 3PAR or Dell's
Compellent and IBM's XAV were simply using commodity components. And we always thought
that the same thing is going to happen with flash. So you could see just three years ago that the
leading products in the all-flash landscape were very much hardware-centric, like violin memory, Skyera, Fusion I.O.,
and these sorts of products have all but died out to be replaced by software-centric products
like EMC's Xtreme I.O. or Pure Storage or NetApp SolidFire.
And the same thing is going to happen with what we call the Gen 2 all-flash arrays.
So products like DECZ or other competitors building harder solutions to extract the true performance of flash will find themselves deprecated by software-centric solutions like ours.
And I think we're seeing it happen at an even faster pace than people anticipated.
So let's talk a little bit about the product.
What does it look like?
How do I connect?
What services does it provide?
Where would I use this very fast kind of array?
Okay, a lot of questions. So let's start with what is the product.
We've chosen to focus on the NVMe 2.5-inch
out-swappable form factor
because of its ease of field replacement.
And we've chosen to focus on the 2U enclosure
because we thought that's the smallest enclosure that can provide high availability.
So our product is essentially a 2U height, rack-mounted, 24 NVMe drive,
no single point of failure enclosure.
It's the first of its kind in the industry.
It supports both dual-port NVMe SSDs
and single-port NVMe SSDs through an interposer.
And it has got two controllers,
which are really Intel motherboards,
nothing special about them.
Those motherboards are connected on one hand
to the NVMe drives,
and on the other hand,
to 40 gig slash 100 gig Ethernet ports
courtesy of Mellanox.
Sounds a lot like Storage Bridge Bay for the next generation.
Storage Bridge Bay didn't take on as a standard
mostly because it was always the same vendor
building the front and the back of the enclosure.
So nobody really cared if you're compliant with the standard or not.
So it looks like an SBB kind of enclosure.
The main benefits are that the controllers are hot-swappable,
and we have full HA and redundancy.
Both controllers can view all drives.
Okay, so there's a PCIe switch fabric between the controllers and the backplane that U.2 drives plug into?
Or they're just dual-ported, right?
Yes, we have support for dual-port NVMe drives.
I believe that we're the first product in the industry to support dual-port NVMe drives.
And we work closely with the SSD vendors to bring up their SSD drives because they also realize that we're the first product that they connect the drives
to. Right.
I remember
talking to Steve Sicola
when he was working on what became
XIO's ICE
and he kept saying that
it turned out that was the first controller that ever used
T10 diff
and the drive vendors
all wanted to test with it
because they never had a test bed before.
Yeah, so we are in a situation like that with dual-port NVMe drives.
Okay.
The PCIe fabric is extremely simple.
It's really just a PCIe switch, static topology,
connected through a passive midplane.
Yeah, you really just need it to act as
a lane expander because
you've got more lanes than you can get off the
motherboard. It's a lane expander.
It's also a port expander because the Intel
CPU complex itself doesn't support so many
PCI ports.
It is also pretty convenient
in terms of error isolation
and downlink contamination
prevention. There's many benefits. But it's acting as a fan-out, not a full fabric, and downlink contamination prevention. There's many kind of benefits.
Yeah, but it's acting as a fan-out, not a full fabric, and that's just simpler.
Exactly. It's merely a static fan-out.
There is no technology changes ever.
It is very, very solid, and we don't see much issues with that.
We also have two power supplies, two battery units.
All the drivers are swappable.
The controllers are swappable.
The PSU is swappable.
We built this box to be rugged, high-duty cycle.
And that's really the mission of building fault-tolerant storage.
What's the battery unit's power?
Do you have like a DRAM cache?
Why do we have a battery unit?
Yes.
Yeah, it's for the dirty DRAM
cache and the flushing
metadata and other data
structures. It only backs the motherboards
and not the drives.
So you're using standard DRAM
as the write cache and then
protecting the whole motherboard?
Yes.
We've always picked simple
and pragmatic solutions.
There are many fancy technologies out there that make everything a lot more complicated.
And we picked very simple things that we can stabilize and productize.
So is that...
Enclosure itself is not dissimilar to SaaS, SVB enclosures that are quite common today, all we asked our vendor that built the enclosure to do
is really replace the SAS fabric with an NVMe fabric,
and that's about it.
Other than that, it will look exactly like any other kind of SAS enclosure.
It will be hard to tell it apart, but it's a pure NVMe enclosure.
That's what I thought.
So you're using DRAM as the write buffer.
Do you have a flush flash
for that during power fail
or enough battery to write
through? Yeah, we
flush it to
our boot devices. Okay.
Fine. I just wanted to make
sure we weren't going back to the old days when
the generator wouldn't start
and you'd have the big stopwatch going,
the battery on the DRAM cache is going to fail in 12 hours.
So the interface internally to the drives are all NVMe fabric sorts of things.
What's the interface between the host and the storage look like?
Okay, so we installed a Hotsite software package, which is a small kernel driver and
mostly a user space process. And we use that to essentially unload data path activities that will
choke the controller. And through that, we're able to show how we can scale performance as you connect more and more servers to E8.
The more servers you connect, basically, the higher the performance you're going to get.
And I'm going to consume some client CPU to manage some tasks.
We consume usually one core on the server.
That will, in a large-scale deployment, that will probably saturate the E8 box. For demos,
we can consume a lot more cores and then use less servers to achieve the same level of performance.
So it's really horizontally scalable as far as the number of cores you dedicate to it.
So you're saying when you mention servers, you're talking about host servers,
not the storage server, right? Yes, the application servers.
Right, right, right.
Okay, and how do you do data protection?
We implemented a form of RAID 6 on the NVMe drives,
and it is implemented in a combination of the software that we have running inside the box
and the software that we have running on the customer hosts.
Okay.
So the client driver and the software on the target box cooperate to do that kind of stuff.
Yeah, exactly.
The stuff that runs on the agents is only related to that particular server.
It is kind of initiator-side processing.
There's never any awareness of the
servers between each other. They don't know of each other's existence. They only ever talk to
the box. So the network topology is strictly north-south, and that is what allows us to scale
horizontally very well. Some of the work is carried by the controller itself. Every feature that we design,
we often get asked, so what happens inside the box and what happens outside the box? It really
varies from feature to feature. Every feature that we design, we cast onto this unique architecture
and determine what should reside inside the enclosure and what can happen on the agents.
And since the agents are broadly distributed,
you've got much more resources there in aggregate.
Yes, but we don't want to disrupt the host,
so we never do maintenance operations on the agents.
We never do all sorts of control plane operations on the agent.
That will always happen inside the box.
Okay.
More than that, and that has been our primary design goal,
is that we don't really care if those customer servers get turned off or die for whatever reason.
One of the huge problems of hyperconverged solutions has been that the customer servers become a point of, you know, a storage turf that you rely on.
Well, they become stateful.
They become stateful. They become stateful.
And one of the great things about server virtualization was that it made my server stateless.
And I could vMotion workloads around and upgrade firmware and add RAM.
And I didn't have to talk to the FACACTA change control committee anymore.
And as soon as you start using them as storage, well, if I take that host down, now at very
least I'm reducing performance and resiliency.
At worst, I might be doing, you know, really exposing things to bad situations.
Yes.
And that has been, I think, the killer seal of hyperconverged solutions.
And you can go to what one might call
extreme scenarios where you kill the servers.
What happens if you kill one server?
The hyperconverged solution can somehow work around it.
What happens if you kill two servers?
You're pretty much toast
with any kind of hyperconverged solution.
So you might say that's not a very common scenario
or whatever, but let me give you a very simple
scenario. I'm upgrading my Linux.
It's patch
Tuesday, whatever. What happens
to those servers? They reboot.
And what happens to your storage?
You now face storage
outage because of a patch Tuesday.
So we've always thought
that... This is why I have to manage patches.
But that's a whole other story.
But that's usually not the storage admin's duty
to manage the application server patches.
So the way we designed E8 is that
even though we took the liberty of running some stuff
on the customer's host,
we never assumed that territory is safe or stable.
So you can turn off a customer server.
We never store volatile metadata on that server.
So nothing happens to your storage.
And you can turn off all customer servers,
and that happens during power outages.
Everything is stored safely on the E8 box.
Nothing happens.
We just clean up the orphaned IOs or orphaned handles
and everything is back to normal
as soon as you turn on those servers.
And that is extremely handy in many kinds of outages.
But even again, just in the Linux booting or Linux patching.
And what we see today in large data centers,
they buy such cheap servers
because they don't extremely care for their reliability because there's so many of them
that they just crash left and right. In one of our deployments, there's 72 servers connected to
Yates. On average, only 66 of those servers are actually alive. And then it's got nothing to do
with the Yates. That's the nature of the beast.
Okay, I believe in fail in place,
but that's a little bit much.
It's not that bad.
It's not what?
But fail in place just as a philosophy,
again, is best suited to things that are stateless.
Exactly.
If you have 10,000 web servers
and 1,000 of them go offline, nobody cares.
So, Zavon, you mentioned scaling of the system,
but I think you were talking about scaling the host.
Is the EH storage a cluster environment or a cluster storage system?
No.
Each dual controller is managed separately. And if you have more than one, you can stripe across them using an LVM kind of approach.
So it's a very loose kind of association.
Is that literally LVM or are you handling it in your client?
Right now it's literally LVM.
But as we mature the product, it's going to be a more kind of organic feature.
The boxes are loosely associated, what we call stacking, and it allows you to put as many as you want.
Today we get almost 180 terabytes in a single box, so it's not a huge problem.
Okay.
Do you support data compression,
deduplication, any of those data
reduction services, thin provisioning?
No,
not presently.
So it's 180 terabytes
after RAID protection kind of thing?
Yes.
I got you.
180 usable?
Oh, usable, okay.
Our RAID 6 is part of our V1.
It's already in production with two customers.
We also support RAID 5 and RAID 0,
but we've come from the IBM school of hard knocks,
so we built something that is extremely robust.
So not only can you lose or disconnect all the customer hosts,
you can knock out a controller, you can knock out two SSDs,
you can turn off the power, you can do all of those things concurrently
and nothing happens to your storage.
That's the strength and robustness of the E8 architecture.
My kind of storage product.
Yeah, I was thinking of something that Howard would love to get his hands on
in one form or another.
Yeah, I mean, I think building storage
is extremely, it's a nasty business.
You go to customers
and they don't care that you're a startup,
they don't care that you're a V1,
they compare you to an EMC VMAX.
And you have the slightest glitch
and they get extremely upset.
They have zero tolerance for failure.
And, you know, a lot of my team has come from IBM's XAVN Diligent acquisitions
and we've gone through a lot.
We've supported the largest tier one customers in the world at IBM.
And if there's one thing we learned is that they have really zero sensitivity
to any kind of problems or outages.
Right.
So that's why we built the storage this way.
You know, this is the life we've chosen.
This is the life we've chosen.
We didn't choose this life.
It chose us.
Okay.
There you go.
So the client is for Linux, right?
Yes. The client is for Linux, right? Yes, but because we wrote most of it in user space,
it's highly portable,
and we have some projects on porting it to other operating systems.
Okay, so your plan to support Windows and or vSphere
is to port the client
rather than support a standard protocol like NVMe over Fabric?
That's a good question.
The problem with NVMe over Fabric is it doesn't really have primitives in place
for things that we do like RAID 6 and high availability and network multipathing.
You kind of need to add that anyway, which is why we're not sticking to NVMe right now.
We're kind of on the fence
to see where that standard goes and what kind
of adoption it takes.
Certainly you can play the trick
that we've been playing since
the SAN industry existed, which is
just emulate
an NVMe device and do the RAID 6
underneath. Or on top of
or something like that.
On the target side? That's what every SAN array does. It says top of, or something like that. Yeah, on the target side?
Well, I mean, that's what every SAN array does.
It says, look, I'm a SCSI drive.
But it's not. It's a RAID array.
But you can do that irrespective of any of your fabrics.
The problem is that all flash arrays today
are capped at a certain level of performance
because they do that kind of emulation
and they do all the features inside their dual controllers.
Right, all of those layers of abstraction
each takes a little cost, but they add up.
Yeah, and I mean, what we can show today
is that probably one of our main competitors
is something like Pure Storage.
And from a hardware perspective,
it looks extremely similar to what we have.
But from a hardware perspective, it looks extremely similar to what we have. But from a performance perspective,
we get something like 10 times their throughput
and one-fifth of their latency
using pretty much the same BOM structure
and same cost structure.
They're still using SAS SSDs for the bulk of their data.
The SAS SSDs are not their their bottleneck so if you open up
they're mostly cpu bound they're cpu bound so if you replace all of their sas ssds with ngme ssds
arguably but it will not really modify anything in their bandwidth or throughput and will probably
knock down their latency by maybe 20 that That's the kind of industry estimate.
Whereas we're using the same hardware,
but a very different software architecture,
are able to get something like 10 or 20 times their bandwidth and throughput and something like one-fifth or even lower of their latency.
And it's using the same hardware.
It's essentially at cost parity.
What sort of numbers are we talking about with respect to 4K IOPS and response time
latency?
You're talking about average latency or minimum or maximum?
I guess I'd say average.
We usually measure that by average.
Sometimes it gets measured by a 95th percentile, but it's pretty similar.
We essentially reflect the underlying behavior of the SSD,
so it does vary from SSD to SSD that we qualify.
We will always choose the best SSD for our customers based on their workloads,
and we will offer several different SKUs of E8 based on performance to cost.
So some of the SSDs I've looked at, NVMe SSDs,
are on the order of 200 microsecond random read latency.
Is that the kind of numbers we're talking about here?
No.
NVMe SSDs are basically NAND latency, which is around 80 or 90 microsecond or maybe 100 microsecond.
We hike that up to about 120 microsecond because of the RAID 6 and the network.
We measure latency end-to-end from the
application on down. I think that with the SSDs we use today, we see about 120 microsecond latency
on 4K reads. As you increase the QDAP, the latency will start increasing as well. For something like
5 million IOPS, we're at about 300 microsecond latency. We saturate our network at about 10 million IOPS.
That's the highest we've recorded on our 2U box, is about 10 million IOPS. But then the latency
starts going up as well. But at about half of that, around 5 million IOPS, it's about 300 mix.
And at low QDAPS, it's about 120 mix, which pretty much reflects what you would get from that kind of SSD if you use it locally.
Even though we hike up the latency a bit, because of our wide striping, we actually reduce
the latency variation. So the net effect is actually positive for customers. They're more
sensitive to consistency of latency rather than the absolute average latency. So because we spread it out on many SSDs and then you got many hosts that peak at different times,
our consistency of latency is actually better
than using local SSDs.
So that is a pretty cool side effect of the RAID striping.
But the numbers, so in terms of IOPS,
you can peak at 10 million
and anything below that we hit easily.
And latency, as I said, the bandwidth is about 40 gigabytes, maybe a bit higher, gigabyte a second.
Write bandwidth depends on block size, but if it's 32K block size, we can get about 30 gigabyte a second.
And write IOPS on small, random 4K IOPS is between 1 million to 2 million.
There's different application factors that actually impact that. So one thing you have
to bear in mind when you test a local SSD, you put it in a server and you test it. That's it.
When you test E8, it really varies on the number of servers that you have, the kind of network that
you have, the kind of network topology, if it's one switch or two switches, cascaded switches.
Oh, Zivon, remember, you're preaching to the benchmarking choir here.
Yeah, well, you know, I was just going to say, you know, 10 million IOPS seems like stratospheric numbers here.
Even 5 million IOPS at 300 microseconds seems very good. This is a 2U
box. I mean, gosh, it's probably less than $20,000 for the hardware here. I don't know. I guess I
should ask, what's the sales price for something like this? We sell for about two to three bucks
a gig. And we try to be very competitive with the other all-flush arrays and still provide 10 times their performance.
So that's the way for a startup to make a dent in this market.
And for those of you with data sets that don't reduce very well, that's a bargain.
So pure storage is really the poster child of data reduction and you can see that if if in vdi they get 10 to 1 and or even over that
then in vsi it goes down to 4 to 1 or 5 to 1 and then in databases be it structured or unstructured
they publish something like 1.4 to 1 this is all off their website right so the opportunity for
dedupe keeps decreasing as you move from just virtual machines into databases and datasets.
But more than that, as the cost of media goes down, the motivation to do Ddup in the first place is eroded.
Ddup consumes CPU and RAM, which are expensive.
Their cost is not going down.
And the media you're saving keeps getting cheaper.
We have a joke at e8 if one of my
guys says i need to order 10 ssds i ask him can you wait until monday because well it will be
cheaper it will be cheaper by then so no actually ssd prices have been trending upward the past few
months the the foundries are running at capacity
and the transition to 3D is causing a temporary price bump.
It brings up a great point.
Long term, I'm with you, but short term...
It brings up a great point.
One of the segments of SSDs is kind of high performance,
high reliability SSDs like we use.
So there's two features that
the vendors keep pushing there that you would find very hard to use without something like E8.
One is capacity. As you grow the capacity of the SSD, if you use it in local storage,
that creates huge failure domains. And putting those big SSDs in E8. Now you've got HA, now you've got RAID 6.
You can use bigger SSDs.
The other thing is high-performance SSDs.
So Samsung announced the 1 million IOPS SSD.
There's no application that can really drive 1 million IOPS, right?
We see applications at most maybe driving 80,000 or 100,000 IOPS on the server.
So by putting such SSDs in E8,
we fan out their performance to maybe 100 servers
so we can really leverage and benefit
from these high-performance, high-capacity SSDs.
The point I'm trying to make is
the vendors are making these SSDs,
but nobody's buying them because they cannot use them.
So we get the grabs on those kinds of SSDs and we're able to find the supply for it,
even though other segments of the market become short supplied.
Ah, the secret stash theory. I got it.
Alrighty. So Ray, is there any other critical thing you need to know about EA, other than your general incredulity that it runs 400 times faster than anything else you've ever seen?
Really, really, really.
So, as far as your marketing, is it like a direct sale or a channel sale kind of thing?
Or how are you getting the market, Zavon?
Right now, our sales are mostly focused on the US.
We're seeing a lot of interest in a few verticals.
One is financials, one is retailers,
one is high-performance computers,
computing or large clusters.
We're mostly going direct at this stage.
These are still kind of bleeding-edge products
that require a bit of customer
hand-holding. But in 2017, as the market starts to mature and we ramp up, we are also starting
to sign channels and expand into EMEA.
Okay. And you're shipping now? Yeah, we GA'd the product in December and we have two customers in production
and as you say, incredulity is perhaps the most common reaction that we get because the numbers
are just staggering. It sounds too good to be true until you actually see it.
Right. I mean, we've demoed the product, and even then people are incredulous and think that we faked the numbers.
I saw somebody write an email that we served all the data from RAM, and that's how we got such high performance.
So the product is GA. real customers using it, and we are happy to demo it on-prem or via WebEx to anybody
who cares to view it. No hidden tricks.
Yeah, so I'm getting back to DRAM cache. You use the DRAM for write cache. Do you do
read caching as well?
No, we couldn't see much of a reason to do it given the speed of the backend NVMe. I'm hearing more and more people talk about
data caching is gone,
or is no longer useful
anymore.
I think that server-side caching is certainly gone
and products
that were kind of
popular a few years ago, like
Pernix Data, and there was another
one. Yeah, the
Flashsoft. That market never developed the way we thought it was going to.
And also with the cost that we sell today,
it's not only that everything is going into Flash.
We think there's almost no point of deduping it either.
You can literally build huge databases straight on Flash
and get amazing performance.
This is true, but not everyone has all the budget they would like.
And so there are compromises to be made.
And there are cases, especially around VDI, where I think deduplication is inherently an advantage.
Oh yeah, certainly. We're not trying to get a foothold in that market. We're mostly looking
into data sets.
Yeah, you would be overkill for VDI.
It's like, look, and it boots in four seconds.
But if you think of it,
if you look at... I bought a laptop
at Costco a few months ago,
and it came with a huge NVMe drive.
So even
our desktops are starting to gain the benefit of that.
And we will be hard-pressed to serve VDI on slower devices without the customers getting upset about it.
Yeah, well, right.
That's the continuing how can you keep them down on the farm after they've seen Peri problem.
Yes, exactly.
I'm just glad I don't have to manage a huge vdi farm anymore
so gosh you mentioned that you're you're porting the uh the agent software to other systems so
as far as linux is concerned is there specific versions of linux that you support we will support
whatever the customer has uh we've seen a myriad of versions out there, both Santos and RHEL,
obviously, in various levels. SLAS, Ubuntu, Debian, Unbreakable, Linux kernel. It's really
kind of crazy, but our dependency there is very small, so it's not a big deal for us to support.
We're looking now into other operating systems as well,
and we're kind of prioritizing it based on the customer requests that we get.
Okay.
As far as service is concerned, you offer like,
what kind of service offerings do you have behind this?
You mean support?
Yeah, support, right?
Support, I'm sorry, yeah.
So we give, by default, three-year warranty and support.
There's two support plans.
One is kind of next business day, and one is for our mission critical.
And you can extend the warranty and support to five years.
And not that I'm not confident, but you have some spares deals too?
Hardware for spares.
Well, we don't sell it.
We stock it at Fruze.
We stock Fruze at FSLs.
So a customer wouldn't necessarily be doing a NVMe swap out if it was a failure.
Well, at IBM, we never let the customer do that.
Yeah, I know.
It's arguable. And as a a customer it made us really crazy when that disk drive is sitting there and the light's flashing and it's a
hot swappable part and i have one on the shelf but the service guy's not going to get here for
three hours and he's going to yell at me if i change it myself. Well, it's arguable whether or not you want to really be as hard about it as that.
But you've seen that whole article about a huge outage of 3PAR in Australia
or something like that.
There were some rumors that it was the customer's fault.
Oh, yeah.
I've been the guy who swapped the wrong drive, too.
But then it reflects badly on the product
not on you so much
so there are merits to not allowing you to do that
I think one of the questions we often get asked
is if your entire product is software
why do you sell an appliance
and that is the reason
the customer wants a single throat to choke
he doesn't want to get ping ponged
between a server vendor, a network vendor,
an NVMe vendor,
and a software-defined storage vendor.
That just doesn't fly.
And at the performance level you guys are operating at,
the tolerances for things are very small.
Because performance is really the key ticket
into the account,
I mean, if he's happy with the performance of pure storage,
he wouldn't even be talking to us.
So performance is the entry ticket, and you must deliver on it,
and you must deliver on it 24-7, 365 days a year, without downtime.
It's a challenge.
We've seen that at IBM, certainly.
So without controlling the hardware vector down to the bits and bytes and firmware levels and registers on the PSU, it just doesn't work.
It's impossible to guarantee those levels of performance and reliability without it.
And you mentioned consistency of response time.
Do you have any statistics on what the standard deviation of your response time might be in,
let's say, a 5 million IOPS kind of configuration?
You mentioned, I think, 300 microseconds.
We've seen some variations between how the tools measure that.
I would have to look into that.
But we usually improve on what you get from the SSD.
Again, it's pretty much a behavior of the SSD that we use,
and we reflect that up to the customer.
But by averaging out on many SSDs,
we get something that is better than the single SSD.
So we would improve something like a 3% STD dev
towards a 1.5% STD dev or something like that.
But we've seen really strange reports
by different tools that do that.
So it's a bit hard to put an exact figure on it.
But that's kind of what, you know,
when we go into a POC, what we tell the customer is...
The tools aren't really designed to be dealing
with the kind of numbers you're
delivering to them.
God forbid Microsoft.
And the tools weren't designed to have
100 servers talking to the box at once.
That's also something that's hard
to measure. So usually our
POCs are two-stage. In the first stage
they just hook up four or eight
servers and run synthetic tests.
And in the second stage, they put it in production
or a production-like environment
and measure the real application.
And they can measure all the performance benchmarks
that they want and their standard deviation
or other measures of consistency of latency.
And then they can compare it to the old Flash array
that they used before or the local SSD
that they used before.
And invariably, they're going to see crazy numbers.
They're going to have to kind of, you know,
scratch their eyes and see if that's really the number
that they came up with.
So you mentioned 100 servers.
Is that a typical configuration for this type of...
I mean, you are talking 5 million IAPs.
The most we've seen is 96.
Somewhere between, you know, 40, 48, 72, 96.
That's the kind of numbers we see, round numbers like that.
Usually it's just how many servers they can fit in the rack or in a couple of racks.
That's usually where that number comes from.
And you mentioned that you support both 40 gig and 100 gig Ethernet, right?
Yeah, the 2U box comes with 8 ports of 100 gig,
which are auto-negotiable to 40 gig or 50 gig.
On the application server side,
we require 10 gig and above.
And is this an RDMA connection?
Yes, ideally, yes.
It wouldn't perform as well without it.
Okay.
Rocky, iWarp, both?
We're doing mostly Rocky.
We have not seen much demand for iWarp,
but we are strictly iBeverb compliant,
so it should support InfiniBand, Rocky, iWarp,
and OmniPath equally well.
But right now, all of our deployments are around Rocky.
Okay.
Yeah, I haven't seen any OmniPath in the wild yet.
God, it's the first time I've heard OmniPath mentioned in a long time.
I haven't seen any
in the wild lately, actually at all.
It's mostly
HPC space, I think. Yeah, really
HCP, yeah, absolutely.
So what's next on the
roadmap? What are we going to see?
More performance?
More scalability? Or
services to attract a broader audience?
I think you're going to see more versatility
that you can fit it for more use cases,
both in terms of getting both higher
and lower performance out of E8
as the case will require.
We will start building up more data services
as well. This is a common
theme, not to rely on
LVMs, but rather
have such organic features.
Right.
So the
manageability, how you configure
volumes and stuff like that, is that through
a web client?
Oh yeah, we have
GUI, which is web-based. We have
REST API. We have
CLI.
Isn't it nice that REST APIs are becoming
table stakes?
Yeah, they are becoming
table stakes. Oh, we also support
OpenStack provisioning.
Oh, so you have a Cinder driver?
Yeah.
Basically, our storage is a kind of shared pool
across all the drives,
and you can provision it into small LANs
down to one gigabyte or one huge LAN,
and then dish it out to the VMs or physical hosts,
depending on how you want to do it.
Creating a new LAN is as simple as just, you know, like in XAV,
right-click, create new LAN,
and then automatically it appears on that host.
Yeah, you guys at XAV were early in having a user interface
that mere mortals could understand.
The other thing that XAIV came out with was videos,
almost YouTube-like instruction sets and stuff like that, right?
I don't remember that, actually.
Yeah, that was an X-everything.
Yeah, maybe it was an IBM thing.
I'm sorry.
I was busy with the hardcore escalations.
I didn't have time for the video.
Yeah, I got you.
I got you.
All right, well, we're running about to the end of the time
here. Howard, is there any other questions you'd like to ask?
No, I
think I got it.
And
it's nice to see that we've got a couple of
horses that can
move us into this new
faster world. God, it's like
all the different dimensions of performance, I would have to say.
Zivon, is there anything you'd like to say to our audience as a final word?
No, I think anybody who's got an interest in what we do is welcome to make contact.
We're doing now POCs, uh, uh, statewide and we're slowly ramping up our sales and
we'll be happy to accommodate any kind of requests where we're seeing more and more
interest now.
Okay.
Uh, well, this has been great.
Zivon, thanks very much for being on our show today.
Thank you for having me.
And next month we'll talk to another startup storage technology person.
Any questions you want us to ask, please let us know. That's it for now. Bye, Howard. Bye, Ray. And until next time,
thanks again.