Grey Beards on Systems - 43: GreyBeards talk Tier 0 again with Yaniv Romem CTO/Founder & Josh Goldenhar VP Products of Excelero
Episode Date: April 19, 2017In this episode, we talk with another next gen, Tier 0 storage provider. This time our guests are Yaniv Romem CTO/Founder  & Josh Goldenhar (@eeschwa) VP Products from Excelero, another new storage ...startup out of Israel.  Both Howard and I talked with Excelero at SFD12 (videos here) earlier last month in San Jose. I was very impressed … Continue reading "43: GreyBeards talk Tier 0 again with Yaniv Romem CTO/Founder & Josh Goldenhar VP Products of Excelero"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to the next episodes of Graybeards on Storage monthly podcast, a show where we
get Graybeards storage and system bloggers to talk with storage and system vendors to
discuss upcoming products, technologies, and trends affecting the data
center today. This is our 43rd episode of Greybeards on Storage, which was recorded April
12th, 2017. We have with us here today Yaniv Roman, CTO, and Josh Goldenhar, VP of Products
of Accelero. So, Yaniv and Josh, why don't you tell us a little bit about yourself
and your company? Okay, I'll go first, I guess. My name is Yaniv Romim. I am CTO, as mentioned,
for Accelero. I'm also one of the co-founders. We founded the company back in 2014, and it's
been a really exciting rollercoaster ride so far. I'm Josh Goldenhaar, the VP of products for Accelero. And as Yaniv said, it has been a
rollercoaster ride ever since late 2014. And it's where we set out to take an idea that Yaniv and
some of the co-founders had technically, which was to use these at the time, brand new, barely
released NVMe products, but use them over a network. And use them not just over a network, like, wow, I can access them,
but use them at extremely low latencies as if they were local storage.
Over a network?
Yeah.
At the time, since NVMe over Fabric didn't exist,
this was somewhat unheard of,
especially since I believe there was only one NVMe drive
actually released by anybody at that point, which was Intel.
That and the general commentary from the peanut gallery that the network's too slow, you can't possibly do that.
Yeah.
Which we're still hearing from some of our friends in the hyper-converged world.
We are, but the background of Yaniv and our other founders was very heavy in RDMA already.
So this was second nature to them. I think from the very beginning, there was no question. In fact,
I had to, in the very beginning, and Yaniv, I think, can attest to this, say, you know,
you guys, you can't just think InfiniBand. There's got to be an Ethernet offering. But they were
thinking very low latency and have familiarity with the networks that could
carry that kind of traffic from the beginning.
How fast is the network that you're using these days for the product?
So the network we're using today goes all the way up to 100 gig.
But we actually know how to hook up a machine to multiple ports of 100 gig.
So you can get bandwidths of around 400 gig from a standard commodity, almost commodity server.
Boxes such as those sold by various vendors today that are 2U boxes with 24 drives inside,
you can hook them up to 400 gig of networking nowadays, which gives you plenty of bandwidth.
400 gig of networking plus a few PCIe NVMe disks.
Well, you guys are limited by PCIe lanes, aren't you? Yes, absolutely. We,
on a regular basis, bump up against that limit. We are constantly finding that the bandwidth
limit inside the box or inside the PCIe complexes is what overall is limiting IO,
even from an IOPS standpoint. When you're pushing millions of IOPS, even 4K IOPS on a box,
you can actually bump up against the bandwidth limits of individual NICs very easily. And if
you put enough of these drives together in aggregate, you can hit the bandwidth limits
of even what people view as very, very wide bandwidth cards.
Yep. And that also sort of can be seen as a precursor to why working in a converged environment
is really critical when you want to get the utmost performance out of your system.
By using lots of smaller machines with a few drives inside and NICs and also running compute
on those machines, you can actually get an overall much higher or much larger amount of performance,
thanks again to the fact that you're no longer limited by the PCI lanes
and those kinds of configurations.
When you say convergent, are you talking about running virtual hypervisors
on the storage itself?
So you could do that, but I'm being very careful in saying convergent,
not hyperconverged, because our solution does work with hypervised systems,
but it can really work also with bare metal systems.
And the real focus from our perspective is that by putting the storage devices
into machines that are also running compute, by allowing you to do that,
you can still disaggregate if you want to,
or you can do a hybrid model where some of the machines are dedicated storage boxes
and you also have storage devices within your compute nodes.
But by bringing storage devices into the compute nodes and using the PCI lanes there for networking
and for the storage devices themselves, you can get much better overall performance.
You can do that with better mental environments,
and you can use it also in environments where you're running hypervisors.
Okay, guys, so let's talk about the product for a minute.
You guys are software-delivered storage, right?
Your product is software?
Yeah, it is.
We made a decision early on that we wanted to be hardware agnostic as much as possible,
that we are a software offering.
And this is really because we see the trends with the kind of customers we talk to,
and that is the largest customers on earth.
Google, Facebook, Baidu, Yanex, Amazon.
And what they all share is going to completely standards-based, and some people say commodity-based servers.
That is, they've hyper-optimized the hardware to wring all the cost out of it and make it as efficient as possible from a hardware standpoint.
And this, of course, has also spurred projects like the Open Compute project,
like Open19, which was featured at our launch, which is how do you get this hardware to be even less expensive? So enterprise, small business, the larger end of small business that is,
they like to jump on these trends as well. They want to go towards standard hardware. Now, they may buy their
hardware from Dell, HP, Lenovo, standard servers, but they still want to go ahead and standardize
on servers. In short, what we're seeing is backlash against these proprietary appliances.
Appliances and storage, the trend has become to offer a box,
paint it a different color with your own bezel, and say that this is something really special.
But underneath the covers, we know that many, if not almost all, competitive storage products out
there are based on standard components, if not standard servers. And so people really resent
paying the uplift two to three or four times as much money
for this standard hardware on top of software costs and then maintenance costs. They don't
like the thought that they're going to pay two to three times the street price for a disk drive
just because they're getting it from a vendor for the quote-unquote proprietary appliance.
So with this movement, we said we have to work on standard hardware.
And so we made that decision very early on to be standardized software,
which basically today runs on any x86-based platform.
We believe we can work on any 64-bit Linux-based platform.
From there, yeah, we offer just software
that gives you this kind of incredible unheard of performance.
I don't think I can find an NVMe SSD for my Raspberry Pi.
So I don't think you really have to worry about the ARM market.
Yet, yet, yet.
Yet, yeah, but there are folks coming up.
ARM, as you mentioned earlier about restraints on the PCIe bus, that tends to be the problem we see with ARM processors today.
Some of them are catching up in processing power and could be competitive to leaders out there, but they tend to have only 8 or 16 PCIe lanes.
So that is a problem.
The other thing that's kind of obscene here is the fact that you guys are running these gazillion IOPS per second on commodity hardware,
effectively low-end commodity hardware, other than the networking and perhaps the NVMe SSDs.
That's the other thing that's kind of odd here.
I'm not sure there are high-end and low-end once you start talking about Xeon servers.
High-end and low-end is kind of more which Xeon you picked than which
vendor you bought it from. You're bringing up a funny point. In all our testing and our demos,
we tend to use Xeon processors because we feel that's what the customers have
racked in their data centers because they're thinking about the compute side or that's what
you buy for a data center is dual socket systems. But we do work very well with single socket systems.
In fact, again, as pointed out earlier, we need the PCIe lanes.
Our product is unique in that on the target side,
that is what holds the NVMe drives, we don't use any CPU.
So one of our largest demos we had done in the past internally
was where we got 11.5 million random read 4K IOPS on a cluster of 10
systems. And these were Intel Core i7 processors. So they were desktop processors. It was a converge
test. So they had to be a little bit fast so they could actually run the synthetic benchmark.
But on the service side, we weren't using any CPU at all. So I think optimally, we would love to have a mobile processor from Intel or an Atom.
If we could get an Atom processor, but one that had 48 or 64 PCIe lanes,
that would be the best balance for us as far as a target system.
Yeah, unfortunately, I think you're the only customer for that product.
Today, today, today, yeah.
Yep.
If you're looking for a million IOPS, I'm of the opinion that you can probably afford at least a one socket Xeon.
So tell me a little bit about what's going on here. Between the storage system and the, I'll call it the host, you're using RDMA across, you know, gigabit Ethernet or 40 gig or 100 gig Ethernet,
but you've got software that's running mainly on the host,
not much running on the storage system.
Is that a good read of this, Yaniv?
It is, absolutely.
So one of the things that is very different in our architecture,
and it has been that from day one,
is really to try and see how you can make this software-defined storage system relevant for very large data centers.
That's really sort of been, from the outset, what we've been trying to achieve.
And so if you want to be able to do that in a converged environment,
one of the things that you don't want to do is you don't want to affect resource planning.
You don't want to make something that takes a lot of CPU cycles on the target side,
and then you have a conflict between the application running on a node and the fact
that it's also serving as storage. So to go and avoid that, we've really set ourselves as a target
to try and minimize, if not zero out, the amount of data path or commonly used CPU for accessing the data.
And instead of implementing storage services on the target side, we do it on the client side.
So if I have an application that needs a lot of IO,
I can expect to utilize some of my CPU resources in order to implement that IO.
In fact, even though we're doing that currently, our I.O. stack is very efficient and it doesn't take a lot of overhead in order to implement storage services on top.
And by doing that, you can really work in that kind of converged environment and you can work
across a large network and not affect the way resource planning is done. So if you have a
scale-out application and you want to increase its size and it needs now more CPU processes
or processors or it needs more memory, you can go and you can have it scale out and you want to increase its size, and it needs now more CPU processes, or processors,
or it needs more memory, you can go and you can have it scale out. And you don't care whether
it's scaling out to machines that are also serving as storage or not, because the target site doesn't
take any CPU. With that in place, it did mean that we had to go and re-architect a lot of the ways,
or a lot of elements of how the storage itself is implemented.
Most, if not all, current storage services are implemented on the target site.
And we're going and implementing them on the client side.
And we're doing it on volumes that can be shared between different clients.
So you can use clustered file system, for instance,
if you want to, in order to share data between multiple nodes.
So in order to go and implement services that are scalable,
that are done from the client side,
we had to re-architect a lot of the stuff,
and that's where actually most of our intellectual property actually lies in how you go and implement storage services from the client side.
So you mentioned shared volume.
So you can share a physical volume
that's residing on the target storage across multiple hosts?
Absolutely.
So our current offering, our 1.10 product, is the one that we released as part of our marketing launch in March.
It includes RAID 10 functionality, so you can have multiple drives that are hooked up or connected into a
single volume. Different replicas of the data will be kept on separate nodes in order to ensure
true high-level availability for the data. And then those volumes can be shared among multiple
different clients. So you could run a clustered file system on top of that, or a database that
wanted to have a shared storage layer underneath and have it running across multiple different nodes.
And in that sense, you can run a scaled application
even on top of a protected volume on practically whatever scale you want to.
So you mentioned multiple targets.
Do you support high availability dual controller types of solutions here?
I'm just trying to understand how this all
hangs together here. Absolutely. So we can work with dual controllers, but it's sort of using
those kinds of controls negates a lot of the benefits of what we're doing. What we're basically
saying is you can use your standard hardware. You don't have to go for any exotic hardware. You can use regular standard servers that aren't dual motherboards or dual
controllers in that sense. And we can still ensure that the data will be highly available
by ensuring that the copies of the data, the replicas within the RAID are stored out onto
or deployed onto different nodes. So in terms of data protection, it's like I'm running mirroring
in the logical volume manager of my guest,
and it's writing to two single controller arrays.
I'm getting resiliency because I'm writing to two of them,
not because each one is resilient, right?
Exactly.
And this is really the key to the system.
Janiv mentioned a lot about when you do this,
when you put the intelligence in the client,
it makes things very scalable.
It lets you share.
But what it really gets rid of is the very common problem of a single bottleneck.
That is when you have a dual controller system,
regardless of the services it's doing,
you have all the IO going to that one system,
which means you really try to scale up the interfaces completely redundantly. You're using a lot of investment
in this one box. And at the end of the day, you get a noisy neighbor problem, even in the box.
That is, if you have a lot of clients hitting the box, the box is going to get to a limit of what
it can serve. And then that's going to affect IO from other clients because it's all centralized. And at some point, you're going to get to a fairly small,
in terms of what NVMe drives can do, bottleneck. That is, if you look at a single NVMe drive,
and this is evidenced by any of the newer, or even the older Intel drives, the newer Intel drives,
the Samsung drives, the HGST drives, which, by the way,
we work with all of these drives. When you look at these drives, a single drive can do as many
IOPS as an entire, for instance, pure storage array. But of course, you wouldn't use it like
that because the pure storage array is redundant and gives you services. But the IOPS level is
high enough. So what happens when you put a bunch of NVMe drives
inside an existing all-flash array today? What you're doing is you're really limiting
the performance of those drives. You're getting some services, but you are so bottlenecking those
drives. There's really no way to get around that if you have this centralized system.
So by distributing the intelligence for logical volumes, for data protection,
for multi-pathing in the client itself, what looks like a block device on the client,
this is actually a logical volume manager that's also doing data protection.
When you completely distribute that, you not only eliminate that bottleneck,
but you also eliminate the noisy neighbor problem. That is, if you have
one host, one client going crazy, consuming millions of IOPS if you wanted to, it's not
necessarily affecting other clients because it's not going into a centralized system that's a
bottleneck. So the idea is that we build a grid of multiple systems that are providing storage from their NVMe SSDs
and multiple systems that are consuming it via your software, right?
Yeah, optimally, you might even call it a non-volatile mesh.
So, no, I'm sorry for that.
NVMesh is the name of the product for anybody who's not catching that one. Optimally,
to get the highest levels of aggregate performance in either IOPS or bandwidth,
and to experience the least amount of problems with a noisy neighbor or contention on the network,
that is. Optimally, yes, you do want to do as Yanni had said and put only a handful of NVMe
drives per host, and then spread those out over multiple
hosts because that way you're never going to tax the network or the NICs.
Optimally, you want to balance the NVMe drives and their capabilities to the bandwidth of
the host that they're installed in.
And in this way, you get very high performance.
Are there limits to the number of target storage systems, I'll call them, that
there are nodes on this network? Sure. So there's really two different types of limits that you can
look at. One is how many target nodes can you have that will contribute to a single volume?
And currently we're limiting that to 128 nodes or 128 targets. We actually have a large-scale deployment that is using that number for specific volumes.
And then the other one is what's the overall deployment of the whole SDS
or the whole software-defined storage system.
And there, we really architected the product to be as limitless as possible
so that you could really scale it up
to a full data center scale that is really the target so you'd be talking about tens of thousands
potentially tens of thousands of target devices within a single system gee and for a second there
i thought you were going to say and we used a one byte node address so you can only have 255
wait a minute wait a minute did you you say 10,000 storage systems,
storage targets, potentially? And you're not even talking the number of clients here. All you said,
the limits from a client perspective is that you can have at most 128 for a single volume. But
I mean, you could have multiple single volumes, obviously. So 10,000 storage systems? Somehow,
this doesn't make any sense to me. Sure. So if you go, as Josh said,
to a setup where you really want to, you know, spread things out. And so you use all of your
nodes or all of your standard servers within your data center. And you just, you know, you've got a
NIC there anyway, so you might as well have a high speed NIC. And you can use it in a converged mode
in the sense that it serves your standard networking and it also does your storage.
And then you put in a drive on average on each one of these standard servers.
Then you could have each one of these nodes would be both a client and a target.
And in that way, you'd really be getting the utmost performance out of your system.
And you'd really be able to also spread your data out so that if you wanted to have high availability, you could ensure that. You could put data replicas on different rows within the same data center or within different
racks. You could really make sure that it was always available from that perspective.
Yeah, but stretching it from Manhattan to Jersey City would kind of
defeat the whole low latency story.
Yes, it would. That's true.
Yeah.
Okay.
So let's talk about the client software to storage target protocol. Before we do that,
at this point I usually ask the, okay, so those are the theoretical limits
and you used a 32-bit number for your node
address so you can have tens of thousands of nodes.
How big have you actually tested?
Although I think we got a hint to that
about one of your real customers
already hit the 128 nodes per volume limit.
That's correct.
As was mentioned in our launch,
and there's action materials on our website,
NASA Ames is the largest as far as widest volume.
NASA Ames out in Moffett Field using their cluster for visualization for some analytics
on files that come off a supercomputer.
They are a single volume spread across 128 nodes.
Each of those nodes only has a single drive.
So it's only a 256 terabyte virtual flash
drive. But that one virtual SSD is attached to, as Yeniv mentioned earlier, it's multi-attached.
It's attached to every one of those 128 nodes. So every single node in this compute cluster
sees the 256 terabyte device as if it's a local device.
And then we've, since the initial deployment, layered in partnership with SGI, now HPE,
we've layered the CXFS file system on top of that.
So it's a clustered file system.
And every node sees this shared file system,
but that file system performs on a seek to any file anywhere
in that file system as if it's a local NVMe drive. So they're still achieving about 140 gigabytes
per second of aggregate bandwidth from all the nodes. And that's limited by their network
architecture, by the way, not by the devices. And somewhere in the neighborhood of 30 million random read IOPS 4K at about a 200
microsecond average response time. They're kind of perfect template customer because they need
a very high bandwidth, both read and write load for certain parts of the computations.
And then when they're doing the analytics on this file to look for trends, to do some processing,
it's a random IO load. So they're both hitting this with random and throughput at different
parts of their compute cycle. And this actually mirrors what many large customers have to do in
the world of analytics, which is you use streaming to bring in very large data sets.
And then when you start examining the data and looking for certain relationships,
you may hit it with a random iOpload.
Oh, God.
And each one with one drive.
Yeah.
Okay.
Usually it's Ray, but now you've blown my mind.
Yeah, it's blowing my mind too.
The key, though, is not to... This is very tempting to get caught up in the numbers, but what we want people to make
sure they understand is the numbers really are relevant.
You don't want to get caught up in these very large numbers. But the important thing to remember is that we are
allowing you to unleash the performance of the media you're buying. In other words, even if you
go to a new system, if one of our, I'll call them competitors out there, if a traditional all-flash
array goes to NVMe drives, and they go to 24 NVMe drives, and they use the middle-of-the-road drives
that, let's say, do 500,000 IOPS each. They go 24. That's 12 million IOPS. But that same box
maybe is going to give you 700,000 IOPS, maybe 800,000 IOPS max. So you've already paid for that
IOPS capability. You just can't access it. And that's the big differentiation
with our system. It's not the top end number. It's whatever you've paid for in your NVMe media,
we're going to allow you to unleash that. We're going to allow you to use it all.
You don't have to use it all, but you can use as much as you want. And you'll get that at very low
latencies. You'll get it at the kind of latencies that the media was made for.
So if you have 10 million IOPS available and you only use a million,
at least when you use a million, you're going to get extremely low latencies,
probably under 100 microsecond read response time.
And even with protection under 30 microsecond write response time.
And what this means is for storage planners
is you no longer have to worry about, am I going to have one client affect another? Am I going to
run out of horsepower? You're not going to, because you'll be able to extract with our software,
all the performance that you already paid for in the media.
Okay. So let's go, go deep here. So the technology, you're actually not using standard NVMe over
Fabric kinds of protocols, if I understand this correctly. Is that correct, Yanni?
So that's partially correct. Our product works with two different protocols, and it works in
practically the same way with both of them. We provide our own flavor of NVMe over Fabrics,
something called RDDA,
which is Remote Direct Disk Drive Access.
And that's what allows us to avoid using
any target-side CPU for the data path.
That protocol is inherently different
than NVMe over Fabrics for two reasons.
First of all, we devised it before
NVMe over Fabrics was defined.
But also, it has been built this way so that it can achieve that feat of not using any target-type CPU.
And when you're working in a converged environment, you typically want to use that, again, so that you don't require any target-type CPU.
And so that makes your resource planning very simple, and it avoids a noisy neighbor problem.
Our product also supports using the standard NVMe over Fabrics protocol for the data transfer.
And we've exhibited that with some vendors that are pre-release themselves that are coming out with NVMe over Fabrics hardware.
And we've shown that the product works seamlessly with them.
And then in that scenario, some target site CPU is used. But those are in scenarios
where the vendor is coming out with a box that has a special piece of hardware, typically an ASIC or
something of the sort, to go and really bring down the hardware requirements or the cost of
providing that NVMe over Fabrics target connectivity. So we really do support both of these today within the product.
When you go and you implement things using NVMe over Fabrics
in order to generate shared functionality of volumes
connected to multiple clients with rated functionality,
you still need to perform some kind of remote locking.
And for that, we still use some additional RDMA communication
on top of
the NVMe of the fabrics that's being used there. So even if you're going to use standard NVMe of
the fabrics, you still need to have some additional communication to ensure data consistency.
Right. So this is somewhat newer than what we saw at Storage Field Day, although you may have had it there. You may have just not discussed it at that point.
So we've got a Linux client
that runs your proprietary RDMA protocol
and presents a block interface.
And we've got a Linux target piece
that runs on a server that has the NVMe SSD
and delivers up that storage to the target.
And they, of course, can both run on the same system.
We've talked about mirroring.
Are there any other services yet?
That depends on what your definition of services is.
So to be fair, and I think what most of your listeners will say,
is that services are things like thin provisioning,
compression, deduplication, snapshots, et cetera, what they've gotten used to in the all-flash
array space. We consider services, storage services, when we're talking at the low level,
even things like logical volumes, basic RAID protection. So we would argue that we do have
services in our client. Those services are that ability to do logical volumes at all, which are dynamic, resizable,
multi-path in an active-active fashion, different logical volume types.
That is the protection on them.
They can be concatenated, RAID 0, RAID 1, and RAID 10.
And so these things are all built in and are services.
But the important thing to understand is that is where the majority of,
and Yanni said this earlier, the majority of IP investment has been in how in the world do you do
this? How do you get clients, especially clients that can share a logical volume, a logical volume
that's not being hosted and processed in a centrally managed solution? How can multiple
different clients attach the same logical volume with
all the intelligence being the client side? So we've got that worked out from here on out,
without getting too detailed, since we're talking to a public audience,
we will be adding other features. This has been a really good conversation. Thanks. But it's been, no, it's, we, of course,
now that we have this base technology established and we've done it the right
way. And if people are really curious and need some late night reading,
you know,
you can Google patents and Accelero and you can find patents that are filed
describing some of this distributed metadata.
And that's what's really behind this functionality that will lead to,
in very soon, upcoming releases, having different erasure coding levels, different data protection
schemes, up and through eventually having things like virtualized blocks that offer thin provisioning,
clones, or snapshots. So services that people are accustomed to will get in the product on the roadmap.
We're not saying here exactly when, but that is absolutely in progress.
There will be a trade-off, of course.
When you do more processing, there is a hit on latency.
So, we can't get around the physics of the problem there.
Yeah, I mean, anytime you take the data and examine it before you store it, that examining takes time.
Right.
But because we're doing it all in the client, it's 100% distributed. So, again, this is where we'll really shine and avoid that central system bottleneck because every time you add a client, that client is also adding, if you want to think about it, storage processing power. Let's talk here about the distributed metadata, which has got to be some sort of a mechanism that provides for the locking of logical volume and control structure
updates and things of that nature. Presumably, it's not in the critical path for the data I.O.,
but it could be if you start doing things like different protection schemes, virtualized blocks,
that sort of thing. Is that where things start to become more complex, I guess?
Yeah, that's exactly true. So once you move to rated volumes that are shared among multiple clients, to ensure data consistency, you start to build out some metadata. And we currently
control that metadata in a distributed fashion, as Josh mentioned, and we use RDMA Atomics in order
to make it really highly performant and, again, to avoid target site CPU usage.
As we progress to more evolved rates that are in development and as we progress to thin
provision or virtualized blocks, as Josh mentioned also, that is also in development, the metadata
structures do get more complex.
We'll be glad to have you back on Greybeards to discuss how you manage all of this with
data deduplication.
I didn't say data deduplication, did I?
I can assure you that our planning has gone all the way through snapshots and clonings
and data duplication, and we know how to do all of that metadata management from the client side in an efficient way.
It's very easy to say it.
The proof is the hard part, proving it in the field.
But we'll do that as well.
And that is really where it does get complicated.
You need to have the right metadata structures
to make that efficient,
especially when you access it from the client side.
But that is where the complication comes in.
Not that we can get into deep, dark secrets about things to come, especially when you access it from the client side. But that is where the complication comes in.
Not that we can get into deep, dark secrets about things to come,
but a light just went off in my head as we were talking about how RDMA and that distributed hash table could get very interesting.
Well, I mean, lots of nice things about the NVMe over Fabric protocol
that makes sorts of things, you know, 4K and under block IO activity almost embedded in the protocol, I guess is what I would call it.
Yeah.
I think for Acceleron NVMe, NVMe over Fabric standard protocol is really more interesting as a way to support non-Linux clients.
But even the RDMA stuff, I mean, ultimately you are talking NVMe to the SSD protocol.
So, I mean, that protocol provides almost embedded 4K data blocks without having to set up data transfers.
It's pretty interesting how it all works.
It's almost bizarre in my sense.
It's like,
it's like taking a command,
a SCSI command and embedded,
you know,
4k worth of data into the command itself packet rather than the,
you know,
a data packet or something.
Well,
why have a chatty protocol?
Yeah,
I suppose,
especially when you're talking microseconds count here.
Yeah.
Yeah.
And that's especially true when you start looking at newer kinds of non-volatile memory devices that are coming out.
The Intel Optane, for instance, where the basic latency for reads as well as for writes is around 10 microseconds.
It's around 10 microseconds.
And so every network round trip that you have to do really becomes critical.
And so if you embed the data within the request itself, you're saving a roundtrip.
It becomes relevant, especially when you go to those kinds of non-vital memory.
Yeah, that roundtrip is, you know, almost a microsecond in itself.
Jesus Christ.
Oh, my God. Okay, So we've got this system and it delivers astounding
performance. And you know, certainly NASA aims is a lovely client. Uh, you guys were talking about
the big boys, the Facebooks and by dues of the world at the beginning of the podcast.
So is your go-to-market strategy for the moment elephant hunting?
It is, but not that big an elephant, if it's fair to say. So while we have the utmost and deepest respect for these largest companies, the reality is
they're probably going to do their own thing.
They have hundreds or thousands of developers
working on these kind of things,
and they tune it to their exact hardware environment
and the way that they do provisioning and offer services.
And at the end of the day,
you could almost look at something like Facebook
and say that it's one application,
that Facebook is a single, large application
made up of many pieces, but it's
one very large application that can be hyper-optimized even to the hardware underneath.
Whereas I was talking to a finance customer early last week and a very typical kind of enterprise
financial services entity, and they said that they have over 8,000 applications in their environment.
And so that's who we want to solve for. We want to bring the power of what we can do and offer this
hyper-efficient, very high-performance storage that's very flexible. Being at the block layer,
you can use it as block, you can use it as file. You can put an object layer on top of it if you wanted to. We did block storage because it's ubiquitous underneath everything else. We'll target those kind of customers. Basically, if you take off those very top level, the highest level customers, Amazon, Microsoft, Azure, etc., and you take the next 200 or so, that's a lot of the folks that we target. So they are elephants.
They're not the biggest elephants in the world.
They're still very, very large, well-known customers.
And we announce some of those publicly as customers.
So we have GE Digital in their Predix Cloud.
PayPal is using our software for more kind of intrusion detection
and some network security issues there.
And we also have Hulu as a customer.
Okay.
So, I mean, the big web guys make sense as target customers for you.
When you start talking about finance,
outside of the Fidelities and the Goldman's of the world
who really have IT groups that think more like Web 2.0 companies than like brokerages have traditionally, isn't your penetration problem in corporate America going to be VMware support?
You'd think that.
But by the very same token that FSI customer I was talking to, one was running OpenStack, and we're perfectly happy to work with OpenStack.
We can work under the KVM hypervisor either within a virtual machine and offer faster than local flash performance over the network.
And that is because we're not incurring the wrath of the hypervisor IO overhead. So we can work within a KVM environment under OpenStack. And then many
of these same kind of customers that are very forward thinking are looking at containers,
in which case there is no virtualization. And we also can work inside a provisioning framework
with containers. We have an internal demo we've done for a very large telco
who's looking at this, where we showed containerized applications with persistent data
running on a container host, and you pull the plug on that host and the orchestration layer,
whether it's Docker Swarm or Mesos or Kubernetes, restarts that application container on a different
physical host and attaches that persistent volume.
And then that container goes ahead and picks up where it left off.
And it's getting local flash performance, but it's free from being bound to a physical host.
And that's really the key.
And you mentioned it exactly.
VMware, because we are a kernel module, it would be impossible for us to get into VMware.
They just simply don't expose the APIs.
However, when VMware supports the NVMe over Fabric client, then they can use us as a target.
And so that will free up that environment, and you could use us with VMware as well.
Okay, so where are you with the Docker volumes driver and Cinder driver and such?
So Cinder, we did not push it into the release for OpenStack,
but we do have a driver ready to go, a Cinder driver.
The reason we did not push that yet is, as we mentioned earlier,
we don't have snapshot functionality yet.
So we would go ahead and we would very happily perform a
snapshot by literally copying all the blocks, which might be a surprise to some people.
Yeah.
An unwelcome surprise.
Ray and I are old enough to remember business continuity volumes,
split mirror snapshots. So yeah, we understand.
Yep. So we've not pushed that up. That will likely not get
pushed up until we have that functionality or if a customer really wants it. But it is functional
today. So if someone was running OpenStack and understood that you should use this only for data
volumes and not for the root volumes where you're going to snap, you could use this there. The
plugin for the Docker environment, that I would call beta level.
And that is because it has to tie into the orchestration layer, either Kubernetes, Swarm, or Mesos.
And we haven't identified a clear leader or partner there yet, honestly.
So anybody listening, we are looking for partnership opportunities there.
This is another way of saying we're very customer-driven.
And the real honest answer from a practical standpoint to your question is the first customer who says,
I need you to support this kind of container in this environment, that's going to be the one that we support.
But we're ready to go. We have functionality now.
That's the nature of being a brand new newborn startup.
Your first seven, eight paying customers get pretty much what they want.
Plus, it's such a radical departure from the other storage world that we're familiar with.
It is, and we have to see who's going to win.
Containers are very interesting, but VMware is a very mature, rich, established environment, to say the least. So we're not crystal balling here, predicting the demise of VMware by any means. So it's a very important environment, well-established. We'll be around a long time.
Oh, come on. That would be news.
That would be news. It would be. Well, you know what happens to the doomsday people once they actually make the prediction and they say what the date is.
Once that date passes, we tend not to hear from them again.
I like to learn from the mistakes of other people in the past, not myself, but on my own.
There was that one reverend from South Carolina who was announcing the end of the world for the fourth time.
Yes.
All right, gents.
We're getting off the deep end here.
We're at the end of the show is there anything uh howard any final questions you have well i mean i i'm really intrigued by
what acceleros doing and by you know some of your other competitors in what i've dubbed the new tier
zero uh this very low latency, very high performance world.
The real question I have is how big an impact you guys think will come from other vendors moving into NVMe,
not as a greenfield, but as a modification to their existing systems. I mean, yesterday, Pure announced that they were allowing users,
coming out with NVMe flash modules,
and now the FlashArrayX was an all-in NVMe system.
Now, they also are still delivering storage via Fiber Channel and iSCSI,
but they're promising NVMe over fabrics in the future.
How big is the slice for high performance storage when Pure delivers NVMe over fabrics and maybe
500 microsecond latency? So that's a valid question. And the answer to the first part
of that question is welcoming other folks. We welcome everyone in the industry to adopt NVMe.
We think it only pushes our agenda forward. NVMe by some is still looked at as kind of exotic, especially NVMe over Fabric.
So we welcome folks like Pure into the arena to go ahead and support that because as Yaniv
pointed out, we already support NVMe over Fabric as a standard.
Can use that as targets.
You can use us as a target.
We can use NVMe over Fabric targets.
We could integrate into ecosystems that support that. So at the end of the day,
they're still going to have in their architecture the same bottleneck limits that they have today.
Yes, maybe they'll go down to 500 microseconds of response time. But if you have a very large
database that has certain transactional limits that are serialized, an individual IOP is what matters, how fast you
can finish that. So you'll still need a solution like ours versus that that you can get from the
recent announcement by Pure. And today, as you kind of alluded to in your question,
it's a smaller market. But I was involved in some of the early all-flash arrays, and when it first came out, we were getting the very same questions.
People were saying, who needs an average one millisecond response time or 1,000 microseconds?
Who needs that when spinning drives were giving us 8,000 microsecond response times at best?
Why do you need consistent one millisecond response time?
And now you look at what storage is being deployed, all flash arrays are the fastest
growing sector of storage sales when you look at IDC or Gartner. So they've become the norm.
And we certainly feel and hope that this is going to be the same kind of pattern that's followed here.
There is the once you've seen faster, that's all you're willing to accept.
Yeah, there's that old joke in storage, right?
Who's ever complained about having too much capacity or too much performance?
The backup guy.
Yes, yes. The capacity, yeah. Performance, guy. Yes.
The capacity, yeah. Performance, no.
All right.
Well, that was good. Yanev and Joshua,
anything you'd like to say as a final statement to our listening
audience?
I'd just like to thank everybody for listening.
Remember that we're
out there. Take a look at our site.
We think we're really doing things differently.
We think that's going to make a difference. And really, we're looking forward to where compute
and storage is going, rather than simply making tiny improvements on where it's been.
Yaniv is actually the one who, I believe, said something to this. And Yaniv, correct me if I'm
wrong, but I love the analogy, which is is imagine if people had taken candles and just worked on constantly improving candles,
improving candlelight, and making it a little brighter, a little bolder, maybe candles lasting
longer. Imagine where we'd be, but sometimes you have to make that leap, and that leap was to the
light bulb, to the electric light. We feel we're in the same kind of position.
Centralized storage models have been around forever.
All flash arrays did this a little differently, but yet they're still the centralized dual controller,
and it constantly makes small improvements, like moving from SAS or SATA now to NVMe media.
They're tiny, iterative improvements. But at some point,
to really go to the next level, you have to take that leap. And Accelero NVMesh is that leap.
So are you really calling Pure's FlashArray Henry Ford's faster horse?
It's dangerous to do so, but I think in a way I am. When I look at their announcements,
there was a different, uh, you know, apologies to you guys. There's a lot of folks out there,
but there was a different blogger who tweeted something last night. I said, they'd love to
see us go up against, uh, Pure with their new announcements. And I did some quick back of the,
the napkin calculations on what we support today. And while Pure is talking about supporting
an 18.3 terabyte flash configuration in this new announcement, today, right now, you could build
out a system with us and you could go to 23.5 petabytes, which would give you 2.8 terabytes
per second of bandwidth. Yeah, to be fair, that 18 terabytes is a module, not an array.
So even, okay, in an array of, we're still, we're talking the max you can go to an array.
It's, I think you could scale us out to, so nearly three terabytes per second of bandwidth and nearly 600 million IOPS.
You could build this with standard off-the-shelf servers today.
Yeah, and there's this discussion here between scale-out and scale-up and things of that nature.
And the fact is you could deploy 10,000 pure systems and potentially reach, you know, I don't know, terabytes per second.
Well, that would make Scott happy.
Yeah, that would make lots of people.
But the issue is, you know, these guys are doing a different thing. They really have come up with a new approach to storage, not unlike another customer, another client that's looking at, you know, re-architecting storage per se.
And it's an interesting approach, definitely.
Yaniv, did you have any final things you wanted to say?
I think I just want to emphasize one thing, and it goes back to Pure also. One of the
things that's been really important for us is to leverage standard hardware and to really try and
avoid using something which is very special. And that's why we've gone with NVMe drives,
which we thought when we started, but it's proved, it's really become apparent now that NVMe is something that's going to be
very widely adopted. And so we can use SSDs from any vendor, any NVMe SSD.
I think Pure are taking a little bit of a different approach there because while they're
providing an NVMe interface, they've gone and generated their own module and they've moved
away from standard NVMe drives. And so when another vendor goes and improves their NVMe drives,
if you're buying Pure, you can't leverage that.
And I think that's really sort of one large differentiation between us
and most of the storage appliance vendors is that we're not dictating any hardware.
We're letting you leverage whatever you can find out there.
And if you want to use Optane drives, you can go and use Optane drives.
And if you need a different mix, you can go and do that.
We're not really trying to tie you down to any specific hardware.
And I think that's a large key to what we're doing.
It's really being purely software only.
Okay.
Well, this has been great.
And even Josh, thanks very much for being on our show today.
Next month, we'll talk to another startup storage technology
person. Any questions you want us to ask, please let us know. That's it for now. Bye, Howard.
Bye, Ray. Until next time.