Grey Beards on Systems - 104: GreyBeards talk new cloud defined (shared) storage with Siamak Nazari, CEO Nebulon
Episode Date: July 7, 2020Ray has known Siamak Nazari (@NebulonInc), CEO Nebulon for three companies now but has rarely had a one (two) on one discussion with him. With Nebulon just emerging from stealth (a gutsy move during t...he pandemic), the GreyBeards felt it was a good time to get Siamak on the show to tell us what he’s … Continue reading "104: GreyBeards talk new cloud defined (shared) storage with Siamak Nazari, CEO Nebulon"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Matt Lieb.
Welcome to the next episode of Graybeards on Storage podcast,
the show where we get Graybeards Storage bloggers to talk with system vendors
and other experts to discuss upcoming products, technologies,
and trends affecting the data center today.
This Great Bids on Storage episode was recorded on June 30th, 2020.
We have with us here today Simak Nazari, CEO of Nebulon.
So, Simak, why don't you tell us a little bit about yourself and your company's product?
Sure. I've been doing storage, it seems, like forever.
God, I think I've seen you at least in two companies, maybe three.
Exactly. And I've been doing file systems and block storage for a long time.
And as a part of my job at a previous employer, I got to see a lot of trends and talk to a lot of customers.
And a few key themes started to develop. One was customers really wanted to simplify their environment and have fewer moving parts.
Storage arrays, while great, are very expensive, and it was beginning to impact the bottom line.
And then the second piece that kept showing up was the problem of managing at scale, being able to customers that
often had dozens, perhaps more than a hundred arrays, and just managing a hundred arrays in
a data center is quite problematic. And these guys were facing the pressure of their users and
essentially the consumers within the enterprise coming in and say, hey,
I can do this stuff much easier on the cloud.
Why is it that it takes you guys eight weeks to provision space for me?
Yeah, storage and stuff like that.
Yeah, yeah, yeah.
And it was something that essentially this at-scale management problem was not something
that they were prepared to handle. And when I started to think about this is the existing model of shared storage where
you're connected by a fabric to it, it has this inherent problem of not being able to
scale this management model.
And people have tried to build management models that tries to solve this.
But at the end of the day,
the building blocks are just not the right building blocks.
And I wasn't interested in building another storage company.
Although you did.
Well, it's a very different type of company.
Absolutely, absolutely, yeah.
The difference is that if you think about the existing solutions today,
we have, you know, storage arrays we've talked about, we have the software defined storage or
hyperconverged, and it tries to solve some of the issues with, you know, expenses of,
and simplification in terms of not having as many moving parts of shared storage. But unfortunately,
it brings a lot of restrictions in terms of the performance, in terms of not having as many moving parts of shared storage. But unfortunately, it brings a lot of restrictions in terms of the performance,
in terms of the way you configure things, your traces of the hypervisor.
And in some ways, it shoehorns you into a specific environment.
So what I really wanted to sort of solve the problem was,
I wanted to kind of the flexibility of the arrays
and the simplicity of the software-defined storage and hyperconversion.
That's really what we were trying to think through and solve and build.
These kinds of environments, I mean, with hundreds of SANs, shared storage arrays, I mean, these things are massive, massive environments, right?
They are. And then one of the big problems they have is a given storage array is serving multiple applications that have different requirements.
And so you have this kind of careful balancing act of every time you do something on the array, you have to kind of talk to multiple consumers of the array saying, hey, I'm about to install a new firmware. And when is a service window that I can coordinate
with all these different users of this one array?
And then imagine you have to do this every week, every time.
And I describe just the upkeep and firmware update of these arrays.
It's akin to painting of the Golden Gate Bridge, right?
If you've ever been to San Francisco, right?
Yeah, yeah.
It's a never-ending story, right?
At the latter end, and now they have to start all over again
because by the time they get to the other end,
the paint is no longer good, right?
So firmware updates are the same, right?
You start updating the firmware to the latest, you know,
available from the vendor,
and by the time you're done updating your 100 arrays
because you do one or two a weekend,
there's a new firmware and you start from scratch.
And it's just a never-ending struggle to deal with this, right?
Oh, it's even faster now.
I think they're releasing storage software even more often than once a year.
Right.
And the poor customer says, hey, if I have some sort of issue,
what do I do?
Well, you've got to upgrade your array. And by the way, you are three releases behind,
which means that he can't even go directly there. You have to go through these intermediate releases
to get the patches, right? It's just a nightmare, right? And then, and add to that the lack of
visibility, right? You can't even see what's going on, on a hundred arrays at the same time, right?
It's much more difficult, right?
Yeah, absolutely.
And you're not even talking about changing the data or changing the storage, which is
a different story altogether.
Yeah, the lifecycle management is yet another pain point.
There's just impedance mismatch, really, between the lifecycle of a server, which is
usually measured in 18 months, 24 months type cycles, and of the server, which is usually measured in 18 months,
24 months type cycles, and the storage rate, which is usually measured in three to five years
life cycles, right? That's another kind of transition point that is difficult to manage
in the data center today. Yeah, that makes sense. I'm so curious about how you go about it, though.
Yeah. Tell us a little bit about how you solve these problems.
Yeah.
So the way you solve the problem is a couple of things.
So you start by re-imagining how the management works.
You think of management being in the cloud, right?
And a typical storage array, when you ship it,
it ships both with storage, you know,
IO path and also all the management artifacts to be able
to manage the array. And that's part of the bloat of the firmware that goes into one of those things.
Imagine if you moved all of the management into the cloud, and, you know, you only had the IO path
and critical pieces running inside the data center. And even then, you don't really run it
inside a storage array,
which has all sorts of, you know, you have to deal with multi-pathing and multi-tenancy issues.
What if you reimagine the array as a smaller physical device that is inside the server,
right? So imagine taking your storage array, miniaturizing to size of a pcie card and sticking
into a server right so we got the cloud end we got the storage engine that is sitting inside the
server and to be clear this is not an accelerator this is the actual the entirety of the storage
engine all the features that you expect from a storage array like compression, deduplication, encryption, all running on the card
and this device, which we call an SPU, stands for services processing unit, kind of like a GPU in
terms of a style and size and fit and finish and power, it then presents the actual storage to the
host and you don't have any sort of in-band management or any sort of management
inside the data center.
It's all sitting in the cloud.
So the experience turns into something like this.
What's at the other end of the SPU?
The other end of the SPU.
So the SPU is made up of essentially a full-function computer running out of
storage stack.
It takes over the storage media inside the
server and turns around and presents what looks like physical LUN to the host. And it also has
Ethernet ports. Some of the Ethernet ports are used for the SPUs to talk to each other and present
what looks like shared namespace and disaster recovery and mirroring and so on. And then there is a dedicated port to connect to the
cloud for the purpose of management. But the storage is effectively still situated as inside
the servers, right? Is that what you're saying? Exactly. But because the server storage is
connected to the Medusa card or to the SBPU, we can turn around and expose that storage,
not only just to the local host, but also to the hosts that are sitting in your sharing domain.
So if you have a VMware cluster, it doesn't really matter that the capacity is sitting
in a different server. You can still go from that server to the local SPU,
from that SPU over the Ethernet path of the SPU to the next SPU and grab the data and present it to the host, right?
So essentially, we solve the shared storage
by having all the properties of shared storage baked into the card,
and the cards can communicate with each other
and present both what looks like shared storage,
but they also use highly available shared stories by doing erasure coding
within the card and then mirroring across the cards.
And what, I'm sorry, Roy, what's the connectivity then between the cards?
Is that Rocky, NVMe, what's going on there?
It's 25 gig Ethernet.
There's two ports on each card and they're talking.
The Ethernet ports are RDMA capable.
We have chosen not to enable RDMA partly because there is a big gap between the switches that are capable of RDMA and the switches that actually have turned on RDMA
capabilities, it turns out there's a lot of expense and configuration issues. And just because of
those deployment issues, we choose the ports in kind of standard Ethernet TCP IP 25 gigi mode.
So the protocol between, let's say, an SBU on one server and an SBU on another server is internal to Nebulon?
Exactly.
It's not iSCSI or anything like that, right?
It's not iSCSI.
It's a very – and a lot of it has to do with our security protocols.
It turns out that, you know, the story on iSCSI security is pretty weak.
We actually encrypt all the data that is exchanged between one card to the next card.
And the data essentially just gets encrypted and is just traversing the entire system in an encrypted form.
And then the other piece is that we actually verify each card with the other card,
ensuring that the certificates are, you know, we check the presence of certificates so that, you know, some other, you know,
entity inside the data center can't connect to the card and present itself as a card trying to steal the data.
So there's a lot that has gone into the security model. And so we are kind of using our own protocol, which means that we're not bound by some standard that doesn't really add anything to our protocol.
And you mentioned high availability.
So there could be more than one SBU in a server?
We can have – okay, so the availability model is is built using layers so let's talk about a single
spu so so the spu connects so by the way the spu can run both with or without drives if it's running
without drives then the spu is just talking to the spu to to collect the capacity or or perform the
io if it has if it has drives then we will actually use erasure coding
on all the drives that are attached to the SPU.
The SPU can consume any type of SSDs,
SAS, SATA, NVMe.
It doesn't really care.
So that's kind of the first level of availability
within the card itself.
And then one more piece.
If the host is rebooted or is down or having issues,
the card is in a completely different fault domain from the host, right?
So, in fact, if you have bare metal and you don't have an OS on the host,
the card would be up and running and it's able to talk to the other cards
and the other cards can use its capacity even if the host is not even configured or running, right?
So that's kind of a big important piece of the design. Wait, wait, wait, C-Mac. So if the host is being rebooted,
let's say, the SPU is still technically active, that reboot can still provide storage to other SPUs in the network? You got it.
In fact, the SPU has to be there
because we expect the OS to actually boot from the SPU.
So the SPU has to be there,
not just to service the host when it comes back
to provide its IO boot services,
but also other SPUs that need the capacity
inside that server. Absolutely.
It is very interesting. I am still completely at a loss as to how it's done.
Well, it turns out that it wasn't that, you know, if there is a bit of a hardware magic taking
place, there are different root domains. So that's, you know, in a more detailed conversation
in front of a whiteboard, I'd love to describe more of it.
So the second piece of availability is we talk about the SVUs having these 25 gigaports,
are able to talk to each other. And so now they can actually mirror data. So a given LUN,
so we talked about the host, you know, even rebooting the data is available, but you can imagine a scenario where somebody go, we lose complete power to the server.
And therefore the SPO just goes offline because there was no power.
In that case, we have mirrored the data.
We always mirrored the data to some other SPO within the pod.
And in that case, we just do the failover and all the capacity is available. And in fact, when the SPU comes back, we re-silver the data or rehydrate the data from the SPU
that was carrying the services forward back to the SPU that was down for service or whatever
may have been the reason why the SPU was down.
So that's kind of the other level of availability.
You talked about two SPPUs in a server. It is possible to put two SPUs in a
server, but we think of it more of a performance consideration as opposed to an availability
consideration. So let's talk about fiber channel cards and how people deal with availability.
It's typically got two ports, right? So you have some sort of multi-pathing running at the host level
with a port failure or cable failure.
Well, in our model, we have two ports,
but the host doesn't even know that those two ports exist.
And then we deal with the port failovers.
So it's kind of nice.
You don't have to actually configure multi-pathing, right?
You don't have to think about it.
It's just kind of built in to the card itself.
So in this model, you know, you don't, you know, you have two ports,
you deal with all the issues that may result as having a port go down. But same back in a fiber channel card with dual ports, you effectively got almost dual circuitry.
It's, yeah, you're powered by the same configuration and you're talking to the same, let's say, PCIe bus,
but those two ports are effectively
electronically as isolated as they can be on the same card.
They're not? One would hope, but they really aren't
as isolated as you would think. Multipathing
really is designed to protect you against a Fibre Channel cable failure.
Most chip failures will result in the entire Fibre Channel card going down.
In fact, so there are few physical failures at the silicon level that will impact only
one port and not the other.
They're really designed to handle the cable failure or the switch going down for upgrade or whatever.
Whatever the path is to get to the actual capacity inside the shared storage array.
Yeah.
You and I come from a history of high availability storage array. Yeah. You and I come from, you know, history of high availability storage arrays, and there's
always been two or four or eight different controllers sitting there, you know, and they
could always, you know, migrate workload from one to the other in case there's a problem
and things of that nature.
In a single SPU environment, you know, I guess the host would be down in that case.
Exactly.
And if you had storage, then that storage would be mirrored to some other SPU storage,
so that wouldn't be a problem. It'd still be accessible.
You got it. That's exactly right. That's the exact thought process behind it.
Now, we can support two SPUs in a host, in which case it gives you
additional bandwidth and performance, but it can also mask the failure of the local
SPU. If a physical SPU actually does fail, you can still provide the same LUN view because
the data that was on this local SPU that just failed is mirrored
someplace else. So the remaining healthy SPU goes and fetches the data and you can continue to
operate. So this is kind of the extreme high availability need. I got you. That would be a
multi-controller scenario kind of thing. Exactly. And so you got that covered as well. And you mentioned that within the server, the SPU uses erasure coding to map the data across all the drives.
That's exactly right.
So it's like Reed-Solomon, it could be two failure mode types of erasure code, or it could be more or it could be less.
So within an SPU, we can tolerate up to two drive failures. So a third one, obviously,
it results in the data not available to that SPU, in which case the SPU just goes and gets the data
from a brethren SPU that was mirroring the data, right? So in this model, you can really tolerate
up to five drive failing without actually having an outage or data unavailability, right? Because,
yes. In that environment, do you have to have similar configurations between the mirrored
SBUs? No. So I talked about the fact that we could even have SBUs that don't have enough
capacity. They can just go get the capacity from another SPU. So we have an algorithm that goes and looks at all the capacity available and creates a map of
how much capacity can be consumed in each SPU and creates kind of a mesh of connectivity of LUNs.
So we don't have a one-to-one mirroring between SPUs that has all sorts of performance implications
if the SPU goes down because the surviving SPU ends up with all the load of the SPU that
just failed.
In this model, any given SPU is in a mirroring relationship with multiple SPUs.
So if it goes down, the load is evenly distributed across the SPUs.
This deals with the capacity unevenness as well.
So effectively, it could be heterogeneous servers,
just as long as there's an SPU connected to each of them,
they could perform a storage mesh.
Is that what you're calling it, rather than a cluster?
Yeah, we call it an end pod.
The problem with using the word cluster is that
now you're confusing it.
Is it a VMware cluster, Microsoft cluster? The word cluster is just, you're confusing it. VMware cluster, Microsoft cluster, the word
cluster is just
overused, right?
And it's just confusing.
And you mentioned all the data is encrypted
at the SBU
where it's written
and then throughout the network
it's maintained in an encrypted form?
Yeah, so
at ingest, so this is where it arrives.
The moment it arrives, we hash, compress, and encrypt the data.
And from then on, throughout its lifecycle, it stays in that encrypted or hash-compressed encrypted format.
So that format is preserved when it's written on drives
and as it's kind of traversing from one SPU to the other SPU
for mirroring or for disaster recovery use cases.
And so you hash to deduplicate the data?
So it's deduplicated, compressed, and encrypted.
Exactly. You got it.
So everything you expect from a modern storage array.
So snapshots is the other thing I would expect from a modern storage array.
You support snapshots?
Absolutely.
So the metadata becomes now a bit of an interesting thing, how that metadata is distributed.
So we struggled with how do we actually do this in the most robust highly available uh center you know
mode so you know we keep talking about sharing you know we can actually use this in an unshared
environment also so you can imagine a modern uh application kind of like mongodb couchbase
where the requirement is really not shared storage, just local storage.
And in one of the conversations I was having with one of our customers, they said, look,
I have a problem.
This is the enterprise IT guy describing the problem he's having.
He's saying, look, I got the guy who wants to do Hadoop or Cassandra or Spark or whatever kind of the modern thing
he wants to do today is, and he comes to me and I said, well, it's going to take me eight
weeks to do it.
And he says, well, okay.
And before I know it, he's bought a hundred servers with a thousand drives and he's running
his own new application, kind of shadow IT.
His drives just start failing and he comes and says, hey, can you help me with this?
It's like, why didn't you buy that?
I mean, I don't know how to replace drives in MongoDB.
And everyone is a little different anyway, right?
So there's this weird tension between the application guy who wants to get going fast and the enterprise IT guy who wants to sort of have visibility.
And I think our solution is perfect because, you know, you can create a pod to run MongoDB.
It doesn't require shared storage, but it requires that visibility.
So if you turn around and just present what looks like lunch, the MongoDB guy is happy.
He didn't have to sort of buy a shared storage array.
He just bought the servers from his favorite server vendor, and he can get the capacity.
Then the enterprise IT guy is happy because he has visibility into what's going on when drives fail, and
he doesn't really care if it's MongoDB or Spark that is running.
Replacing a failed drive has the exact same behavior.
You go to the cloud, notice something, you press a button, you go take the drive and
put it out.
You don't even have to talk to the application guy.
So now, the reason I talked about this story and to take you back to the issue of metadata.
So we had to design each SPU to have its own metadata, right?
That has to do with compression, deduplication, encryption, all those things.
So it's fully self-contained because it has to be able to operate independent of all the
SPUs for the non-shared use case. In the shared use case, the only metadata you really need to have is,
okay, who is my mirror so I can send my data to?
They don't have to know about the details of how many drives they have.
Is it six drives or eight drives?
Is it SAS or is it NVMe?
I don't really need to do any of that stuff.
I just need to know where it is, where is the network endpoint where I have to send the data for
mirroring or if I'm not serving the data, which SP you'll have to go to talk to get the data and
present it back to the host. So this gives you the ultimate availability and isolation of the
metadata. So there's some metadata that is clustered wide.
That metadata is about who has what,
which loan is served where,
but all the other kind of metadata
that has to do with the actual layout of the data
and compression and hashes,
all that is essentially encapsulated in an SPU
and it's independent of the other SPUs.
Very good.
Tom, maybe we can talk a bit about replication.
I know that it's using erasure coding, so there's a replication algorithm built in there.
Do you have any particular take on that, that you're doing differently?
So to be clear, we're not doing erasure coding across SPUs.
SPU is doing the erasure coding within it.
We chose that partly because doing erasure coding across SPUs, you know, has a pretty huge tax in terms of latency that the host will experience.
So we do just straight mirroring across SPUs.
And so within a pod, which is kind of what it maps to approximately a cluster,
like a VMware cluster or Microsoft cluster server or an Oracle rack.
So that's kind of the mirroring that takes place there.
And then we are working on disaster recovery where it is asynchronous,
where you kind of mirror the
data of one pod in one data center to another pod to a different data center. And that can be done
either asynchronously or asynchronously. And the good news is that all of these protocols,
mirroring protocols, essentially have the same baseline code, which we have to kind of build from the beginning and makes us comfortable
in terms of reliability and performance of solution.
Yeah.
So, you know, for like synchronous replication, things of that nature, you'd have to wait
until the data was actually at the replicated site before you, you know, authorize the IO to complete.
You got it.
And are you doing that for within the pod mirroring as well?
Yes.
So the data has to be in two places in order to satisfy the IO.
I gotcha.
So to be clear, it's a function of,
so this is kind of the beauty of making the model app-centric. In our model,
you kind of say, you don't start with make a pod, then install an application. You actually say,
I'm going to run VMware. That's where you start. Or I'm going to run MongoDB. I want to run VMware
for a database or VMware in a development environment. And so we have a series of
templates that describes what the configuration should be.
And it's all embedded into the template itself.
In fact, in this model, you don't ever deal with a worldwide name or LUN masking or exporting. You just say, make me a VMware cluster from these servers.
And in fact, the definition of the template is even whether a bootload is created and where the
content the bootload should come from so so the the experience is the customer buys the server
they rack him power them up connect the ethernet force go to the cloud and say make me a vmware
cluster and we just go and create the bootload, grab the content of the VMware boot
lens from the inside the data center, lay it out, create all the data lens
based on the template, and set up whether they're mirrored or not.
Let's say the customer says,
it's a VMware development environment, I don't necessarily need mirroring,
I don't need that kind of high availability.
Whereas in the VMware production environment,
where we do mirroring and sharing, or the guy says it's a MongoDB,
which means that no mirroring and no sharing, or the Kubernetes where there's no sharing but mirroring.
So all those configuration kind of details are hidden behind the template.
You press a button and the volumes get created, they get populated, exported, and you didn't
have to know anything about
worldwide names.
Mirroring is an option within the pod?
Yes, it is an option.
I guess I didn't realize that.
Are encryption, compression, deduplication also options?
No, just mirroring.
And there isn't mirroring as an option.
It has to do with the fact that Mongo doesn't need it, right?
And in fact, if you ever run MongoDB on something that is like one of these
hyper-convergent software-defined storage, which they force mirroring,
Mongo does three copies, you do two copies,
before you know it, you have six copies of the data, right?
So that's what you mean by saying application-centric storage,
because you're effectively configuring the pod, the MPod,
via application templates. Is that how this would work? That's exactly right. So at GA,
we will have a certain set of application templates we've created for pretty popular
type applications. But the customers can take our templates and modify it or create
their own template, right?
And so the thinking is that somebody in the enterprise IT says, okay, well, VMware in
our environment, we want it to have four terabyte LUNs, we want it to be mirrored, we want it
to be this way.
So they've modified the existing templates and the application guy just says, okay, I'm
going to use this template at these 10 servers that just came into the data center, make them a VMware,
and off you go.
And from then on, the enterprise IT guy is not involved in the conversation.
He just sort of set the standard for the organization, and the application guy just uses it, right?
Now, so getting back to where, yeah, I'm trying to understand how the SPU presents its storage as a LUN. Is it,
is it iSCSI? Is it a virtual volume? Is it? It is, it is not iSCSI. Otherwise you have to deal
with IP addresses and so on. Remember we are inside the server. We're on the PCI bus of the
server. Therefore over the PCI bus, we are presenting what the host will see as a SaaS LUN.
Now, we picked SaaS as opposed to NVMe in our initial implementation because NVMe as
a shared interface is not all that well supported by VMware and Oracle RAC and so on.
So we chose something that is industry standard.
Every single OS has drivers for a SaaS controller.
That's kind of what we chose to actually do
for our initial release.
And then once NVMe becomes popular,
remember we are on the PCIe bus,
we can just turn around and present
what looks like an NVMe target to the host.
That's extremely interesting.
How are you going to market with this solution?
So the solution really is SBUs and a cloud management control plane.
Is that what the solution represents?
You got it right.
And so it's interesting.
You talk about go to market.
I think that's one of the, you know, when you talk to customers, they just don't want yet another vendor in their data center.
It just got enough of them as it is. So our model is really the best way to think about it is kind
of like a rate card motion. Today, when you buy servers, every single server you buy has some sort
of a storage controller in it. It could be a 5-inch
card, it could be a RAID card.
Who are you buying those?
The cards are built by Broadcom.
They're OEM
from these vendors and
they're provided by Dell,
Supermicro, HPE,
Lenovo.
They all have it in their configuration
matrix, I guess, right?
And that's exactly the model for us.
In fact, if you refer back to our press release, we are going to go to market with HPE and Supermicro.
We have a third one we are talking to, and we expect to be on board for GA.
So the thinking is, essentially, you buy it directly from Supermicro
or HPE and you just call them up and say, hey, you know, instead of the standard, you know,
19, you know, the standard, you know, SAS controller that used to put in, put one of
these Medusa cards and off you go, then you don't need to buy a five-way channel card or shared
storage. Normally when I buy a SAS card or something, there's a standard cost for that.
And I pay $190 or $250, whatever the cost.
I don't know what the cost is, sorry.
But I pay that once and I get the card and I've got it.
So how does the SP,
the SPU is a much more, I'll call it intelligent device
than a SAS RAID card?
Sure, sure, sure.
So you think of it, you know, you have standard, you know, graphics card,
and then you have an NVIDIA card, right?
You kind of think of them in that lane, right?
Yeah, there's a dumb, you know, VGA built into the motherboard of most systems,
but a lot of people opt to buy NVIDIA cards because it does a lot of more stuff.
So that's how you think about it, right?
So in that model, you buy the card from the OEM, they set the pricing, but there's also
a subscription in the cloud for the use of the cloud and all the analytics and API driven and
automated software interfaces that you get in the cloud that is sold through the OEM also, right?
So the entire solution is purchased through the OEM.
And it's pre-mixed, pre-measured
based on the parameters you give the OEM?
Well, there's a bit of a negotiation,
but frankly, they will set the price on the hardware.
Right, right.
And there's no capacity charge here
because the capacity is actually
whatever ships with the servers.
Exactly.
The OEM and the customer decide,
hey, I need 20 terabytes per server.
They get 20 terabytes per server.
It just depends on what their needs are.
And we don't charge for that capacity.
That capacity,
the customer just pays for the raw capacity
they buy from the server render.
It's like software-defined storage at the next level.
I'm trying to figure...
It's got hardware is the only problem, right?
You can't do this without an SBU hardware in there.
But you can't do software-defined storage without hardware either.
I mean, every single software-defined storage has a rate controller that they depend on,
right?
Exactly.
Wow.
It's kind of interesting.
When a company was to place something like this within their data center, though,
does this require you to change over what you already have to support this?
Or, for example, would a device like this allow you to connect externally at 25 megabit Ethernet to, say, that NetApp that's sitting over there unused?
Would you be able to manage that in some way through these?
I think they could coexist in an environment.
Right.
So we are choosing not to try and manage external storage, partly because then you always end up becoming a lowest common denominator.
There's just enough variations on these devices.
And frankly, one of the big problems always is how do you, like, you know, you have to, then you have to talk to the brocade switch or the sysquatch switch to set up the zone and the worldwide names.
There's just so much kind of these storage artifacts to try and deal with. In our model,
all that stuff sort of disappears. In fact, you know, we don't want to bring that complexity back.
We just want to take all that away, right? In other models I've seen, they say, well, we can integrate quite nicely
with X, Y, or Z, but it's going to be hobbled in a certain way. Right. And that's the problem.
The hobbling, you know, yes, it can integrate, but it gets hobbled, right? But, you know,
in our model, you are essentially running it in a, you write exactly as it was intended, right.
Which is, you know, the application guy, you know,
owning the entire server and doesn't have to talk to anybody else about it.
Right.
So, I mean, this thing could actually be, you know, let's say you could,
I'm not sure you can order an SPU by itself without the server and all that
stuff, but you could,
you could almost plug an SPU into a server that has storage and you immediately have a shared storage environment.
If you have another SPU card, you could plug it into another server and all of a sudden it's shared.
It could be extremely trialable from that perspective, right?
It can be, although we are not going to.
So the question of how do you get an SBU, whether you buy it and put it in your server and you buy pre-built with a server,
all of the hardware motion is through the OEMs.
We are leaving that conversation to the OEM and the customer to decide what's the best way for them to get their hands on.
But initially we think. With the server is the way to go. Yeah, yeah.
Often the customers, if you think about large enterprises,
they are much more comfortable having a pre-built, pre-tested out-of-the-factory config coming in. They just plug in, power on,
and off they go, right? Well, that makes sense.
Nobody wants to buy parts and pieces.
They just want a single line item SKU.
Exactly. Yep.
Have you qualified particular storage devices?
I was going to ask whether disk is supported,
but I'm thinking it's not.
Storage devices like third-party?
Well, you know, each vendor has, you know,
a fairly elaborate list of storage devices
that they support in their
servers, not all of which do you have to support from your perspective, but I'm just wondering
if there is there a limit like that?
By storage devices, you mean drives that connect?
Yes.
Okay.
Yeah.
Yes.
So we are supporting SSDs only, no spinning media.
Okay.
And the answer is yes, we are working with the OEMs and there is a standard set of drives and capacities that they tend to be the most popular that customers buy.
And then in conjunction with the OEMs, you can order what's supported.
So it's a hardware compatibility list, yes? it is but you know it is mostly because uh you know the stuff on the back end is is pretty um
it's not that dependent on the type of drives you attach you can in fact you know
but we are trying to sort of limit the exposure of the customer by sort of having it a certain
set of you know drives that are that the oem is comfortable and they're getting the volume connected initially to the card.
But the backend interface of the card to the drives is really what a RAID card connected
to the drive would be like.
So it's all well-tested.
We are using industry standard components.
We're not designing our own ASIC.
So we are pretty comfortable with being able to expand that very rapidly.
You didn't mention storage class memory at all.
Is there support for storage class memory?
Yeah, storage class memory comes in two flavors.
There is a flavor that is a form of a DIMM
that sits on the server.
So we don't even have access to it.
It's just in a different PCIe domain.
So that's usually used for caching solution
as a caching solution more than anything else.
And then there is the storage class memory that sits on the bus on,
on the NVMe.
Right.
And so we are,
we are able to consume NVMe,
although,
you know,
that's probably not the first use case that comes into mind.
And there's a third use case for,
for that storage class memory to be used on the SPU for caching of our own metadata and improving the performance.
So we are looking at adding that as an option later on.
Okay, so it's primarily a metadata caching and data caching solution rather than a pure data storage solution there. Exactly. I mean, the fact is that today, the thing that people pay attention most is, you know,
how efficient is compression duplication?
Because they're trying to, and they're willing to pay the cost that comes with doing the
compression and encryption.
Almost cost because, you know, the dollar per gigabyte is still quite important.
And frankly, most applications, when they move
from spinning media to SSDs, even with the compression and deduplication tax, it's still
plenty fast for a large majority of applications. There are some niche applications, obviously,
that do need that 30-mucrosecond latency, but very few applications can really take advantage of that productively.
I was going to ask if you have an onboard cache on the SBU.
I assume there's something like that.
Yeah, we have a 32 gigabyte cache on the SBU, yes.
And you've got non-volatile memory there as well for write buffers?
It's non-volatile memory, exactly.
Oh, the whole thing. Okay, that's good. That's good. Huh, you know, I don't think I have any
other questions. Matt, do you have any last questions for CMAQ? I really don't. CMAQ,
is there anything you'd like to say to our listening audience before we close out? One
thing I should probably ask is, it's not GA yet. It will be GA in the future. Is that true?
Exactly. It is GA third quarter of this year,
and we are making great progress. Looking forward to meeting the needs of the customers
with really kind of a completely new take on how to solve this problem in data center.
It was a great pleasure talking to you guys. Really good questions.
Yeah. Okay. Well, this has been great. Thank you very much, Really good and simple questions. Yeah, okay.
Well, this has been great.
Thank you very much, CMAC, for being on our show today.
Thank you so much.
Next time, we'll talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it,
and please review us on iTunes and Google Play and Spotify,
as this will help us get the word out.
That's it for now.
Bye, Matt.
Bye, Ray.
Bye, CMAC.
Bye. Until next Bye, CMAC. Bye.
Until next time.
Good day.