Grey Beards on Systems - 83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to the next episode of Greybeards on Storage podcast, a show where we get Greybeards Storage bloggers to talk with system vendors to discuss upcoming products, technologies and trends affecting the data center today. This Graybeard on Storage episode was recorded on June 6, 2019. We have with us here today Cam Eschke, VP of Strategy and Business Development, and Muli Ben-Yuda, co-founder and CTO of LightBits Labs. So, Cam and Muli, why don't you tell us a little bit about yourself and your company?

Starting point is 00:00:45 Thank you, Ray. This is Muli. why don't you tell us a little bit about yourself and your company? Thank you, Ray. This is Muley. Why don't I get started? So I will leave it to Cam to tell you all about LightBits, which is a very exciting subject that we love to talk about. As for me, briefly, I'm one of the founders of Lightbeats Labs. I'm the CTO. Before that, I spent many years at IBM Research as in GetIBM research, I am a longtime Linux kernel contributor.

Starting point is 00:01:30 So if you're using the Linux kernel, you're using some very small bits that I contributed over the years. Linux kernel, yeah. Yeah. Sometime in the mid-2000s, I was getting a little bit bored with operating systems. Thankfully, at the time, hypervisors were all the rage. So if you're using Xen or KVM, you're also using some small bits that I contributed to both of them. My background, as you can probably understand, is in operating systems and more generally building high performance systems, combining, you know, software, hardware, economics, and game theory, and taking all of these things and building really high-performance systems that delight their users. And that's really what we're trying to do at LightBits Labs. But I'll let Cam introduce himself and the company. Sure. Thanks, Muli. Hi, everyone. This is Cam Eshke.

Starting point is 00:02:44 I lead the business team for LightBits and prior to LightBits I was a Dell EMC. I was at a startup called DSSD and we got acquired by EMC a few years ago and I've been in data center, storage, networking, compute for many years. Worked at big companies like EMC, Intel, HB and also also startups such as DSSD and CrossLayer Networks. So at LightBits, I lead the business team, and I'm based in San Jose, California. Actually, LightBits is an Israeli-based startup. We have offices in Silicon Valley and in New York City. Let me tell you a little bit about company background.

Starting point is 00:03:22 We've been around for three years. We have about 70 employees now and very technology oriented. You know, most of the team is in engineering. We've so far raised $50 million in funding, just closed our Series B in December of last year. And investors include strategic investments from Cisco Investments, Dell Capital, Micron, as well as a group of angel investors and VCs. So, Ray, would you like me to give an overview of what we do? Yeah, please do. It's kind of, I've read through your website, but yeah, why don't you go ahead and talk

Starting point is 00:03:56 about the company and what your products or solutions do. Yeah. Sure. So basically what we build is a software defined solution for disaggregation in cloud infrastructure. So if you look at cloud infrastructure, whether it's private cloud or cloud service providers and public cloud, performance sensitive workloads today or in the past have been running in more of a converged infrastructure model, meaning that you have nodes that have compute and storage and networking. As you need more storage or you need more CPU horsepower, you keep adding that node.

Starting point is 00:04:28 That's how you scale. And that model works well if you have smaller deployments and you need a lot of performance. You have a direct attached connection between the SSD and the CPU, and it's just sort of a simple model to build out. Problem is that in cloud infrastructure, you want to be able to scale to much larger scales where you're talking about thousands or tens of thousands or hundreds of thousands of racks. And when you get to that scale with this model of direct attached storage, you're going to end up with stranded capacity and underutilization of your infrastructure. So what all the hyperscalers have been doing is they're migrating to a model of disaggregation,

Starting point is 00:05:07 meaning separating storage and compute so that you can now share a pool of storage, a pool of SSDs across a bunch of compute nodes. And that way you can increase your utilization of your infrastructure, not have wasted SSD capacity, and also be able to scale your storage and compute independently so that if an application if a workload needs more storage no problem you can create a cluster with the right ratio of storage to compute for each particular particular application and this by

Starting point is 00:05:36 the way the separation of storage and compute also helps improve operational efficiencies and gives you more flexibility and maintaining of upgrading storage and compute independently. So this transition from direct attached storage to disaggregation is already underway with the big hyperscalers and now all the other column tier 2 cloud service providers and enterprise private cloud deployments are also making that transition for the same reasons because they want an architecture that's easy to scale, is efficient, is simple to use. And that's basically what we that's the space that we play in.

Starting point is 00:06:14 Now what we do, which is unique, is that we can support this disaggregation with end to end NVMe over standard TCP IP. Now, NVMe is a high-performance control interface for SSDs, and it was initially started as for direct-attached SSDs, but then with NVMe fabrics, it was extended over different transports and initially designed for RDMA fabrics. But what LightBits did, we came in a few years ago and said, you know what, RDMA fabrics make a lot ofBits did, we came in a few years ago and said, you know what? RDMA fabrics make a lot of sense for rack scale small deployments, but if you want to connect

Starting point is 00:06:50 any compute node to any storage node in a large data center, RDMA is just not going to work at that scale. Because the networking requirements are so specialized? Exactly. Networking is much more complex. If you want to run RDMA, you have to get close to lossless networking. And you need to have RDMA nicks on every single client. And in many cases, there are interoperability issues between different RDMA technologies. So it is very difficult to get RDMA to work at that scale. What if we did this over TCP IP where you don't need a special NIC on every single client? You don't need to do anything special with the networking. Can you still get good performance and latency? So we built a solution around that, and that's exactly what we do. So we can disaggregate storage and compute

Starting point is 00:07:41 over a standard TCP IP network and still get performance that is equivalent to direct attached storage. Because that combination doesn't exist. So Cam, I'm not as a deep storage expert as Ray here, but I am a systems guy. So on surface, the first thing that I have to ask is that you when you have a PCI bus and the overhead of say that blah, blah, blah in a single system, that performance of the local system is going to be blazingly fast, especially as you talk about if you have many octane-based memory, et cetera. So I guess the first question is, how is it possible to not get performance degradation over something that adds latency to the overall system? Sure. Well, let me give you a short answer, and I'm going to pass it on to Moli to get into a little more detail. If you look at the latency,

Starting point is 00:08:45 the latency is the contribution from the media itself, meaning the flash or Optane, whatever the storage media is, is the biggest contributor to the latency. So, for example, if you have, in our case, we can do read-only across NVMe over TCP with, you know, storage and compute, you know, separated from each other over the network. With read-only latencies on average around 130 to 140 microseconds, about 100 microseconds of that is the media itself. So there are ways you can improve the consistency of the latency by better managing the media, which we do. And then there are ways to improve the networking latency by having a more

Starting point is 00:09:36 optimized NVMe over TCP stack that can avoid packet drops and have better flow control, which we also do. So we optimize both the networking side and the management of the SSDs to get better latency out of the SSDs themselves. And the net result is, when you put it all together, we're adding, like I said, about 30 microseconds to what you get as compared to direct attached storage. So yes, there is an adder when you look at the average latency. But from an application perspective, that 30 microseconds is negligible.

Starting point is 00:10:13 So it's indistinguishable. This disaggregated model is indistinguishable from direct attached storage. I was reading through your website today, and I was very interested to see that you actually support alcohol data services, thin provisioning, compression, quality of service. Why data striping? I mean, these sorts of things are very unusual for an NVMe over Fabric solution today, and that's with RDMA, special purpose networking, and all that other stuff. Doing that over TCP would be even more of a latency hit. So how does this all work?

Starting point is 00:10:57 So how does it all work is my department. But before we get into that, initially, so we've been involved with NVMe and NVMe over Fabric since its inception. Okay. And this is before Lightbits was founded. But if you look at who our team members are, you know, we built the first NVMe controllers. We built the first NVMe over Fabric over RDMA or Rocky systems. And when Lightbeats started, we took a good long look at how people were trying to do NVMe over Fabric over RDMA and the deployment challenges that they were running into that Cam mentioned. And there were several aspects to how people were trying to do NVMe over Fabry and not succeeding. And one of these aspects is the network aspect. So, you know, we invented

Starting point is 00:11:55 NVMe over TCP and brought it to NVMe.org and standardized it. But then another aspect is people were trying to build basically to extend the network to connect from the CPU to the drives instead of over PCI Express over a network. But the fundamental model was very, very similar. So it's still the same raw physical drive, except now we are trying to access it not over PCI Express, but over an RDMA network. And we'd like to think that's actually the wrong model for widespread storage disaggregation. What you actually want is data services and the ability to take a lot of disparate NVMe drives that are found in different servers and tie them all into one big storage pool. And once you do that, you add that abstraction layer, which we added in LightOS, our software offering.

Starting point is 00:12:56 And once you do that and accelerate it in our hardware acceleration cards, which we're calling Lightfield, then it opens the possibility of implementing hardware accelerated data services. Now, you mentioned a few of the data services that we support. We don't support every possible data service. So this is not an all-flash array with every possible data service. We're supporting very specific data services that make sense for storage disaggregation at enterprise private clouds and big cloud providers. With regards to the performance question, I just want to say a few things. First of all, performance, as you know, when it comes to storage systems, that actually has many different aspects. We can talk about IOPS, we can talk about throughput, we can talk about average latencies. We can talk about IOPS. We can talk about

Starting point is 00:13:45 throughput. We can talk about average latencies. We can talk about tail latencies, two nines, four nines, five nines, and so on. We can talk about CPU utilization. We can talk about memory bandwidth utilization. We can talk about PCI Express utilization and so on, many different aspects. But I just want to, because people, as Cam mentioned, we can actually provide over a TCP IP network performance that is equivalent. And I even go as far as saying in some cases better than what direct attached drives can do. And once you think about this, how is it possible that you have a drive and you drive it over PCI Express? Now you take the same drive fundamentally, put it over a TCPAP network and actually get better performance.

Starting point is 00:14:34 You have a cache? Okay, I will answer that. I will answer that. I won't leave you hanging. The answer is quite simple. We haven't changed the laws of physics. A longer network is longer. A more complex network requires more processing. That's a given. What we have done is change the fundamentals of DIO. When you drive a local drive, you're driving it in a certain way. And you here is the application that goes to the operating system, to the through the file system, VFS in Linux file system, block layer and so on. And then at the end of the day, something hits the drive. That something

Starting point is 00:15:19 has a significant, you know, the pattern of IOs, the way that the IOs reach the drives, they have a lot of effect on what the drives can do. NVMe drives are really good at some things and really horrible at other things. When you add that abstraction layer that I talked about as part of LightOS, that abstraction layer not only receives all of the IO requests from the different clients, it actually molds them into something else that the drives can handle much better. So the short answer to how can you do better than direct attach is the network takes and the LightOS global FTL, the layer that manages the drives and the hardware acceleration and all of that,

Starting point is 00:16:08 brings it back and then some. Does that make sense to you? Because it's a little bit hard to explain without a wide range. Yeah, but you're also adding data services to the solution. Wide striping can help reduce the busyness of an SSD. You're doing, quote unquote, wire speed inline compression.

Starting point is 00:16:29 That's going to help the data transfer off the media. So, yeah, I mean, there are things you can do. I mean, obviously, you can have a memory cache as well in the storage server. Yeah, you need to be a little bit more. Yeah, I understand how this all could work, and you could gain back the 30 microseconds and maybe even increase it beyond that, but there's still some discussion left to be had here. mix between what customers are doing in cloud service infrastructure providers because one of the things that they love to do is abstract away the underlying hardware design as much as possible to services and and i think most of them look at data services as another application that rides on top of their infrastructure.

Starting point is 00:17:25 So do you guys run into customers where you have to, and I would imagine have to champion the ideal of using some of these data services that help to mitigate some of the impact of network latency versus using, you know, build their own data service type thing. So how many, I guess, you know, real numbers, how many service providers are actually adopting your data services and in turn the performance gains versus saying, you know what, we're fine with creating our own layer of data services. I would say that, first of all, you know, what is to kind of explain a typical use case would be, for example, we have a customer right now, which is a Fortune 500 cloud service provider.

Starting point is 00:18:12 They have been using direct attached storage with SATA SSDs. They got to a point where they realized their storage is growing faster than their compute. So they wanted a way just to be able to increase their storage capacity without having to add more of these nodes that they currently had. And they wanted to move to NVMe SSDs. And previously, their server designs had, or even currently has, SATA-based SSDs. So they said, well, instead of changing all of our compute nodes and adding nodes that may have extra things

Starting point is 00:18:46 that we don't need, all we want is storage, why don't we drop in storage servers which has a pool of NVMe SSDs and be able to share that capacity across the entire data center. So in every rack, their vision is that in every rack there's going to be a storage server with 24 SSDs in it and it is servicing east-west traffic to any compute node in the data center, not just rack scale. Now in that model, first of all, they wanted to do it over TCP IP. They looked at iSCSI, they realized the performance just doesn't meet their requirements, so they started looking at us with NVMe over TCP.

Starting point is 00:19:22 Second, they wanted to use off-the-shelf hardware. They didn't want to go buy an expensive appliance, which is custom designed and is tied to the specific software. They wanted complete flexibility to separate the storage from the hardware. And we worked with Dell, our partner, and we're shipping through Dell to this customer. And finally, they want to be able to turn data services on and off. They do want to deploy erasure coding in almost every case to protect against SSD failure. But compression, in some cases they do want to use compression, in some cases they don't.

Starting point is 00:19:58 For example, if the data is encrypted, compression doesn't help you, so they just turn off compression. And that's the flexibility we give them they've done a whole bunch of benchmarks with us as well as other customers and are able to you know we've demonstrated to them that the performance the the latency particularly which we which they care about most on average is only slightly higher than direct attached. And the tail latency, the 99.99% latency is in most cases lower than direct attached. And as Mully said, that's because of how he managed the SSDs. It has nothing to do with the networking side of it, but it's about how do you avoid, for example, read and write conflicts. Because if you know, I'm sure you know that SSDs

Starting point is 00:20:47 take much longer time to write than they do read. You know, that's just a characteristic of flash. And so if you can avoid a read waiting for a write, you can get better latencies. If you can avoid all these latency spikes that you can get in individual SSDs, kind of smooth it out across all the SSDs, you can get better tail latency. So what we do for them is we give them more consistent latency than what they were getting before, on average latency that's only slightly higher than direct attached,

Starting point is 00:21:16 and we give them a way to scale their storage completely independently and reuse all their existing compute infrastructure and still move to NVMe performance. You mentioned again on your website that two things of interest. One was a persistent write buffer. And how does that exist in a standard server configuration?

Starting point is 00:21:44 And the other one was the ultra-low, ultra-low, I want to say ultra-low write latency. Maybe somebody can explain that. Yeah, those two actually go together. Yeah. So if you look at standard servers today, they are starting to come with newer, you know, persistent memory type technologies. pass on a server or you could have Optane drives or you could have Samsung Xenand as ZSSD drives these these types of or you could just have an old-fashioned NVD for for battery backed DRAM basically so given at least one of these technologies at least in in current versions of light os

Starting point is 00:22:49 versions that are in production right now given a at least one of these technologies on the server then when an io comes in we just store it in the persistent memory buffer of whatever type. And then we can acknowledge it immediately. And the software, Lido S software, makes sure that the data is durable and protected. And if a power failure happens, then everything is recovered and so on. So the ultra low latency is really because we're using persistent memory in one of the best ways that it can be used. The write buffer is what happens to the data after it gets written to that persistent memory. We start, as I mentioned, molding the data, adapting it, doing data services on it if those are configured, as Cam mentioned. So it really all works together. doing data services on it if those are configured, as Cam mentioned.

Starting point is 00:23:46 So it really all works together. So it's effectively a fast write operation from that perspective, and sometime later you destage it. And if you have to put a ratio coding around it, you put a ratio coding around it, you might even do a log structure file solution or something like that. Yeah, you can do a lot of things. You can replicate it to other machines to protect against server failures or rack failures.

Starting point is 00:24:12 You can compress it. So you support replication? The version of LightOS that is now in development and that will be coming out later in 2019 does support cluster-wide er eraser coding or replication, depends on how you want to look at it, so that it will provide protection against server failure.

Starting point is 00:24:34 Server, server, yeah, yeah. Well, that's very interesting. The other thing was the wire line speed compression. As far as I understand, you're not talking about the hardware accelerator at this point, are you? There is a... We have two offerings, either software only, where all of the features are available

Starting point is 00:24:58 in a software only deployment, or we have the option to use an acceleration card that we call Lightfield. And Lightfield works with LightOS to accelerate the software and accelerate those data services, such as compression and erasure coding and some of our what we call global FTL functions, which is the layer of software that's managing the SSDs. Now, if you have a server, a storage server that has plenty of CPU cores, like for example, with AMD EPYC, we see lots of designs that have more than enough cores, you can do everything in

Starting point is 00:25:32 software and still get wire speed performance. If you have a CPU with fewer cores, let's say instead of 32 or 28 for Intel, maybe it's 14. Then we would recommend using the card, and the card is a PCIe add-in card that essentially will improve, boost the performance when you're using those data services and still allow you to get to a wire speed. So the answer is yes, if the CPU doesn't have enough cores.

Starting point is 00:26:02 I would say about 50% of our customers don't use the acceleration card, don't need to. So, Muli, you're hitting a topic that kind of has me scratching my head a little bit from an optimization perspective. Wire speed and networking kind of are two things that don't easily translate. You know, when I'm looking at a 10 gigabit network path or a 40 gigabit network path, much of the inefficiency, even before I hit the network, is kind of in the network stack on the OS itself. So I'll never get, you know, from a file, if I'm just using a in-memory file transfer from in-memory in one system to in-memory in another system, I can almost never get that wire speed for just due to the overhead of the OS. Are you guys doing anything special in that OS since, you know, you're on the Linux kernel development team?

Starting point is 00:27:02 Are you guys doing anything special in the OS? They have a client, right? You guys do have a client software. So those are two very good questions. Let me address the second one first, because it's actually easier. No, we do not. In fact, we use standard NVMe TCP for everything on the client. And by virtue of being standard,

Starting point is 00:27:31 you know, NVMe TCP is now included in Linux. It's included in Linux kernel 5.0 and later, and it is making its way into all of the different Linux distributions. The LightBits LightOS solution is, I believe, unique in the market in that it's requiring absolutely nothing except standard NVMe TCP drivers on the clients, which every client will have. And I mentioned Linux. Don't worry, people who like other operating systems and hypervisors, they're all going to have them. It's just a question of when.

Starting point is 00:28:14 With regards to the first question of wire speed, so I have to admit to using wire speed a bit loosely from a software perspective. As far as I'm concerned, if we connect a server to a 100 gigabit Ethernet network, not 10 gigabit, not 40 gigabit, 100 gigabit Ethernet network, not 10 gigabit, not 40 gigabit, a 100 gigabit Ethernet network, and we connect clients to it, and those clients, you know, are saturating the wire. They're sending as much as can be sent. And the storage server running LightOS and with our Lightfield adapter

Starting point is 00:28:41 gets all of those millions of IOPS and does not cause any slowdown to the clients. Remember, they're saturating the 100 gigabit ethernet network. It does not cause any slowdown, any increasing latency and compresses everything on the fly, which it does, then I believe it's fair to say that we're doing compression at wire speed. But this is not the raw wire speed because, of course, there is software and hardware involved on both sides. It's just that if we take the same clients, same storage server, same NVMe drives, and we do the same read and write operations at the same rates with and without compression, you will see no difference in latency

Starting point is 00:29:33 or in any other important metric. Okay? So this is really wire speed compression. Okay, that's an important clarification. The wire speed is whatever we define in software as wire speed. So if my true throughput on a hundred gigabit connection is, I'm going to throw a crazy number out there, 85 gigabits per second. You guys are not going to add any additional latency to that transaction. Exactly. That's what we mean by wire speed.

Starting point is 00:30:11 And since you asked about are we doing anything special in the operating system? So yes and no. LightOS is based on Linux in the sense that, you know, it looks like Linux. It behaves like Linux, from a deployment operational model, everything that you expect to see in a Linux system, you will find there. In fact, LightOS is installed as just a bunch of software packages that are installed on your favorite Linux distribution. From a data path perspective, it's a completely different data path.

Starting point is 00:30:48 Linux in, and you know, this actually ties back into a lot of the virtualization work that I've been doing over the last 15 to 20 years. Linux actually has some very nice facilities for basically getting out of the way. I mean, it's a great operating system, probably the greatest operating system the world has ever seen, but it's a general purpose operating system. It runs on everything, as you know, Linux. And once you try to do something,

Starting point is 00:31:16 as we're doing at LightBits, that is more special purpose, you know, building a system for storage serving, then you can do things that in Linux you probably wouldn't do because they would hurt some other use case. But if you don't care about that use case, all you are doing is storage serving, you absolutely can do them. And that's pretty much what LightOS does. It takes all the data path out of Linux with Linux's cooperation and does it in a much more streamlined, efficient, hardware-accelerated way.

Starting point is 00:31:47 And that applies to everything from the network to the drives and even across the boundaries of the server, all the way to the clients in some specific cases around, for example, network flow control and so on. Yeah, that's a very important clarification. When I look at the NFV space, that's the exact same thing that's happening. They're stripping out the general function, general capabilities that allow Linux to run across millions of devices to this specific use case. And that helps me kind of connect the dots on how you guys are able to guarantee or improve latency,

Starting point is 00:32:31 even in scenarios where you're not 100 percent sure what's happening in a network, because as these service providers are building huge networks with big collision domains and they're putting all these storage units inside of these dedicated networks, you have to start breaking it down. And when you start introducing routing, et cetera, into it, then these unknowns really do start to impact the end-to-end performance. Yeah. And we've based this on TCPAP, which is an amazing collection of protocols. I mean, they power the internet. Obviously, they work at scale, but are also incredibly complex.

Starting point is 00:33:13 And because we were very clear, adamant on the fact that we're not going to require anything on the client side, we had the interesting challenge that our team of PhDs rose to the occasion and handled of how do you provide consistent latency, reduced jitter while using standard TCPAP. And we have a number of, you know, patents and so on in this area. But that was a really nice problem to solve for our customers, basically. Because as you mentioned, these networks, they're huge. They have all sorts of different characteristics. And you have to work within the confines of existing standards for how you do transmission control and so on.

Starting point is 00:34:06 And we've done that in LightOS. Let me turn a little bit to the business side of things. You mentioned that Dell was a partner. Are you going to market through your partner community? And how are these things priced? I mean, you've got the software solution as well as the hardware card and that sort of thing. Sure.

Starting point is 00:34:28 Is it priced on a per terabyte basis or per server? Okay, I can answer that. So first of all, we are vendor agnostic. So if a customer comes to us and says, I want you to go work with Dell, HB, Supermicro, Quanta, or use my own server that I developed myself, then we can do that. This is a bring your own hardware kind of model. Now, having said that, if a customer comes to us and says, I want you to ship through a server OEM, we already have that relationship through Dell,

Starting point is 00:35:03 or we can sell through Dell to customers. And what that means is that Dell takes a today it's a Dell 740 XD and they take our software and our acceleration card, they integrate it all and then they ship it to the customer. So that model already exists. And we're open to creating that model with other server OEMs, depending on what the customer selects. Now, if the customer is buying software from us directly, then we have a node-based software license model. So depending on the number of nodes, the customer pays an annual subscription that includes software license and support and maintenance. So when you say nodes, are you talking about client nodes or

Starting point is 00:35:43 storage server nodes? Storage nodes. So one of the important, and I want to make sure to sort of emphasize what Mully was saying, we don't touch the clients. We don't have any proprietary software that's running on the clients. Our solution is entirely on the target side. And so that's one of the beauties of Lightbit's approach. You know, you look at some of our competitors in this space, and they go to customers and say,

Starting point is 00:36:09 you got to run my software on every single node. Not in the case of Lightbit's. So our software is entirely on the target side in the storage server. And we tell our customers, you can use vanilla TCP IP to connect your existing clients to our node. All you need on the client side is an NVMe over TCP drive. Yeah. So the question I was going to ask the question about, is this a cluster storage system?

Starting point is 00:36:34 I mean, can you have, you know, a dozen nodes with one or two SSDs each in them and be able to support, you know, a light OS storage cluster? Or is it... I'm just trying to figure out. So your websites seem to show client sort of operations, but I guess what they're really showing was multiple storage servers. So apologies for the website not being clear.

Starting point is 00:36:58 And we're going to hear a little bit into the roadmap and so on. So every forward-looking statement is subject to change, et cetera, et cetera. Having said that, LightOS 1.x, which is now in production in multiple data centers delivering web-scale traffic, each server stands on its own with the assumption that these are cloud-native applications using the disaggregated storage and the applications are taking care of the data replication and so on. So a customer may have multiple LightOS servers that are not aware of each LightOS 1.0, 1.1

Starting point is 00:37:41 servers that are not aware of each other because the application is taking care to write copies of the same data to each one of them. And they may even be in different data centers, by the way. LightOS 2.0, which is now in development and will be out later in 2019, is a clustered solution that builds on all the goodies in Lido S 1.X, everything that we've talked about, and adds the ability to cluster these servers together. And again, this is done using standard NVMe TCP, no client-side drivers needed.

Starting point is 00:38:18 Basically, your standard NVMe TCP client-side driver will know how to work with the right server in the cluster and how to move to a different server if that server that you originally worked with failed, how to move to a different replica. And, of course, slideOS behind the scenes will take care of all of the data replication and so on. So that's where the confusion came from. Okay, so back to the business side. So you mentioned that it was a price per node selection.

Starting point is 00:38:50 So I could have a node with 24 NVMe SSDs in it and it would cost me the same amount as a node with one SSD. I mean, not that you even support one SSD maybe, but... The node pricing will depend on the size of the node. So if you have a node which has, not that, not how many SSDs have been populated. So the question is, what is the maximum number of SSDs that your node can support? So if you have a storage server that let's say only, and we've had this, we've had a customer that has storage servers that can only support four SSDs it's a micro storage server that gets a certain price point and then

Starting point is 00:39:29 if it's a storage server like OCP does 32 SSDs or the Dell server I mentioned 740 XD has 24 SSDs that is a higher price point now if you buy a 24 SSD storage server and you only populate four or eight and you you know you get the same price that you would if you populate four or eight and you get the same price that you would if you populated all 24 because you have the ability to scale up without having to pay extra in the future. One of the things that we learned is that capacity-based is not interesting for our customers because they want to be able to have some certainty in the pricing. And if they change their SSDs to higher capacity SSDs or they decide

Starting point is 00:40:05 to scale up a particular node, they don't want to have to pay more for that at that point. So that's why it's a node-based model. And yes, the pricing changes depending on the number of SSDs that a node can support. So two related questions on the business side. One is a little technical and the other one is strictly on the support side. And the technical question leads into the support. From a client services perspective, when we think of clients and we throw the name client out there, typically we're talking about in units, big Oracle rack installations, etc. You guys

Starting point is 00:40:43 are tailoring to service providers. So client means something different in this world. It'd be great if you can kind of define what type of clients are consuming these targets. And then two, from a support perspective, when, you know, you think of a global service provider and the footprint needed and the, and many times they're application servers, maybe they're running NoSQL database, like Cassandra, MongoDB, or, you know, relational database, or they could be running analytic workloads, maybe for AI, maybe something else. So any server or client, when I say client, it's a machine that is issuing IO and accessing the storage server. I'm sorry, from a customer perspective, they're providing this as like a pass service or are these

Starting point is 00:42:02 infrastructure as a service solutions that they'll end up providing to their end customers? It could be software as a service providers. It could be infrastructure as a service providers. If it's infrastructure as a service, then many times the entire rack is essentially being rented either in a multi-tenant fashion or not, to some end customer where the end customer comes in and installs its applications on the client nodes and then uses the storage server for storage. But the infrastructure is sitting with the infrastructure as a service provider, and the customer determines what that application is, running on that same infrastructure with the IaaS vendor. To add to Cam's point, first of all, when we talk about clients and storage servers, the more precise storage terminology would be initiator and target.

Starting point is 00:43:03 So when we talk about clients, we're talking about initiators. Second thing is these clients, as Cam mentioned, they may be running applications, MongoDB, Cassandra, any type of basically application that needs high-performance storage, or they might be running virtual machines. You know, they may be servers with hypervisors running virtual machines, or they may be, for example, members of a Kubernetes cluster running containers with LightOS providing persistent storage to these Kubernetes containers, or they may even be running serverless functions

Starting point is 00:43:47 with LightOS, again, providing persistent storage to that entire system. So really, when we talk about clients, it's initiators, standard servers running whatever the client wants to run that needs high-performance storage. So yeah, back to Keith's question about the service levels for, you know, whether the customer buys through a partner versus direct from you guys. Right. That was your question, right, Keith? Yes. The eventual question.

Starting point is 00:44:16 Yeah. So when a customer, like for example, the customers that are buying through Dell today, for any support required for our software stack, they come directly to us. And the hardware is supported by Dell. So it would be no different than a customer buying a Dell server and then running a particular software stack that they purchase on top of it. For the software, they go to the software vendor to get support. And if there's a hardware issue, they go to Dell. We're following the same model.

Starting point is 00:44:45 Even if we're shipping through Dell, that's the model that direct support is provided for the software itself. The reason we wanna ship through Dell is there's a couple of reasons for that. First of all, we wanna make it very easy for customers that wanna use the acceleration card. So when we ship through Dell,

Starting point is 00:45:04 Dell performs the integration. They open up the box, they So when we ship through Dell, Dell performs the integration. They open up the box, they put in the card, and they ship to the customers. The customer doesn't have to do anything as far as integration of the card. And also, frankly, it's always nice to sell through large partners like Dell, where we can address the broader market. And they bring us leads, and we have collaboration on the go-to-market side. Hell of a reach. Absolutely. Absolutely.

Starting point is 00:45:29 But the support for LiteOS and the LiteBit stack is coming directly from LiteBits to the end customer. Okay. Well, listen, this has been great. Keith, any last questions from Moley and Calm? No, there's a lot to digest. I really appreciate the clarifications from the website. Yeah, yeah. Com and Moly, anything you'd like to say to our listening audience?

Starting point is 00:45:52 I want to thank you both, Ray and Keith, for the opportunity. You know, just to summarize, it's sort of a last point here. You hear a lot about NVMe over Fab fabrics, and there's many companies that have solutions in the market. We'd love to see a strong ecosystem around NVMe over TCP because we think it makes a lot of sense as a way to disaggregate and eventually really become the next generation of sort of SANS replacing iSCSI. Now, what we bring to the table is a software-defined approach, which is running on standard hardware. It includes not just NVMe over TCP, which is something we pioneered, but also a better way to manage the SSDs to give you better latencies, better endurance

Starting point is 00:46:41 of the SSDs, and also data services, and do that without having to touch the network or the clients. So that in, you know, in 15 seconds, that's kind of the way we differentiate. And if anybody's interested, they can go to our website, lightbitslabs.com to learn more. All right. Well, this has been great. Thank you very much, Kam and Moli, for being on our show today. Thank you very much. Next time, we will talk to another system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it, and please review us on iTunes and Google Play as this will help get the word out. That's it for now. Bye, Keith. Bye, Ray.

Starting point is 00:47:19 Bye, Common Moley. Bye. Bye-bye. Thank you. Until next time. Thank you. Thanks. Bye-bye.

Grey Beards on Systems - 83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.