Podcast Archive - StorageReview.com - Podcast #115: Is Computational Storage Just a Fancy Name for Storage?

Starting point is 00:00:00 Hi everyone, Brian Beeler with Storage Review. Thanks for joining into the podcast. I've got JB Baker with us today from Scale Flux and we're going to talk a little bit more about this category of flash SSDs called computational storage. And JB is already gagging in the back of his mouth a little bit and I only bring it up because that's how a lot of the world knows these drives, but we're going to get into that, talk about the category a little bit and much, much more. JB, thanks for joining us today. Hey, thanks for having me.

Starting point is 00:00:33 I appreciate the opportunity here, Brian. All right. So let's just start with why do you hate the term computational storage? Before you answer that question, what was is computational storage? And then let's get into your drive a little more. Yeah. So, you know, the ScaleFlux was founded back in 2014. And the mission at the time was going after computational storage. And the whole concept of computational storage is around improving the data center, the data infrastructure efficiency

Starting point is 00:01:07 by pushing some of the compute tasks down closer to the storage, closer to the data, and doing work on data closer to where it lives. So that frees up those valuable CPU cycles, your x86-type processors are now the ARM processors, freeing up those cycles to run your applications as opposed to doing heavy management tasks that can be very burdensome and add latency to your application, cause more data movement, and just generally slow things down and reduce your efficiency. So broadly, computational storage was any, and tell me if this is too broad, but any

Starting point is 00:01:57 SSD that had some other hardware-enabled engine, maybe even hardware is too specific, in the drive, though, to do some other task to ideally offload cycles from your CPU. Yeah, and actually the way that the industry came about talking about computational storage was not just down at the drive level, but we also, through the Storage Networking Industry Association. ScaleFlux was one of the kind of initial partners or founders in the computational storage working group in SNEA. And as the group sat down four-ish years ago, and in that formation, it was, well, there's computational storage drives.

Starting point is 00:02:40 There are computational storage processors. There are computational storage processors. There are computational storage arrays. All of these different ways in which you can move these functions away from the CPU and either put them at an array level or put them in a processor that might sit between the CPU and the drives or push things, functions all the way down to the drives and that's that huge scope led to some of the some of the confusion around how people understand computational storage and the variety of ways in which all of us vendors who were working in it talked about it and so you know that you asked early, why do I hate the term competition? I don't hate the term, but it's, it is a term that people get confused about and can feel like, oh, that's a, it sounds complex. It sounds like something future, future out there. Great benefits if I can slash how much data I have to move across my network by 90% or if I can boost my application performance 2x, 4x, or I can get more data into my drives, but complex.

Starting point is 00:03:55 And so that's where I have the little bit of shock around just the term itself. Well, the more you describe it, actually, the worse I think it gets, because if you're going to include the entire ecosystem, which includes probably DPUs to some extent, depending on the flavor, GPU-accelerated RAID storage, like the GradeCard, the PlyOps solution, and many others that are solving these problems. I mean, we've seen computational storage companies start and fail already. I mean, this has been some period of time. But yeah, I think for the point of this conversation, we'll focus on the SSD itself and what you guys are doing there and what that category looks like.

Starting point is 00:04:47 But for me as an industry analyst person, it's really hard not to crack. And I'm very close to the industry in terms of what you guys are doing versus what competitors are doing versus what some of the silicon guys are trying to do to get their accelerators added into these other spots. And it's difficult because it creates a lot of confusion, I think. And I mean, you talked about Senea and the working groups are really good at a lot of things. But sometimes, marketing communications is not always one of them, unfortunately. And there's so many constituents in there. I'm sure even in the working group you're part of, there's probably at least a couple dozen, well, maybe not a couple dozen, maybe a dozen

Starting point is 00:05:31 players. And each one of you has different core tenants to your products, right? Yeah. I've actually kind of lost track of how many different companies are participants in the computational storage twig. I think we were, it's definitely over 60. Oh, okay, maybe, okay, so a couple dozen. And some of them have, you know, some of them don't have necessarily computational storage solutions, but they are travelers and care about what goes on there.

Starting point is 00:06:02 Great, so you've got 48 people involved that don't have a product, uh, but have a, uh, a selfish motivation one way or the other to, to influence the outcome. All right. I don't want to get into the working groups cause we'll spend an hour on that and still just be pissed off at the end. Um, but your drives. Yeah. Yeah. I mean, I guess so where, where we have over, you know, where we launched our third generation of product in 2022, we call it the CSD 3000 and the NSD 3000. The first two generations were based on FPGAs. We were using Xilinx FPGAs and that allowed us to do that rapid prototyping and put out some computational storage functions and start

Starting point is 00:06:46 selling some drives. And as we went through the first generation of product, we learned that, hey, it's got to be simple to use. You can't force somebody to do application integration in order to take advantage of the function that you put in the drive. We had the primary function in that one was a compression offload engine. But it was kind of complex to use where you had to push your data down to the drive. It would do the compression offload.

Starting point is 00:07:18 And then you send the data back up to the host. And then if you wanted to write it to the drive, well, that was a separate function. So it was kind of a, we combined two separate devices into a single form factor, but complex to use. In the second generation, we retained the compression function and made it transparent. So now there's no replacing of libraries in the host, there's no application integration, you plug the drive in and it boosts and you get that transparent compression. But there was still a little bit of complexity in the Gen 2 in that we had a Scaleflux driver instead of just the inbox NVMe driver.

Starting point is 00:08:01 And that's one of the big things going to the third gen is now we continue to make it easier to use and as a direct substitute for another NVMe drive where we transition to ScaleFlux Custom ASIC. Everything is on chip on the drive and we're using the NVMe drivers. So now there's not even any software to install. You plug in our drive, uses the inbox NVMe driver, the compression just happens automatically. And it's all transparent and easy to use. So that was our, as we had worked with customers is like simplicity, simplicity, simplicity, because those, those overburdened people in IT ops and hardware infrastructure, they just don't have the time to deal with that complexity and things.

Starting point is 00:08:56 And then on the application side, they don't want to do application customizations. It's never worked. Some unique drive. It's never worked. Some unique drive. Yeah. It's never worked. I mean, we, that goes back a decade back when fusion IO was disrupting the enterprise flash space and they had these IO drives that were fantastic and they were really great. And they always said, if we could just get the database guys to tune their code for flash instead of hard drives, think about all the, this is a story that's as old as time, right? There'll be some certain niche applications.

Starting point is 00:09:30 There'll be some in-memory databases that'll tune for something new like PMEM whenever that rolls around or CXL or whatever the new high-speed hotness is, right? But for what you guys want to do for enterprise mass adoption, it's got to be easy. It's got to just work. And it's got to be compatible with enterprise stuff like VMware, which I know you guys have as well. So the drive for anyone watching on YouTube, it's on my desk here. It's an SSD, right? And outside looking, besides being maybe a little heavier than other drives that we have in the lab, I mean, it just looks like an NVMe SSD.

Starting point is 00:10:11 And I guess that's to your point, how you want to be characterized. And to a certain extent, why the computational storage halo still technically applies, but maybe not, you know, you don't want to be seen as that first. You want to be seen as an SSD first, which makes sense. That can also handle some of the compression and data reduction in the drive while saving the cycles on the CPU side. And that's always been a challenge, whether it's a storage array or software defined anything you might even remember permabit was running around with a fiber channel appliance that would do compression because it's expensive not just cost wise but cpu cycles i don't even still today i mean we've seen solutions that'll have a 20 30 percent hit when you enable that. So if you don't have hardware acceleration

Starting point is 00:11:05 native to the array or the server or whatever, in whatever form, you're going to eat a substantial performance hit. And when you spend the big money on those CPUs with the big cores and clocks and everything, that's just an unfortunate way to allocate your resources. Yeah, I mean, there's a couple aspects of that. The compression algorithm, it's a fixed function algorithm,

Starting point is 00:11:30 but it's something that, and each one you use has a different burden on the CPU, but I'll just kind of bookend it, with LZ4, a lightweight compressor. So you're sacrificing how much you compress the data in order to get more throughput out of the software through the CPUs. But even that, we look at and we've seen that it can take several Intel high-end Pentium cores to achieve the throughput to fill one NVMe SSD. So now let's say, you know, if you have a 48 core system, and you got one drive in there, okay, maybe you can dedicate the four cores to compressing data at line rate to be able to fill that drive. But now you put a second drive in

Starting point is 00:12:22 and a third drive and a fourth drive. You know, pretty quickly you run out of cores. It doesn't scale very well. And if you wanted to actually maximize how much you compress the data, then you'd be using like a GZIP. And well, that takes hundreds of cores versus doing that compression. And now you're also talking tens of or hundreds of watts at that point to do that compression versus putting it into a hardware engine that can deliver six gigabytes per second of compression throughput per drive at a cost of, you know, around a watt. So, you know, do you want to spend hundreds of watts and tens of cores and hundreds of thousands of dollars on cores to do compression? Probably not. No, you'd rather do it somewhere

Starting point is 00:13:12 else if you can, right? I mean, that's your big value prop. And so when you think about compression, I guess the way it translates out the other side is this is a, I don't know, I think this one's a 768, an 8-terabyte class drive. On average, if you look at what the big array guys quote for enterprise workloads like a Pure or Dell, I think they're like a 4-to-1 guarantee normally on their data reduction rates. So maybe that's about what you see in the enterprise. But I'm curious, what do you see? And when you're going to market, do you talk to your customers like, hey, this is an 8-terabyte drive,

Starting point is 00:13:48 but based on your workload or workloads similar to yours, we think it's more like a 26-terabyte drive or a 38-terabyte drive. Is that how you communicate your value? Yeah, there's certainly two critical aspects of the value. One is that extended capacity um and we do support going up to us very quick very soon we'll be supporting up to 24 terabytes of logical capacity so you can take that eight terabyte uh up to three exits capacity uh will support up to 4x capacity on the four terabyte so you know being able to to store 16 terabytes of data on there and you know i you may have noticed me kind of smirk when i hear the the compression guarantees or the data reduction guarantees

Starting point is 00:14:35 all of those guarantees as i've read the asterisks and the footnotes on them they all say depends on your workload assumes that you're not pre-encrypting the data, that you're not pre-compressing the data. So there's a lot of caveats around those. There are, yeah. And so, you know, the compression that you achieve is definitely going to be impacted by the data that you run and what workload you're running. Now, as we've gone out and done testing with customers and they report back what compression ratios they're achieving, we are typically seeing that the compression ratio is greater than two to one for any of the database

Starting point is 00:15:16 applications. We do a lot of work with Aerospike, MySQL, Randy B, Postgres. We've done work with customers on Kafka and other applications, but typically they're seeing, the reports that we typically hear are 2.5 to one, up to five to one, with some cases like Redis on Flash, it's been nine or 10 to one. But if you send me encrypted data, well, I'm not gonna compress. Or if you send me data data, well, I'm not going to compress. Or if you send me data that's been pre-compressed with LZ4, I'm only going to see about an extra 20% compression. But that has a massive improvement on the latency consistency and the mixed workload performance that you would see versus using an ordinary drive.

Starting point is 00:16:04 Yes, I want to go there next. So I think you start out with the, the big pitch around the data compression or data reduction and the, the capacity benefit, cause that's the easiest one, I think to kind of get people to wrap their heads around. But yeah, you just started to go there. There's a performance benefit too. So talk about that a little bit in terms of Gen 4 headroom and what you guys can do there. Yeah, so you know, and you guys have done testing with the drives and you see that when the data is compressible, which is, as you said, that's kind of the typical enterprise case that we run into, it allows the drive to have significantly better write

Starting point is 00:16:46 performance up front. And that, there's a virtuous cycle that happens in the drive of those hot writes require less data to actually be written to the NAND, and then when we do that, now there's more free space on the NAND. So as the drive starts to fill up or as an ordinary drive would start to fill up, our drive still stays relatively empty. And you've got higher effective over-provisioning on the back end that reduces the amount of cold writes you have to do. And that helps with achieving your reads and consistent read latency when you've got a mixed read write workload. I know I, I tend to auger down into the details.

Starting point is 00:17:31 No, that's good. And you know, we, we saw that in the single drive testing, so it's, it's clearly there. How does that scale then and then maybe talk a little bit about how your customers have scaled. So I'm sure that in many of your POCs, it's like, here, take a couple of drives, do some stuff with them so you can kind of see this and feel this. And that's probably the starting point for a relationship with the customer. But where do you see them going? Are they going into software defined storage solutions? Are they going just as addressable drives

Starting point is 00:18:10 in VMware or something else? Like what kind of use cases are you seeing? Yeah, it really is a broad array. I'd say that the leading use case is to put us in a compute node. So if you've got a database compute node, that's been the primary use case is to put us in a compute node. Okay. So if you've got a database compute node, that's been the primary use case and they'll have anywhere from one to four drives as a ordinary level in there.

Starting point is 00:18:36 We do have customers that are putting it into the shared storage that is being addressed by multiple compute nodes. There, now you've got 816 or more drives in that array. So it really does kind of go everywhere. And there's nothing stopping you, I suppose, technically from being qualified and included in a typical two-node storage SAN, right? There's nothing fancy there, or is that more challenging?

Starting point is 00:19:06 Well, the only gap for a storage array at this point is the dual port. So I don't offer dual port today. That is in the works for later second half of this year. Although even the array guys though, honestly, a lot of them are disassociating their software from the array so that they can go cloud defined server defined you know and go that route because the economics there you know if you're a dell or hpe or somebody like that or cisco that has a vast server platform to continue to engineer hardware side is, I'm not sure that's gonna continue a whole lot longer. Yeah, I mean, we're definitely more focused there

Starting point is 00:19:51 in the server side, as opposed to the storage, the high end storage array, that kind of falls into that, going back to the SNIA definitions in a way that those are already computational storage arrays. They have dedicated hardware and software solutions to do not just compression but data dedupe and compaction and snapshotting and all these other functionality and there would be some refactoring of software to leverage in-drive compression for some of those arrays, for the traditional ones. New players, where they're trying to use DPUs as the primary processor for an array, well, yeah, now the compression in-drive becomes pretty important.

Starting point is 00:20:43 Because they don't have these other dedicated hardware pieces to do that, those functions. Well, right. The big one, Fungible, who we know pretty well and they recently were acquired, they could do some pretty creative things. And some of the other guys that are DPU based, Vast is doing that. There are others that are looking at some of these technologies that can layer, I suppose, and make for an interesting conversation there. Yeah, and it always comes back to well, how do you want to allocate

Starting point is 00:21:23 your compute cycles and where do you want to put them to make your system the most efficient, the most performant overall? So how is going to the ASIC? I mean, I understand that's the progression, right? As everyone starts with an FPGA and the holy grail is to hopefully have enough money to make it to an ASIC because at that point, you've got exactly what you want but still tunable, right, if you need to a certain extent. So what has that done for you guys in terms of the growth of the company or applications you can support or what are the other impacts? Sure. I mean, I think the biggest thing with going to an ASIC is that it allows the product to have a much richer feature set.

Starting point is 00:22:08 With an FPGA, we were still in a U.2 form factor with the FPGA. So that restricted us to a relatively smaller FPGA to stay within the power envelope and the space envelope of a U.2. And so that, in the prior generations, that prevented us from doing things like adding encryption into the drive. And then as you try to ramp up to PCIe Gen 4, Gen 5 speeds, there's just not enough gates in an FPGA that will fit within the power envelope of U.2, or as we move into the E1S and enabling that, you just can't get an FPGA that's small enough physically with enough gates to do something interesting. Well, that's interesting. I was going to let you off the hook on form factors, but since you brought it up, Gen 5 obviously with Intel's official launch this week, although

Starting point is 00:23:11 Sapphire has been out there for many weeks, and Genoa not so long ago, that opens up some interesting things for Gen 5, but now we're seeing all the servers, well, we're seeing two things. The hyperscale server is going E1S primarily, although they may still use some E3 as well. All the big enterprise guys are going E3S though on the server side. And most of those designs are 7 mil. I mean, that's less than half of the thickness of the current U.2 or U.3 drive. Can you fit into a 7 mil form factor? And how does that, I mean, is that a harder challenge for an ASIC versus a standard SSD controller? I'm curious about that. Well, so, you know, just to be clear, our ASIC is similar size size similar overall to a standard SSD

Starting point is 00:24:06 controller okay and as we go we move into our gen 5 then we'll do a process shrink and and we will shrink our you know the package as well our current chip the the 3000 series chip we actually can fit into the e1s form factor we have not delivered that to the market yet. We focused on, you know, as a startup, I got to try to keep my number of SKUs limited until I've got significant pull. So we focused on the highest volume SKU or form factor right now,

Starting point is 00:24:41 the U.2. Well, yeah, I mean, slot-wise, even OCP will concede that despite all the EDS FFF excitement that U.2, at least through this year, and I would guess probably kind of deep into next year, is still the predominant form factor. But E1S really opens up a lot of hyperscaler conversations, and we don't really need to go there today, but I guess the cloud guys can also benefit from this, maybe even more so with full control over their own stack, should they choose to embrace a product like this, right? Right, right. Yeah, there is a lot of potential there in portions of their environment.

Starting point is 00:25:22 Yeah. in portions of their environment. Portions of their environment, data comes encrypted and compressed before they see it, and you can't do a lot there. So the big question, I think, then, is if we're thinking about this as an SSD, as a 7.68 terabyte or a 4 terabyte class or whatever it is, I see the benefit to the capacity argument, so I'm getting more, theoretically, capacity. Good performance profile. You don't have to talk hard numbers,

Starting point is 00:25:54 but generally speaking, if I were to compare this cost-wise to other 768s in the industry, I assume you're a little more. I have no idea, So I'm just a little curious as to what the pricing looks like. Yeah. I mean, it's, I don't want to talk any absolute pricing. No, no, I understood. If you buy more, it'll cost less. Right. But yeah, I mean, we're like, we're, we're aiming the,

Starting point is 00:26:16 the performance of the drive, even when you're not using the extended capacity, you know, we're seeing that we're at the upper end. We're aiming at that performance segment, not the lower end, I don't know, what do you want to call it, entry or low end data center segment. We've aimed at this as a premium performance drive. So we are aiming to be price competitive with the other drives up in that swim lane, as we call it. And then when you want to use the extended capacity, then you're going to buy the CSD. If you're not going

Starting point is 00:26:53 to use extended capacity, we have the NSD SKU, which doesn't offer the capacity expansion. But the CSD now, yeah, I'm going to charge a 25, 30% premium over those other drives because the reason you're going to buy it is because you're going to take it and use it as twice its capacity. So on a raw capacity, sure, it's a little more price expensive, but it actually is costing you 30, 40% less than buying, say, a 16 terabyte drive versus our 8 terabyte drive. Or arguably more, because you're just talking about the hard cost of the drive, but now you've got fewer slots, conceivably, less power in aggregate, and all of these things, fewer RACU.

Starting point is 00:27:37 I mean, this is a big push in the U.S. for sure, but in Europe, I mean, where energy expenses are going through the roof, if you can take your footprint, even a small data center, down a couple RACU and down many watts, it's an enormous savings. Yeah. And then, I mean, we're doing a lot of testing and work in different system level environments to truly highlight those benefits in terms of not just raw capacity efficiency, but that system level efficiency, reducing the overall power that it takes you

Starting point is 00:28:16 to achieve a certain workload and deliver a certain amount of work back to your application owners. Yeah. So let's, let's think about this a little bit more too, from getting started with your drive. If you, if you sample one out to a prospect or a customer, they want to play around with it.

Starting point is 00:28:35 How do you help them visualize or come up with a test plan to see like, I don't know, just drag a folder of files over that was there was a hundred gig or two hundred terabytes or whatever it is and now you know I can look at the drive utilization how do you how do you get them to see that to understand the the the benefits there yeah I mean it varies a little bit but typically users will start off with something simple like FIO or as you guys do with VDBench. And there are settings and we have scripts that we can provide out for FIO for people to see different levels of compression and what that does to the performance profile. And then it's just a query into the smart attributes in the drive that says, what was the host terabytes written?

Starting point is 00:29:29 And what's the NAND terabytes written to show you how much capacity was saved? Yeah, you probably need a little widget for VROPs or something for the VMware people to have just that little pie chart that says, you know, you've, you've done this much and this much has actually been, been used on the drive. That that's a good idea. We'll do that. What we have done, by the way, what we have done is done integrations, plugins for Nagios and Prometheus, which are two pretty common server management tools.

Starting point is 00:30:04 So that what they can see on the pane there is the remaining physical space instead of just the remaining logical space. Because every other drive out there, logical equals physical. So we've done that integration to enable people to make it easier for them to manage that extra capacity. Well, that's pretty cool.

Starting point is 00:30:28 I mean, we've been working with you guys for a number of years through the generations of product, and this current one is pretty fun. So what is the process if someone listening to us today or watching the pod, if they want to check out the company ScaleFlux, learn more about the drives, what do they do? You can just, we have www.scaleflux.com. And right from there, you can hit the contact us button and request a POC.

Starting point is 00:30:59 Or if you want to just go directly, just type in, send an email to info at scaleflux.com. And we're monitoring all of that actively and we will get a response back to you quickly. Yeah, and I would encourage anyone in the categories that we've discussed today that can take advantage of something like this to check it out because these guys really are

Starting point is 00:31:22 one of the most credible players in computational storage even though we've discounted that term a little bit. So many of the other products out there either the companies have gone under or they're so niche that it's really hard to understand how to use them in a standard environment. For this to be able to just drop into a virtualized environment, VMware off and running, great. If it's a software-defined or a server-based storage situation, great there too.

Starting point is 00:31:52 So it's certainly worth checking out and understanding how this kind of technology can impact those workloads. Very well said. I appreciate it. Good. Thanks for doing this, JB. Look forward to seeing you again soon, buddy. All right, thanks, Brian.

Podcast Archive - StorageReview.com - Podcast #115: Is Computational Storage Just a Fancy Name for Storage?

Brian sat down with ScaleFlux’s JB Baker for an interesting discussion about computational storage… The post Podcast #115: Is Computational Storage Just a Fancy Name for Storage? appeared ...first on StorageReview.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Podcast Archive - StorageReview.com - Podcast #115: Is Computational Storage Just a Fancy Name for Storage?

Brian sat down with ScaleFlux&#8217;s JB Baker for an interesting discussion about computational storage&#8230; The post Podcast #115: Is Computational Storage Just a Fancy Name for Storage? appeared ...first on StorageReview.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Brian sat down with ScaleFlux’s JB Baker for an interesting discussion about computational storage… The post Podcast #115: Is Computational Storage Just a Fancy Name for Storage? appeared ...first on StorageReview.com.