Grey Beards on Systems - 72: GreyBeards talk Computational Storage with Scott Shadley, VP Marketing NGD Systems

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Graybeards on Storage podcast, a show where you get Graybeards storage bloggers to talk with storage assistive vendors to discuss upcoming products, technologies and trends affecting the data center today. This GrayBirds on Storage episode was recorded on September 18, 2018. We have with us here today Scott Shadley, VP Marketing of NGD Systems. So Scott, why don't you tell us a little bit about yourself, NGD Systems, and what is this thing called computational storage, which we heard about so much at Flash Memory Summit?

Starting point is 00:00:43 My pleasure. Thanks, Ray, for the introduction, and Howard, a pleasure to be with you on the line as well. I have to admit, I'm a little disappointed I won't ever be able to grow the beard quite the same way. So being on great storage is a great opportunity for me. Just give it time, Scott. It's just a question of age, I think. Like the V-beard guys, we accept those that are knit and glued on as well. Okay, there we go. I can win by that method. So my name is Scott Shadley.

Starting point is 00:01:07 I'm a VP of Marketing here at NGD Systems. NGD Systems is a startup that started in 2013. We came from a long history of working together at companies like STC and WD and some other places like myself from Micron. The idea behind what NGD is up to today is to build what's called the computational storage SSD. And that computational storage product means that we're doing something a little different with an NVMe traditional SSD product. And we're adding compute resources to that

Starting point is 00:01:37 via some very unique IP that was invented and designed by the team here at NGD Systems. So our base platform is an off-the-shelf NVMe SSD. We provide ultra-capacity, low power, and the ability to offload compute into each of the individual storage devices at scale and in parallel. Oh, my God. So, you know, we used to do this back in the 90s with sort and some backup stuff where we'd put compute kinds of capabilities out in this big, massive

Starting point is 00:02:06 storage iron system where we'd actually speed up backups and our sorts. And we actually had one or two vendors use it. But it was always a challenge to get the, I'll say it, the application guys to use the sorts of services that those were. Are you guys getting some application buy-in and these sorts of things? We are. And to your point, it does create some unique challenges. So that when you go to look at doing something along the lines of computational storage, the first thing you've got to do is follow the golden rule in storage of

Starting point is 00:02:33 keep it simple, stupid, or the KISS principle. So we designed it in such a way that it allows the user to have ultimate flexibility in what they port to it. And it's the applications are not locked to a specific implementation or a specific product that you can use on it. We do get better results based on which type of product you put in it. So for example, if you're going to do a write intensive application, I can't speed flash up any faster than anybody else can, except for the vendors when they new tech nodes, but from a read centric or analytic type workload. So you talk big data analytics, data warehouse analysis, anything around databases. And then we've also found from a customer base, a significant number of AI and machine

Starting point is 00:03:15 learning style workloads that they've brought to us that work well for this, because we're basically creating a mining capability within the storage device itself. So everything prior to now, including the products that you're talking about, required some kind of storage device talking to a compute resource, and there's still a bus between them. And the bus between them is what's really started to come up as a problem for a lot of people. And that's what we took a different approach at and said, well, in-memory is great.

Starting point is 00:03:43 You need it. GPU acceleration is great. You need it. There are absolutely workloads where I've stored a ton of data, terabytes of data. You can think of autonomous cars. You can think of sensors. You can think of an airplane. And I want to analyze that data and I don't want to have to move it to analyze it. And as we look at now the edge IoT space with all those sensors and things, we're collecting all this data at edge points and moving that over the bandwidth, whether it be direct connect, 5G, whatever,

Starting point is 00:04:10 there's a significant time lag and an amount of bandwidth you're consuming simply to move a zero and a one from point A to point B. Yeah, and sometimes a thousand little processors is just better than one big processor. Precisely, yeah. So we didn't go into this thinking we're going to compete with an Intel or an AMD or a GPU. We're not trying to put enough horsepower per drive that it can run a server.

Starting point is 00:04:32 But we are putting enough horsepower per drive that you can manage the data on that individual drive at whatever capacity point it's at and ensure successful analytics come back to the host processor. But it's not just horsepower. I mean, you've got to have memory. You've got to have networking to some extent. Obviously, you have networking because you're a storage device, I guess. And you have to have storage. Yeah, no problem. So there's a whole bunch of patents that exist for NGD systems.

Starting point is 00:05:04 We're sitting at 20 issued and 10 pending. Oh, I'm impressed. And 15 of those are actually on SSD side of the house. And the other five plus 10 pending are on the in situ or computational storage side. So because the team here has done enough SSD controllers across the plethora of people that work here, we've done 11 different SSD controllers amongst the plethora of people that work here. We've done 11 different SSD controllers amongst the expertise of the team. We focused on grounds up. So the ASIC itself runs as a standard NBME SSD. So we're choosing NBME as the protocol because that's where everybody's

Starting point is 00:05:36 headed. And then we built an FTL that allows us to break the one-to-one barrier that people have faced over time. So there's a history in the SSD space, one megabyte of DRAM to one terabyte of flash. And that limits your capability in a footprint within an even form factor to get density. So what we do is we actually have a one-to-four ratio, one gigabyte of DRAM to four terabytes of flash. So by doing that, we allow it to go to higher densities per form factor. So we issued a press release, for example, back in May about the 16 terabyte, two and a half inch NVMe SSD. Wait, wait, wait. So you're doing in-situ processing with less memory? Correct. So we actually use memory that's on the drive.

Starting point is 00:06:21 We use the quote unquote spare memory now that we don't need for the FTL as the memory for the CPUs that are inside the SSD. So you're actually sharing the computational resources and the memory resources between the server and the computational storage. Is that how I read that? So inside the drive, it's a holistic, take a two and a half inch drive, for example, an 8039 U.2. We have two parallel processing units with inside the individual single ASIC. One is your traditional SSD management, FTL, wear leveling, garbage collection, firmware. All that good stuff. Yeah.

Starting point is 00:06:58 The second data path inside that same ASIC is the four application processors. And then there's DRAM shared inside the drive between the two. We don't take any host DRAM over. We don't use host OS. We don't require kernel changes. We don't do anything at the host layer that would cause this drive to be problematic to use, if you will. It is a drop-in NVMe drive that you can turn on these really cool capabilities of adding application processors. How do you run applications without changing the OS at the host server level and stuff like that? My jaw is dropping if you can hear it.

Starting point is 00:07:32 I don't understand how this works. So we have an SDK that we've built that allows a user to manage the drives at the host level, but it's an application. It's not a kernel change. So every drive appears to the host as an NVMe target. And we use that same NVMe protocol through our SDK and we tunnel to the drives over a TCP IP connection. So we actually don't require any unique connection point and we

Starting point is 00:07:58 can share that NVMe bus with the standard read-write path. So we're not impeding the reads or writes of just the data while we can compute on that same data. Okay, so you've got some set of compute cores in the SSD. Are those like standard ARM cores running Linux that I can code to? Or is it more complicated than that? No, back to our keep it simple, stupid principle of things. We have four A53 application coprocessors,

Starting point is 00:08:24 and we run an instance of Ubuntu live inside every drive. And so it's literally a microservices in each product. Oh, okay. So you've got a full Linux solution. So I've got a little Lambda-like serverless environment within the SSD. Exactly. And when I write data, the application that I've uploaded into those cores examines it and sends a message back that says that Joey, no fingers just entered the casino and we should stop him. Basically. Yeah. And the theory behind it is where what classified is my,

Starting point is 00:09:00 my CTO loves to classify as near real time. So everything that we do when it comes to running the application processors on the data, it's done once the data is in the flash media or what other persistent memory may be coming available because we have that flexibility built into the controller. So the data is written to the drive, then you can do the analysis on it and send the response back,

Starting point is 00:09:21 which is historically what you do for any stored data. And that's why we're not competing with or trying to replace the concepts of in-memory, because there are a lot of applications, there are a lot of needs in the market where in-memory and persistent memory buses or GPUs are still needed. But for the 85% of the rest of the workloads that are managing stored data, we can be applicable. So just about any storage-oriented data activity can be migrated out to the NGD drive? Yep, exactly.

Starting point is 00:09:51 It's a platform because it's running the Linux and because those application processors are able to be made available to the end user. You can port an application as is and you'll see some advantage just simply because you're not creating an IO traffic. You can then also recode the application, which we're doing with a couple of strategic partners, where they're actually rewriting their code to take advantage of the direct connection.

Starting point is 00:10:13 And we're seeing even more incremental improvements in the application performance. So if they didn't rewrite, then they would just do normal Unix level I-O, and it would be mapped within this NGD drive to the direct connection. Is that how it would work? Yep. And it's scaling across multiple drives per platform. Right. So I've got some incoming data stream that is writing data spread across multiple drives.

Starting point is 00:10:40 And then each one of them does some analysis and tells me what happened in that data. Correct. So if you are taking a very large data set across 10, 12, 24 drives, each one of the drives will run concurrently the exact same application and process on the data set on that given drive and then give the responses back to the host as a single set of data. So you're still running, quote unquote, independent analysis per drive, but because you've got four cores in every drive and they're running in parallel, you'd still get a significant improvement in response. Yeah, this is extremely interesting. So, and during this processing, you could still be doing normal storage IO kinds of stuff, reading and writing to the host if

Starting point is 00:11:20 necessary, right? Yeah. In fact, you have to be able to continue to do standard data in and out to make it work effectively. So the usual question comes up then is what about contention? And the response is, well, we only operate on stored data. So if you've stored information and we've run the process in the app on that data, you'll get the response from the most recently stored data. If you're changing the data after the analysis is done, kind of like inline memory update, right, you're going to have the same exact contention you'd normally have. The difference is you haven't pulled the data all the way to memory to see that you've made a mistake.

Starting point is 00:11:53 So have you got some examples of what this analytical application is? Because, I mean, facial recognition is the first one that comes to mind, but there's reasons I want to use a GPU for that. Exactly. So we've done, kind of as I mean, facial recognition is the first one that comes to mind, but there's reasons I want to use a GPU for that. Exactly. So we've done, kind of as I mentioned, we've done a bunch of generic application work, but we've seen because of talking to customers that these AI and machine learning type of things come in very handy. So if you look at a neural network, you can do either a weightless neural network or even go as far as a convolutional neural network. We can actually do the training and inference inside the drive as the data is being written. So as you're writing a file, instead of having to pull everything back into memory and keep an

Starting point is 00:12:33 index in the memory footprint on the CPU side of the system, we keep the index local in the memory inside each drive so we can update the index as the files are being written and turn right around and inference on them instantaneously without any IO bandwidth between the host and the user, other than simply the files being written and then the response to the inference is being written back out. There's no extra IO exchange. And there are actually cases, and we're very happy to admit that if you do a single drive for certain types of networking applications like that, and the database is small enough where you can actually fit it into a memory footprint that's on the server, it may be faster just to run it in host application-based use model. But if you're striping data across more than one drive, or your database is larger than a couple of terabytes in

Starting point is 00:13:18 footprint, we can see it scale linearly as you add drives and as the data set grows. So there's no change in the response time from the drives running the application versus the host has a longer and longer delay as you're constantly flushing and reloading the memory footprint with the size of the database to do the analysis. And so you've run like standard databases, Redis, or some of the other databases on the machine, and you've seen some speed ups and stuff? We have. We don't see as much significance on some of those database platforms unless, again, it gets to a certain database size. The smaller the database, the less value I can add because I'm not solving some of your IO traffic bandwidth because you can move it quickly enough into the host memory. So it has to be of some substance.

Starting point is 00:14:08 Yeah. Everything's easy when you fit completely in host memory. Exactly. So if you've got 128 gigabytes of host memory and you've only got 120 gigabytes of a file, or even you can change that to a T, 128 terabytes of DRAM, until you bridge that delta where I've got more storage than memory, you won't see a significant improvement. But as soon as you go over that gap, we can show huge amounts of benefits. Or even if we don't show a 10, 20, 30x improvement, we've still freed the host to go do other tasks while we execute on its behalf. So there's a power efficiency play as well. So how many terabytes of storage are on these devices, Scott? So today we do a two and a half inch drive at 16 terabytes with the current man available. And we have an M.2 at eight

Starting point is 00:14:54 terabytes and then the fancy new EDSFF stuff, the one you short, the little one, we can do 16 terabytes today. That M.2 sounds interesting. Yeah, 8 terabytes on an M.2 box? Yep. Device? Well, imagine a box you designed for that for parallelism with 64 of those. Or more. I don't know.

Starting point is 00:15:16 You're starting to talk HPC kinds of stuff here. Are you in the HPC market? We've been talking to a few of the HPC guys. They're a little slower to jump on the fun bandwagon that we're creating. We've definitely got a lot more focus on the easier to go after hyperscale guys, but the HPC guys have showed interest. We've actually got several of our three-letter acronym guys back in DC that have been approaching us and talking to us about opportunities there. Yes. Well, my neighbors up the mountain in New Mexico will come around pretty soon. Yeah, sooner or later. They'll figure it out. It's that atomic mentality, I think,

Starting point is 00:15:50 or something like that. For a second, you were going to say the much more accessible enterprise market. Yeah. Is that an oxymoron? Yeah, pretty much. I guess. So another good example for you

Starting point is 00:16:00 since we're talking about it is, so look at genome sequencing and biotechnology. So there's an application out there that as you scale host CPUs, you get an incremental performance in protein sequencing analysis that these platforms can do. And one that we've looked at is called BLAST, and it's a government focused one. So if you've got 32 cores running your protein sequencing on a database of 150 terabytes, you get basically from one core to 60 cores, you can improve about 60x. So it's a 1x per core you improve. But if you look at it from a storage perspective, I've given you, you've got a 60X improvement by the host cores, but I've turned around and given you a 100% improvement on that performance just simply

Starting point is 00:16:50 by activating the cores that you already have in the storage you require. So you're getting 100% better performance at no additional cost to the storage. Right. And that's an application that subdivides and hands out little pieces of work all the time anyway. Exactly. And they're usually in random sizes and all that kind of good stuff that makes it more complex for people to kind of play with, if you will. So this does bring up the cost question. I mean, so 16 terabyte drives, is that a standard level cost? Are you charging a premium for this computational storage device?

Starting point is 00:17:26 So we're rolling it out as two unique products because as we've been talking to customers over the last couple of years delivering these prototypes, we have some customers like, you know what? I just need the big one. So I've sold 32 terabyte add-in cards to some customers. I've sold 16 terabyte U.2s and they're fat, dumb, and happy with having that kind of storage footprint per drive. So we're selling that as a cost competitive to the market NVMe SSD. Then we turn around and look at this. Okay, now we have this computation capability. There is a software layer. There's a support expectation when in time you offer software. So there is a pennies per gigabyte adder to each drive for the capability to do the computational resources. And are you shipping all the capabilities and all the drives and just turning it on

Starting point is 00:18:11 via a software license? Or is it really two distinct hardware devices? We are able to, with the capacity only version, because we have the capability to play around with the DRAM modeling of it, we can offer it at a lower cost without having the capability of in situ turned on because we wouldn't put as much DRAM in the drive, therefore saving the user cost. But generally when you hit eight terabytes or bigger in a drive, the DRAM offset isn't enough to really cause the customer any pain points. So then they would just buy it as the in situ and we turn it on later. And we'd only charge them for it once they turn it on. Yeah, an SSD that big is so expensive,

Starting point is 00:18:48 the little bit of DRAM differential isn't that big. Correct. Okay. And that's another kind of step point into the kind of the value prop of NGD systems is because I'm doing 8 terabytes or I'm doing 16 terabytes, people naturally go, well, you can't be as cheap as the NAND vendor. But from a margin perspective play, if you want to really get down to the nuts and bolts of it, because my BOM is less costly for the rest of the BOM, it offsets

Starting point is 00:19:14 some of that. So my cost to be competitive with the big guys is not as problematic as it is for some other people in the market. And you guys are using like 3D TLC kinds of NAND, is that, or are you moving beyond that? So we're compliments of not being one of the big guys. We're flash agnostic. And I've got ties into my former employer, of course. And then we've got the guys over in Japan, a couple of guys in Korea. We work with all of them is the benefit of it.

Starting point is 00:19:40 So we've actually got a business model where a customer can request. And based on their request and their ability to help negotiate, we can adjust price. Yeah, when you're selling to the Amazons and Baidus of the world, they can twist Samsung and my crumbs aren't for you. Exactly. Better than you can. Yeah, yeah. So by being able to offer customers that option, we can therefore add even more value to the marketplace.

Starting point is 00:20:02 Because then if one guy runs short on parts and we've already pre-qualified the product across multiple vendors and our performance doesn't change because we overcome that with our firmware mapping, then we can provide a customer three or four different versions of the exact same drive, no performance change, no reliability differential, and no, you know, significant bomb that they haven't already tested. We keep a supply chain very happy. Gee, when you were at your former employer,

Starting point is 00:20:27 you kept telling us how theirs was special. Now you're making all the flash the same. You know, sometimes let's not go there. So this has got an Embutu complete operating system sitting on this, on these cores sitting out there on the drive, and you've got the host system as well. I mean, so the STC must be somewhat interesting to be able to kind of control all this computation going on across. Is it typical, I guess the question is, is it typical to have a significant application that runs in all the drives in a particular server?

Starting point is 00:21:03 I mean, or would it be different applications running on different drives? Or how does that play out? You could, in theory, do different applications per drive, but most of the customers we have, it's all direct attached at this point. We have been working with some fabric-based solutions, which we could get into. But as a direct attach, you're usually striping data across multiple drives. So it challenges the user in today's current environment to do application per drive. Now, if we reset that and we talk to these guys about building it from grounds up, and they literally keep data sets per application per drive,

Starting point is 00:21:38 we could very easily with our platform run individual applications on each individual drive. That's not a gate from our perspective, other than it's the current architectural problem on the host side. That seems just too complex to me. The way I'm looking at this, it appears I'm going to run a microservice in the drive and send data and have the microservice analyze it and send results back. Exactly. Speaking of microservices, does it support container operations sitting out there on a drive?

Starting point is 00:22:10 Well, you'd almost think I'd prompt you for that one. So, yeah, we actually have, because it's a Ubuntu core Linux and it's a full-fledged operating system, we can drop containers natively into every drive. So a good example is we went out to the Docker store and we SSH'd from the host directly to the drive, dropped the container onto the drive itself and executed the application in place, no modifications whatsoever. In the case of that app we did there,

Starting point is 00:22:38 it was a license plate recognition tool that's available in the Docker store. And we ran it independently on several different drives simultaneously. The host sat idle, zero to 5% utilization on the host. But yet each drive was sending back license plate recognition data to the server system saying, here's the answer to your question, if you will. I can think of a lot of toll services that might be interested in something like this. My God, this is impressive. So, you know, normally you would need a container engine or something like that to run these things, but it can run native on Ubuntu without the Docker

Starting point is 00:23:13 container engine. So the Ubuntu instance that we have installed on the system has that Docker container engine. Yeah. So you are correct. You do have to have that. But now with Docker, of course, supporting Kubernetes as well, we kind of get that a little bit of a default of both of the best of both worlds. So have you got built-in event trigger stuff so that my microservice gets notified when new data has been written, or is that something I have to code? Today, it's not in the current version of the solution, but it is something definitely from a roadmap item that we know we'd like to add because it's been asked for. Okay. And I just want to clarify one thing. You talked about

Starting point is 00:23:54 customer striping data across multiple of your SSDs. We're not talking about RAID type striping where there's just arbitrary chunks. We're talking about writing files or objects or something else that's complete to be analyzed to each one, right? Very good clarification. Yeah, we're not designed today to, just like any other drive, as soon as you put it behind a true RAID engine, you virtualized out the drives so the individual targets aren't there anymore. So it's very difficult for us to operate in a system where it's in a rated environment. The beauty is a lot of customers have started realizing that with Flash and with NVMe, you really don't require RAID. The erasure coding capabilities or replication are so much more efficient and useful pretty interesting amounts of computational resources, especially if it runs containers, man. I think I can run my whole world here.

Starting point is 00:24:52 Yeah, we are very excited about the opportunities of that kind of platform enablement has started to show off and enable with customers. But there's no GPU or anything like that sitting out there on that drive. No. And there's, there's not an, there's not really a plan to, so we actually did some, uh, uh, work with one of our partners on, uh, it was, it wasn't facial recognition, but image similarity searching. So thinking Bing or Google, when you Google a word, you want to go find something and you're invoking TensorFlow or other tools. And we mapped it between running it in a host system with GPUs attached mapped it between running it in a host system with GPUs attached to it

Starting point is 00:25:26 and running it in a system with our drives attached to it. And the drives being just enough drives to hold the database. So we weren't trying to overload it by saying we're going to add extra cores by adding an extra drive. Wait, you didn't cheat? We didn't cheat.

Starting point is 00:25:39 Scott, I'm disappointed in you. Where's that marketing side of you, Scott? I mean, it's the engineering side coming in. But so we did 12 drives, fully loaded. They're 8 terabytes a piece, so it's a 96-terabyte database. And we ran it with GPUs. We ran it with our drives. Now, from a performance perspective,

Starting point is 00:26:00 we were 10 times slower than running it by by pulling it into the gpu resources and all that kind of good stuff from a power and energy efficiency perspective we were 300 times better so we saved the customer on a power budget and we delivered the responses within a you know order of magnitude error if you will so we've talked to one customer he went to an ai conference in brazil a couple weeks ago. And the thing they're running into today is everybody's throwing GPUs galore at everything. And we're finally hitting to the point where the true TCO. So if I get 100x improvement with 72 GPUs, or I get a 10x improvement with just some storage devices,

Starting point is 00:26:40 the TCO says go with the storage devices. So that really is starting to come around to, to the, I'm sure what the GPU guys would say is a terrible problem to have, that they're going to start selling fewer of them just because people can afford to power them up. Well, they're starting to sell fewer of them because people are giving up on Bitcoin mining.

Starting point is 00:26:56 And that's a good thing. That too. So there's, there's, there's a couple of, you know, AI FPGA kinds of things out there. TensorFlow you brought up.

Starting point is 00:27:07 They've got TPU for IoT devices and stuff like that that they're coming out with. It would be interesting to see if you couldn't incorporate something like that in here. But I imagine it's a big step. Now, maybe somebody might be interested in doing that, mind you. The thing we have to keep in consideration when we look at adding the compute to storage is the power envelopes of the storage devices because our backplanes are only so strong. So of course, NVMe in a two and a half inch slot

Starting point is 00:27:34 allows you 25 watts, but when you run 24 drives at 25 watts, you either don't have enough power supply or you've got a cooling problem. So what we did, our focus was to say, let's add enough compute core. So the core we chose was not a high performance core, but a low power, moderate performance core. So that when we have a 16 terabyte drive doing native data IO, and we turn on computational resources, our total power budget for that drive is at 12

Starting point is 00:28:02 watts. So we're giving you performance and power savings over some of the other products in the market, because not everybody has the capability to do that. And if you go to the M.2s, like Howard mentioned earlier, 64 of them in a system, there's a problem today with a lot of the open compute platforms that developed around the M.2 to cool them, because you get one in the front and one in the back, and the one in the back can't get cool. But if they're all running at six watts and offering compute offloads so your CPUs aren't heating up the rest of the box, you get a win-win.

Starting point is 00:28:31 Yeah, and the alternative would be a U.3 device with a TPU and one gigabyte of flash to hit that power envelope. Or something, yeah. And we do have others in the plate. So there are definitely, we're not the sole source, if you will, of computational storage devices.

Starting point is 00:28:50 And there's actually things coming up like the SNEA Storage Developer Conference next week is having a special session to introduce a way to standardize the acknowledgement of compute capable devices, if you will, in one of their birds of a feather session during that event next week. And there's multiple companies invested in that. So this is something that's starting to get a little more traction. And we acknowledge that there are

Starting point is 00:29:13 workloads I'm going to do very well at. There's workloads that other people that are developing a different type of product are also doing very well at. But we're not competing with one another, if you will. We have different workloads or different end cases for our customers. That's interesting. And so you think there might be some standards activity to try to provide some standard way of accessing the computational resources out past the NVMe bus? I just think it's too early. I think we have to let... What do you mean it's too early? These guys are shipping product. They've got 60 M2. They've got all sorts of stuff here. Yeah, and Scott would be really happy if some standards body took the methods that they're using and called them a standard. But me, I'd rather have a year or two of five or six more vendors going, and here's another way to do it before we start writing a standard.

Starting point is 00:30:05 I was going to say there is actually what's called a provisional working group within SNEA to investigate what would be classified as something that would require standard. And we're not looking at it as the transfer protocols, but more of how do you identify that you have that capability and how do you manage that? Because we're looking at it from a NIC perspective to a storage device perspective to whatever else. And so to your point, Howard, I totally agree. I don't want it locked down today or tomorrow, even if it's my protocol, because that doesn't solve the problems for the better public. But it does put us on track to develop a way of making these devices more flexible in the market. Yeah, I think CDMI is kind of the example of how not to do it. Yep, exactly.

Starting point is 00:30:48 Wow, that's a different discussion. Do you think CDMI was there because some one cloud vendor became dominant in that solution? Well, it was a standard before enough people were using object storage to decide what the things you wanted in a standard were. Exactly, yeah. But Scott, you mentioned extending this over fabrics. And we're big fans of NVMe over fabrics.

Starting point is 00:31:18 Yes. How does that communication work in a fabric environment? It's just another TCP connection or another NVMeQ? Yeah, in multiple dimensions, yeah. Effectively, what we're working with now is down that path, yeah. Because by default, our drives already have a TCP IP connection to the host system through our development protocol that we did, we can extend that over a fabric-based implementation. Now, there are so many different ways today to use that OF protocol that I'm not a drop-in and everybody can use it.

Starting point is 00:31:52 So we are starting the development of how to work in a more broad spectrum view of it, but there's nothing today that we see preventing the capability of these becoming a truly fabric-friendly environment. No, it would be harder with NVMe over Fiber Channel, but that, that, those customers and your customers aren't a hundred percent over. Yeah. Yeah. And we have a couple of nice, so being a SoCal based company, it's kind of a unique opportunity because partnering with folks in the Bay area, there's plenty of them there, but they also

Starting point is 00:32:20 have a lot of other people talking to them down here in the SoCal region that we're in in the OC. There are some vendors down here that are a little easier to work with. They drive across town and we can talk to them and they're not in the same level of 70,000 people trying to talk to them. So we've been able to strategically partner a little bit better that way on the fabric-based stuff. Well, you know what you could do if you wanted to be behind a storage system? You could do the deduplication logic. You could do the compression logic.

Starting point is 00:32:47 You could do some, you know, I'll call it reliability checking and stuff like that. All that could be done at the drive level rather than having to take the storage processor resources to go do that stuff. you'd build a solid fire like system where each SSD is responsible for some piece of the hash space and have all the D dupe in the SSD beyond that. Yeah. And compression could be done there and post-process, you know, it's, it's a, so we, we actually have a CDN customer, the content delivery being a big thing right now, right? Each one of the drives can store and compress using their compression algorithm because they're porting their compression algorithm into the drive. So we don't have a

Starting point is 00:33:30 compression engine native to the drive, but we can run an application doing compression. And then on the flip side, we can also run the key management software. Yeah. And we can run the key management software for encryption. So the big problem with encrypted drives today is every key and every exchange of information from an encryption point of view goes back to a single host that manages the key and the keys being sent back and forth over the IO bus. If I run the encryption algorithm on each and every drive, it's many instances of it, but the keys are always local. The keys are never shared and yet I can still encrypt and decrypt on the fly. Right. And the keys only have to be transferred at power up. Exactly. We have a customer right now who's trying to figure out exactly how many HD

Starting point is 00:34:12 streams from how many different cameras they can send to our drive before it craps, which is taking quite a few right now. So yeah, no, but you can pump 4k streams in and take SD stream, you know, take SD files out. Exactly. Yep. Because I'm, again, I'm not, I'm not doing anything unique on my drive side, which other SSD products have tried to do is have a native compression algorithm on the drive. I'm running the application. They would normally run at the host level in situ on the drive. So that's a very different way to look at it, but also a much better advantage to the system because there's a lot of things that you can't recompress compressed data if it's sent to you in a compressed stream. But if I'm compressing it natively, I can still buy you that space.

Starting point is 00:34:56 God, the advantage of having an Ubuntu system running, sitting out there on the drive, being able to drop containers at will, being able to drop applications almost at will, having a TCP IP interface to the server, this is pretty damn impressive. Yes. This is the marketing guy with a double E degree. You know we don't talk to those marketing guys with marketing degrees. I love it. I had to throw that out there just for fun. So from a market perspective, you're obviously not going after enterprise customers.

Starting point is 00:35:31 You're going after hyperscalers. And you mentioned HPC as well and some three-letter acronyms. Yep. So one of the most public hyperscale guys that we've been engaged with is our friends at Microsoft Research Group have done a research project called Flashsoft. They've been working on this program for just about five years now and have tried 19 different ways to solve their scalability problem of this image search that they're running. And for the first time in five years, they've been able to achieve that via our product technology and they presented it alongside us at the Flash Memory Summit during our keynote.

Starting point is 00:36:08 So that's one hyperscale guy. We do have a lot of the enterprise guys looking, but of course, from a call and delivery perspective, we all know how long some of those things can take. So there's engagement there, but there's not a to be in the market soon play there. And enterprise suppliers, not enterprises as customers. You're saying enterprise guys, but you mean enterprise storage guys kind of thing, right? Yeah. Correct.

Starting point is 00:36:31 We started talking to some of them, but then, of course, you've got to have a delivery path. And our partnering in that front is still in the early stages as we roll out this product. So it's a matter of making sure we get the vendor they like to work well with us as well. So it's not something I can order from CDW. Not today. Speaking of CDW, is it available worldwide or is it still U.S. only or how does that play out? So right now we've got direct paths to customers in U.S. and Europe. We've kind of staved off the Asia front just in the near term as we, you know, from a

Starting point is 00:37:06 scale perspective, we are planning a go-to-market rollout early next year, which will start to include the APAC area. A team of 35 right now, it makes it hard to go too global. So. Yeah, yeah, yeah, yeah. This is already your second generation solution or you did like a beta solution before, and this is really your your GA version is that did I get that right? So fall of fall of 16 we did the gen 1 prototypes fall of 17 early 18 we launched the second gen which were both FPGA based solutions the product we'll be shipping by the end of this year will be the full-fledged production ASIC. Yeah the FPGAs are a problem in terms of power envelope as well, aren't they?

Starting point is 00:37:48 Yeah, they are. And actually, we run out of gates even in the most powerful FPGAs today to do both the FTL and the compute in a single drive. So therefore, I can give you a slower drive, but give you the compute resources. And then, of course, people are like, well, if it's not faster, why am I buying it kind of thing? Well, you got to look at the bigger picture.

Starting point is 00:38:05 Yeah, I mean, it makes sense as a development kit, but only if it's in vision of buying something faster later. Yep. And because it, the, the ASICs based off of the FPGA implementation, those customers that have already bought one can upgrade without much effort. We offer an upgrade path for those customers too. Well, this has been great. Scott, anything you'd like to say to our listening audience?

Starting point is 00:38:28 I would say thanks for letting me come on and chat about it. We definitely have a lot of information on the website available for anybody who'd like to follow up. Feel free to hit me up on Twitter or via email, which you'll have on the website when this goes live. So excited to see what more fun we can have with the future of storage. Thank you very much, Scott, for being on our show today.

Starting point is 00:38:48 Next time, we'll talk to another storage system technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. And please review us on iTunes as this will help get the word out. That's it for now. Bye, Howard.

Starting point is 00:39:00 Bye, Ray. Bye, Scott. Bye, guys. Been a pleasure. Until next time.

Your Ad Here

Grey Beards on Systems - 72: GreyBeards talk Computational Storage with Scott Shadley, VP Marketing NGD Systems

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.