Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 07x07: Accelerating Storage Infrastructure using GPUs with Graid Technology

Episode Date: July 15, 2024

Modern AI infrastructure has exposed the importance of reliability and predictability of storage in addition to performance. This episode of Utilizing Tech, presented by Solidigm, features Kelley Osbu...rn of Graid Technology discussing the challenges of maximizing performance and resiliency of storage for AI with Jeniece Wnorowski and Stephen Foskett. AI servers are optimized for machine learning processing, and Graid Technology SupremeRAID offloads processing to GPUs similarly to the way these massively-parallel processors offload ML processing. They also have a peer-to-peer DMA feature to direct the data directly to the processor rather than forcing all data to pass through a single processor or channel. There is a need for RAID software at many spots in the data pipeline, from ingestion and preparation to processing and consolidation, and each requires performance and availability. There are many applications that require maximum performance and capacity without impacting the host CPU, including military, medical research and diagnostics, and financial, in addition to AI processing. Hosts: Stephen Foskett, Organizer of Tech Field Day: ⁠⁠⁠https://www.linkedin.com/in/sfoskett/⁠⁠⁠ Jeniece Wnorowski, Datacenter Product Marketing Manager at Solidigm: ⁠⁠⁠https://www.linkedin.com/in/jeniecewnorowski/⁠ Guest: Kelley Osburn, Senior Director at Graid Technology: https://www.linkedin.com/in/kelleyosburn/ Follow Utilizing Tech Website: ⁠⁠⁠⁠https://www.UtilizingTech.com/⁠⁠⁠⁠ X/Twitter: ⁠⁠⁠⁠https://www.twitter.com/UtilizingTech ⁠⁠⁠⁠ Tech Field Day Website: ⁠⁠⁠⁠https://www.TechFieldDay.com⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠https://www.LinkedIn.com/company/Tech-Field-Day ⁠⁠⁠⁠ X/Twitter: ⁠⁠⁠⁠https://www.Twitter.com/TechFieldDay ⁠⁠⁠⁠ Tags: #UtilizingTech, #Sponsored, #AIDataInfrastructure, #AI, @SFoskett, @TechFieldDay, @UtilizingTech, @Solidigm,

Transcript
Discussion (0)
Starting point is 00:00:00 Modern AI infrastructure has exposed the importance of reliability and predictability of storage in addition to performance. This episode of Utilizing Tech, presented by Solidigm, features Kelly Osborne of Grade Technology discussing the challenges of maximizing performance and resiliency of storage with AI. Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum Group. This season is presented by Solidigm and focuses on the question of AI data infrastructure. I'm your host, Stephen Foskett, organizer of the Tech Field Day event series. And joining me today from Solidigm is my co-host, Janice Ranowski. Thank you for joining us on the show. Oh, thank you, Stephen. It's so nice to be back.
Starting point is 00:00:43 So Janice, we have been talking quite a lot on this season about the varying requirements for data underneath AI. And one of the things that keeps coming up is, I guess, the practical question of how these AI servers are built and what they look like and what the infrastructure around them looks like. And it seems that a lot of the focus of the industry, especially when it comes to AI, has been on optimizing the processing aspects and the data movement aspects without as much consideration for storage. And we're here to set that record straight, right? That's why we're doing this.
Starting point is 00:01:23 That's right. That's exactly right. So it's all about the infrastructure. Infrastructure matters, but the underpinnings of that infrastructure, particularly storage, right, as we're dealing with loads and loads of data, storage is really being put back on the map in a big way. And we're excited to talk to various guests
Starting point is 00:01:40 about our large capacity, high density storage, and how does all of this become a game changer for those AI workloads that everyone's dealing with? Yeah, I think that's the key there. As you say, I mean, storage has become sort of a critical path, a critical point. You have to have high performance storage, certainly, but you also have to have reliability and storage features. And one of the questions about reliability, too,
Starting point is 00:02:06 is if there was some kind of failure somewhere in the data path, how would that affect everything? Well, the answer is pretty badly. You want to make sure that everything you're building, all the way from the CPUs to the GPUs to the network to the storage layer, everything has to be redundant, has to be reliable, and has to be predictable. So that's what we're going to talk about today. We have a guest on, Kelly Osborne, representing Grade Technology. Kelly, welcome to the show. Thanks. As you mentioned, I'm Kelly Osborne. I'm Senior Director of OAM and Channel Business
Starting point is 00:02:44 Development here at Grade Technology. And we've been in business for about three years based out of California. And I appreciate the opportunity to chat with you guys. So tell us a little bit, I guess, to start off, where in the AI data infrastructure stack does Grade technology fit? So GRADE technology and our product Supreme RAID is actually a GPU accelerated RAID stack. So we actually dedicate a GPU to do all of the infrastructure processing for RAID calculations for parity. And we also feature a peer-to-peer DMA technology that allows movement of data from the drives to the applications directly across PCIe, eliminating bottlenecks that we traditionally see with older hardware RAID technologies
Starting point is 00:03:34 and the scaling problems of CPU utilization with traditional software RAID technologies. So, Kelly, we've worked a lot together over the last year here, been to a couple of shows together and certainly very excited about your Supreme Raid product. And I know our teams have been doing some work there as well. Can we just kind of jump right into it and talk a little bit about how you guys have been taking a look at that high density storage, the 61.44 terabytes specifically, can you talk a little bit about the value of QLC SSDs in conjunction with Supreme RAID and kind of what you're seeing? Sure, and I think it's not just about capacity, it's about performance. And when you look at dense server environments, and we're talking about servers that have 12, 16, even 24 or 32 NVMe SSDs. And now with that kind of a capacity point, how do you deliver the potential performance of those drives? So if I said I'm going to do read zero stripe across all those drives and then do read, I'm going to get very high read performance.
Starting point is 00:04:44 Unfortunately, I'm not going to have any data production. So as you layer in data protection for an environment like that, the traditional methodologies will cause bottlenecks and you'll never be able to get to that performance that you would have with RAID 0. What we have been trying to do here at Gray Technology with Supreme RAID is build a stack that can deliver very close to RAID 0 or theoretical performance on read and write of those big drives you have, and yet provide all of the protection you need for that data that you're storing on those. So let's talk a little bit about how you're doing that. I guess some people listening might
Starting point is 00:05:23 be scratching their heads thinking, wait, what do GPUs have to do with storage? And I think that's the clever thing here, right? In that just like GPUs and their massive parallel architectures can be used to accelerate the kind of matrix math used for machine learning, that same kind of hardware can be used to accelerate the calculations that are used in storage data protection, right? Why don't you talk us through a little bit about how GPUs are useful for storage? Absolutely. So if you take a look at the RAID technologies that we've always had, it's nothing really new in terms of what we call Reed-Solomon algorithms. The issue is generating parity information using those kinds of algorithms will take a heavy toll on your CPU. So if your CPU is very heavily utilized because you have a
Starting point is 00:06:19 large number of drives that are really, really fast, it doesn't leave much room for your application to run. So if we can take that and put it onto a GPU, in our case, we work with NVIDIA GPUs, we've taken and written our Supreme RAID driver as a CUDA-based application. So we are taking advantage of those really, really dense CUDA cores on those GPUs to do that mathematical calculation much more quickly than you could on a CPU. Very similar to graphic rendering and things. GPUs are much better at that than a general purpose CPU. If you tried to do that on traditional hardware rate controller, the flip side of that is in that environment, you create this problem of a multi-lane highway feeding down into a single card. And so the problem there is if you've got, say, 10 or 20 of these big solid-ime drives, each taking four PCIe lanes, if you had 20 of those, that's 80
Starting point is 00:07:21 lanes. If you feed that into one card, you're funneling it down just like you would a toll booth with a superhighway. So the other part of our technology is to use a peer-to-peer DMA feature that allows us to act as a traffic cop. So in addition to offloading from the CPU, we can tell the data on the drives or we can tell the drives, excuse me, to directly send the data across PCIe, kind of like a bypass road around that toll booth. So that is the other side of our technology that solves kind of a hardware rate bottleneck. Yeah. And a lot of this too, no motherboard, you know, relay out required. The SSDs actually just plug right into the, you know, PCIe interconnect and then everything kind of just works together. Is that unique or is that similar to others in your space? We really don't have a competitor in our space. All the traditional hardware-grade controllers would require them to run cables from your drive straight to a card.
Starting point is 00:08:33 As you mentioned, we want to see the drives plugged right into the motherboard. And so we see them across that PCIe root complex, and we can communicate and control the drives without forcing the data through our card, which eliminates that bottleneck. And what's interesting to me is because of your simplistic design and nature, and you guys, you know, work with a lot of really interesting customers, we'd love to just talk a little bit about the work you've been doing with Cheetah Raid as an example. I know at the NAB show, we had a very
Starting point is 00:09:06 interesting demo there. Do you want to just talk to us a little bit about the uniqueness and what we showed there on the show floor? Sure. And I think being part of a solution is a much more interesting thing to discuss than mathematical calculations on a GPU. I think you kind of understand what we're doing. How do you apply that in the real world? And in the broadcast world, you're dealing with really, really large video files in many cases, which are perfect for the Soladyne 61 terabyte drives. The type of thing that you see in that market is if you're doing digital recording at a studio or on a film site, for example, that data will fill up those drives. And at the end of the day, you want to edit those. You want to edit that data.
Starting point is 00:09:51 So you've got to get them over to a post-processing house or a graphics, you've got hundreds of terabytes and transmit it to the cloud and then let your video editing shop go access those files. It becomes very expensive and time consuming. So what Cheetah does is they build a server that can take four of your drives and put it into a removable cartridge. And then they have three of those in the in the server so they can pull those out and then you could ship those somewhere put new cartridges in and then start filming again the next day and then back at your post-production house you could plug those in and then access all of that data we're providing the rate protection and high performance read and write access to make sure you can get that digital video down fast. And at the same time, in an editing environment, if you're going to have
Starting point is 00:10:50 five or six Windows, for example, based editing stations, you need very, very fast access. And many times, four or five people are going to want to edit that file at the same time, and they might be working on different parts of that file. And so the final piece of our solution was a company called Tuxera. And Tuxera is a company that makes Fusion FileShare. And this is basically a SMB protocol file sharing technology that allows those multiple video editors to access the same file at the same time, but also with extremely high performance. And so this really solves a lot of problem when you have on waiting for data to scroll forward and things like that. You can move back and forth very, very quickly. You can have multiple people attacking that data at the same time instead
Starting point is 00:11:45 of having to do it sequentially. So it saves time. And so that was a very, very cool, very cool solution that we were showing. And it was fun to be part of that. It was amazing. Yeah. Thank you for being a part of that, Kelly. And just downstream, all of the kind of AI workloads that are being run within the studios. There's software out there now that actually works with Texera where they're able to take that video footage in real time and then start editing and looking for sequential B-roll that supports that video. So it's kind of taking a look at the entire pipeline of how this stuff actually becomes a movie or a video, you know, within five
Starting point is 00:12:26 minutes, which could ultimately be five days. So the work that we've done here, I think, has really helped these guys, which is really exciting. I just want to ask Kelly, can you tell us a little bit more about some of your other partners and customers? I know we have like NetApp as an example. We've done some work there with those guys, and I believe you're a part of that solution. But can you talk a little bit more about some of the mission-critical workloads you work with and how high-density storage could potentially benefit some of those workloads? Sure. So in the machine learning and AI space, we're seeing more and more customers purchasing very, very expensive GPUs. I was at GTC. I got to see you out there. It was a great show. Got to see Jensen walking around with his leather jacket on. But we're seeing servers that have eight H100s and now the
Starting point is 00:13:19 Blackwells are coming. These are servers that are going to be $400,000 or $500,000 machines by the time they're finished being built. And they're very, very data hungry. And so I'm sure you've talked about it before, but one of the biggest issues is how do I keep those very expensive devices that I've purchased fully utilized? Why buy eight of them if I'm only going to have them 50% utilized because my data is too slow in getting to the GPUs? I might as well just do four. So I know very, very fast read performance from very large machine learning and training data sets is a big part of what we're capable of doing with a large number of drives. And so that's an area that we're involved in. We also partner with several companies that build parallel file systems. And the primary use for those are
Starting point is 00:14:14 high-performance compute environments. So Think, very large supercomputer environments. So we work with ThinkPark and they represent a product called BGFS and we partner with them. In fact, we were recently at the ISC conference in Hamburg, Germany, the processing and storage of data when doing ML training, there's a whole broad pipeline, a data pipeline here, all the way, as you talked about a minute ago, from ingesting and preparing data in the field through the actual practical inferencing of data with applications. And I think it's easy to get tied up thinking about the importance of storage right there when you're building your ML models. But there's a million other things that need to be addressed in order to have an effective AI solution. And it seems to me that solutions like this, like yours, are part of a broader perspective of AI data infrastructure. Is that how you see it as well? Yeah, absolutely.
Starting point is 00:15:38 You know, AI is not an event, it's a workflow. And you have to do ingest first, obviously, and that could be multiple sources of data coming in all at the same time. How do you write that really fast? That's a very write-intensive operation. And you still want that data to be protected as well in case there's corruption and other things. And then you move to kind of a tagging model and a training model, which generally is much more read intensive because I'm now running back through that data with multiple GPUs in parallel to try to build my trained data sets. And then you move into inference and inference really becomes a whole nother animal, much more read intensive once again, but not quite as GPU intensive.
Starting point is 00:16:26 And so each stage of that, and they can be different depending on which frameworks you're using. So I think that, you know, the experience that we see is that high speed access to data, both read and write, smooths out those workflows and makes them, you know, much more efficient. I'm curious, Kelly, can you tell us a little bit about like just an overall TCO model here in terms of utilizing, you know, QLCSSDs with Supreme Raid? Is there a cost benefit to any of your customers as they're kind of looking at their overall infrastructure? I believe so. When we take a look at those drives or Gen 4 drives today, and they're generating 7 gigabytes a second capable of read performance.
Starting point is 00:17:21 So if I took four of those drives, that's 28 gigabytes a second of theoretical read performance. So if I took four of those drives, that's 28 gigabytes a second of theoretical read performance. If I plug those into a traditional hardware RAID controller, I'm basically maxed out on that RAID controller. So if I had eight drives, if I plugged all eight into that one controller, I'm only going to get 50% of the throughput because that PCIe Gen 4 by 16 slot's only capable of doing 32. So I'm right at 28, 32 overhead of that controller. So the only way I'm going to get better performance is to buy another controller and another controller and another controller. So if I have 16 drives, I need four controllers. If I have 24 drives, I need six controllers.
Starting point is 00:18:04 Now I'm worried about how many PCIe slots do I have? Where do I put my GPUs if I've got all these RAID controllers, right? So the ROI for us is we can handle up to 32 drives with one slot. We have an HA feature. If you want to have two of our devices, two NVIDIA cards with our software, we do offer that HA failover. So maximum two slots. So that actually saves you on PCIe slot and it saves you on cost
Starting point is 00:18:33 from having to purchase all those physical controllers. The flip side of that is if you said, well, I could just do 24 drives and run software RAID. If you want to have any room left on that CPU to run any applications like your AI, you know, PyTorch or something like that, you're going to have to have a lot more expensive CPUs to be able to handle the overhead of the infrastructure. So by offloading that to the GPU, you could theoretically buy a cheaper CPU potentially, and that can also save
Starting point is 00:19:03 you money and still give you the performance you're looking for. Yeah, I know you mentioned it's, you know, our QLC drives have, you know, seven gigabytes per second with reads, which is great, right? But in some of these environments where you have some, you know, write intensive activity as well, you still have that 3.4 gigabytes per second, which is amazing for QLC. So, you know, in a nutshell, QLC can really handle some of these higher performant, you know, workloads. And a lot of folks out there, there's debate around whether or not QLC can hold up to it, right? But it's actually not only a benefit of performance, but cost, and it's seamless every time, which is really great.
Starting point is 00:19:41 Yeah, the point of the, you know, CPU resources is an interesting one because, of course, there are a lot of software-raided solutions that use CPU to do the same processing that you're doing on GPU. And many of those use accelerator instructions built into the CPU. Are those still, I guess, using up resources that could otherwise be useful for machine learning processing? They can be. Some of the instructions that we know about with those are instructions that aren't going to be around much longer. And they may be Intel-only features. So we see, you know, that kind of a problem where if they decide not to keep those vector instructions, you could be hampered by that. The other question is, where do you run your application? Is it in user space or kernel space? We are, we employ a kernel
Starting point is 00:20:40 driver as well as the CUDA driver. So there is a piece that runs on your CPU that works in concert with the CUDA-based driver that we put on the GPU. Some companies out there will advertise that they get the best performance by running in user space, but that has vulnerabilities along with it. Well, and I think that these are all considerations that people would need to address in their individual environments as well, right? That they would need to look at the capabilities of the servers that they're deploying in various spots along their data pipeline and decide whether it would be appropriate to use something that relies on CPU instructions versus a GPU or even a RAID card at various levels. Could you see a place for all the different storage solutions
Starting point is 00:21:29 in the same data pipeline? Very possibly. The traditional hardware controllers that have been around for a long time, they feature batteries and caching, battery-backed cache, for example. If you're talking about storing large amounts of data over time, there are still places where hard drives are being used, and that's an area that we
Starting point is 00:21:51 don't particularly work with. So that traditional hardware rate controller that has caching and other things is a perfect fit for that kind of environment. If you're talking about a lightweight server with two to four drives in it, software is probably adequate. So where the grade solution really fits is when we're talking about much higher densities of NVMe in a single machine. So you know, we kind of understand where our fit is. And, and that's our market is to focus on these machines that have more than four NVMe drives where you have to start figuring out how do I deliver the maximum performance from those drives to the application.
Starting point is 00:22:31 As drives get bigger and faster, is there a greater processing requirement to do the RAID calculations or is there more nuance to that? It's a processing in terms of which RAID level you want to use. So you have RAID 5 and 6, which are more typically, we're seeing RAID 6 once you have more than four drives because people want a higher level of protection. And I think the biggest issue is when you are talking about a large number of drives that are so fast now with NVMe that have so much data, it just consumes your CPU if it's software RAID. We already know the hardware RAID controller is not going to handle that at all. If you just dedicate software RAID, you'll never get to the performance that you want because you're going to be consuming that CPU more and more, and then you have to spend more on greater or more powerful CPUs to try to keep up with that processing. In the testing,
Starting point is 00:23:31 we've done even software RAID at RAID 10, which is how a lot of people will try to deploy software RAID to minimize that CPU involvement because it eliminates the mathematical calculations. The problem is you sacrifice half your capacity to do RAID 10 because you're mirroring and striping. And also your write performance goes down because you have to make two writes to get the mirror before you're acknowledged. We offer basically what we call very close to RAID 0 performance with RAID 5 or RAID 6 usable capacity, and once again, free that CPU up. Yeah, Kelly, so let's go back to use cases for a minute. Just want to hear a little bit more about what other type of partners
Starting point is 00:24:15 and who really needs this type of performance. Can you give some specific examples? Sure. So we're working with the military on lots of edge deployments where high performance computing is still mandatory, but they want to be very compact, very low power, very low heat, and they need to eke as much performance as they can to capture the data that they're capturing in those types of environments. We have research hospitals, one in particular that most people know of in Memphis that actually has a new type of microscopy. So this is a
Starting point is 00:24:51 microscope that is an atomic microscope, and it generates a huge amount of data. And it has to be written really, really quickly. And if you think about it, the faster I can write that data, the more quickly they can move on to another patient study. And so if you think about it, the faster I can write that data, the more quickly they can move on to another patient study. And so if you can help more people with that same device by not having to wait for the data to write down to disk, that's valuable. Similar area in the medical device world, we're working with companies that build CAT scan, CT scan, MRI type systems, and similar situation. A huge amount of data has created very quickly in a very short amount of time. And today they have to wait a long time for that to get written. If they could double that
Starting point is 00:25:38 performance, they can handle twice as many patients in a day. That helps more people. It helps the clinic pay that equipment off. That's very expensive, much more quickly because they're seeing more efficiency. Database, high performance database. We work with accounts that are working with Oracle and Postgres, Redis, other high performance in-memory compute type databases. Splunk servers. I have a large credit card company that's using us in that kind of environment. High performance compute supercomputers with parallel file systems like BGFS. So we could go on and on and on where this makes sense. We have gamers call us all the time,
Starting point is 00:26:20 but we're too expensive for the gamers. Well, that's great. Thank you so much for this. It's interesting to consider these aspects because again, as I said at the top, I feel like so often people focus only on the signature data center full of GPUs and they don't realize that there's just a lot more to the question of AI data infrastructure than that. And even that has requirements for high performance and, as I said, high-rate, reliable, and highly predictable storage that comes from RAID. So thank you so much for joining us, Kelly. Before we go, where can people learn more about GRADE technology and Supreme RAID, and where can they connect with you? Obviously, we have a LinkedIn page and a website, greattech.com. And upcoming shows that you might want to come see us in August will be at FMS, which is formerly Flash Memory Summit.
Starting point is 00:27:17 It's now the Future of Memory and Storage. That'll be in Santa Clara. And then again in November, this Super Compute 24, SC24 show, which is in Atlanta this year. So I'll be personally at both shows along with other folks from my company, and we'd love to hear from you. And Janice, I imagine that folks will see Solidigm at some of those shows too, right? Solidigm will be there, almost at all of those shows and then some. And yeah, thank you again for hosting us today, Stephen. Well, thank you very much for joining us. It's nice to see you.
Starting point is 00:27:51 And thank you everyone for listening to this episode of Utilizing Tech, our special AI data infrastructure series presented by Soladyne. You can find this podcast in your favorite podcast application. Just look for Utilizing Tech. And please do consider giving us a rating or a review. This podcast is brought to you by Tech Field Day, home to IT experts from across the enterprise, now part of the Futurum group, as well as, as I said, Solidigm. For show notes and more episodes, head over to our dedicated website, UtilizingTech.com,
Starting point is 00:28:21 or find us on XTwitter and Mastodon at utilizing tech. Thanks for listening and we will see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.