Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 07x07: Accelerating Storage Infrastructure using GPUs with Graid Technology
Episode Date: July 15, 2024Modern AI infrastructure has exposed the importance of reliability and predictability of storage in addition to performance. This episode of Utilizing Tech, presented by Solidigm, features Kelley Osbu...rn of Graid Technology discussing the challenges of maximizing performance and resiliency of storage for AI with Jeniece Wnorowski and Stephen Foskett. AI servers are optimized for machine learning processing, and Graid Technology SupremeRAID offloads processing to GPUs similarly to the way these massively-parallel processors offload ML processing. They also have a peer-to-peer DMA feature to direct the data directly to the processor rather than forcing all data to pass through a single processor or channel. There is a need for RAID software at many spots in the data pipeline, from ingestion and preparation to processing and consolidation, and each requires performance and availability. There are many applications that require maximum performance and capacity without impacting the host CPU, including military, medical research and diagnostics, and financial, in addition to AI processing. Hosts: Stephen Foskett, Organizer of Tech Field Day: https://www.linkedin.com/in/sfoskett/ Jeniece Wnorowski, Datacenter Product Marketing Manager at Solidigm: https://www.linkedin.com/in/jeniecewnorowski/ Guest: Kelley Osburn, Senior Director at Graid Technology: https://www.linkedin.com/in/kelleyosburn/ Follow Utilizing Tech Website: https://www.UtilizingTech.com/ X/Twitter: https://www.twitter.com/UtilizingTech Tech Field Day Website: https://www.TechFieldDay.com LinkedIn: https://www.LinkedIn.com/company/Tech-Field-Day X/Twitter: https://www.Twitter.com/TechFieldDay Tags: #UtilizingTech, #Sponsored, #AIDataInfrastructure, #AI, @SFoskett, @TechFieldDay, @UtilizingTech, @Solidigm,
Transcript
Discussion (0)
Modern AI infrastructure has exposed the importance of reliability and predictability of storage in addition to performance.
This episode of Utilizing Tech, presented by Solidigm, features Kelly Osborne of Grade Technology
discussing the challenges of maximizing performance and resiliency of storage with AI.
Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum Group.
This season is presented by Solidigm and focuses on the question of AI data infrastructure.
I'm your host, Stephen Foskett, organizer of the Tech Field Day event series. And joining me today
from Solidigm is my co-host, Janice Ranowski. Thank you for joining us on the show.
Oh, thank you, Stephen. It's so nice to be back.
So Janice, we have been talking quite a lot on this season about the varying requirements for data underneath AI.
And one of the things that keeps coming up is, I guess, the practical question of how these AI servers are built and what they look like and what the infrastructure around them looks like. And it seems that a lot of the focus of the industry,
especially when it comes to AI,
has been on optimizing the processing aspects
and the data movement aspects
without as much consideration for storage.
And we're here to set that record straight, right?
That's why we're doing this.
That's right. That's exactly right.
So it's all about the infrastructure.
Infrastructure matters,
but the underpinnings of that infrastructure,
particularly storage, right,
as we're dealing with loads and loads of data,
storage is really being put back on the map in a big way.
And we're excited to talk to various guests
about our large capacity, high density storage,
and how does all of this become a
game changer for those AI workloads that everyone's dealing with?
Yeah, I think that's the key there.
As you say, I mean, storage has become sort of a critical path, a critical point.
You have to have high performance storage, certainly, but you also have to have reliability
and storage features.
And one of the questions about reliability, too,
is if there was some kind of failure somewhere in the data path,
how would that affect everything?
Well, the answer is pretty badly.
You want to make sure that everything you're building,
all the way from the CPUs to the GPUs to the network to the storage layer,
everything has to be redundant, has to be reliable, and has to be predictable. So that's what we're going to talk about today.
We have a guest on, Kelly Osborne, representing Grade Technology. Kelly, welcome to the show.
Thanks. As you mentioned, I'm Kelly Osborne. I'm Senior Director of OAM and Channel Business
Development here at
Grade Technology. And we've been in business for about three years based out of California.
And I appreciate the opportunity to chat with you guys. So tell us a little bit, I guess,
to start off, where in the AI data infrastructure stack does Grade technology fit? So GRADE technology and our product Supreme RAID is actually a
GPU accelerated RAID stack. So we actually dedicate a GPU to do all of the infrastructure
processing for RAID calculations for parity. And we also feature a peer-to-peer DMA technology
that allows movement of data from the drives to the applications directly across PCIe,
eliminating bottlenecks that we traditionally see with older hardware RAID technologies
and the scaling problems of CPU utilization with traditional software RAID technologies.
So, Kelly, we've worked a lot together over the last year here, been to a couple of shows together and certainly very excited about your Supreme Raid product.
And I know our teams have been doing some work there as well.
Can we just kind of jump right into it and talk a little bit about how you guys have been taking a look at that high density storage, the 61.44 terabytes specifically, can you talk a little bit about
the value of QLC SSDs in conjunction with Supreme RAID and kind of what you're seeing?
Sure, and I think it's not just about capacity, it's about performance. And when you look at dense server environments, and we're talking about servers that have 12, 16, even 24 or 32 NVMe SSDs.
And now with that kind of a capacity point, how do you deliver the potential performance of those drives?
So if I said I'm going to do read zero stripe across all those drives and then do read, I'm going to get very high read performance.
Unfortunately, I'm not going to have any data production.
So as you layer in data protection for an environment like that, the traditional methodologies
will cause bottlenecks and you'll never be able to get to that performance that you would
have with RAID 0.
What we have been trying to do here at Gray Technology with Supreme RAID is build a stack
that can deliver very close to RAID 0 or theoretical performance on read and write of those big drives
you have, and yet provide all of the protection you need for that data that you're storing on those.
So let's talk a little bit about how you're doing that. I guess some people listening might
be scratching their heads thinking, wait, what do GPUs have to do with storage? And I think that's the clever thing
here, right? In that just like GPUs and their massive parallel architectures can be used to
accelerate the kind of matrix math used for machine learning, that same kind of hardware can be used to accelerate the
calculations that are used in storage data protection, right? Why don't you talk us
through a little bit about how GPUs are useful for storage? Absolutely. So if you take a look
at the RAID technologies that we've always had, it's nothing really new in terms of what we call
Reed-Solomon algorithms. The issue is generating parity information using those kinds of algorithms
will take a heavy toll on your CPU. So if your CPU is very heavily utilized because you have a
large number of drives that are really, really fast, it doesn't leave much room for your application to run.
So if we can take that and put it onto a GPU, in our case, we work with NVIDIA GPUs,
we've taken and written our Supreme RAID driver as a CUDA-based application. So we are taking advantage of those really, really dense CUDA cores on those GPUs to do that mathematical calculation much more quickly
than you could on a CPU. Very similar to graphic rendering and things. GPUs are much better at that
than a general purpose CPU. If you tried to do that on traditional hardware rate controller,
the flip side of that is in that environment, you create this problem of a multi-lane
highway feeding down into a single card. And so the problem there is if you've got, say, 10 or 20
of these big solid-ime drives, each taking four PCIe lanes, if you had 20 of those, that's 80
lanes. If you feed that into one card, you're funneling it down just like you would a toll booth with a superhighway.
So the other part of our technology is to use a peer-to-peer DMA feature that allows us to act as a traffic cop.
So in addition to offloading from the CPU, we can tell the data on the drives or we can tell the drives, excuse me, to directly send the data across PCIe, kind of like a bypass road around that toll booth.
So that is the other side of our technology that solves kind of a hardware rate bottleneck.
Yeah. And a lot of this too, no motherboard, you know, relay out required.
The SSDs actually just plug right into the, you know, PCIe interconnect and then everything kind of just works together.
Is that unique or is that similar to others in your space? We really don't have a competitor in our space.
All the traditional hardware-grade controllers would require them to run cables from your drive straight to a card.
As you mentioned, we want to see the drives plugged right into the motherboard.
And so we see them across that PCIe root complex,
and we can communicate and control the drives without
forcing the data through our card, which eliminates that bottleneck.
And what's interesting to me is because of your simplistic design and nature, and you
guys, you know, work with a lot of really interesting customers, we'd love to just talk
a little bit about the work you've been doing with Cheetah Raid as an example.
I know at the NAB show, we had a very
interesting demo there. Do you want to just talk to us a little bit about the uniqueness and what
we showed there on the show floor? Sure. And I think being part of a solution is a much more
interesting thing to discuss than mathematical calculations on a GPU. I think you kind of
understand what we're doing. How do you apply that in the real world? And in the broadcast world, you're dealing with really,
really large video files in many cases, which are perfect for the Soladyne 61 terabyte drives.
The type of thing that you see in that market is if you're doing digital recording at a studio or
on a film site, for example, that data will fill up those drives.
And at the end of the day, you want to edit those. You want to edit that data.
So you've got to get them over to a post-processing house or a graphics, you've got hundreds of terabytes and transmit it to the cloud
and then let your video editing shop go access those files. It becomes very expensive and time
consuming. So what Cheetah does is they build a server that can take four of your drives and put
it into a removable cartridge. And then they have three of those in the in the server so they can pull those
out and then you could ship those somewhere put new cartridges in and then start filming again
the next day and then back at your post-production house you could plug those in and then access all
of that data we're providing the rate protection and high performance read and write access to make sure you can get that
digital video down fast. And at the same time, in an editing environment, if you're going to have
five or six Windows, for example, based editing stations, you need very, very fast access. And
many times, four or five people are going to want to edit that file at the same time, and they might be working on different parts of that file. And so the final piece of our solution was a company called Tuxera. And
Tuxera is a company that makes Fusion FileShare. And this is basically a SMB protocol file sharing
technology that allows those multiple video editors to access the same file at
the same time, but also with extremely high performance. And so this really solves a lot
of problem when you have on waiting for data to scroll forward and things like that. You can move
back and forth very, very quickly. You can have multiple people attacking that data at the same
time instead
of having to do it sequentially. So it saves time. And so that was a very, very cool,
very cool solution that we were showing. And it was fun to be part of that.
It was amazing. Yeah. Thank you for being a part of that, Kelly. And just downstream,
all of the kind of AI workloads that are being run within the studios. There's software out
there now that actually works with Texera where they're able to take that video footage in real
time and then start editing and looking for sequential B-roll that supports that video.
So it's kind of taking a look at the entire pipeline of how this stuff actually becomes
a movie or a video, you know, within five
minutes, which could ultimately be five days. So the work that we've done here, I think, has really
helped these guys, which is really exciting. I just want to ask Kelly, can you tell us a little
bit more about some of your other partners and customers? I know we have like NetApp as an
example. We've done some work there with those guys, and I believe you're a part of that solution. But can you talk a little bit more about some of the mission-critical workloads
you work with and how high-density storage could potentially benefit some of those workloads?
Sure. So in the machine learning and AI space, we're seeing more and more customers purchasing very, very expensive GPUs.
I was at GTC. I got to see you out there. It was a great show. Got to see Jensen walking around
with his leather jacket on. But we're seeing servers that have eight H100s and now the
Blackwells are coming. These are servers that are going to be $400,000 or $500,000 machines by the
time they're finished being built. And they're very, very data hungry. And so I'm sure you've
talked about it before, but one of the biggest issues is how do I keep those very expensive
devices that I've purchased fully utilized? Why buy eight of them if I'm only going to have them 50% utilized because my data is too
slow in getting to the GPUs? I might as well just do four. So I know very, very fast read performance
from very large machine learning and training data sets is a big part of what we're capable of doing
with a large number of drives. And so that's an area that we're involved in. We also partner with
several companies that build parallel file systems. And the primary use for those are
high-performance compute environments. So Think, very large supercomputer environments. So we work
with ThinkPark and they represent a product called BGFS and we partner with them.
In fact, we were recently at the ISC conference in Hamburg, Germany, the processing and storage of data when doing ML training, there's a whole broad pipeline, a data pipeline here, all the way, as you talked about a minute ago, from ingesting and preparing data in the field through the actual practical inferencing of data with applications.
And I think it's easy to get tied up thinking about the importance of storage right there
when you're building your ML models.
But there's a million other things that need to be addressed in order to have an effective
AI solution. And it seems to me that solutions like this, like yours, are part of a
broader perspective of AI data infrastructure. Is that how you see it as well? Yeah, absolutely.
You know, AI is not an event, it's a workflow. And you have to do ingest first, obviously,
and that could be multiple sources of data coming in all at the same time.
How do you write that really fast?
That's a very write-intensive operation.
And you still want that data to be protected as well
in case there's corruption and other things.
And then you move to kind of a tagging model and a training model, which generally is much more read intensive because I'm now running back through that data with multiple GPUs in parallel to try to build my trained data sets.
And then you move into inference and inference really becomes a whole nother animal, much more read intensive once again, but not quite as GPU intensive.
And so each stage of that, and they can be different depending on which frameworks you're
using. So I think that, you know, the experience that we see is that high speed access to data,
both read and write, smooths out those workflows and makes them, you know, much more efficient.
I'm curious, Kelly, can you tell us a little bit about like just an overall
TCO model here in terms of utilizing, you know, QLCSSDs with Supreme Raid? Is there a cost benefit
to any of your customers as they're kind of looking at their overall infrastructure?
I believe so.
When we take a look at those drives or Gen 4 drives today, and they're generating 7 gigabytes a second capable of read performance.
So if I took four of those drives, that's 28 gigabytes a second of theoretical read performance. So if I took four of those drives, that's 28 gigabytes a second of theoretical
read performance. If I plug those into a traditional hardware RAID controller,
I'm basically maxed out on that RAID controller. So if I had eight drives, if I plugged all eight
into that one controller, I'm only going to get 50% of the throughput because that PCIe Gen 4 by 16 slot's only capable of doing 32.
So I'm right at 28, 32 overhead of that controller.
So the only way I'm going to get better performance is to buy another controller and another controller and another controller.
So if I have 16 drives, I need four controllers.
If I have 24 drives, I need six controllers.
Now I'm worried about how many PCIe slots do I have?
Where do I put my GPUs if I've got all these RAID controllers, right?
So the ROI for us is we can handle up to 32 drives with one slot.
We have an HA feature.
If you want to have two of our devices, two NVIDIA cards with our software, we do offer that HA failover.
So maximum two slots.
So that actually saves you on PCIe slot
and it saves you on cost
from having to purchase all those physical controllers.
The flip side of that is if you said,
well, I could just do 24 drives and run software RAID.
If you want to have any room left on that CPU
to run any
applications like your AI, you know, PyTorch or something like that, you're going to have to have
a lot more expensive CPUs to be able to handle the overhead of the infrastructure. So by offloading
that to the GPU, you could theoretically buy a cheaper CPU potentially, and that can also save
you money and still give you the performance
you're looking for. Yeah, I know you mentioned it's, you know, our QLC drives have, you know,
seven gigabytes per second with reads, which is great, right? But in some of these environments
where you have some, you know, write intensive activity as well, you still have that 3.4
gigabytes per second, which is amazing for QLC. So, you know, in a nutshell, QLC can really handle some
of these higher performant, you know, workloads. And a lot of folks out there, there's debate
around whether or not QLC can hold up to it, right? But it's actually not only a benefit of
performance, but cost, and it's seamless every time, which is really great.
Yeah, the point of the, you know, CPU resources is an interesting one because, of course, there are a lot of software-raided solutions that use CPU to do the same processing that you're doing on GPU.
And many of those use accelerator instructions built into the CPU.
Are those still, I guess, using up resources that could otherwise be useful for
machine learning processing? They can be. Some of the instructions that we know about with those
are instructions that aren't going to be around much longer. And they may be Intel-only features.
So we see, you know, that kind of a problem where if they decide not to
keep those vector instructions, you could be hampered by that. The other question is,
where do you run your application? Is it in user space or kernel space? We are, we employ a kernel
driver as well as the CUDA driver. So there is a piece that runs on your CPU that
works in concert with the CUDA-based driver that we put on the GPU. Some companies out there will
advertise that they get the best performance by running in user space, but that has vulnerabilities
along with it. Well, and I think that these are all considerations that people would need to address in their
individual environments as well, right?
That they would need to look at the capabilities of the servers that they're deploying in various
spots along their data pipeline and decide whether it would be appropriate to use something
that relies on CPU instructions versus a GPU or even a RAID card at various levels. Could you see a place for all the different storage solutions
in the same data pipeline?
Very possibly.
The traditional hardware controllers
that have been around for a long time,
they feature batteries and caching,
battery-backed cache, for example.
If you're talking about storing large amounts of
data over time, there are still places where hard drives are being used, and that's an area that we
don't particularly work with. So that traditional hardware rate controller that has caching and
other things is a perfect fit for that kind of environment. If you're talking about a lightweight server with two to four drives in it, software is probably adequate.
So where the grade solution really fits is when we're
talking about much higher densities of NVMe in a single
machine. So you know, we kind of understand where our fit is.
And, and that's our market is to focus on these machines that
have more than four NVMe drives where you have to start
figuring out how do I deliver the maximum performance from those drives to the application.
As drives get bigger and faster, is there a greater processing requirement to do
the RAID calculations or is there more nuance to that?
It's a processing in terms of which RAID level you want to use.
So you have RAID 5 and 6, which are more typically, we're seeing RAID 6 once you have more
than four drives because people want a higher level of protection. And I think the biggest
issue is when you are talking about a large number of drives that are so fast now with NVMe that have so much data, it just consumes your CPU if it's software RAID.
We already know the hardware RAID controller is not going to handle that at all.
If you just dedicate software RAID, you'll never get to the performance that you want because you're going to be consuming that CPU more and more, and then you have to spend more on greater or more powerful CPUs to try to keep up with that processing. In the testing,
we've done even software RAID at RAID 10, which is how a lot of people will try to deploy software
RAID to minimize that CPU involvement because it eliminates the mathematical calculations.
The problem is you sacrifice half
your capacity to do RAID 10 because you're mirroring and striping. And also your write
performance goes down because you have to make two writes to get the mirror before you're
acknowledged. We offer basically what we call very close to RAID 0 performance with RAID 5 or RAID 6 usable capacity, and once again, free that CPU up.
Yeah, Kelly, so let's go back to use cases for a minute.
Just want to hear a little bit more about what other type of partners
and who really needs this type of performance.
Can you give some specific examples?
Sure.
So we're working with the military on lots of edge deployments where high performance
computing is still mandatory, but they want to be very compact, very low power, very low heat,
and they need to eke as much performance as they can to capture the data that they're capturing
in those types of environments. We have research hospitals, one in particular that
most people know of in Memphis that actually has a new type of microscopy. So this is a
microscope that is an atomic microscope, and it generates a huge amount of data. And it has to be
written really, really quickly. And if you think about it, the faster I can write that data,
the more quickly they can move on to another patient study. And so if you think about it, the faster I can write that data, the more quickly they can move on to
another patient study. And so if you can help more people with that same device by not having to wait
for the data to write down to disk, that's valuable. Similar area in the medical device
world, we're working with companies that build CAT scan, CT scan, MRI type systems,
and similar situation. A huge amount of data has created very quickly in a very short amount of
time. And today they have to wait a long time for that to get written. If they could double that
performance, they can handle twice as many patients in a day. That helps more people.
It helps the clinic pay that
equipment off. That's very expensive, much more quickly because they're seeing more efficiency.
Database, high performance database. We work with accounts that are working with Oracle and
Postgres, Redis, other high performance in-memory compute type databases. Splunk servers. I have a
large credit card company that's using us in that kind of
environment. High performance compute supercomputers with parallel file systems like BGFS.
So we could go on and on and on where this makes sense. We have gamers call us all the time,
but we're too expensive for the gamers. Well, that's great. Thank you so much for this.
It's interesting to consider these aspects because again, as I said at the top,
I feel like so often people focus only on the signature data center full of GPUs and they don't realize that there's just a lot more to the question of AI data infrastructure than that. And even that has requirements for
high performance and, as I said, high-rate, reliable, and highly predictable storage that
comes from RAID. So thank you so much for joining us, Kelly. Before we go, where can people learn
more about GRADE technology and Supreme RAID, and where can they connect with you? Obviously, we have a LinkedIn page and a website, greattech.com.
And upcoming shows that you might want to come see us in August will be at FMS, which
is formerly Flash Memory Summit.
It's now the Future of Memory and Storage.
That'll be in Santa Clara.
And then again in November, this Super Compute 24, SC24 show, which is in Atlanta
this year. So I'll be personally at both shows along with other folks from my company, and we'd
love to hear from you. And Janice, I imagine that folks will see Solidigm at some of those shows too,
right? Solidigm will be there, almost at all of those shows and then some. And yeah, thank you again for hosting us today, Stephen.
Well, thank you very much for joining us.
It's nice to see you.
And thank you everyone for listening to this episode of Utilizing Tech,
our special AI data infrastructure series presented by Soladyne.
You can find this podcast in your favorite podcast application.
Just look for Utilizing Tech.
And please do consider giving us a rating or a review.
This podcast is brought to you by Tech Field Day, home to IT experts from across the enterprise,
now part of the Futurum group, as well as, as I said, Solidigm.
For show notes and more episodes, head over to our dedicated website, UtilizingTech.com,
or find us on XTwitter and Mastodon at utilizing tech.
Thanks for listening and we will see you next week.