In The Arena by TechArena - Revolutionizing RAID: Graid's Supreme Solution for Modern Data Challenges
Episode Date: September 3, 2024Allyson Klein and Jeniece Wnorowski chat with Kelley Osburn of Graid about SupremeRAID™ and its role in tackling high-performance storage challenges in data-driven environments....
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now let's step into the arena.
Welcome to the Tech Arena Data Insights series. It's time for another podcast
about data insights, and that means Janice Narowski of Solidigm is back in the studio with me.
Welcome back to the program, Janice. Yay, thank you for having me back, Allison. It's a pleasure.
So Janice, how have you been spending your time since the last podcast, and what have you been
picking up on Data Insights?
Oh my gosh, spending so much time attending various shows.
We just landed at Flash Memory Summit, which was a really good time meeting with a lot of our partners and customers.
And I am super excited to introduce Kelly Osborne. Kelly Osborne is the senior director at Grade and frankly, just one of my most favorite people to talk to. So
we're going to chat with you today. Welcome to the show, Kelly. How's it going?
Thanks. Everything's going great. I went to FMS as well and got to hang out with Janice and the
Soladyne folks. Very cool show. And I made it home without catching any kind of book.
Oh, that's nice. That's wonderful. Let's just get started then. Grade is obviously known as a leader in storage solutions, and I've been reading up on your solutions.
The Supreme Raid's products have captured the attention of many enterprises.
Can you provide some background on Supreme Raid and how Grade came to deliver it to the market?
Yep. I think it's pretty obvious that the advent of flash storage has taken the market by storm and it's growing extremely rapidly. Compounding that, we can see as many as 24 SSDs in a single one-use
server platform.
And then when you take a look at what Solidigm is doing with these massive drives that hold
up to 61 terabytes today, and we know that they're going to be going even higher next
year, being able to protect those and being able to protect them with RAID in a timely
fashion becomes paramount.
A huge amount of data when you stuff 24 drives into a server like that, and you have to make sure that the data is protected, especially in these large machine learning workflows and large databases, etc.
We identified a problem in the market where traditional hardware RAID solutions simply cannot keep up with the performance
of these drives. And software RAID, which is the alternative to a hardware RAID solution,
also has problems with scalability because it consumes a very large percentage of your CPU
to handle the data protection tests. So what we came up with was an idea to write our own
software RAID stack that we can deploy on an NVIDIA GPU.
So we are NVIDIA inception partners, and we have written our software to be a CUDA-based driver that runs on an NVIDIA card.
And this does two things.
It takes all of the RAID processing off of the CPU. put out a creative patent to the peer-to-peer DNA technology that allows us to communicate with the
drives and get out of the data path so that the data can flow back and forth to the applications
directly from the drives, thus delivering maximum performance. So that's in a nutshell what we're
doing. Super interesting stuff, Kelly, and thank you for the background just on Supreme RAID,
how it's utilizing the high-density storage, how it's utilizing the high density storage, how it's utilizing the GPU in
an innovative manner. Can you share with us a little bit around what type of end customers
might be using this type of technology? And I won't put you on the spot. You don't have to name names.
The customers that really need this kind of technology are the ones that are buying
very dense servers with a large amount of storage, and they've been struggling to generate or get the performance that was promised to them from, let's say, 20 SSDs. And so what they're looking for is
really low latency, high response. So we see fintech environments where it's high frequency
trading, high performance databases like Redis and Oracle and Couchbase and things like that.
We're also involved in large super compute environments
and we partner with parallel file system companies
like BGFS to be able to provide
very high performance data protection in the storage node
that can then be expanded across a number of nodes
into a larger file set into the petabytes of capacity.
Also things like machine learning,
we have customers who are using us
for really high performance data collection, things like Splunk servers at a major credit card company. And also in the military, we're involved in classified data acquisition technologies that need really high performance and a small footprint because they're in a mobile platform. And so weight becomes a critical issue in something like an aircraft.
Kelly, you just touched upon so many interesting use cases and innovative use cases, but I have to ask you about the elephant in the room, which is AI. And obviously, AI is placing a tremendous
stress on the data pipeline and serving up data at every stage between training and inference.
How do you see
RAID solutions being tapped to fuel these workloads? The first part of that is the data sets that we're
seeing for these models are very large. So they span multiple disks, obviously, and you need to
make sure that you protect that data so that if a drive fails, you can replace that drive and not
lose your data and be able to rebuild quickly because you want to make so that if a drive fails, you can replace that drive and not lose your data
and be able to rebuild quickly
because you want to make sure that in a rebuild situation,
if there's a problem,
you don't go into a degraded state on the host.
The second part of it is,
how do we deliver the maximum read performance
for these types of environments?
Things like really fast IO
that can reduce the time spent reading input data.
The model training times can be improved when you do this.
So I like to say that AI and machine learning isn't an event.
It's actually a workflow.
And really high-speed IO can smooth out these workflows.
And on the most important part of this, I think, is making sure that the extremely expensive
GPU assets and accelerators that these customers and companies are buying can get full utilization.
If they're sitting there waiting for data, then you're not getting your money's worth out of these very expensive assets that you purchased.
And so delivering data very, very timely is how we help in these workflows.
So delivering that data very timely, I agree, is key, right?
But tell us a little bit more, Kelly,
about how your customers or new prospective customers
should look at your technology.
How do they deploy it?
How do they look at it differently?
Why should they use this great product?
We actually very much simplify
the way you can deploy storage in a server.
Instead of having a physical
card that has all the drives directly attached to it, which creates a cabling mess, we actually
work with servers that have the drives directly attached to the motherboard. And when we put the
NVIDIA card into that server and load our driver on it, it now looks like a RAID controller to the
host operating system instead of a graphics card.
We can then communicate with those drives across that PCIe root complex, if you will, and act like a traffic cop.
And so the way it gets deployed gives you a huge amount of flexibility in the data protection levels you want, how many virtual disks you can create.
We support things like NVMe over Fabric to provide access to data
outside of the server. And then the customers would then present those virtual disks to their
applications instead of the physical disks. And then we create that huge amount of performance
to feed those expensive GPUs. Now, I know that in all of this, you also need some fantastic storage media.
And I know that you work with Solidigm.
Can you talk a little bit about the collaboration between Solidigm and Grade and how that plays into the solutions that you've been talking about today?
Absolutely.
The Solidigm relationship that we have has been very strong.
We've done extensive testing in their engineering labs as well as ours. And we actually had an opportunity to put together a solution with Solidigm for the large NAB show.
So in the media and entertainment industry, we partnered together to create an environment that involved several other companies whereby we could have removable disc cartridges in a server full of the Solidigm 61 terabyte drives, which are ideal for
recording large amounts of digital video. The removable cartridges gives you the ability to
move those large files to another location for post-processing and computer graphics and things
like that. And then we had an additional partner that provided a really high speed file access
layer called Tuxera. And between the four of us, we created a pretty interesting solution.
The server manufacturer was called Cheaterade.
And we actually just won a Best in Show award at the FMS show two weeks ago in Santa Clara.
So we were very, very pleased with that solution and are now working with a number of different media and entertainment organizations who are interested in this solution that we build. That's so cool. It is really cool. It was shocking to get that
award alongside the great team and really appreciate all the interesting work you guys
did there, Kelly. And speaking of that specific solution, right, with media and entertainment,
you know, as you're going out and talking with other customers, what are you seeing in terms of changes in storage requirements?
And how does that shape where you're taking your solutions into the future?
So it's the combination of capacity and performance.
And sometimes customers need one or the other, or sometimes they need both.
These solid-dime drives are capable of delivering the performance, but the huge amount
of capacity. And so some existing opportunities that I'm involved in where these drives are
being heavily considered are for map bar servers, where over time they're going to be collecting a
large amount of data and they need to have access to that for analytics has to be very fast, but
they need that deep storage and have it really be available
whenever they need it. And so these drives are capable of delivering that. The other side of it
is where we see companies who want really high IO per second. And in that kind of an environment,
you might go with a larger number of smaller drives because it's not a capacity play,
it's a straight out performance play. But if you need a significant
capacity with maximum performance, we have found that these solid-on QLC drives deliver.
Now, Kelly, when you take a look out in the future, obviously we're heading into the end of
2024, we're heading into the 2025 timeframe. What do you see on the horizon for innovation? And is there anything that's
queued up that GRADE plans to take advantage of with your ecosystem? I think the biggest thing
that we're starting to see is samples of PCIe Gen 6. So Gen 4 is pretty prominent. Gen 5 is
really coming out. We're starting to see Gen 5 drives. They're extremely fast, which exposes bottlenecks
of traditional RAID technologies even more. The faster they get, the worse the performance is for
customers who try to deploy them with these old technologies. So that really plays into our hands.
We're also starting to work with some companies who want to use our software embedded in systems
that you might not see, so backup
appliances and other things where we're just a software layer providing that high-speed
performance inside of a solution like that.
And with the PCIe Gen 6, the performance is only going to continue to increase.
And because of the new form factors like E3 short, E3 long, E1, et cetera, the EDSFF
form factors, more and more servers are going to come
out with larger and larger numbers of drives. And so part of our solution is increasing our
ability to support that number of drives and still deliver the performance that our customers are
expecting. Yeah. I mean, software RAID is where it's at, Kelly, and GRADE is delivering in such
a big way. And we thank you so much for just touching on all of our questions here.
But I know others might have more.
So where can folks go to find more information about the solutions that we discussed today
and engage with you and your team?
Yep.
So we have a website, of course, gradetech.com.
Once again, we are Grade Technology, Inc.
Our product is called Supreme Raid.
You can also find us on LinkedIn and Twitter or X. And then obviously you can reach out to us and we can have a salesperson or a technical support type customer engineer communicate with you to answer any questions. thank you so much, Kelly, for being on the show today. It was a lovely interview and I learned a
ton about GRADE and about where we are going with the data pipeline. So thank you. And Janice,
yet another interview is in the books for us. It's always a pleasure to do these Data Insights
podcasts. Same here. Thank you so much, Allison. Thanks for joining the Tech Arena. Subscribe and engage at our website,
thetecharena.net. All content is copyright by the Tech Arena.