In The Arena by TechArena - Solidigm on Building Future-Ready AI Storage

Starting point is 00:00:00 Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome in the arena. My name's Alison Klein, and today is another Data Insights episode, which means Denise, Norowski from Solidine, is back with me. Janice, how are you doing today? Listen, it's great to be back. I'm doing great. Thank you. you. So, Janice, I know that we've got a really exciting guest and we are actually turning the tables a little bit with data insights and going deeper into storage media. Who do we have with us today? Yes, we do have a very exciting guest. We actually have Ace Stryker with us,

Starting point is 00:00:46 and Ace is the product marketing director of AI infrastructure for Solidime. Welcome to the podcast, Ace. Thank you so much, Janice and Allison. It's great to be with you. So Ace, just to kick things off, Can you tell us about your role at Solidime and what your focus is on AI and strategy? Sure thing. Yeah. I have been at Solidime since day one. The company is approaching four years old this December. And for the last couple of years, my job has been to eat, sleep, and breathe, AI data and what's going on in the quickly moving world of AI and how are the use cases and the opportunities evolving and what's the role. of storage in that. As you can imagine, it's a big deal for a company that makes SSDs. AI is the data driver of the 2020s and probably will end up being of the first half of the

Starting point is 00:01:44 21st century. Between the giant data sets that are needed to train foundation models these days and build in, you know, increasingly complex capabilities and the exploding amount of data on the other end of the AI pipeline and inference. where you have things that we can get into, like the key value cash or retrieval augmented generation. There's lots of emerging technologies on the inference side that also generate and consume a ton of data

Starting point is 00:02:12 in the course of providing more value to AI users. And so my job is to pay close attention to that, to understand new models, the sort of ecosystem landscape, potential partners for Solidime, understand our customers and our use cases better, Also, that we can make sure that we are investing our time and energy into making products that are going to help solve problems today and tomorrow.

Starting point is 00:02:39 Ace, I was so excited to talk to you today because Data Insights program is often focused on other companies talking about how they're utilizing data platform. So they deliver unique capabilities. But we're going to turn that around and talk about what are all these AI workloads doing that makes them so different and uniquely demanding for. storage across training and inference. And how does Solonime optimize for all of the core capabilities that they're looking for like high throughput concurrency and consistent quality of service? Yeah, AI is such a diverse field, right? Almost anything, any task that you can

Starting point is 00:03:19 imagine assigning to a human to get done, there's somebody working on trying to accomplish that with AI. And of course, we're here in the early sledding in the first, call it three to four years now, of AI really being a predominant thing in our lives. And we're dealing with a lot of the use cases that are low-hanging through at this point. A lot of fairly simple and straightforward tasks that we are finding ways to leverage AI to solve with less cost, less time, less human input. But now with the rise of agenic AI, we're able to move into a world of increasingly complex problems and assign those to AI and the quality of the solutions we're getting

Starting point is 00:03:59 are better and better. We're not quite there yet to where you can turn over any problem in your life and I will solve it, but month over month, even week over week, there's new models coming out, there's new tools, there's new solution stacks with pieces plugged in a new way that solves a problem that for the duration of human history up to this year had to be done manually by humans. And so as you can imagine with a diversity of applications there comes a whole bunch of different requirements on the hardware side. And from a storage perspective, we're really concerned about how the storage in an AI cluster

Starting point is 00:04:35 interacts with the memory to deliver optimized outcomes. Storage does not do the job on its own. To really understand how data flows through the AI pipeline, you've got to understand the frequent interactions between the storage layer and things like host DRAM and high bandwidth memory on the GPU, trying to protect this from a storage-only perspective is going to sort of limit your ability to understand what's going on. And so as we look at how we work with memory to do things like make sure the GPUs are fed

Starting point is 00:05:06 all the time and keep churning at high utilization, make sure that things like the history of your interaction with an AI model in the form of what's called the KB Cash, the model's short-term memory, is efficiently stored and recalled in the course of a conversation without having to expend a lot of time and energy and money on unnecessary tokens. And so there's a density piece of this that we worry about a lot, and there's that performance piece of this that we worry about a lot. But at the end of the day, all of that stuff, if I had to encapsulate it into one word,

Starting point is 00:05:40 the word that we hear most often from our customers, from our partners is efficiency, right? That is the name of the game. How do we do more with less? We know how much power these AI clusters consume. astronomical, right? We know how much data they need. We know how big some of these models are. So it's all about how can we get more efficient and storage plays an important part in that story and probably an underappreciated part in a world where a lot of attention is focused on the GPU for

Starting point is 00:06:09 a lot of good reasons, right? It's super expensive. It's very power hungry, right? We're all very worried about GPU specs, but a GPU that's not fed efficiently by the storage pipeline is wasted money. space and energy. So with all that, Ace, do you share specifics about customers or benchmarks? We're saw in the processes that really helped to kind of accelerate machine learning and or deep learning workflow? Sure. Yeah, we have a lot of great customer stories.

Starting point is 00:06:39 I'll take the opportunity here to plug our website. So selladime.com. If you go to the insights page, there's a bunch of articles about what we're doing with customers to solve real world problems. There's cool stuff on there about work with. Doug on a system called Nomad. It's a mobile data center that you can deploy out in the Sahara Desert if you want to solve problems with the edge. We have a story with InnoNet that's about collecting automotive data inside of moving vehicles. Maybe a good one. An example here

Starting point is 00:07:08 if folks aren't familiar with some of our work would be PKKO. We've done some work with them in the healthcare space on medical imaging and using AI to understand things like CAT scans and X-rays and to deliver diagnoses. more quickly, more efficiently. So it goes back to you can build an AI solution stack with hardware and software and aim it at almost any problem in the world, right? And the diversity of our kind of customers and use cases covers a lot of grounds for that reason. In terms of benchmarks, it's an interesting question because benchmarks are what we use to really get as close as we can to Apple's comparisons and to say, hey, if you just turn this one dial, for example, if you swap one

Starting point is 00:07:51 SSD for another, this is the difference you will see in some kind of measurable outcome. And so the leading benchmark today in the AI storage space is called ML Perf. It's published by a group called ML Commons, and they have a storage-specific test that lets you plug in either a single drive or an array of drives inside of a server and run it through some AI workloads to understand how many GPUs can you support at high utilization with this storage subsystem. I encourage folks to check that out if they're interested. MLPIR just published results from version 2.0 of their test,

Starting point is 00:08:26 and Solidime drives were well represented in there in systems from a few different kind of ecosystem members. But yeah, that's the one that we're certainly watching and we've even got a hoax on the working group that they're helping develop the next versions of those tests to make sure that they align with real-world use cases and stuff. But that's a very useful tool to understand just how important storage performance is

Starting point is 00:08:49 to making sure that those high-blotent GPUs you bought are being properly used and heavily used as you want them to be. I've been having a lot of conversations with practitioners of late about what infrastructure requirements are shaping to be moving forward. And one of the topics that keeps coming up is data management at scale. How does Solidine tackle managing and moving data with speed and efficiency? Well, you're talking about these large-scale datasets. It's another answer that necessarily includes other parts of the system, right?

Starting point is 00:09:26 We can't just say storage does that on its own and solves those problems. When we're talking about moving and managing large amounts of data, the software matters very much, and the networking matters very much as well, right? So when you look at the way these AI clusters are architected, typically you have a bunch of GPU servers with a certain amount of storage inside those boxes, although it's not a ton. A GPU server from NVIDIA might have eight or 10 slots for SSDs in there. You can fill that up and you can use up that space pretty quickly. And so in most cases, for these larger deployments,

Starting point is 00:09:59 you're also communicating across a network to dedicated storage servers that might be full of 24 or 32 SSDs. And you may have a whole bunch of those stacked on top of each other, right? So, of course, it's important to put the right SSDs in the box and make sure those are performance enough and make sure they're high density so you can get some of those efficiencies of scale in terms of storing more data in fewer boxes. But if you're choking that with a slow network connection or poor orchestration of the software, you're going to get burned as well. That's why we're

Starting point is 00:10:31 very excited looking at emerging networking technologies. There's a lot of reason for optimism, whether it's Ethernet or some of these other kind of proprietary approaches. We are approaching a world in the next few years where the bandwidth over the network can and will in some cases exceed the throughput of storage devices. In other words, the network ceases to be the bottleneck and then it's back on the storage devices, right? And then you can really unleash great performance by putting faster SSDs in those boxes than we typically do today. It comes down to partnerships. This is something that Solidine spends a lot of time and energy on and kind of a point of pride for us is identifying those movers and shakers in the ecosystem,

Starting point is 00:11:14 whether it's folks making GPUs or folks building industry-leading software like WECA or working with the CSPs or Neo-Clouds that kind of put all this stuff together. It's all done in partnership with them to really dial in the optimal combination of hardware and software to make sure that as that amount of data continues to grow and grow, and the demands from the workload in terms of the speed of moving the data, only get higher that we're able to keep up and deliver results that'll make AI users happy and ultimately create value for businesses at the bottom line. As emerging paradigms like in-memory processing and computational storage of all, what's your vision for where intelligent storage is

Starting point is 00:11:56 headed next? That's a good question. Because there's so much investment and attention on doing AI bigger and better and more efficiently, there's a lot of mad scientists. in labs around the globe that are working on some pretty cool stuff, right? And so if we take that view of the AI clusters, having storage in a couple of places, let's look at the GPU servers first. That's where we talk about direct-attached storage, those SSDs that are plugged directly into the GPUs. So there's a lot of chatter now about high bandwidth flash, which is kind of an emerging

Starting point is 00:12:31 application for NAND hasn't existed or has only existed in a very limited form in the past. But you can expect to hear more about that in the future and how can we really unleash the performance of the NAND media and get it out from behind the PCIE interface. And then even within conventional SSDs, we'll continue to see that PCIE interface evolve, right? Most of the high performance stuff today is all PCIE Gen 5, but we've got six and seven coming behind it quickly. That's going to increase bandwidth by quite a bit. It doubles every generation. And so the GPUs will be much happier in terms of their bit. to pull data when they need it.

Starting point is 00:13:08 But that's also going to generate a tremendous amount of heat when you talk about these next-gen SSDs. And so thermal management for storage becomes a bigger deal. If you look at AI servers today, we're pretty worried about thermal management of the GPU and the CPU, and a lot of those use cold plate cooling because fans are no longer sufficient to keep them cool enough to prevent them from throttling.

Starting point is 00:13:33 We have not had to worry about that for storage in the past. We absolutely do have to worry about it going forward in Gen 5 and beyond. And so things like cooling storage with efficient cold plate mechanisms or even immersion cooling where you dump the whole thing in a tank of mineral oil or whatever the liquid is, that's going to be a bigger and bigger challenge and opportunity for storage vendors to innovate and solve some of those problems. When you look at the other side of the AI cluster, which is the network attached stuff, those dedicated storage servers that sit in racks maybe next to the GPU servers.

Starting point is 00:14:09 Really cool stuff coming along, obviously in terms of higher and higher densities per drive, right? So Solidime has led for the last year with 122 terabytes in a single SSD about the size of a deck of cards. That will continue to grow and grow. We've announced plans for 256 terabyte drive. And you can imagine it's not too long in the distant future before you're going C. Solidime and others aiming at a petabyte in a single device, which was unfathomable even five years ago. That's just a wild amount of data. But in the world of AI, we're seeing the amount of data only go in one direction and go very, very quickly higher and higher. So that's a couple of areas I'd keep an eye on. There are wild cards out there as well. You mentioned computational storage. What can folks do with that? How much work can you take off of the CPU or GPU by leveraging some of the computer?

Starting point is 00:15:03 inside the SSD. There are some interesting experiments and in limited forms there are some products available on the market to do things like compression and decompression of data using SSD compute. Storage class memory was a big deal a few years ago, has been less so but a lot of people are now saying, hey, this might really be a good fit for AI workloads. And so we may see that start to come back. CXL is something that feels like it's been talked about forever and has never really landed in a way that has shaken up the market in a big way, but continues to be talked about. And perhaps there are new applications for that in the AI world as well. So those are a few of the things that I would say keep an eye on, and we might see some

Starting point is 00:15:44 exciting innovations coming down the pike in the next few years. What you've been talking about is just an acceleration of innovation on the number of fronts, from interfaces to media, to new classes of devices. If you were going to be having a one-on-one within, I'm IT architect, looking at the next generation storage layer for high-performance AI. What would you talk about them in terms of guidance on how to plan for that? I would say particulars we're dealing with more data as time goes on. It pays to really get familiar with how data moves between different memory tiers and storage within an AI system.

Starting point is 00:16:24 And there may be a significant kind of untapped opportunities for efficiencies there. So as an example, we recently published a white paper. You can check it out on our website. We have an explainer video with it. We worked with this great company called Metrum AI. To try to answer the question, what happens if you move significant amounts of AI data out of memory? You offload it onto SSDs in ways that people don't typically do. How does that affect performance?

Starting point is 00:16:51 How does that affect memory utilization? And we have some pretty interesting results. So we use this example use case where you have a video of a traffic intersection. there's cars, there's pedestrians, there's cyclists, and we feed it into this analysis pipeline that generates embeddings and creates a rag database and then ultimately outputs like a safety report that says, hey, this is what happened in the video, these are the changes that could be made to this intersection to keep people safe. And then we ran that keeping all the data in memory and we ran it moving as much as we could using kind of industry standard approaches

Starting point is 00:17:25 outside of memory and onto SSD. And that ended up being a couple of things. mostly in terms of the data that we moved. We moved a lot of that rag data that we generate in the course of analyzing the video. Typically that's in memory and it can get quite big. We move that to SSD and then also some of the model weights themselves. You don't need to keep the whole model in memory all the time, right? You're typically only accessing certain layers at any one time. And so the ones that aren't active, can you put them on disk or SSD rather than keep them in memory?

Starting point is 00:17:56 And you can. We demonstrated that and we wrote about how we accomplished that. And what we saw was, yeah, you can absolutely use less memory. That makes sense intuitively, right? If you're moving stuff out of memory onto SSD, you don't need as much DRAM. And we saw like a 57% reduction in the amount of DRAN that was used in a 100 million vector data set that we benchmarked. And then the other part was the model weights and moving those from memory to SSD. And that was really exciting because what that allowed us to do was to run more complex models on GPU

Starting point is 00:18:30 hardware that you typically just cannot run it on, period. Our demo included running the Lama 3.370 billion parameter model, I believe, on an Nvidia L40s. And that's a lot of alphabet soup there. But long story short, that is a combination of GPU and model that you cannot use in the real world typically. There is not enough memory on that GPU to fit that model. But we showed that by moving some of the weights into storage, you could.

Starting point is 00:18:58 And so you can imagine the implications of that in terms. of maybe edge environments where you have less power and you need to use GPUs that might have more severe memory constraints. You can imagine how that might apply to legacy hardware that an enterprise might already have, that they want to repurpose for new AI use cases. What we're showing is new possibilities unlocked where, yes, you can run these modern complex models on hardware that you didn't think you could before. And that involves this kind of NSISD offload approach. And so there's a lot more work to be done there. Our white paper was just a tip of the iceberg and we're getting ready to publish some more. So stay tuned on that closer

Starting point is 00:19:34 to the end of the year. In terms of like the advice I would give someone about architecting the next generation storage layer for scalable AI, I would say pay close attention to where data resides. And if that's the optimal place, if the way that we've always done it is the way we should continue to do it or whether leveraging high performance SSDs can gain you some efficiencies. and, oh, I didn't mention this, but we didn't lose performance at all when we did the SSD offload. That was the other kind of key finding.

Starting point is 00:20:06 In fact, we gained performance because the indexing algorithm that was used in the SSD offload approach was so efficient, our queries per second actually went up by like 50% versus running those queries with the data in memory. And so that's something I think a lot of folks

Starting point is 00:20:20 I'll realize, and that would be a piece of advice I'd give to anyone kind of looking at how do we optimize and do this even better in the future? So, Ace, that was a lot of good information. Definitely some great work there with Metrum AI. So thank you for that deep dive. And I agree.

Starting point is 00:20:35 I encourage anybody who hasn't seen that paper on her website and some of that data, and you should check it out. Any other AI communities or collaboration or research that you've been working on to help shape future ready storage solutions that you want to talk about? Yeah. I mean, almost nothing that Solidime ever does is done alone in the vacuum. If you look at our key values on our website, which I know nobody ever really looks at that stuff. But if you go to About Us and you look at our key values, you'll see our corporate logo,

Starting point is 00:21:06 which is sort of this interlocking S thing. And it explains that what that actually means is partnership. It has to do with two partners coming together and fitting into each other. And that wasn't an accident. That was chosen as the company logo because that is so core to who we are and the way we approach these problems. So anyone you can imagine across the ecosystem, whether it's invidia and their GPUs and inside the servers, how do we work together on thermal solutions for compute and storage? We've written about that recently, whether it's the CSPs or the neoclouds that are making the hardware available for enterprises to turn on their AI and try to go solve those problems and extract the value to improve their bottom line,

Starting point is 00:21:50 whether it's the software geniuses who are working on orchestration and putting all the hardware pieces together and getting the most out of them, we're talking to those folks every day. And we are constantly growing in our understanding of what problems do they face, what opportunities do they see, how can we contribute from a storage perspective? And so it's a great point of pride for us that almost any time you see solid I'm on stage somewhere, you'll see us with a partner. And that's because it's really embedded in our DNA. That's how we approach these things and solve these problems together. When I think about what you've sent in this episode, I know that our listeners are going to want to engage further. So first of all,

Starting point is 00:22:35 thank you for spending time with us. I always learned something from you. And I'm so glad. My pleasure. Thank you. But where can folks engage with you to continue the dialogue? And then where can they go to find out about the solutions that you talked about, whether it's those 122 drives or other technology that Solidime is delivering to the market. Let's say start with the website. We update that all the time. We just recently launched a whole new section of the website dedicated to edge AI problems and solutions and use cases. We're constantly posting new articles, the kind that we talked about earlier, talking about customer engagements and how we're solving problems together. You can check out our LinkedIn, our YouTube. We've got

Starting point is 00:23:16 new content on there all the time as well. And if you're going to be at any of the big conferences coming up in the fourth quarter of 2025, things like OCP or Supercompute, we will absolutely be there as well with Bells on. We'll be sponsoring and have booths. And please come by and say hi. We'd love to talk to you more. Awesome. Thank you so much for being on the show. And Janice, yet another fantastic insights episode. Thanks so much for your collaboration. pleasure thank you allison thanks for having us here thanks for joining tech arena

Starting point is 00:23:50 subscribe and engage at our website techorina.a i all content is copyright by tech arena

In The Arena by TechArena - Solidigm on Building Future-Ready AI Storage

Solidigm’s Ace Stryker joins Allyson Klein and Jeniece Wnorowski on Data Insights to explore how partnerships and innovation are reshaping storage for the AI era....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.