In The Arena by TechArena - Storage Innovation in the AI Era with Solidigm’s Roger Corell and Tahmid Rahman

Episode Date: November 28, 2023

TechArena host Allyson Klein chats with Solidigm’s Roger Corell and Tahmid Rahman at the OCP Summit about their company’s heritage in the storage arena and how their SSD portfolio delivers the per...formance and efficiency required for the AI era.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Allison Klein. We're coming to you this week from OCP Summit in San Jose, California, and I'm so delighted to be joined by Tamid Rahman and Roger Correll from Solidigm. Welcome, guys. Thank you. We're happy to be here. Thanks. Thank you, Allison. So, Tamid, why don't you just get started with an introduction of your role at Solidigm and your role in storage? Yes, Alison, I manage a team of product marketing engineers who are in charge of maintaining the near-term roadmap and basically getting the alignment from customers, looking into industry trends and incorporating them in our products and our offerings.
Starting point is 00:01:03 At the end of the day, we are basically the trusted SSD advocates for our customers. Fantastic. And Roger? Yeah, sure. So I am the Director of Solutions Marketing. And basically what that means, Allison, is modeling, and kind of the key set of messages that support that as we bring our products to market. Now, Solidigm is one of the newest names on the storage arena, but you have a very long heritage. Now, can you explain where you came from? Indeed, sure. Good questions. Let me start at the beginning, I guess. Our day one stand up was December 30th of 2021. So we're just about coming up on our two year anniversary. business and Intel's former NAND SSD business selling our data center products, selling the data center portfolio under the SolidIne brand.
Starting point is 00:02:11 And combined, we bring a little bit over 50 years of innovation to the market. So if I could go into that a little bit. Yeah, sure. So it was kind of interesting when I was developing what I call this histogram of combined innovation across the companies. Our lineage back to Intel basically brings us almost to the start of NAND, where we had a NOR flash prototype in 1988. So that's kind of the beginning. Then again, through our Intel heritage, we've been involved in helping create or co-create PCIe specification, the NVMe specification. We were actually, we're here at OCP and we are actually one of the, I believe, one of the first members of OCP all the way back in 2011 as a co-founder. Now, fast forward to today, and some of the innovations that we're most proud of
Starting point is 00:03:08 have to do with kind of our density leadership enabled by our hyper-dense portfolio supported by QLC technology. First with QLC technology in 2018, and what we believe is the industry's strongest QLC portfolio currently in the market. That's fantastic. Now, Solidine has been making waves with new product introductions of late.
Starting point is 00:03:32 We've had an SLC NAND product hit the market. You've also introduced QLC NAND. We'll get to that in a second. OCP Summit has been all about the AI era. And when you think about the AI era, it starts with data. Tamid, do you want to talk a little bit about how you view the specific challenges of AI from a storage perspective and how that's influenced SolidIne's engineering roadmap? Absolutely.
Starting point is 00:04:01 I think all of us are here learning the new requirements around AI. The market is really, really growing, right? One of the key drivers here is the data growth that is needed. Think about the AI models. They're increasing very fast. In some studies, it says 10x in two years. So I think with the growth of the AI model, there's also a need to grow on the density. And we are also seeing that a lot of these workloads are read heavy, which actually goes back to your reference to QLC, right? Do we have the right density of drive
Starting point is 00:04:41 in the portfolio? Does it have the right performance to do those data analogies that AI requires, be it inference or training? So as time goes on, you'll see these AI applications kind of spanning from the core to the edge, where at the edge, there'll be a lot of real-time inferences going on. And on the core, there'll be training servers where you, again, need a lot of density and a lot of real-time inferences going on. And on the core, there will be training servers
Starting point is 00:05:05 where you, again, need a lot of density and a lot of performance. I think we have both of them in our portfolio between all the way up from the SLC and all the way down to the QLC swim lanes. Roger, do you want to add something here? Yeah, thanks, Tamit. I guess I would just kind of emphasize the edge.
Starting point is 00:05:26 Not only I totally agree with Tamit that we're seeing inference at the edge, but we're also beginning to see kind of some lightweight reinforcement kind of learning or training at the edge, which further increases the need for fast, performant, high-density storage at the edge. And also, when you start talking about edge locality constraints, and I don't know, maybe we'll get into that a little bit more, that's where we think, again, a performant, kind of hyper-dense portfolio really pays off with those edge challenges. Now, you talked about the edge, and obviously the edge is something that is growing
Starting point is 00:06:05 every day. There are different use cases driving the edge, but one of the things is data gravity and not wanting to go to the cost or time expense of moving data up into a data center. What does that mean in terms of the workload composition at the edge? And what does that mean in terms of the types of storage that might be changing in those edge environments? So when we look at storage at the edge, we look at what we call locality constraints or locality challenges. You know, factors like size and weight of a better term, kind of nodes continue to proliferate for reasons of faster responsiveness, lowering the cost of moving data. There's some studies that say moving data is four times more costly than storing data. we see density as really kind of a key in addressing these size, weight, limited power availability, serviceability, and kind of operational efficiency challenges. In terms of workloads, we see a lot of these workloads, as Tameed was referring to, as kind of re-dominant.
Starting point is 00:07:20 But certainly there needs to be some amount of write capability there as well, you know, to take the data in from the endpoint and infer from that. But, yeah, so we think it's all about locality challenges and how hyper-dense storage is really, we believe, best suited to address those challenges? These other aspects of the edge that also is becoming really, really critical, which is, you know, multi-tenancy at the edge. As you deploy a really large density of drive, you want to monetize it. There's multiple ways of doing that. You can use namespaces. You can manage them in a different way.
Starting point is 00:08:00 You can have a lot of control knobs on the device. And both NBME and ocp kind of gives you that framework and specifications to design around it so i think we're ready for that at the core implementation level now we have to just take it to the edge the other thing i want to point out is the security at the edge that also becomes really critical because now that you don't have direct access to it, you can send a service person to replace a drive. So remote accessibility, having a test station at the remote edge would be another critical
Starting point is 00:08:35 area of innovation. And I think in this case also, it will be playing a good role in getting all those security features implemented in their specification. Now, we talked about SLC. We talked about QLC. For those of our virtual audience that don't spend their lives imbued in NAND technologies, can you break apart those acronyms and talk about how those various products play in different roles for data centers? SLC stands for single-level cell.
Starting point is 00:09:06 And think about like one transistor that stores your data bit one on zero, just at one bit, right? And then as you scale from TLC to QLC, three bits per cell and four bits per cell. So if you just compare a TLC to a QLC SSD, you're looking at around 33% density increase, right? and four bits per cell. So if you just compare a TLC to a QLC SSD, you're looking at around 33% density increase, right?
Starting point is 00:09:32 And on top of that, here in Solidigm, we also have very, very dense design in NAND. So we can actually have up to 28% better density compared to the alternative TLC in the market, right? So taking everything into account, if you're really looking for density and the right performance, maybe the QLC SSD is a good fit for you. If you have a lot of caching going on and your ride endurance is much higher, right, the duty cycle of ride is much higher, then you will probably look at something like a TLC, one driver's per day, three driver's per day, or all the way up to an SLC drive, which is a 50 driver's per day. So I think we got you
Starting point is 00:10:12 covered for all these applications, you know, coming from AI analytics, CDN, all the way to OLTP database, and all the way up on high frequency trading. When you kind of look at that in aggregate from a portfolio perspective, so our view of the market is most of your listeners, everyone, all of your listeners, I'm sure, are aware is it's an increasingly bifurcated or segmented market. There is not a one size fits all. If you just look back a few years ago, you had a limited number of capacity points and you had a U.2 form factor and an M.2 form factor and maybe three endurance swim lanes. Now we have endurance swim
Starting point is 00:10:53 lanes from 0.5 drive rates per day to 50 drive rates per day. We have the entire EDSFF portfolio. We have U.2. We have capacity points from 3.84 to 61.44 terabytes, an industry-leading storage density on a per-drive capacity basis. So we believe that this portfolio enables our customers to really kind of find the right fit drive. Now that brings up a really important question. When we look at data centers today, the value proposition of SSD technology is just undeniable. But we still see some apathy with spinning disks continuing to run in data centers. What do you think is the driver for holding on to this antiquated technology? And what do you think is going to be the tipping point? Is it going to be performance?
Starting point is 00:11:48 Is it going to be latency? Is it going to be sustainability? I'll take a crack at it, I guess, just to kind of lead with furthering your point. Our estimates, as well as I would, I guess, say a consensus of industry analysts, is about 90% of core data center storage is still on HDDs. Just kind of blows your mind, right?
Starting point is 00:12:12 It's crazy. To your point, Allison, like why? Now, when you get further away, when you get to kind of a mid-tier edge infrastructure, when you get to kind of an edge endpoint server certainly you're going to see that hdd component come down but still like you're saying the core kind of clings to clings to hdbs in spite of you know tco modeling that strongly suggest maybe the other path would be a better solution. We're not saying there will not be a place for HDDs. We just believe that HDDs should get, I guess, moved a bit further down into the stack,
Starting point is 00:12:55 down into much colder archival storage. We believe the industry is beginning to, and certainly our hyperscalers, I think, is a testament to this beginning. And kind of storage innovators, some top tier, you know, tier one and two CSPs are beginning to see the TCO benefits. But I think it's going to be these modern workloads, to your point, that are demanding more data, demanding that they access more data at speed and do it in as efficient a way as possible. So I think these modern workloads could begin kind of accelerating that HDD displacement opportunity. Yeah, I agree with you, Roger.
Starting point is 00:13:41 The other thing is the performance density, right? So performance density to keep up with the data growth. If you need to maintain that, I think SSD is the solution story together, you see tremendous amount of benefit. We just did an analysis on our P5336 drive, 61 terabyte, on 100 petabyte object scale deployment. If you replace those HDDs with the 61.44 terabyte P5336, you can see immediately 47% TCO benefit. And that dollars per gigabyte discussion does not, you know, translate that story, right? Right.
Starting point is 00:14:31 So I think the discussion and narrative needs to go to a full system level, you know, cost reduction. And not only that, I'm looking at this OCP, you know, giant poster here. It talks about sustainability. That's another talent of sustainability that needs to be, you know, giant poster here, it talks about sustainability. That's another talent of sustainability that needs to be, you know, pressed. And with that TCO benefit, you see the power reduction, rack consolidation, and the mere fact that SSDs have crossed the density point of baseline high-density HDD is also a testament to the fact that, yeah, we are there with the density and the performance and the full system level benefit.
Starting point is 00:15:10 Yeah, that's really great. Not to mention the cooling aspects of spinning disks versus an SSD configuration. as these alternative because you know air-based cooling is a lot less efficient than as we're seeing on the show floor liquid cooling immersion cooling I mean some of the sustainability benefits of those methodologies are just eye-popping versus traditional air-based cooling airflow cooling and using hyper dense storage in an EDSF form factor allows you to get even more out of those alternative methods because you can pack more storage in the same space. Talk me through, you know, you've talked about SLC, TLC, QLC. You've talked about your different capacity ranges. Talk to me about how you would work with a customer in terms of a deployment from the data center to the edge to really figure out what are the right
Starting point is 00:16:12 capacities and performance vectors that they need to be thinking about in terms of the tiers of their storage. And then how have your partners responded with, you know, unique systems that are designed for these environments and the workloads that run within them? Yeah, of course. I mean, as we were talking about it, the density range that our roadmap offers is really diverse. At the very beginning, we have a 200 GB SATA drive.
Starting point is 00:16:40 So, yeah, SATA is not going away. It still has its last legs and it's still running. So we have those SATA drives. So if you are a system integrator and doesn't want to move from legacy SATA drive, we have those solutions. But as you build up from there, there's different versions of fault factors that we're enabling. So if you're on an M.2 or on legacy system, at the edge, we would recommend you to go to an E1.S. Why? Because the connector is more robust. It is designed for the next interface.
Starting point is 00:17:13 It is also much more thermally efficient than an M.2 design. And last but not least, it is hot pluggable. So all these reasons are there to bring this EDSFS to the forefront. So talk about E1.S or E3.S or E1.L. If you are a capacity-optimized solution, you're deploying that, then probably an E1.L would be a good entry point for an innovator. E3.S would be a good replacement for a U.2 feature. Workload-wise, I think think as I was mentioning, the one size fits all doctrine of SSDs is over. One drive writes per day will not serve
Starting point is 00:17:57 on the market. Especially it will be an overkill if you put a one drive per day on a server that is mostly doing reads. So from that point of view, we are trying to differentiate our roadmap with essential endurance, value endurance, standard endurance, which is the one driver per day, and build from there the medium endurance and very high endurance. So based on where the workload is, you can actually choose these drives. Some of these drives use the same controller. So there is ease of qualification and ease of, you know, use of same farmer base. Right.
Starting point is 00:18:32 Right. So I think that's where SolidM brings the value that you choose which one is best for you. And we are there, as I said, your trusted story advocate to guide you through that process. When I think about OCP and I think about the future of storage, there's been a tremendous amount of work on specifications and then in the sustainability field. Also, how do we repurpose storage for second life? And what do we do for data sanitation? And in those efforts, what can you tell me about SolidIme's engagement in OCP? And where do you see the most value from this organization in terms of shaping the future
Starting point is 00:19:14 of storage? And not just for the hyperscalers, but for Broadmark? I'll take it from the technical point of view, and then I'll let Roger comment on it. So the data sanitization is definitely a feature that we have been supporting for a long time. And I think repurposing, reusing the drive is something that we think is the right thing to do because we pack a lot of endurance in our drives. Think about it. Our top line 61 terabyte has 210 petabytes written capability from an address point of view. If you compare it with an edge application like an ADAS, where you have the development vehicle running around in the city and collecting data,
Starting point is 00:19:56 at best, they will be probably getting 25 petabytes in five years. So still, it's not working, right? So how do you repurpose those drives? We give them the, you know, different wear out indicators so that the user can actually monitor the state of the drive and we can repurpose it as needed, right? And they can also do securities remotely and then get the data back or clean the data
Starting point is 00:20:23 so there is no remnant of old data, be it healthcare related or stuff like that. Right. Wonderful. I guess just to build a little bit on what Tamid said and maybe map it back to something I said a few minutes ago, we are super engaged with, we're really leaning into the OCP community. As I mentioned, we have our roots as a co-founder back in 2011. And just to kind of build on that a little bit, certainly OCP is extremely interested in
Starting point is 00:20:56 adopting the EDSFF form factor. We continue to build out, not only were we inventors of it, given our lineage back at Intel, but we believe that we've got the broadest EDSFF portfolio in the markets. We are working with, you know, we see the direction of where cooling needs to go in terms of alternative cooling methods. And we're collaborating with industry leaders on how our drives can kind of help these alternative cooling methods get even further optimized. And then you look at storage density and what that means to hyperscaler, leading with QLC, leading with, you know, first to market with 61.44 terabytes.
Starting point is 00:21:38 And then the sustainability benefits, as you mentioned earlier, and, you know, certainly in comparison to what we talked about a minute ago, the, you know, kind of the existing, you know, the 90% install base and all of the sustainability benefits that QLC can bring to the hyperscalers. So we're really leaning into multiple kind of tenants or initiatives within OCP. That's fantastic. Thank you guys so much for your time at OCP. I know it's a busy week for you guys.
Starting point is 00:22:13 One final question for you. I'm sure that those who are listening online want to learn all about Solidigm SSDs, the technology, and how to partner with you. Where would we send them for more information and to engage your teams? So go to solidigm.com, and we have all the information there. But if you want to reach out to us, both Roger and myself, we're available through LinkedIn. Just search us out. And then tomorrow, I just did a session today in the Explorer Hall,
Starting point is 00:22:43 so it will be probably recorded and available on YouTube. Tomorrow, there's a sustainability track. We are the only SSD vendor in the sustainability track. So I would like you guys to attend that. It's around, I don't want to give you the wrong information. It's in the afternoon, 2 o'clock-ish. So yeah, just go ahead and attend that Solid-N session. And we will be talking about thermal as well. Well, guys, it's always a pleasure. And this was no different. Thank you so much.
Starting point is 00:23:11 Thank you. Thank you. Thanks for joining The Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.