Podcast Archive - StorageReview.com - Podcast #126: A Myriad of Storage Topics with Quantum

Episode Date: January 12, 2024

Our podcast has Brian introducing Jordan Winkelman from Quantum Corporation.  If you remember, we… The post Podcast #126: A Myriad of Storage Topics with Quantum appeared first on StorageReview....com.

Discussion (0)
Starting point is 00:00:00 Hey everyone, Brian Buehler here with the Storage View Podcast. We've got another friend on of the podcast today. He's a newer friend. Jordan and I met over, I guess it was coffee or not quite over coffee, in the coffee break room of the Quantum Office out in Denver, Colorado area as where many good things happen. We got sidetracked and spent 45 minutes talking about AI and GPUs and security and surveillance and all things storage and AI and maybe even space related.
Starting point is 00:00:34 I'll have to remember what else we covered on that. But I'd like to welcome in my new friend, Jordan Winkleman, CTO of Quantum. Jordan, thanks for coming in. Thank you, Brian. So you're on the road today, but as I said, you and I met in Interactive when I was out looking at Myriad a couple weeks back, and then I ran into you at the SC event. But most good things happen near the coffee machine or the bar. Am I incorrect in that statement? No, I think you're correct. I drink about a gallon of coffee every single day.
Starting point is 00:01:09 And yeah, you get a little more descriptive when you've had a few drinks in you. Descriptive, yes. And if you can keep on the coffee and less hiding in the restroom dealing with it, then all the better. So the first question I got from our audience when I told them what was going on, on excellent that with this pod is they said, what is quantum? And I thought maybe the easier question is what isn't quantum, but I'm going to give you both questions and let you choose which one you want to answer. I'll answer a little bit of both. Okay. Kind of, kind of comes, you know, at the same time. So the high level is quantum is a 40 year old 43 year old storage company we started in hard drives and dlt backup tape um you may remember the quantum
Starting point is 00:01:53 fireball or the quantum uh bigfoot hard drives that you might have had in your early personal computers um in roughly 2000 we actually sold the hard drive business to MacStore, who then later resold that business to Seagate. And you might look at it as half of the hard drives in the world today have Quantum's intellectual property in them still. Around the same time, we acquired ADIC, the makers of the Scalar robotic tape libraries, the very large and LTO at the time, as well as our Stornex file system,
Starting point is 00:02:28 which is heavily used in the media and entertainment world for content creation, pretty much all the movies and TV shows you watch. But it's also used heavily in large scale archives because it has an integrated HSM that helps protect data for those same media and entertainment organizations, but also this country's and many others' intelligence world. Since then, we went on a bit of an acquisitions binge in the last five years.
Starting point is 00:02:56 We also created variable block data duplication. When we use that with our DI disk-based backup platform, we have the active-scale object storage platform, as well as the active-scale cold storage extension to erasure-coated tape, a true tape-based object store. We acquired the Pivot3 hyper-converged storage company, which has a significant focus on video surveillance, but also allows us to install GPUs in it. And it's the basis for our CatDV AI platforms to provide on-premise AI for media and entertainment workloads.
Starting point is 00:03:34 Obviously we're large in scalar tape. We are the number one vendor of LTO tape to the hyperscale tape world, what you might consider a deep or cold storage in the public cloud. And we are in the seven of the top seven hyperscale tape world, what you might consider deep or cold storage in the public cloud. And we are in the seven of the top seven hyperscalers globally. I'm probably forgetting something,
Starting point is 00:03:52 but it's a pretty broad portfolio that can support the needs of almost any storage customer. Well, I was out there for Myriad, which is your newest product, right? So what's the pitch on Myriad and how that fits in with all of these other components? Well, Myriad is a next generation all flash file and object storage platform that presents as a NAS today. It'll present as S3 in the very near future.
Starting point is 00:04:16 And a little further down the line, we'll have a parallel file system client for it with GPU direct storage. So while today it's all NVMe and RDMA internally, all orchestrated by Kubernetes, a true cloud native platform that can stand itself up in roughly 30 minutes just by plugging things in and letting it go. So uniquely fresh storage platform that's really built for the future. All right, so we'll come back to that
Starting point is 00:04:43 because we've done some work on that as you're well aware and published around that recently. And I want to talk more about Myriad, but let's do something fun. Let's talk about the hard drives. So you started out with what Quantum was and actually it comes up every time that we talk about, you know, we're doing something with Quantum. We did a piece last year around the air gap security and tape libraries, which was really cool, and all sorts of other things. But people still, well, of a certain age, people still don't forget about the hard drives. I think one thing the industry, at least the younger guys these days, have forgotten is that at a point in time, there were dozens of hard drive vendors I mean this is
Starting point is 00:05:25 probably absolutely before your time as well but you know the history right absolutely and and you know I may look young but I'm not unfortunately not that young uh my first hard drive was a 40 megabyte Quantum Fireball so in my uh Mac 2 Si back in the day. But you're absolutely correct. You had vendors like Mac Store, Toshiba, Seagate, Western Digital, HGST, IBM, Samsung. I'm sure I'm missing others in the mix and others that just didn't survive throughout the years. Right, I mean, it's hard.
Starting point is 00:06:01 If you think of all the places that you'd wanna create a startup in, go raise a couple hundred million from Silicon Valley to to go after hard drives can't be very high on that list, right? It's a pretty commodity market today, that's for sure. And the time I talked to the Seagate guys about Hammer recently and the time to innovate, that's like two decades. So it takes a long time to go from idea to engineering to deployment. But the other thing is tape. I've got an LTO 8 cartridge from you guys sitting on my desk here that brings back a lot of memories for a lot of venerable IT admins. Tape was still pretty big at SC23 this year. So Tape, despite it being perceived as a legacy media,
Starting point is 00:06:51 I don't think we've been in a more strong environment for Tape. What's your take on that? Well, I'm of the belief that Tape rebounded from a low point a few years back because of the major changes from being traditionally used for backup and recovery to more of a long-term archive and preservation medium today. What most people may not know is that the public cloud is as much as 50% tape-based cold storage
Starting point is 00:07:20 from the overall storage perspective. Whether the data is being moved from a customer's on-premise data center or a co-located data center, it's just going back into the tape once it gets into the cloud. The densities of tape, the low power and cooling requirements of tape
Starting point is 00:07:36 make it a very optimal format for that long-term preservation because it's as much as 80% lower cost than a hard drive based object store and dramatically lower cost than the public cloud where you're paying infinitely forever. And then you might have those variable retrieve costs. So it's really a matter of where does that tape live today as opposed to where if that tape still exists or not. Do you think tape would be more sexy or appealing from a brand perspective
Starting point is 00:08:11 if the hyperscalers were less into obfuscating how they're using tape than whether or not they are? I mean, I know this is a perpetual debate with certain cloud providers for their low cost tiers that many have assumed have been on tape because how else could you provide whatever cents per gig for an eternity? There's no other media that would make any sense there in many cases, but the hyperscalers still don't really want to talk about that publicly
Starting point is 00:08:42 for whatever reason. Do you have a read on that? I don't know that to talk about that publicly for whatever reason what do you have a read on that i don't know that it's relevant anymore okay we sell a lot of tape with our object store we don't call it tape we call it cold storage it what matters is does it meet the cost metric does it meet the durability and does it have the lifespan of what the customer wants to do to preserve that data? That's really the important part. So when you talk about object on tape, I mean, most people think about fast and erasure coding, it's spreading out bits over multiple drives or SSDs. With tape, you can't possibly erasure code across dozens of tapes, right?
Starting point is 00:09:38 I mean, how does that work? Absolutely wrong. That's exactly how the cloud works today. Whether it's your household names or not, whether the customer brings their own software, and in this case, I mean the hyperscalers or we bring the software, you are erasure coding across multiple libraries for availability, and you may be erasure coding across multiple geolocations, again, for greater durability or availability. Even allowing for entire site failures to go offline without impacting the availability of that data, and the customer can still meet their service level agreement to their own customers. What sort of data is this? Would this be, I know you guys have a big foothold in media and entertainment amongst others, would this be, we've made a movie, we've sent it out,
Starting point is 00:10:34 it's streaming and published and gone to the theaters, and now we need to archive all of this content because in 25 years we're going to remaster it and we'll wanna pull it back or is it traditional backup kind of use cases or what are some of the other use cases that you're seeing for Object with Tape? It's a very wide variety of use cases. It could be long-term backups where you might have your retention copies
Starting point is 00:11:01 for months or years or even decades based on what type of business you're in whether it's financials whether it's life sciences whether it's medical imaging in the cases of medical imaging the data may be required to be stored for upwards of 100 years past the life of the patient so your regulatory issues in those cases, right? Absolutely. But also NSF-granted research projects where the data has to be maintained forever and be readily available to anybody if it's been funded by the federal government
Starting point is 00:11:35 here in the United States. So autonomous cars, pretty much anything you can think of that needs to be stored for long periods of time. Things that can be remonetized, obviously, customers really like if they can take those, I'll use media and entertainment assets. Think about how many times Star Wars has been redeployed to different tape media or different DVD, VHS, Blu-ray, whichever formats.
Starting point is 00:12:04 And when the next generation of 8K comes out, guess what? They're going to go back to those archives again, and they're going to do up-resing and restoring of all that media, and we'll be able to see the individual poors on Harrison Ford's face at that point. I'm not sure you're really selling me on the 8K version of Star Wars, but okay i i buy it i mean there's obviously a uh a financial incentive to keep these assets to to be able to to do that and even with i guess some of the scans if you're getting a you know an mri or or some sort of uh you know low-level scan like that the technology that exists today to interpret those scans maybe is better tomorrow or with AI or with whatever else.
Starting point is 00:12:47 And maybe I want to go back and get those scans or have a deep archive of my own personal stuff to be able to track over time. I'm sure there's a thousand different reasons why having more of that data could be good for research or even on my own personal body or in a collection of bodies in aggregate anonymized, right? Well, to be honest with you, I can think of a great use case going back to 8K and Star Wars.
Starting point is 00:13:12 I want to go watch it on the MSG Sphere. Come on. On the outside of it or the inside? Well, both, obviously. You know, think about the resolution requirements when you're that far away from the actual screen. The higher the resolution in those environments, the more it's going to look like reality when you're sitting hundreds of feet away, hundreds of feet away from, you know, the actual display at those extreme resolutions. So as more and more of those spheres get built around the world, which I understand is happening, I think that'll you know, that the amount of data that we're capturing for this, you know, imagery use cases,
Starting point is 00:13:55 we're just going to explode just like it has moving from standard definition to HD to 4K today. It's really a matter of where you're going to be consuming that data as opposed to watching it at home. So what does that do then, do you think? Because you talk about the sphere, and that just took five or six short years to bring to fruition. I'm sure the next one will be a little quicker. But when we look at the AMC we've got down the road here, they've got a little data center set up there behind glass, and it's kind of neat to see you know the different stuff in there but it's pretty much a download once and then play to a couple screens dozens or hundreds
Starting point is 00:14:32 or thousands of times to to monetize that that video asset but as those get bigger and bigger and and double or triple in file size each time to be able to produce those videos in the sphere or at a movie theater, you've got to play there too. How do you view that local storage requirement for media and entertainment? Are you sure that there's local storage at the AMC? Or do they have a really, really big pipe that's streaming those pieces of media live
Starting point is 00:15:07 from a large data center somewhere in the world that has a very, very secure security model to make sure people aren't stealing those extremely high-res assets? The way you ask the question makes me think I'm not sure that you might have an answer there. So tell me about that. How has that transmission become reliable enough that that's
Starting point is 00:15:26 a viable way to to distribute media uh it it came about because it has to there's in some cases I can't say the name of the the media companies but they all have multiple data centers we'll say one will be in Hollywood and one might be, I'll pick a world-class data center like Switch SuperNAP. Switch allows for extreme bandwidth failover between these two data centers. And so if there is a network outage or a data center outage, these companies are still able to stream that data live to whether it's a movie studio or for watching dailies or whether it's a theater or whether it's watching your favorite TV show that episodes to get out onto the air before it's been broadcast? And those companies can get their advertising revenue, which, you know, is the most important things to broadcast. How does this change then?
Starting point is 00:16:37 Because you guys have perspective here, too, and you're starting to talk about some of it is the capture of content. You're talking about movies where it's collected over time and shared and edited and that sort of thing but you guys are involved in live content as well sporting events i'm sure you're in the back of of dozens if not hundreds of espn trucks and and cbs and fox and all these guys with those requirements and 4, 8K coming for especially major sporting events, Premier League Soccer, F1, Super Bowl, stuff like that, what do you have to do? What does Quantum do to make that even possible? We are actually very widely deployed in broadcast and film studios. I'll use sports as an example for the live events. I'll use a direct customer example because they're a public reference ultimate fighting championship. If you think about an event, they'll start the camera.
Starting point is 00:17:40 It'll run for four hours. There might be 16 or 20 different camera angles that they just ingest that video at 4K into a storage device. The content has metadata accessed and tagged to it in live fashion, whether in an automated manner or in a manual manner at the site of the event in that case. And that media is then broadcast all around the world, whether live on pay-per-view or in a post-process manner. UFC has a system called Fight Pass, I call it, Netflix for beating people up. And so the customers may watch in different portions of the world that have different unique requirements around what can be displayed on TV or on the web. For example, some countries can't show blood. So what happens in that case is the metadata will be used to figure out which pieces of video are relevant for which markets. And then, you know, if you have a particularly bloody
Starting point is 00:18:36 section for that market, you might cut to Joe Rogan for an extended period of time. We are also widely used through, I'd say about half of the major stadiums throughout the United States today, whether NBA, NFL, you know, again, UFC, we cover the whole gamut. in conjunction with our store next file system to do that metadata tagging transcoding a video and making it easy to cut and display you know live content up to the scoreboard which happens frequently some of those stadiums also record every event that happens in them like concerts or you know pretty much any type of political event whatever they might have at that particular stadium and so they'll make use of this live event media for monetization for whatever the use case but we are heavily used for also things like instant replay we support the leagues as well as ufc so most people don't know but if you're watching tv if you're going to, but if you're watching TV,
Starting point is 00:19:45 if you're going to the theaters, if you're watching sports, there's an extremely high likelihood that the data is going through a quantum product. That's interesting. Yeah. I know you and I chatted about the UFC bit. And what's interesting there too, I think is that, I mean, we all know the big fights UFC 243 or whatever that are the, the $70 pay-per-view uh you know high action high visibility showcases but there are i don't know dozens of smaller support fights around that
Starting point is 00:20:16 that occur in between those events right around the world that that also have these challenges where they want to capture all this content and maybe maybe they're not all live stream, but you'd like to get clips to your point, to your subscribers, but also to media outlets to promote these young fighters, to promote whatever else is going on. So there's a lot more to it. Talk about some of the sophistication with the cameras, because I know this isn't directly your responsibility, but obviously you interact with the camera feeds as it comes across the wire to your storage and your other products. But you must have thoughts on what the trends are there in terms of the investment in the video capture devices at these events and stadiums and so on? Well, the cameras are getting more amazing in higher resolutions with higher frame rates that allow you to to take that
Starting point is 00:21:10 content and do more with it at a later date. We don't have 8K TVs that are widely deployed today, but you may still capture an event in 8K so that you can up res that content and get a close upup without actually having to have an optical zoom 4k initially with i'll pick on red cameras they weren't really used to get that higher resolution to go on tv they were used to allow for a single shot to get both the wide and the close-up so having more pixels at a higher frame rate provides better quality of the image less motion blur but also being able to re-utilize the same frame of video for more use cases with pan and scan and you know
Starting point is 00:21:54 not having to reset up the shot for every use case uh for you know the close-up or the wide shot again uh and then ai analytics are starting to get into certain cameras, I'll pick on the video surveillance world where they're looking for all sorts of, we'll say, license plate recognition or gunshot detection or just looking for that shady individual in the parking lot. The cameras are actually able to inform the security personnel of things that are actually happening in real time. Well, and from a public safety perspective, that kind of information is pretty critical. And yeah, I mean, obviously that would make a big difference. When you look at the cameras too, that we're talking about for sporting events, I mean, there's a certain nostalgia that
Starting point is 00:22:44 your guys must have for like the seventies when the NFL had two cameras in a stadium. talking about for sporting events I mean there's there's a certain nostalgia that that that your guys must have for like the 70s when the NFL had two cameras in the stadium they had the wide shot and then they had maybe one one up on the line now you've got I don't know how many are in a typical stadium but between all the different angles your blimp shot your helicopter your pile on cameras now cameras in the yard markers to get when that guy dives with the ball that we see the the the grains of plastic grass in between the ball to see where that is or was the foot out I mean the the the uh the view into the millimeters of difference now which to your point go into a lot of these things
Starting point is 00:23:25 like replaying clips and the excitement around the sports, the video captures really change the game. So as these cameras have come online and now there's a dozen or more, maybe dozens in a stadium, what's incumbent on you from an infrastructure standpoint to support that? Well, obviously you need enough performance to support the live ingest of all of that data. There may be decks that are in solid state recording for what we call the melt or the mezzanine that gets displayed out to the pay-per-view or recorded live to broadcast. But looking at the perspective of, you know, a 4k camera, you may have cameras that are ingesting as low as 50 megabits and you may have cameras that are ingesting at over 600 megabytes per second. Our platforms allow for unlimited scale of performance. We design them based on the needs of the stadium, not only today,
Starting point is 00:24:24 but also providing capabilities to scale in the needs of the stadium not only today but also providing capabilities to scale in the future when the next generation cameras come out it's one of the unique things about quantum's store next file system is that it was designed for this very purpose of ingesting video in fact it was designed for nasa for satellite ingest if a satellite flies over you miss a bit it's never coming back. You're never going to be able to retrieve that data. And so we provide that same functionality and capability for pretty much everybody who's doing anything live broadcast or content creation.
Starting point is 00:24:55 Just a little harder when you have lots of concurrent cameras. And how much of that now relies on the networking piece of the infrastructure? So I know you work with a bunch of different players there too, but is everything you're doing in this M&E space, is this 100 gig, 200 gig, do you benefit from these super fast, four 800 gig NVIDIA, you know, Mellanox switches and such? So talk a little bit about networking and that impact in terms of these M&E workflows. So for decades past, we had a, still have a technology called SDI or serial digital interface that has a fiber optic cable from a camera that goes into a device called a router that gets
Starting point is 00:25:38 bumped into some kind of storage device, like even a deck of tape you know back in the day you would have a digi beta or xd cam or a variety of other formats we're moving to more ip based technologies uh technology called sympty 2110 which is entirely ip based runs over networks you may have compressed media you may have uncompressed media coming across but these data rates for say uncompressed media which typically goes over sdi are four or two gigabytes per second for 4k or upwards of eight to 12 gigabytes per second for 8k having the flexibility of one or many of the 100 gigabit or 400 gigabit ethernet links into these systems can allow for you know actually meeting the requirements of the live ingest but also being able to turn it around and push it up to the scoreboard very quickly so that the the content of a live game is more relevant and
Starting point is 00:26:36 you know the people in the stadium can see that instant replay that much faster i'm sure you're making the fiber channel world quite sad. There's no room for Fibre Channel in the M&E world? Oh, Fibre Channel is still extremely well deployed in the media and entertainment world. Okay, good. So one of the things about IP technologies is it's grown a lot over the years, but ethernet incurs a little higher latency
Starting point is 00:27:06 in unless you're using rdma protocols ethernet data coming in over ip has to be processed and acknowledged and checksummed by the actual cpu which is going to have a potentially significant performance implication especially at higher data rates that's mitigated by rdma but you are still going through ethernet switches and some of it may be copper some of it may be optical but when you're talking about fiber channel you inherently get rdma because of the asics on the the hbas themselves doing all of that workload for the acknowledgements and checksumming so the the data may go direct from Fibre Channel out to Ethernet for whatever that purpose is to be edited by the user.
Starting point is 00:27:49 We may see more of the Fibre Channel on the backend of the infrastructure, as opposed to going out to the editorial clients or to broadcast. But there is some benefit to, you know, I don't want to call it the obscurity because it is well deployed, but a dedicated storage network. Whether you're using Fibre Channel or iSCSI or ICER or Rocky, it doesn't really matter. You probably are going to have a dedicated storage network so that
Starting point is 00:28:17 you're not impacting the capabilities of the storage for your traditional end user networking. So it's really a protocol question it's still going to run over the same fiber optic infrastructure uh i think sometimes it's based on the uh comfortability of the engineer who may be working in video for decades at that point in time so i still sell a lot of fiber channel is what i'm saying and all of my tape is either sas or fiber channel and so i i don't see fiber channel going away uh anytime soon no i suspect not and i ask it kind of half-heartedly because i feel a little sad for fiber channel because all the cool kids are on to ai and and other things and fiber channel is kind of like yo you know we're still here but they don't get
Starting point is 00:29:01 brought into any of the uh the fun I suppose they're, they're doing the blocking and tackling, as you say, in these, these storage sands and are still the primary go to for enterprise storage. I mean, half the stuff over my shoulder would would be, you know, fiber channel connected, but there is, there's still room, right? They still tend to increase the capabilities. 64 gig fiber channel is here.
Starting point is 00:29:30 It's not super widely deployed yet. But for certain of my customers in the uncompressed video space, it's still the king. There's capabilities of true load balancing with fiber channel that you can get greater aggregate performance across multiple links than you're able to achieve with various Ethernet protocols today still. And so you're still going to find that ultra low latency, ultra high bandwidth requirements. Customers are not in my space moving necessarily away from Fibre Channel and media and entertainment. Now, in less real-time workloads, IP is definitely taking over. Right. Well, there's a cost advantage there too. And then an overall, how do you compare the throughput? Because you talk about 64 gig fiber is kind of where we're at now or where anyone
Starting point is 00:30:17 making an investment now is refreshing their fabric and their NICs or HBAs with 64 gig, but it's not necessarily a one-to-one line, 64 gig fiber versus 100 gig ethernet. How do you think about that or characterize where those benefits lie for a customer? Well, I think things are changing a little bit now that we have Gen 4 and Gen 5 PCI. A couple years ago, where things were predominantly Generation 3 PCI, a 16-lane PCIe bus was 128 gigabits. Gen 4, 256. Gen 5, 512. You're finally able to get more bandwidth to those high performance Ethernet ports than
Starting point is 00:31:07 you were previously, even a couple years ago, where you're now starting to outstrip on the bandwidth side of things with Ethernet. I have a technology storage array called the F2100 that can read up to 55 gigabytes per second with 100 gig Ethernet based on using 800 gigabit ethernet ports well i don't have enough pcie slots in that system to drive more than 55 gigabytes per second on fiber channel um so there's there's trade-offs based on what the customer's network is um but i think that there's more ability to scale in the future with Ethernet because of these, you know, newer generations of PCIe. Yeah. And I mean, 5 seems fresh and new to be what storage's impact on AI going forward.
Starting point is 00:32:09 Because when we look at any of these, we were just at AMD's event this week where they put up the GA on the Instinct MI300. If you look at any of these 8-way or even 4-way H100 systems from nvidia intel's got some offerings as well none of these gpu servers are storage heavy and there's a you're talking a little bit about uh pcie slots but fundamentally there's like a lane congestion or a lane challenge right there's only so many so many devices that you can put in jam into one thing. In fact, just for fun, we took a Bergamo server, attached a JBOD to it. And we noticed that once we filled up all the bays and started filling up the bays on the JBOD with NVMe drives, that we actually oversubscribed the lanes and started to lose things like the USB ports on this server. So that's us just being a little bit silly and overzealous, but it is a
Starting point is 00:33:08 legit problem. And so part of that has been the sacrifice of how many storage drives are supported in these systems. But then the question is, if I'm going to put a million dollars into this GPU system, it sure as hell better be doing something 24-7 or close to it. If that's true, then how do I fuel this thing with whether it's GPU direct storage or otherwise? And so, I mean, you must have thoughts on that as well. Absolutely. So I'll use our new Myriad platform as an example here. Myriad is, as I mentioned,
Starting point is 00:33:42 today it's a high performance NAS. You can have hundreds or thousands of concurrent connections with extreme performance because it's all NVMe. We can scale out the number of nodes in our performance and metadata performance scales linearly along with the nodes and your capacity. Some of the platforms get more efficient from a storage perspective as they get larger,
Starting point is 00:34:02 just like Myriad. And systems are being designed to perform differently today than they were in the past. You might have had high-performance systems for structured data, which would be dramatically different than high-performance systems for unstructured data where they access the storage arrays
Starting point is 00:34:20 dramatically differently. AI workloads are a combination of both. Training workloads are typically built up of millions or billions of small image files or some form of tagging and analytics, as opposed to large video files. You don't actually feed a large video file to an AI. So you need high-performance storage at large scale, whether it the tier 1 nvme or the tier 2 hdd or the tier 3 tape media for that longer term ability to re-access that data but you need the interconnects you need those high performance ethernet interconnects ethernet is this the network
Starting point is 00:34:59 platform or the network protocol of choice for uh. You brought up InfiniBand. Well, InfiniBand is still used heavily in AI, but it's really the interconnect between the GPUs themselves as opposed to the network or the storage. And so we're redesigning storage for the future to be able to take on any workload. And that's what Myriad is really about, is being able to take whatever
Starting point is 00:35:25 your workload today is and and be able to handle it because it was designed for all of these different use cases not just one well i think you hit a good point there and i think one thing that i mean look ai means a hundred different things to a hundred different people if you sit there and talk to them about i'm sure you saw that at23, is that you've got a wide chasm of practitioners that are doing AI on a workstation versus doing training and developing of models versus some of the inferencing workloads. It could be at the edge or elsewhere versus some of the heavy duty training in these big GPU boxes. But my sense of where we want to go with this as an industry is I don't think we want a silo stack just for AI. And that duplicating storage just to put faster things next to the
Starting point is 00:36:20 GPUs doesn't make sense. So what we really need is the primary storage arrays that we have now, or that we're, that we're investing in now, maybe is the better way to say it, are capable enough to handle my traditional SQL server workloads or whatever else is in the business and whatever the, the, the AI ops team needs to go create new business value out of these assets. So that's, I think that's the big challenge and to have a system that's capable of doing that. Is that how you see it too?
Starting point is 00:36:55 Yes and no. I still see a breakdown between structured data and unstructured data. The types of storage devices that deal with the two platforms are dramatically different. You're going to have different kinds of controllers. You may have lower bandwidth, but tens of millions of IOPS. Unstructured data doesn't really work that way. The benefit of some of these more modern platforms is the integrated deduplication and data reduction technologies so that when you're buying these high performance flash systems you're not again as you said duplicating that data unnecessarily and having multiple different storage platforms and also having these extremely
Starting point is 00:37:37 high performance single namespace environments allows for multi-protocol access, whether over SIFs or NFS or S3 or GPU direct, all under the same single namespace. So providing that capability to access it at high performance from whatever the client type is, is critical so that you can minimize that expensive NVMe cost. And then we still offer, as I said, those middle tiers and those colder tiers of storage to help minimize that long-term cost.
Starting point is 00:38:09 But again, that's going to be about the preservation as opposed to what you're working on today. You still need to find the appropriate place for that data to live based on its availability to generate money or what your durability and preservation requirements are. So in your role as CTO, then how are you advising organizations on how to make these infrastructure investments? Because I think the show pony is the GPU server because, I mean, it's so expensive for one, but that's where the business sees the value in terms of deriving intelligence from the data at hand. But the AI guys aren't necessarily storage and infrastructure and networking guys. They might be model guys and training. I mean, it's totally different skill sets.
Starting point is 00:38:58 How do you bring some semblance of organization to these investments? Well, that is one of the big challenges. There is an expertise across multiple different skill sets as you brought up to be able to provide this capability. So you're gonna have data scientists who are the consumers. They're gonna be the stakeholders of, you know, why you're buying this in the first place. They may or may not be the scientists
Starting point is 00:39:25 who are writing their own software and applications for these AI workloads. You may have developers in CI CD pipelines who are assisting those data scientists, but there may be a misconception about what's more expensive, the storage or the GPUs. So, and you have to be able to feed those GPUs eight h100s in a single dgx box well how much bandwidth can those h100s consume from that storage the storage environments can be
Starting point is 00:39:55 physically larger than these gpu environments and when you have a rack of nvme you can be using the equivalent amount of power that those GPUs are actually using. I think there's also a bit of a misconception in the world about which uses less power, hard drives or SSDs. It's not as obvious as you might think. An SSD generates a whole lot more heat than a hard drive does. And so the power and cooling requirements of your data centers grow along with those for those GPUs as well. So they're both equally critical. You can't run those GPUs without the storage
Starting point is 00:40:33 to keep them fed. No, I mean, absolutely. I think that it's fundamental and that organizations that go out and just buy GPUs, whether they're add-in cards or these expensive socketed systems or whatever, are gonna be wildly disappointed if they think that just throwing money at GPUs, whether they're add-in cards or these expensive socketed systems or whatever, going to be wildly disappointed if they think that just throwing money at GPUs is going to solve anything or create a business value, right? I mean, it's well more than that. I didn't think
Starting point is 00:40:56 we were going to go here, but I am curious now that you brought up some of these heat and thermal dynamics and power consumption. I know that Quantum doesn't sell liquid cooling, but as CTO, you must be thinking about that and looking at what your customers are doing. With liquid getting more common for these GPU servers, what's Quantum's take generally on liquid in the data center or what are you guys seeing? And do you have any thoughts on that
Starting point is 00:41:24 in terms of guiding enterprises on, on what that investment may look like in the future? Typically we're not involved in the, you know, construction of the data centers, but those customers that are working in large scale HPC environments are accustomed to working with, you know, liquid cooled data centers today because of the unique properties of a supercomputer or an AI cluster. As long as the liquid is appropriately routed through whatever mechanisms they're using to draw the heat out of the servers, to us, it's kind of irrelevant um from a cooling perspective we actually focus more on
Starting point is 00:42:06 maintaining the stability of tape media to be honest with you we have a a prototype system that we call curator that has the ability of maintaining uh solid state cooling and uh humidity control so that the tape media which can actually be more impacted by humidity today than temperature, stays in the appropriate realms and inside the appropriate tolerances. think about these large scale tape consumers like the hyperscalers, do you think they actually want to put the tape in the data center? Or do you think they want to put it in an environmentally controlled box that sits outside the data center? A greenhouse is what you're suggesting, a nice humid greenhouse? I actually don't know what greenhouse is, but we sell a big box that you can keep your stuff cool outside. No, it was like a literal greenhouse, you know, with plants and stuff in it.
Starting point is 00:43:09 A nice oasis for your tapes and data to exist in outside of the data center. No, that's interesting. It's another consideration, too, in this overall story. So, you know, cooling and power is definitely one thing. And as you say, getting more things out of the data center, that makes a lot of sense. That's an angle I hadn't actually considered. I really am not a huge fan of the 2024 predictions thing.
Starting point is 00:43:41 And I'm not asking you for predictions, but as you're talking to your customers going into this coming year, what are some of the other concerns they may have that may not be obvious to the rest of the enterprise world? What are you hearing there that maybe we should be thinking more of that we're not? I can't think of anything specific offhand outside of just the nature of the data growth.
Starting point is 00:44:08 The size of sensors that we have today are just that much more minute and the ability to gather that much more data. Think about genomic sequencers today, they generate orders of magnitude more data than they did previously. And similar to like a satellite satellite if you don't get that bit of data and the genomic sequencer you lose it and so there's just the data growth in general we don't expect it to slow down we expect it to grow exponentially moving forward so it's just a challenge of how do you ingest all that data in the appropriate time.
Starting point is 00:44:46 I do see a trend of, I don't want to call it cloud repatriation necessarily, cause there is a significant cost with that, but we are seeing consumers and customers realizing that continuing to put all of their data that may not make revenue or generate a profit in the cloud may be cost prohibitive. So I recommend to customers that they put the data in the cloud that does generate money
Starting point is 00:45:11 because it is making money to support itself and to support the business. And that data that doesn't generate revenue but has intrinsic value, store on-premise in a low-cost storage medium. And in some cases, some of my larger enterprise IT customers, they have the product as opposed to the corporate IT data sets. And the product may go into the public cloud, again, for that, you know, high performance elasticity of the cloud. But that it data that has to be held onto
Starting point is 00:45:45 for compliance may get redirected to an on-premise tape-based object store. Let's say car companies generate exabytes of data that you probably don't think about. And it has to go somewhere and be saved for- It has to go somewhere. A very, very long time. Yep.
Starting point is 00:46:06 Yeah, I mean, we started out by talking about tape and and coming back to it here why um why isn't there enough hard drive capacity with intelligent compression and dedupe to make up for this void? Why is tape still the best answer? And actually, while you're answering that, why didn't Blu-ray make it as a long-term archive solution? I know it still exists, but not the way that Meta and others or Facebook at the time had tried and had moderate success with.
Starting point is 00:46:47 I'm going to step back to CDs briefly and talk about how we had this belief that CDRs were going to work forever and preserve data forever. Pressed CDs, if they were held in the appropriate environmental conditions, could be. Unfortunately, CDRs, due to the nature of how the technology works, and people might think that the laser came from the bottom side and went through the polycarbonate layer, but in reality, the laser went through the top side of the aluminum layer on that CDR, and that place that I myself might have taken the CD out of the player and flipped it upside down, thinking that I didn't want to scratch the plastic. Well, guess what? I scratched the top. I did more damage to the medium than I thought I was doing, and that's why it stopped reading in my
Starting point is 00:47:34 CD player. Also, we took all these Sharpies or pens and wrote on the back of these things so that we knew what they were. Or worse i did this myself i used to go buy those labels and i'd slap the label with an adhesive on the back of that aluminum and it would eat away at the aluminum and what might have been a five-year lifespan now is dramatically reduced and so we all learned over time that maybe you want the inkjet printer to to print your label so that it's protecting the the media uh as opposed to potentially damaging it. The same thing applies for more modern optical technologies like Blu-ray. But I was talking to a vendor who wanted to move away from Blu-ray today, this morning, in fact.
Starting point is 00:48:17 They've got 1.5 million Blu-rays at 25 gigabytes a pop. So it's a very stable platform, but it's storing one and a half million Blu-rays at 25 gigabytes each. I haven't done the math, but that's actually not that much data. Well, I guess at 25 gig, it sounds like decent size. Well, the other way, a million sounds like a hell of a lot of disks
Starting point is 00:48:41 to try to figure out how to move into something else. But wasn't the dream i mean you you would know better than me probably i think it was an ocp dream that facebook had that they'd get a hundred years out of these blu-rays in this deep deep deep cold archive but is that not the reality for current optical media technology so this this 1.1 million won't last forever, but how long do you think they really get out of those? Honestly, I don't know. I'm not going to put out a number without having my own either anecdotal or known values on that. But I will say that we are seeing new technologies coming to market that might be optical, optical might be ceramic might be DNA storage
Starting point is 00:49:26 there's a lot of different Technologies we've seen Microsoft talking about Project silica again and how that medium because of the way they're storing the data in voxels I think was the term they used uh that medium can theoretically last forever uh tape media will degrade, DNA will degrade, but at least with DNA you make so many copies of the data that you have so much of it, you have petabytes in the tip of a pen. The problem with some of these medias is that every time you read them you destroy them. For example, every time you read a DNA sequence for archival purpose, not only is it extremely slow, but you destroy that strand of DNA. So you might have hundreds or dozens or hundreds of the exact same strand of DNA in that piece of media, whatever you want to call it, so that you can actually read it multiple times. There's a lot of challenges based on whatever the media is.
Starting point is 00:50:25 That's an extreme erasure coding algorithm for DNA then to make sure that you've got enough resilience built into the media, right? We're definitely still a ways away from DNA-based storage. The Quantum has invested significantly to the DNA Storage Alliance, and we have a couple key members of our team on that standards organization. I'm not really sure what to call them at this point. Well, you must, I mean, so that's a good point. I mean, you guys must invest in a lot of these with people, if not financial backing, because, I mean, you probably have thoughts, but may not really care what gets adopted at the end of the day. You want to be able to support the new, whatever it is. And the earlier you're engaged on these things, the better connected you are,
Starting point is 00:51:16 and able to ingest your, get your own thoughts and opinions into these standards boards, right? And get something productive out the other side. The way to look at it is everybody in these consortiums is bringing their own tidbit of knowledge and technology. So maybe the knowledge is the encoding to those medias for that error correction. Maybe it's the automation of moving that media out of whatever the containerization is to put it into whatever the drive quote unquote format is. Maybe it's working on instead of having a dedicated piece of media and a dedicated drive, you know, selling the idea of the media is the drive again,
Starting point is 00:52:03 but with much slower speeds and much greater durability. We all have our investment of human resources in the research and development space that we are all contributing, whether it's to LTO or whether it's just DNA storage or whether it's to optical mediums. You'd be surprised how many places quantum is involved in those automation or encoding areas of the world. It's interesting. And yeah, I haven't thought about, you know, you see these releases that come out every now and then with new media and it seems so fanciful. It'll be a guy with a roll of film and is like, here's one roll and it can hold all this, but this is the only one.
Starting point is 00:52:49 And now we have to figure out how to commercialize it. I mean, we talked about 20 plus years for hammer technology. I mean, these things are clearly, not are they not overnight, they're decades worth of investment, right? Absolutely. I mean, think about how long it takes between individual generations of lto media um you know four or five six years of investment of
Starting point is 00:53:13 making the track densities that much closer making the read heads be able to look through the media different angles to try to get you, the ability to write inside more layers of that media. Think about how Blu-rays work across multiple layers to get that increased density. There's lots of ways in which that has to be taken into account. So with tape, I mean, we talked a lot about your object on top of tape. What else should people be excited about? Is it density? Is there a cost advantage?
Starting point is 00:53:54 When I posted some videos from SC about tape, there were a couple solutions there that I think are really interesting. And we always get great engagement even on when I was out working with your robots with the clear lid just watching that guy go back and forth and grab tapes and put them in the drives and back into storage. People love that but it doesn't get sort of the shiny sparkle that other technologies in the data center do. Is there anything coming that we should be excited about? Or is it just a progression that, you know, where tape's going to be somewhat maligned rightly or wrongly, maybe wrongly, from a creativity and shininess perspective?
Starting point is 00:54:43 The latest and greatest fun thing will always get the most interest and you get the eyes on it. I actually think that object storage on tape is one of the shiniest, newest things. So maybe I'm jaded, but I find it very interesting. You sell it.
Starting point is 00:55:00 The technology challenges. Well, people think tape is slow, but they're very wrong um the average the individual tape drive lto9 each drive can read or write at 400 megabytes per second and that's faster than many raid technologies today so the challenges are great especially when you pack a lot of tape media and a lot of tape drives into a very small physical footprint. You get that performance density, you get that capacity density. And a lot of what we're looking at how to do today is how do we pack more drives and more media into the same physical footprint? I'll call it an arms race between Quantum and its competitors in the market about who can make the most dense library that's the most cost effective
Starting point is 00:55:44 for our customers. In the hyperscale space, it's really about physical floor real estate. And that's how many drives I can get into a rack or is it tapes and drives or tapes? What's the ratio that you've got to fight for there? uh you give up there's trade-offs so the the in the hyperscale uh the customer may determine that they just want to trickle the data in and a couple drives are all that's necessary and what we'll do in that case is we'll pack more tape slots into those drive bays um again it's really about packing in as much media to get the greatest density on the floor to support how much limited power, cooling, and physical space those data centers have. Some customers want more tape drives because it's really about the high-performance ingest,
Starting point is 00:56:37 but once they fill up that library, it might move a lot of those tape drives into another library. That way, that very expensive tape drive investment can be reutilized over the course of time. That's a fairly common thing that the hyperscalers are doing. Some of our competitors focus more on tape slots. Some of them focus on, you know, instead of having a vertical rack like our hyperscaler models, they might have more enterprise tape libraries that, you know, may them the the best serviceability um take up odd spaces in the data centers
Starting point is 00:57:12 we tried to to pick the the optimal way of getting as many tapes and as many drives as we can inside the same physical footprint is there a need so there are trade-offs is there a need you know because i've seen your your uh your i scalar rack, the tall ones too, not just the little units. Remind me though, because we're talking about multiple drives in these units, and of course, the hundreds or more tapes. Is there an opportunity for multiple robots within these racks? Or is it still just a one robot? Is that the max efficiency
Starting point is 00:57:48 well I'll talk about my active scale cold storage technology and what we call rail or redundant array of independent libraries the reason I bring this up is that in the world of the cloud you're going to have uh designing for failure you're going to have uh drives fail you're going to have designing for failure. You're going to have drives fail. You're going to have tapes fail. You're going to have robots fail. But if you're erasure coding across multiple libraries, then who cares if the robot fails? Right. So build the consumer off the shelf product, you know, and expect that something is going
Starting point is 00:58:19 to fail, but provide a simple way to repair when that robot fails. So we have something we call the service module where the robot will go home, you hit a button and twist a lock, you pull it out, you swap in a new one, and five minutes later that robot's back online. So in hyperscale data centers, you're just planning for that failure. That's just the reality of it. Some customers do prefer enterprise libraries that take up a bigger physical footprint and maybe scale wide as opposed to vertically. And you may have a limit on the number of robots you have in that individual library, but you have multiple
Starting point is 00:58:57 robots. I will say that different hyperscalers have different models. Some use the vertical and some use the enterprise. Is there a movement? I don't even know. Does OCP have a tape movement? Is there a working group for LTO or tape generally within OCP? Are you aware of that?
Starting point is 00:59:16 Not so much in OCP, but we do have the LTO consortium, which is about defining the media and the actual tape format itself. There's nothing specific to OCP. In reality, when we show up at a hyperscaler, everything's racked already. In the box, we have a shipping crate. It rolls out, it bolts to the floor, slam in the media, and you walk away inside five minutes.
Starting point is 00:59:41 As long as they're able to pack the media and they may or may not care about OCP for the tape platform like they do for the server and disk-based or flash-based storage platform. Right. All right, well, we've covered a tremendous amount of ground here. I actually thought we'd get more into Myriad
Starting point is 01:00:00 and it's my fault because I kept you off on these other topics that i thought were wildly interesting but we do have a paper on myriad so i'll link to that in the description and people can check that out i will say one thing that i think is really cool there that we didn't talk about but i just want to tease because i think it's so neat is the way you guys use the switches there for for deployment for load balancing there's a lot of really cool, like, just little nuggets. Myriad's full of these little nuggets that, you know, I'm probably diminishing the value by calling them little nuggets,
Starting point is 01:00:34 but there's so many little pieces that are cool that when added up together, Myriad's super cool. So we'll link to that and definitely check that out to learn more. And Quantum has got, you has got a bunch of other things as we've discussed today. So check out their website, we'll link to that as well. And Jordan, thanks for doing this. Really appreciate your perspective as always.
Starting point is 01:00:55 Brian, thank you for the time. And anybody out there who's interested in talking about the only real Kubernetes orchestrated storage platform that's fully containerized from the ground up using microservices for everything, including deployment, come talk to us. We're really excited about it. It's finally available in the market and we're selling them.
Starting point is 01:01:15 So we'd love to talk in detail about what we believe is the next generation storage platform. There, you said it. All right, thank you all.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.