Podcast Archive - StorageReview.com - Podcast #126: A Myriad of Storage Topics with Quantum
Episode Date: January 12, 2024Our podcast has Brian introducing Jordan Winkelman from Quantum Corporation. If you remember, we… The post Podcast #126: A Myriad of Storage Topics with Quantum appeared first on StorageReview....com.
Transcript
Discussion (0)
Hey everyone, Brian Buehler here with the Storage View Podcast.
We've got another friend on of the podcast today.
He's a newer friend.
Jordan and I met over,
I guess it was coffee or not quite over coffee,
in the coffee break room of the Quantum Office out in Denver,
Colorado area as where many good things happen.
We got sidetracked and spent 45 minutes talking about AI and GPUs and security and surveillance and all things storage and AI and maybe even space related.
I'll have to remember what else we covered on that.
But I'd like to welcome in my new friend, Jordan Winkleman, CTO of Quantum.
Jordan, thanks for coming in.
Thank you, Brian. So you're on the road today,
but as I said, you and I met in Interactive when I was out looking at Myriad a couple weeks back,
and then I ran into you at the SC event. But most good things happen near the coffee machine or the
bar. Am I incorrect in that statement? No, I think you're correct. I drink about a gallon of coffee
every single day.
And yeah, you get a little more descriptive when you've had a few drinks in you.
Descriptive, yes. And if you can keep on the coffee and less hiding in the restroom dealing with it, then all the better. So the first question I got from our audience when I told
them what was going on, on excellent that with this pod is
they said, what is quantum? And I thought maybe the easier question is what isn't quantum,
but I'm going to give you both questions and let you choose which one you want to answer.
I'll answer a little bit of both. Okay. Kind of, kind of comes, you know, at the same time.
So the high level is quantum is a 40 year old 43 year old
storage company we started in hard drives and dlt backup tape um you may remember the quantum
fireball or the quantum uh bigfoot hard drives that you might have had in your early personal
computers um in roughly 2000 we actually sold the hard drive business to MacStore, who then later
resold that business to Seagate.
And you might look at it as half of the hard drives in the world today have Quantum's intellectual
property in them still.
Around the same time, we acquired ADIC, the makers of the Scalar robotic tape libraries,
the very large and LTO at the time,
as well as our Stornex file system,
which is heavily used in the media and entertainment world
for content creation, pretty much all the movies
and TV shows you watch.
But it's also used heavily in large scale archives
because it has an integrated HSM that helps protect data
for those same media and entertainment organizations,
but also this country's and many others' intelligence world.
Since then, we went on a bit of an acquisitions binge in the last five years.
We also created variable block data duplication.
When we use that with our DI disk-based backup platform, we have the active-scale object storage platform,
as well as the active-scale cold storage extension to erasure-coated tape, a true tape-based object store.
We acquired the Pivot3 hyper-converged storage company, which has a significant focus on video surveillance,
but also allows us to install GPUs in it.
And it's the basis for our CatDV AI platforms
to provide on-premise AI for media
and entertainment workloads.
Obviously we're large in scalar tape.
We are the number one vendor of LTO tape
to the hyperscale tape world,
what you might consider a deep or cold storage in the public cloud. And we are in the seven of the top seven hyperscale tape world, what you might consider deep or cold storage
in the public cloud.
And we are in the seven of the top seven
hyperscalers globally.
I'm probably forgetting something,
but it's a pretty broad portfolio
that can support the needs of almost any storage customer.
Well, I was out there for Myriad,
which is your newest product, right?
So what's the pitch on Myriad
and how that fits in with all of these other components?
Well, Myriad is a next generation all flash file and object storage platform that presents as a NAS today.
It'll present as S3 in the very near future.
And a little further down the line, we'll have a parallel file system client for it with GPU direct storage. So while today it's all NVMe and RDMA internally,
all orchestrated by Kubernetes,
a true cloud native platform that can stand itself up
in roughly 30 minutes just by plugging things in
and letting it go.
So uniquely fresh storage platform
that's really built for the future.
All right, so we'll come back to that
because we've done some work on that as you're well aware and published around that recently.
And I want to talk more about Myriad, but let's do something fun. Let's talk about the hard drives.
So you started out with what Quantum was and actually it comes up every time that we talk
about, you know, we're doing something with Quantum. We did a piece last year around the
air gap security and tape libraries, which was really cool, and all sorts of other things. But
people still, well, of a certain age, people still don't forget about the hard drives. I think one
thing the industry, at least the younger guys these days, have forgotten is that at a point
in time, there were dozens of hard drive vendors I mean this is
probably absolutely before your time as well but you know the history right absolutely and and you
know I may look young but I'm not unfortunately not that young uh my first hard drive was a 40
megabyte Quantum Fireball so in my uh Mac 2 Si back in the day. But you're absolutely correct.
You had vendors like Mac Store, Toshiba, Seagate,
Western Digital, HGST, IBM, Samsung.
I'm sure I'm missing others in the mix
and others that just didn't survive throughout the years.
Right, I mean, it's hard.
If you think of all the places
that you'd wanna create a startup in, go raise a couple hundred million from Silicon Valley to to go after hard drives can't be very high on that list, right?
It's a pretty commodity market today, that's for sure.
And the time I talked to the Seagate guys about Hammer recently and the time to innovate, that's like two decades. So it takes a long time to go
from idea to engineering to deployment. But the other thing is tape. I've got an LTO 8 cartridge
from you guys sitting on my desk here that brings back a lot of memories for a lot of
venerable IT admins. Tape was still pretty big at SC23 this year.
So Tape, despite it being perceived as a legacy media,
I don't think we've been in a more strong environment for Tape.
What's your take on that?
Well, I'm of the belief that Tape rebounded from a low point a few years back
because of the major changes from being traditionally used
for backup and recovery to more of a long-term archive
and preservation medium today.
What most people may not know is that the public cloud
is as much as 50% tape-based cold storage
from the overall storage perspective.
Whether the data is being moved
from a customer's on-premise data center
or a co-located data center,
it's just going back into the tape
once it gets into the cloud.
The densities of tape,
the low power and cooling requirements of tape
make it a very optimal format
for that long-term preservation
because it's as much as 80% lower cost
than a hard drive based object store and dramatically
lower cost than the public cloud where you're paying infinitely forever.
And then you might have those variable retrieve costs.
So it's really a matter of where does that tape live today as opposed to where if that
tape still exists or not. Do you think tape would be more sexy or appealing from a brand perspective
if the hyperscalers were less into obfuscating how they're using tape than whether or not they are?
I mean, I know this is a perpetual debate with certain cloud providers for their low cost tiers
that many have assumed have been on tape
because how else could you provide
whatever cents per gig for an eternity?
There's no other media that would make any sense there
in many cases, but the hyperscalers
still don't really want to talk about that publicly
for whatever reason.
Do you have a read on that? I don't know that to talk about that publicly for whatever reason what do you have a read on that
i don't know that it's relevant anymore okay we sell a lot of tape with our object store we don't
call it tape we call it cold storage it what matters is does it meet the cost metric does
it meet the durability and does it have the lifespan of what the customer wants to do to preserve that data?
That's really the important part.
So when you talk about object on tape, I mean, most people think about fast and erasure coding, it's spreading out bits over multiple drives or SSDs.
With tape, you can't possibly erasure code across dozens of tapes, right?
I mean, how does that work?
Absolutely wrong.
That's exactly how the cloud works today. Whether it's your household names or not, whether the customer brings their own software, and in this case, I mean the hyperscalers or we bring the software, you are erasure coding across multiple libraries for availability, and you may be erasure coding across multiple geolocations, again, for greater durability or availability.
Even allowing for entire site failures to go offline without impacting the availability
of that data, and the customer can still meet their service level agreement to their own
customers.
What sort of data is this? Would this be, I know you guys have a big foothold in
media and entertainment amongst others, would this be, we've made a movie, we've sent it out,
it's streaming and published and gone to the theaters, and now we need to archive all of this
content because in 25 years we're going to remaster it and we'll wanna pull it back
or is it traditional backup kind of use cases
or what are some of the other use cases
that you're seeing for Object with Tape?
It's a very wide variety of use cases.
It could be long-term backups
where you might have your retention copies
for months or years or even decades
based on what type of
business you're in whether it's financials whether it's life sciences whether it's medical imaging
in the cases of medical imaging the data may be required to be stored for upwards of 100 years
past the life of the patient so your regulatory issues in those cases, right? Absolutely. But also NSF-granted research projects
where the data has to be maintained forever
and be readily available to anybody
if it's been funded by the federal government
here in the United States.
So autonomous cars,
pretty much anything you can think of
that needs to be stored for long periods of time.
Things that can be remonetized, obviously, customers really like if they can take those,
I'll use media and entertainment assets.
Think about how many times Star Wars has been redeployed to different tape media
or different DVD, VHS, Blu-ray, whichever formats.
And when the next generation of 8K comes out, guess what?
They're going to go back to those archives again,
and they're going to do up-resing and restoring of all that media,
and we'll be able to see the individual poors on Harrison Ford's face at that point.
I'm not sure you're really selling me on the 8K version of Star Wars, but okay i i buy it i mean there's obviously a uh a financial incentive to keep these
assets to to be able to to do that and even with i guess some of the scans if you're getting a
you know an mri or or some sort of uh you know low-level scan like that the technology that
exists today to interpret those scans maybe is better tomorrow or with AI or with whatever else.
And maybe I want to go back and get those scans
or have a deep archive of my own personal stuff
to be able to track over time.
I'm sure there's a thousand different reasons
why having more of that data could be good for research
or even on my own personal body
or in a collection of bodies in aggregate anonymized, right?
Well, to be honest with you, I can think of a great use case going back to 8K and Star Wars.
I want to go watch it on the MSG Sphere. Come on.
On the outside of it or the inside?
Well, both, obviously. You know, think about the resolution requirements when you're that far away from
the actual screen. The higher the resolution in those environments, the more it's going to look
like reality when you're sitting hundreds of feet away, hundreds of feet away from, you know, the
actual display at those extreme resolutions. So as more and more of those spheres get built around the world,
which I understand is happening, I think that'll you know, that
the amount of data that we're capturing for this, you know, imagery use cases,
we're just going to explode just like it has moving from standard definition
to HD to 4K today.
It's really a matter of where you're going to be consuming that data
as opposed to
watching it at home. So what does that do then, do you think? Because you talk about the sphere,
and that just took five or six short years to bring to fruition. I'm sure the next one will
be a little quicker. But when we look at the AMC we've got down the road here, they've got a little
data center set up there behind glass, and it's kind of neat to see you know the different stuff in there but it's pretty much a download once and then play to a couple screens dozens or hundreds
or thousands of times to to monetize that that video asset but as those get bigger and bigger
and and double or triple in file size each time to be able to produce those videos in the sphere
or at a movie theater, you've got to play there too.
How do you view that local storage requirement
for media and entertainment?
Are you sure that there's local storage at the AMC?
Or do they have a really, really big pipe
that's streaming those pieces of media live
from a large data center somewhere in the world
that has a very, very secure security model
to make sure people aren't stealing
those extremely high-res assets?
The way you ask the question makes me think I'm not sure
that you might have an answer there.
So tell me about that.
How has that transmission become reliable enough that that's
a viable way to to distribute media uh it it came about because it has to there's in some cases I
can't say the name of the the media companies but they all have multiple data centers we'll say one
will be in Hollywood and one might be, I'll pick a
world-class data center like Switch SuperNAP. Switch allows for extreme bandwidth failover
between these two data centers. And so if there is a network outage or a data center outage,
these companies are still able to stream that data live to whether it's a movie studio or for watching dailies or whether it's a theater or whether it's watching your favorite TV show that episodes to get out onto the air before it's been broadcast?
And those companies can get their advertising revenue, which, you know, is the most important things to broadcast.
How does this change then?
Because you guys have perspective here, too, and you're starting to talk about some of it is the capture of content.
You're talking about movies where it's collected
over time and shared and edited and that sort of thing but you guys are involved in live content
as well sporting events i'm sure you're in the back of of dozens if not hundreds of espn trucks
and and cbs and fox and all these guys with those requirements and 4, 8K coming for especially major sporting events, Premier League Soccer, F1, Super Bowl, stuff like that, what do you have to do? What does Quantum do to make that even possible? We are actually very widely deployed in broadcast and film studios.
I'll use sports as an example for the live events.
I'll use a direct customer example because they're a public reference ultimate fighting championship.
If you think about an event, they'll start the camera.
It'll run for four hours.
There might be 16 or 20 different camera angles that they just ingest that video at 4K into a storage device. The content has metadata accessed and tagged to it in live fashion,
whether in an automated manner or in a manual manner at the site of the event in that case.
And that media is then broadcast all around the world, whether live on pay-per-view or in a post-process manner.
UFC has a system called Fight Pass, I call it, Netflix for beating people up.
And so the customers may watch in different portions of the world that have different unique requirements around what can be displayed on TV or on the web.
For example, some countries can't show blood. So what happens in that case is the metadata will be used to figure out which pieces
of video are relevant for which markets. And then, you know, if you have a particularly bloody
section for that market, you might cut to Joe Rogan for an extended period of time.
We are also widely used through, I'd say about half of the major stadiums throughout the United States today, whether NBA, NFL, you know, again, UFC, we cover the whole gamut. in conjunction with our store next file system to do that metadata tagging transcoding a video
and making it easy to cut and display you know live content up to the scoreboard which happens
frequently some of those stadiums also record every event that happens in them like concerts or
you know pretty much any type of political event whatever they might have at that particular stadium and so they'll make use of this
live event media for monetization for whatever the use case but we are heavily used for
also things like instant replay we support the leagues as well as ufc so most people don't know
but if you're watching tv if you're going to, but if you're watching TV,
if you're going to the theaters, if you're watching sports,
there's an extremely high likelihood that the data is going through a quantum
product.
That's interesting. Yeah. I know you and I chatted about the UFC bit.
And what's interesting there too, I think is that, I mean,
we all know the big fights UFC 243 or whatever that are the,
the $70 pay-per-view uh you know high action
high visibility showcases but there are i don't know dozens of smaller support fights around that
that occur in between those events right around the world that that also have these challenges
where they want to capture all this content and maybe maybe they're not all live stream, but you'd like to get clips to your point, to your
subscribers, but also to media outlets to promote these young fighters, to promote whatever else is
going on. So there's a lot more to it. Talk about some of the sophistication with the cameras,
because I know this isn't directly your responsibility, but obviously you interact with the camera feeds
as it comes across the wire to your storage and your other products.
But you must have thoughts on what the trends are there in terms of the investment in the video capture devices at these events and stadiums and so on? Well, the cameras are getting more amazing
in higher resolutions with higher frame rates that allow you to to take that
content and do more with it at a later date.
We don't have 8K TVs that are widely
deployed today, but you may still capture an event in 8K so that you can up res
that content and get a close upup without actually having to have an
optical zoom 4k initially with i'll pick on red cameras they weren't really used to get that
higher resolution to go on tv they were used to allow for a single shot to get both the wide and
the close-up so having more pixels at a higher frame rate provides better quality of the image less motion blur but also
being able to re-utilize the same frame of video for more use cases with pan and scan and you know
not having to reset up the shot for every use case uh for you know the close-up or the wide shot
again uh and then ai analytics are starting to get into certain cameras,
I'll pick on the video surveillance world where they're looking for all sorts of, we'll say,
license plate recognition or gunshot detection or just looking for that shady individual in the
parking lot. The cameras are actually able to inform the security personnel of things that are actually happening
in real time. Well, and from a public safety perspective, that kind of information is pretty
critical. And yeah, I mean, obviously that would make a big difference. When you look at the cameras
too, that we're talking about for sporting events, I mean, there's a certain nostalgia that
your guys must have for like the seventies when the NFL had two cameras in a stadium. talking about for sporting events I mean there's there's a certain nostalgia that that that your
guys must have for like the 70s when the NFL had two cameras in the stadium they had the wide shot
and then they had maybe one one up on the line now you've got I don't know how many are in a typical
stadium but between all the different angles your blimp shot your helicopter your pile on cameras
now cameras in the yard markers to get when that guy dives with
the ball that we see the the the grains of plastic grass in between the ball to see where that is or
was the foot out I mean the the the uh the view into the millimeters of difference now which to
your point go into a lot of these things
like replaying clips and the excitement around the sports, the video captures really change
the game. So as these cameras have come online and now there's a dozen or more, maybe dozens
in a stadium, what's incumbent on you from an infrastructure standpoint to support that?
Well, obviously you need enough performance to support the live ingest of all of that data.
There may be decks that are in solid state recording for what we call the melt or the mezzanine that gets displayed out to the pay-per-view or recorded live to broadcast. But looking at the perspective of,
you know, a 4k camera, you may have cameras that are ingesting as low as 50 megabits and you may
have cameras that are ingesting at over 600 megabytes per second. Our platforms allow for
unlimited scale of performance. We design them based on the needs of the stadium, not only today,
but also providing capabilities to scale in the needs of the stadium not only today but also providing
capabilities to scale in the future when the next generation cameras come out it's one of the unique
things about quantum's store next file system is that it was designed for this very purpose
of ingesting video in fact it was designed for nasa for satellite ingest if a satellite flies
over you miss a bit it's never coming back.
You're never going to be able to retrieve that data.
And so we provide that same functionality and capability for pretty much
everybody who's doing anything live broadcast or content creation.
Just a little harder when you have lots of concurrent cameras.
And how much of that now relies on the networking piece of the infrastructure?
So I know you work with a bunch of different
players there too, but is everything you're doing in this M&E space, is this 100 gig,
200 gig, do you benefit from these super fast, four 800 gig NVIDIA, you know, Mellanox switches
and such? So talk a little bit about networking and that impact in terms of these M&E workflows.
So for decades past, we had a, still have a technology called SDI or serial digital interface
that has a fiber optic cable from a camera that goes into a device called a router that gets
bumped into some kind of storage device, like even a deck of tape you know back in the day you would have a digi beta
or xd cam or a variety of other formats we're moving to more ip based technologies uh technology called sympty 2110 which is entirely ip based runs over networks you may have compressed
media you may have uncompressed media coming across but these data rates for say
uncompressed media which typically goes over sdi are four or two gigabytes per second for 4k or
upwards of eight to 12 gigabytes per second for 8k having the flexibility of one or many of the 100
gigabit or 400 gigabit ethernet links into these systems can allow for you know actually meeting
the requirements of the live ingest but also being able to turn it around and push it up to
the scoreboard very quickly so that the the content of a live game is more relevant and
you know the people in the stadium can see that instant replay that much faster i'm sure you're
making the fiber channel world quite sad.
There's no room for Fibre Channel in the M&E world?
Oh, Fibre Channel is still extremely well deployed in the media and entertainment world.
Okay, good.
So one of the things about IP technologies
is it's grown a lot over the years,
but ethernet incurs a little higher latency
in unless you're using rdma protocols ethernet data coming in over ip has to be processed and
acknowledged and checksummed by the actual cpu which is going to have a potentially significant
performance implication especially at higher data rates that's mitigated by rdma but you are still
going through ethernet switches and
some of it may be copper some of it may be optical but when you're talking about fiber channel you
inherently get rdma because of the asics on the the hbas themselves doing all of that workload
for the acknowledgements and checksumming so the the data may go direct from Fibre Channel out to Ethernet
for whatever that purpose is to be edited by the user.
We may see more of the Fibre Channel on the backend
of the infrastructure, as opposed to going out
to the editorial clients or to broadcast.
But there is some benefit to, you know,
I don't want to call it the obscurity
because it is well
deployed, but a dedicated storage network. Whether you're using Fibre Channel or iSCSI or ICER or
Rocky, it doesn't really matter. You probably are going to have a dedicated storage network so that
you're not impacting the capabilities of the storage for your traditional end user networking.
So it's really a protocol question it's still going to
run over the same fiber optic infrastructure uh i think sometimes it's based on the uh
comfortability of the engineer who may be working in video for decades at that point in time
so i still sell a lot of fiber channel is what i'm saying and all of my tape is either sas or fiber channel and so i i
don't see fiber channel going away uh anytime soon no i suspect not and i ask it kind of half-heartedly
because i feel a little sad for fiber channel because all the cool kids are on to ai and
and other things and fiber channel is kind of like yo you know we're still here but they don't get
brought into any of the uh the fun I suppose they're, they're doing the
blocking and tackling, as you say, in these, these storage
sands and are still the primary go to for enterprise storage. I
mean, half the stuff over my shoulder would would be, you
know, fiber channel connected, but there is, there's still
room, right?
They still tend to increase the capabilities.
64 gig fiber channel is here.
It's not super widely deployed yet.
But for certain of my customers in the uncompressed video space, it's still the king.
There's capabilities of true load balancing with fiber channel that you can get greater aggregate performance across multiple links than you're able to achieve with various Ethernet protocols today still. And so you're
still going to find that ultra low latency, ultra high bandwidth requirements. Customers are not in
my space moving necessarily away from Fibre Channel and media and entertainment. Now, in less
real-time workloads, IP is definitely taking over.
Right. Well, there's a cost advantage there too. And then an overall, how do you compare the
throughput? Because you talk about 64 gig fiber is kind of where we're at now or where anyone
making an investment now is refreshing their fabric and their NICs or HBAs with 64 gig, but it's not necessarily a one-to-one line,
64 gig fiber versus 100 gig ethernet. How do you think about that or characterize where those
benefits lie for a customer? Well, I think things are changing a little bit now that we have Gen 4 and Gen 5 PCI.
A couple years ago, where things were predominantly Generation 3 PCI, a 16-lane PCIe bus was 128
gigabits.
Gen 4, 256.
Gen 5, 512.
You're finally able to get more bandwidth to those high performance Ethernet ports than
you were previously, even a couple years ago, where you're now starting to outstrip on the
bandwidth side of things with Ethernet.
I have a technology storage array called the F2100 that can read up to 55 gigabytes per
second with 100 gig Ethernet based on using 800 gigabit
ethernet ports well i don't have enough pcie slots in that system to drive more than 55 gigabytes per
second on fiber channel um so there's there's trade-offs based on what the customer's network is
um but i think that there's more ability to scale in the future with Ethernet because of these, you know, newer generations of PCIe.
Yeah. And I mean, 5 seems fresh and new to be what storage's impact on AI going forward.
Because when we look at any of these, we were just at AMD's event this week where they put up the GA on the Instinct MI300.
If you look at any of these 8-way or even 4-way H100 systems from nvidia intel's got some offerings as well
none of these gpu servers are storage heavy and there's a you're talking a little bit about
uh pcie slots but fundamentally there's like a lane congestion or a lane challenge right there's
only so many so many devices that you can put in jam into one thing. In fact, just for fun, we took a Bergamo server,
attached a JBOD to it. And we noticed that once we filled up all the bays and started
filling up the bays on the JBOD with NVMe drives, that we actually oversubscribed the
lanes and started to lose things like the USB ports on this server. So that's us just being a little bit silly and overzealous, but it is a
legit problem. And so part of that has been the sacrifice of how many storage drives are supported
in these systems. But then the question is, if I'm going to put a million dollars into this GPU
system, it sure as hell better be doing something 24-7 or close to it. If that's true, then how do I fuel this thing
with whether it's GPU direct storage or otherwise?
And so, I mean, you must have thoughts on that as well.
Absolutely.
So I'll use our new Myriad platform as an example here.
Myriad is, as I mentioned,
today it's a high performance NAS.
You can have hundreds or thousands of concurrent connections
with extreme performance because it's all NVMe.
We can scale out the number of nodes in our performance
and metadata performance scales linearly
along with the nodes and your capacity.
Some of the platforms get more efficient
from a storage perspective as they get larger,
just like Myriad.
And systems are being designed to perform differently today
than they were in the past.
You might have had high-performance systems
for structured data,
which would be dramatically different
than high-performance systems for unstructured data
where they access the storage arrays
dramatically differently.
AI workloads are a combination of both.
Training workloads are
typically built up of millions or billions of small image files or some form of tagging and
analytics, as opposed to large video files. You don't actually feed a large video file to an AI.
So you need high-performance storage at large scale, whether it the tier 1 nvme or the tier 2 hdd or the tier 3
tape media for that longer term ability to re-access that data but you need the interconnects
you need those high performance ethernet interconnects ethernet is this the network
platform or the network protocol of choice for uh. You brought up InfiniBand.
Well, InfiniBand is still used heavily in AI,
but it's really the interconnect between the GPUs themselves
as opposed to the network or the storage.
And so we're redesigning storage for the future
to be able to take on any workload.
And that's what Myriad is really about,
is being able to take whatever
your workload today is and and be able to handle it because it was designed for all of these
different use cases not just one well i think you hit a good point there and i think one thing that
i mean look ai means a hundred different things to a hundred different people if you sit there
and talk to them about i'm sure you saw that at23, is that you've got a wide chasm of
practitioners that are doing AI on a workstation versus doing training and developing of models
versus some of the inferencing workloads. It could be at the edge or elsewhere versus some
of the heavy duty training in these big GPU boxes. But my sense of where we want to go with this as an industry is I don't think we want a
silo stack just for AI. And that duplicating storage just to put faster things next to the
GPUs doesn't make sense. So what we really need is the primary storage arrays that we have now, or that
we're, that we're investing in now, maybe is the better way to say it, are capable
enough to handle my traditional SQL server workloads or whatever else is in
the business and whatever the, the, the AI ops team needs to go create new
business value out of these assets.
So that's, I think that's the big challenge and to have a system that's capable of doing
that.
Is that how you see it too?
Yes and no.
I still see a breakdown between structured data and unstructured data.
The types of storage devices that deal with the two platforms are dramatically
different. You're going to have different kinds of controllers. You may have lower bandwidth, but
tens of millions of IOPS. Unstructured data doesn't really work that way. The benefit of
some of these more modern platforms is the integrated deduplication and data reduction technologies so that when you're buying
these high performance flash systems you're not again as you said duplicating that data
unnecessarily and having multiple different storage platforms and also having these extremely
high performance single namespace environments allows for multi-protocol access, whether over SIFs or NFS or S3 or GPU direct,
all under the same single namespace.
So providing that capability to access it
at high performance from whatever the client type is,
is critical so that you can minimize
that expensive NVMe cost.
And then we still offer, as I said,
those middle tiers and those colder tiers of storage to help minimize that long-term cost.
But again, that's going to be about the preservation as opposed to what you're working on today.
You still need to find the appropriate place for that data to live based on its availability to generate money or what your durability and preservation requirements are.
So in your role as CTO, then how are you advising organizations on how to make these
infrastructure investments? Because I think the show pony is the GPU server because, I mean,
it's so expensive for one, but that's where the business sees the value in terms of deriving intelligence from the data at hand.
But the AI guys aren't necessarily storage and infrastructure and networking guys.
They might be model guys and training.
I mean, it's totally different skill sets.
How do you bring some semblance of organization to these investments?
Well, that is one of the big challenges.
There is an expertise across multiple different skill sets
as you brought up to be able to provide this capability.
So you're gonna have data scientists who are the consumers.
They're gonna be the stakeholders of, you know,
why you're buying this in the first place.
They may or may not be the scientists
who are writing their own software and applications
for these AI workloads.
You may have developers in CI CD pipelines
who are assisting those data scientists,
but there may be a misconception about what's more expensive,
the storage or the GPUs.
So, and you have to be able to feed those GPUs eight h100s in a single dgx box well
how much bandwidth can those h100s consume from that storage the storage environments can be
physically larger than these gpu environments and when you have a rack of nvme you can be using the
equivalent amount of power that those GPUs are actually using.
I think there's also a bit of a misconception in the world about which uses less power,
hard drives or SSDs. It's not as obvious as you might think. An SSD generates a whole lot more
heat than a hard drive does. And so the power and cooling requirements of your data centers grow along with those
for those GPUs as well.
So they're both equally critical.
You can't run those GPUs without the storage
to keep them fed.
No, I mean, absolutely.
I think that it's fundamental
and that organizations that go out and just buy GPUs,
whether they're add-in cards
or these expensive socketed systems or whatever,
are gonna be wildly disappointed if they think that just throwing money at GPUs, whether they're add-in cards or these expensive socketed systems or whatever, going to be wildly disappointed if they think that just throwing money at GPUs is going to solve
anything or create a business value, right? I mean, it's well more than that. I didn't think
we were going to go here, but I am curious now that you brought up some of these heat and thermal
dynamics and power consumption. I know that Quantum doesn't sell liquid cooling,
but as CTO, you must be thinking about that
and looking at what your customers are doing.
With liquid getting more common for these GPU servers,
what's Quantum's take generally on liquid in the data center
or what are you guys seeing?
And do you have any thoughts on that
in terms of guiding enterprises on,
on what that investment may look like in the future?
Typically we're not involved in the, you know,
construction of the data centers,
but those customers that are working in large scale HPC environments are
accustomed to working with, you know,
liquid cooled data centers today because of the unique properties of a supercomputer or an AI cluster.
As long as the liquid is appropriately routed through whatever mechanisms they're using to draw the heat out of the servers, to us, it's kind of irrelevant um from a cooling perspective we actually focus more on
maintaining the stability of tape media to be honest with you we have a a prototype system
that we call curator that has the ability of maintaining uh solid state cooling and uh
humidity control so that the tape media which can actually be more impacted by humidity today than temperature, stays in the appropriate realms and inside the appropriate tolerances. think about these large scale tape consumers like the hyperscalers, do you think they actually want
to put the tape in the data center? Or do you think they want to put it in an environmentally
controlled box that sits outside the data center? A greenhouse is what you're suggesting,
a nice humid greenhouse? I actually don't know what greenhouse is, but we sell a big box that
you can keep your stuff cool outside.
No, it was like a literal greenhouse, you know, with plants and stuff in it.
A nice oasis for your tapes and data to exist in outside of the data center.
No, that's interesting.
It's another consideration, too, in this overall story.
So, you know, cooling and power is definitely one thing.
And as you say, getting more things out of the data center,
that makes a lot of sense.
That's an angle I hadn't actually considered.
I really am not a huge fan of the 2024 predictions thing.
And I'm not asking you for predictions,
but as you're talking to your customers
going into this coming year,
what are some of the other concerns they may have
that may not be obvious to the rest of the enterprise world?
What are you hearing there
that maybe we should be thinking more of that we're not?
I can't think of anything specific offhand outside of just the nature of the data growth.
The size of sensors that we have today are just that much more minute and the ability
to gather that much more data.
Think about genomic sequencers today, they generate orders of magnitude more data than
they did previously.
And similar to like a satellite satellite if you don't get
that bit of data and the genomic sequencer you lose it and so there's just the data growth in
general we don't expect it to slow down we expect it to grow exponentially moving forward so it's
just a challenge of how do you ingest all that data in the appropriate time.
I do see a trend of,
I don't want to call it cloud repatriation necessarily,
cause there is a significant cost with that,
but we are seeing consumers and customers realizing
that continuing to put all of their data
that may not make revenue or generate a profit in the cloud
may be cost prohibitive.
So I recommend to customers that they put the data in the cloud that does generate money
because it is making money to support itself and to support the business.
And that data that doesn't generate revenue but has intrinsic value,
store on-premise in a low-cost storage medium.
And in some cases, some of my larger enterprise IT customers, they
have the product as opposed to the corporate IT data sets. And
the product may go into the public cloud, again, for that,
you know, high performance elasticity of the cloud. But
that it data that has to be held onto
for compliance may get redirected
to an on-premise tape-based object store.
Let's say car companies generate exabytes of data
that you probably don't think about.
And it has to go somewhere and be saved for-
It has to go somewhere.
A very, very long time.
Yep.
Yeah, I mean, we started out by talking about tape and and coming back to it here why um
why isn't there enough hard drive capacity with intelligent compression and dedupe to make up for this void?
Why is tape still the best answer?
And actually, while you're answering that,
why didn't Blu-ray make it as a long-term archive solution?
I know it still exists, but not the way that Meta
and others or Facebook at the time had tried
and had moderate success with.
I'm going to step back to CDs briefly and talk about how we had this belief that CDRs were going
to work forever and preserve data forever. Pressed CDs, if they were held in the appropriate
environmental conditions, could be. Unfortunately, CDRs, due to the nature of how the
technology works, and people might think that the laser came from the bottom side and went through
the polycarbonate layer, but in reality, the laser went through the top side of the aluminum layer on
that CDR, and that place that I myself might have taken the CD out of the player and flipped it
upside down, thinking that I didn't want to scratch the plastic. Well, guess what? I scratched the top.
I did more damage to the medium than I thought I was doing, and that's why it stopped reading in my
CD player. Also, we took all these Sharpies or pens and wrote on the back of these things so
that we knew what they were. Or worse i did this myself i used to go buy those
labels and i'd slap the label with an adhesive on the back of that aluminum and it would eat away at
the aluminum and what might have been a five-year lifespan now is dramatically reduced and so we all
learned over time that maybe you want the inkjet printer to to print your label so that it's
protecting the the media uh as opposed to potentially damaging it.
The same thing applies for more modern optical technologies like Blu-ray.
But I was talking to a vendor who wanted to move away from Blu-ray today, this morning, in fact.
They've got 1.5 million Blu-rays at 25 gigabytes a pop. So it's a very stable platform,
but it's storing one and a half million Blu-rays
at 25 gigabytes each.
I haven't done the math,
but that's actually not that much data.
Well, I guess at 25 gig, it sounds like decent size.
Well, the other way,
a million sounds like a hell of a lot of disks
to try to figure out how to move into something else.
But wasn't the dream i mean
you you would know better than me probably i think it was an ocp dream that facebook had that they'd
get a hundred years out of these blu-rays in this deep deep deep cold archive but is that not the
reality for current optical media technology so this this 1.1 million won't last forever, but how long do you think they really get
out of those? Honestly, I don't know. I'm not going to put out a number without having my own
either anecdotal or known values on that. But I will say that we are seeing new technologies
coming to market that might be optical, optical might be ceramic might be DNA storage
there's a lot of different Technologies we've seen Microsoft talking about Project silica again and
how that medium because of the way they're storing the data in voxels I think was the term they used
uh that medium can theoretically last forever uh tape media will degrade, DNA will degrade, but at least with
DNA you make so many copies of the data that you have so much of it, you have petabytes
in the tip of a pen. The problem with some of these medias is that every time you read
them you destroy them. For example, every time you read a DNA sequence for archival purpose, not only is it extremely slow, but you destroy that strand of DNA.
So you might have hundreds or dozens or hundreds of the exact same strand of DNA in that piece of media, whatever you want to call it, so that you can actually read it multiple times.
There's a lot of challenges based on whatever the media is.
That's an extreme erasure coding algorithm for DNA then to make sure that you've got
enough resilience built into the media, right?
We're definitely still a ways away from DNA-based storage. The Quantum has invested significantly to the DNA Storage Alliance, and we have a couple key
members of our team on that standards organization. I'm not really sure what to call them at this
point. Well, you must, I mean, so that's a good point. I mean, you guys must invest in a lot of
these with people, if not financial backing, because, I mean, you probably have thoughts, but may not really care what gets adopted at the end of the day.
You want to be able to support the new, whatever it is.
And the earlier you're engaged on these things, the better connected you are,
and able to ingest your,
get your own thoughts and opinions into these standards boards, right?
And get something productive out the other side.
The way to look at it is everybody in these consortiums is bringing their own tidbit of knowledge and technology.
So maybe the knowledge is the encoding to those medias for that error correction. Maybe it's the automation of moving that media out of whatever the
containerization is to put it into whatever the drive quote unquote format is.
Maybe it's working on instead of having a dedicated piece of media and a dedicated
drive, you know, selling the idea of the media is the drive again,
but with much slower speeds and much
greater durability. We all have our investment of human resources in the research and development
space that we are all contributing, whether it's to LTO or whether it's just DNA storage or whether
it's to optical mediums. You'd be surprised how many places quantum is involved in those automation or encoding areas of the world.
It's interesting. And yeah, I haven't thought about, you know, you see these releases that come out every now and then with new media
and it seems so fanciful. It'll be a guy with a roll of film and is like,
here's one roll and it can hold all this,
but this is the only one.
And now we have to figure out how to commercialize it.
I mean, we talked about 20 plus years for hammer technology.
I mean, these things are clearly,
not are they not overnight,
they're decades worth of investment, right?
Absolutely.
I mean, think about how long it takes between
individual generations of lto media um you know four or five six years of investment of
making the track densities that much closer making the read heads be able to look through the media
different angles to try to get you, the ability to write inside more
layers of that media. Think about how Blu-rays work across multiple layers to get that increased
density. There's lots of ways in which that has to be taken into account. So with tape, I mean,
we talked a lot about your object on top of tape.
What else should people be excited about?
Is it density?
Is there a cost advantage?
When I posted some videos from SC about tape, there were a couple solutions there that I
think are really interesting.
And we always get great engagement even on when I was out
working with your robots with the clear lid just watching that guy go back and forth and grab tapes and put them in the drives and back into storage. People love that
but it doesn't get sort of the shiny sparkle that other technologies in the data center do.
Is there anything coming that we should be excited about?
Or is it just a progression that, you know, where tape's going to be somewhat maligned
rightly or wrongly, maybe wrongly, from a creativity and shininess perspective?
The latest and greatest fun thing
will always get the most interest
and you get the eyes on it.
I actually think that object storage on tape
is one of the shiniest, newest things.
So maybe I'm jaded,
but I find it very interesting.
You sell it.
The technology challenges.
Well, people think tape is slow,
but they're very wrong um the average the individual tape drive lto9 each drive can read or write at 400 megabytes per second
and that's faster than many raid technologies today so the challenges are great especially
when you pack a lot of tape media and a lot of tape drives into a very small physical footprint. You get that performance density, you get that capacity density. And a lot of
what we're looking at how to do today is how do we pack more drives and more media into
the same physical footprint? I'll call it an arms race between Quantum and its competitors
in the market about who can make the most dense library that's the most cost effective
for our customers. In the hyperscale space, it's really about physical floor real estate.
And that's how many drives I can get into a rack or is it tapes and drives or tapes?
What's the ratio that you've got to fight for there? uh you give up there's trade-offs so the the in the hyperscale uh the customer may determine that
they just want to trickle the data in and a couple drives are all that's necessary and what we'll do
in that case is we'll pack more tape slots into those drive bays um again it's really about
packing in as much media to get the greatest density on the
floor to support how much limited power, cooling, and physical space those data centers have.
Some customers want more tape drives because it's really about the high-performance ingest,
but once they fill up that library, it might move a lot of those tape drives into another library.
That way, that very expensive tape drive investment
can be reutilized over the course of time.
That's a fairly common thing that the hyperscalers are doing.
Some of our competitors focus more on tape slots.
Some of them focus on, you know, instead of having a vertical rack
like our hyperscaler models, they might have more enterprise tape libraries
that, you know, may them the the best serviceability um take up odd spaces in the data centers
we tried to to pick the the optimal way of getting as many tapes and as many drives as we can inside
the same physical footprint is there a need so there are trade-offs is there a need you know
because i've seen your your uh your i scalar rack, the tall ones too, not just the little units.
Remind me though, because we're talking about multiple drives in these units,
and of course, the hundreds or more tapes.
Is there an opportunity for multiple robots within these racks?
Or is it still just a one robot?
Is that the max efficiency
well I'll talk about my active scale cold storage technology and what we call rail or redundant
array of independent libraries the reason I bring this up is that in the world of the cloud you're
going to have uh designing for failure you're going to have uh drives fail you're going to have designing for failure. You're going to have drives fail.
You're going to have tapes fail.
You're going to have robots fail.
But if you're erasure coding across multiple libraries, then who cares if the robot fails?
Right.
So build the consumer off the shelf product, you know, and expect that something is going
to fail, but provide a simple way to repair when that robot fails. So we have something we call the service module where the robot will go home,
you hit a button and twist a lock, you pull it out, you swap in a new one,
and five minutes later that robot's back online.
So in hyperscale data centers, you're just planning for that failure.
That's just the reality of it.
Some customers do prefer enterprise libraries that
take up a bigger physical footprint and maybe scale wide as opposed to vertically. And you
may have a limit on the number of robots you have in that individual library, but you have multiple
robots. I will say that different hyperscalers have different models. Some use the vertical and
some use the enterprise.
Is there a movement?
I don't even know.
Does OCP have a tape movement?
Is there a working group for LTO or tape generally
within OCP?
Are you aware of that?
Not so much in OCP, but we do have the LTO consortium,
which is about defining the media
and the actual tape format itself.
There's nothing specific to OCP.
In reality, when we show up at a hyperscaler, everything's racked already.
In the box, we have a shipping crate.
It rolls out, it bolts to the floor, slam in the media,
and you walk away inside five minutes.
As long as they're able to pack the media
and they may or may not care about OCP
for the tape platform like they do for the server
and disk-based or flash-based storage platform.
Right.
All right, well, we've covered a tremendous amount
of ground here.
I actually thought we'd get more into Myriad
and it's my fault because I kept you off
on these other topics that i thought were wildly
interesting but we do have a paper on myriad so i'll link to that in the description and people
can check that out i will say one thing that i think is really cool there that we didn't talk
about but i just want to tease because i think it's so neat is the way you guys use the switches
there for for deployment for load balancing there's a lot of really cool, like, just little nuggets.
Myriad's full of these little nuggets that, you know,
I'm probably diminishing the value by calling them little nuggets,
but there's so many little pieces that are cool that when added up together,
Myriad's super cool.
So we'll link to that and definitely check that out to learn more.
And Quantum has got, you has got a bunch of other things
as we've discussed today.
So check out their website, we'll link to that as well.
And Jordan, thanks for doing this.
Really appreciate your perspective as always.
Brian, thank you for the time.
And anybody out there who's interested in talking
about the only real Kubernetes orchestrated storage platform
that's fully containerized from the ground up
using microservices for everything, including deployment,
come talk to us.
We're really excited about it.
It's finally available in the market and we're selling them.
So we'd love to talk in detail about what we believe
is the next generation storage platform.
There, you said it.
All right, thank you all.