In The Arena by TechArena - Universal Storage Disruption with Vast Data
Episode Date: November 17, 2022Allyson chats with Jeff Denworth, co-founder and CMO at Vast Data about their innovative Universal Storage platforms that have propelled them to acute interest from data center customers and earned th...em a Gartner magic quadrant placement. What's different about Vast Data solutions, and why does this storage work so well for HPC?storageHPCvastronautsSC22universalstorage
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now, let's step into the arena.
Welcome to the tech arena. My name is Alison Klein, and we are here at Supercomputing bringing you stories about innovation in the scientific realm. I'm so pleased to be joined
by Jeff Denworth, co-founder and CMO of Vast Data. Welcome to the program, Jeff.
Thank you, Alison. Awesome to be on here.
So Jeff, why don't you just start by talking about Vast Data, introduce it to the audience,
and tell me why you decided, after many years in the storage arena, to found a new company
focused on universal storage. We founded the company because we saw a variety of trade-offs with respect to how
people managed and processed on their data. And then we realized that for the first time,
a number of those trade-offs could be broken, such that customers could start to interact with
and work with their data in entirely new and different ways than what they were able to previously. And so, as you mentioned,
we created this product that we call Universal Storage, which is kind of the generic marketing
name. It's kind of like, you know, a really good toothbrush that is great at cleaning teeth and
stuff and doesn't really tell you much. And the reason for this is that we realized that if you can build a system
that is designed to give customers both just like awesome, scalable performance, but at the same
time, not incur the same kind of classic high performance storage tax that customers have always
had to pay, well, then what it does is it defies the classic categorization of storage, right?
If you think about the last 30 years,
customers have been buying all sorts of different types of storage systems
within their data centers for different use cases and workloads.
You have backup systems, you have file systems,
you have high-performance systems, and you've got everything in between.
And we realized that if you could break these trade-offs,
well, then you could truly buy one product
that was a universal tier of flash that could be applied to all of your data.
This product is essentially something that defies classic categorization.
And I'm a big believer that you can't just have like a cool product when you come into market and expect customers to buy it.
There also has to be a market disruption that kind of drives the need for IT buyers and IT consumers to consider new
solutions. And here, we kind of timed it at the right time where people were just starting to
talk about machine learning, just starting to talk about deep learning. And the realization
is they started to digest what these workloads meant within their data center, not only just
for new data that they were acquiring, but also for the legacy data that they wanted to open up access
to, is that you needed a new system or a new data platform that ultimately could remove the
constraints from these new workloads being able to go and train and infer on their data. And so
what happened is the kind of the market opened up to us and said that we need something new
because the pyramid of data storage that people have been managing for the last 20 to 30 years, well, the dynamics have been turned upside down where these customers now needed the fastest access to the largest amounts of data.
And that's not how things have been done up until now.
You've been getting a lot of attention in the media of late, including a call out by
Gartner's being part of their magic quadrant. Storage companies historically have not been
treated with this kind of celebrity status in the industry. Tell me why so many people are
paying attention to this new approach to storage. You touched a little bit on machine
learning and AI, and I can understand that it's unlocking some new opportunity, but give me some
more detail about why customers are responding so well. So I think first and foremost, it's a story
that's too good to be true. If I come into a customer environment and say, you can just have
Flash for all your data and it scales and you never have to worry about doing anything ever again. It's like, okay, I've heard
these types of stories from vendors up and down, and there's always a big asterisk. So what's the
asterisk? And we basically say there's no catch. And so as a new incumbent storage vendor, I should
say as a new kind of emerging storage vendor, the next thing that customers ask
is, well, does it work? Because, you know, most storage products are pretty crappy, no offense.
But, you know, if you think about companies that just get into the space, they have to prove
themselves. They have to build out their capability, they have to build out their technology.
And then you go into the HPC space and you go to a customer and say, hey, I've got this product
that you can tie something like 10 or 20,000 compute nodes to, and it should probably just work, right? These are things that a lot of storage companies aspire to,
but never actually end up sticking. And so we realized in the earliest days that we had to
QA the hell out of a product architecture that is just simpler to develop on. And I think what
we've done is we've kind of brought the notional time horizon to build an enterprise storage system
down from
what people conventional wisdom was always, it takes about 10 years. We brought that down to
five years. And at the same time, if you can actually deliver on this promise of flash for
the cost of disk, customers just start buying and buying and buying. And so we just announced last
week that we've got now three customers that have committed since we started selling just a few
years ago, over a hundred million dollars around our product. And that's
like the same amount of money that whole storage companies get over their first couple of years.
And we've gotten them from single customers. And so the bookend to our story is that it's not just
a cool technology at the right time, but it's a company that has broken out of the classic
kind of like, let's call it freshman class of new storage players. And we're now being considered
among the top tier infrastructure providers in very short time because we're selling like crazy.
And I think that's what Gartner called out. We had the highest ranking of any company that's ever entered into
the file and object storage magic quadrant. And that's simply a byproduct of the market share
that we've captured over the last couple of years. So where are you seeing market traction?
Is it with the hyperscalers? Is it enterprise? Is it HPC? Where's the market taking off?
If you think about the hyperscalers, the first thing to consider is that
none of them have really a good file system. They've all gone and outsourced either
open source software, like a lot of the large cloud service providers, they resell Lustre
or they're reselling something like NetApp, but nobody's really built their own first-party file system in a great way.
And a lot of times, it takes them years to roll a new technology
through their service catalog for reasons that, you know,
they have the weight of thousands of customers sitting on them
that they have to make really smart decisions that are very thoughtful
as they kind of move through different technology generations and
concepts. The enterprise customers that we work with, and in particular HPC, and, you know,
I very much go to lengths to not characterize universal storage as an HPC product. But in the
early days, we realized that it was a kick-ass product for HPC customers. My background is in
parallel file systems, so I've got a long history in the space.
And we kind of recognized early on there was a real connotation around parallel file systems
within the market. They were thought of as being fragile and complex, and you kind of have to
hire a PhD to go and manage infrastructure for your PhDs. And we basically said, let's take the computer science out of this for the customer.
In doing so, we found an enterprise customer base that was really receptive to the idea
because we're not selling parallel file systems.
It's just a parallel NAS.
We've solved for some of the most fundamental scaling challenges of classic scale-out NAS systems.
And we basically say to customers,
well, if you can solve the storage side bottleneck, it turns out that you don't have the same client
side bottlenecks that you thought you did, right? And everybody in the HPC space loves to pick on
NFS as being not scalable. And now we're showing the world that you can run your hardest codes
at the highest levels of scale with NFS by explaining to them,
and it takes a long time, but explaining to them that the server was always the problem,
not the client. And if you can solve the server side problem, you can do anything.
When you think about high performance computing systems and, you know, supercomputing,
one of the biggest conversations at the show is going to be the intersect between AI and HPC
and underlying infrastructure changes around composable infrastructure to serve
the coming needs of an exascale era. When you think of those opportunities, what do you get
excited about as the person who is providing the data storage for some of the world's largest challenges.
There's a gentleman who, actually, I'm not sure if he still works at HPE, but he previously did,
and he came in through the SGI acquisition. He's the CTO of SGI named Enling Goh.
And with Dr. Goh, you've made a slide forever ago, and I use it in almost every one of my presentations, which basically shows the dichotomy between HPC I.O., classic simulation-based I.O., and AI I.O.
And in the HPC kind of classic era, you had a little bit of data in in the form of input directories and then a ton of data out in the form of simulation data and checkpoints and things like this. Well, in the era of AI,
that dynamic completely gets turned upside down, where now you have a ton of data that needs to
go into training these models and a very small amount of data that comes out. And so what we
realize is that a lot of organizations have built infrastructure for that first class of workload
and haven't thought at all
about the read problem and the random read problem that is pervasive with AI. And I have this axiom
that says pretty much every HPC center is in the process of trying to also evolve to or expand into
also being an AI center of excellence or competence.
But not every AI customer wants to become an HPC customer, if you know what I mean.
And so what's happening is those organizations that have strong backgrounds in HPC, well, they know about GPUs.
They know about programming languages that are used for these types of AI accelerators.
They know about RDMA networks and they know about scalable storage. They're well suited to deploy AI infrastructure
if they kind of move to a different application area. And here they all need all flash for their
infrastructure. And they're realizing that none of the parallel file systems and none of the
scale out mass platforms were ever designed to make scalable flash affordable. And that of the scale-out NAS platforms were ever designed to make scalable Flash affordable.
And that's the kind of saving grace that VAST has here.
But on the flip side, we made a conscious decision to build a NAS as opposed to a parallel file system.
Because parallel file systems are really tricky.
You know, I've got something like 15 years of experience with Lustre.
I was in the original Lustre team. And, you know, the NAS market sold
circles around parallel file systems, even though from a scale and a performance perspective,
parallel file systems have always been better. It's just they've been always more difficult to
deploy. And the customers are telling you, by looking at the market share capture,
you don't need that performance, and you can get suitable levels of capability from a NAS, customers will always choose the NAS. And so we basically looked at it
and said, how can we unlock the performance from NAS such that you can use it for everything? And
that's why we think we're really well poised. That's why NVIDIA, for example, is an investor
in VAST. That's why we are proud to say we just won the HPC Wire Editor's Choice Award,
because we're basically just democratizing this easy system for any class of scale,
and readying every organization for this movement that's about to hit us around artificial
intelligence. Congratulations on the award. That's wonderful news. You've just described
incredible capability that VAST is delivering to the HPC
arena. As you look forward into 2023 and the current demands that scientific community has
to solve some of the world's biggest problems, what are the key breakthroughs that you're
expecting from the industry as a whole to further high performance compute
platforms? And what is VASC's role within that? You're starting to see evidence of some
really spectacular science that becomes possible, mostly on the back of people figuring out how to
incorporate neural networks, machine learning, deep learning into the classic applications that
they've been deploying. You know, I think about some areas where I saw a presentation from NOAA
at the Hyperion user group, and they basically said, almost all of our codes are moving towards
machine learning so that we can essentially get to much better model
accuracy. You know, we work with oil and gas companies, and a lot of them are now stopping
exploration, but they're not stopping computing, and they're not stopping accelerating their
computing investments. They've realized that AI can help them with everything from getting more
efficiency out of the reservoirs that they've already discovered to finding new
applications for alternative energy that ultimately can allow them to diversify their business.
But probably my favorite story was there was a code that was released by Google earlier this
year called AlphaFold2. And basically, you know, up until now, you had all these customized
processors that were designed for protein folding and simulation.
And what Google showed the world is, you know, if you don't need to be absolutely accurate on this determination as you start to look for ways that proteins fold and how they can bind with certain biological structures.
Well, you can just use a GPU and you can infer on that and get like 99% accuracy.
And then if you find something that looks good, then you go actually calculate it properly.
And this has been a huge scientific breakthrough, almost a grand challenge problem that's been
solved, where now you've got universities and research labs around the world that are
all doing protein folding a
hundred times faster than they used to, which will lead to just so much faster drug discovery.
And I think that is an example of what's going to start happening at a more and more frequent
pace as you start to realize that, you know, the models that are now being trained are moving into
the trillions of parameters.
And you're just going to have this cascading amount of innovation that comes from it that the pace of which we've never seen as a society.
And so honestly, I don't know what's going to come over the next year or so.
But what I do know is that the market is relentless to push the envelope in ways that
we never saw before.
Jeff, thank you so much for being on the program.
You've shared some incredible thoughts about high-performance computing,
the innovation of storage, and VAST's role.
I appreciate you being on.
One final question for you.
Where can folks engage with the VAST team and continue the dialogue with you?
Well, we're at SC22 this week in Dallas.
So if you're curious, just stop by our
big old booth in the middle of the trade show floor. And you can definitely have a conversation
with some of our engineers there. If not, you can find us on vastdata.com and we can take it from
there. Fantastic. Thank you so much for being on the show today. Thank you. Thanks for joining the tech arena,
subscribe and engage at our website, the tech arena.net. All content is copyright by the tech
arena.