In The Arena by TechArena - Universal Storage Disruption with Vast Data

Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the tech arena. My name is Alison Klein, and we are here at Supercomputing bringing you stories about innovation in the scientific realm. I'm so pleased to be joined by Jeff Denworth, co-founder and CMO of Vast Data. Welcome to the program, Jeff. Thank you, Alison. Awesome to be on here. So Jeff, why don't you just start by talking about Vast Data, introduce it to the audience, and tell me why you decided, after many years in the storage arena, to found a new company

Starting point is 00:00:58 focused on universal storage. We founded the company because we saw a variety of trade-offs with respect to how people managed and processed on their data. And then we realized that for the first time, a number of those trade-offs could be broken, such that customers could start to interact with and work with their data in entirely new and different ways than what they were able to previously. And so, as you mentioned, we created this product that we call Universal Storage, which is kind of the generic marketing name. It's kind of like, you know, a really good toothbrush that is great at cleaning teeth and stuff and doesn't really tell you much. And the reason for this is that we realized that if you can build a system that is designed to give customers both just like awesome, scalable performance, but at the same

Starting point is 00:01:52 time, not incur the same kind of classic high performance storage tax that customers have always had to pay, well, then what it does is it defies the classic categorization of storage, right? If you think about the last 30 years, customers have been buying all sorts of different types of storage systems within their data centers for different use cases and workloads. You have backup systems, you have file systems, you have high-performance systems, and you've got everything in between. And we realized that if you could break these trade-offs,

Starting point is 00:02:20 well, then you could truly buy one product that was a universal tier of flash that could be applied to all of your data. This product is essentially something that defies classic categorization. And I'm a big believer that you can't just have like a cool product when you come into market and expect customers to buy it. There also has to be a market disruption that kind of drives the need for IT buyers and IT consumers to consider new solutions. And here, we kind of timed it at the right time where people were just starting to talk about machine learning, just starting to talk about deep learning. And the realization is they started to digest what these workloads meant within their data center, not only just

Starting point is 00:02:59 for new data that they were acquiring, but also for the legacy data that they wanted to open up access to, is that you needed a new system or a new data platform that ultimately could remove the constraints from these new workloads being able to go and train and infer on their data. And so what happened is the kind of the market opened up to us and said that we need something new because the pyramid of data storage that people have been managing for the last 20 to 30 years, well, the dynamics have been turned upside down where these customers now needed the fastest access to the largest amounts of data. And that's not how things have been done up until now. You've been getting a lot of attention in the media of late, including a call out by Gartner's being part of their magic quadrant. Storage companies historically have not been

Starting point is 00:03:52 treated with this kind of celebrity status in the industry. Tell me why so many people are paying attention to this new approach to storage. You touched a little bit on machine learning and AI, and I can understand that it's unlocking some new opportunity, but give me some more detail about why customers are responding so well. So I think first and foremost, it's a story that's too good to be true. If I come into a customer environment and say, you can just have Flash for all your data and it scales and you never have to worry about doing anything ever again. It's like, okay, I've heard these types of stories from vendors up and down, and there's always a big asterisk. So what's the asterisk? And we basically say there's no catch. And so as a new incumbent storage vendor, I should

Starting point is 00:04:40 say as a new kind of emerging storage vendor, the next thing that customers ask is, well, does it work? Because, you know, most storage products are pretty crappy, no offense. But, you know, if you think about companies that just get into the space, they have to prove themselves. They have to build out their capability, they have to build out their technology. And then you go into the HPC space and you go to a customer and say, hey, I've got this product that you can tie something like 10 or 20,000 compute nodes to, and it should probably just work, right? These are things that a lot of storage companies aspire to, but never actually end up sticking. And so we realized in the earliest days that we had to QA the hell out of a product architecture that is just simpler to develop on. And I think what

Starting point is 00:05:19 we've done is we've kind of brought the notional time horizon to build an enterprise storage system down from what people conventional wisdom was always, it takes about 10 years. We brought that down to five years. And at the same time, if you can actually deliver on this promise of flash for the cost of disk, customers just start buying and buying and buying. And so we just announced last week that we've got now three customers that have committed since we started selling just a few years ago, over a hundred million dollars around our product. And that's like the same amount of money that whole storage companies get over their first couple of years.

Starting point is 00:05:52 And we've gotten them from single customers. And so the bookend to our story is that it's not just a cool technology at the right time, but it's a company that has broken out of the classic kind of like, let's call it freshman class of new storage players. And we're now being considered among the top tier infrastructure providers in very short time because we're selling like crazy. And I think that's what Gartner called out. We had the highest ranking of any company that's ever entered into the file and object storage magic quadrant. And that's simply a byproduct of the market share that we've captured over the last couple of years. So where are you seeing market traction? Is it with the hyperscalers? Is it enterprise? Is it HPC? Where's the market taking off?

Starting point is 00:06:46 If you think about the hyperscalers, the first thing to consider is that none of them have really a good file system. They've all gone and outsourced either open source software, like a lot of the large cloud service providers, they resell Lustre or they're reselling something like NetApp, but nobody's really built their own first-party file system in a great way. And a lot of times, it takes them years to roll a new technology through their service catalog for reasons that, you know, they have the weight of thousands of customers sitting on them that they have to make really smart decisions that are very thoughtful

Starting point is 00:07:21 as they kind of move through different technology generations and concepts. The enterprise customers that we work with, and in particular HPC, and, you know, I very much go to lengths to not characterize universal storage as an HPC product. But in the early days, we realized that it was a kick-ass product for HPC customers. My background is in parallel file systems, so I've got a long history in the space. And we kind of recognized early on there was a real connotation around parallel file systems within the market. They were thought of as being fragile and complex, and you kind of have to hire a PhD to go and manage infrastructure for your PhDs. And we basically said, let's take the computer science out of this for the customer.

Starting point is 00:08:07 In doing so, we found an enterprise customer base that was really receptive to the idea because we're not selling parallel file systems. It's just a parallel NAS. We've solved for some of the most fundamental scaling challenges of classic scale-out NAS systems. And we basically say to customers, well, if you can solve the storage side bottleneck, it turns out that you don't have the same client side bottlenecks that you thought you did, right? And everybody in the HPC space loves to pick on NFS as being not scalable. And now we're showing the world that you can run your hardest codes

Starting point is 00:08:41 at the highest levels of scale with NFS by explaining to them, and it takes a long time, but explaining to them that the server was always the problem, not the client. And if you can solve the server side problem, you can do anything. When you think about high performance computing systems and, you know, supercomputing, one of the biggest conversations at the show is going to be the intersect between AI and HPC and underlying infrastructure changes around composable infrastructure to serve the coming needs of an exascale era. When you think of those opportunities, what do you get excited about as the person who is providing the data storage for some of the world's largest challenges.

Starting point is 00:09:26 There's a gentleman who, actually, I'm not sure if he still works at HPE, but he previously did, and he came in through the SGI acquisition. He's the CTO of SGI named Enling Goh. And with Dr. Goh, you've made a slide forever ago, and I use it in almost every one of my presentations, which basically shows the dichotomy between HPC I.O., classic simulation-based I.O., and AI I.O. And in the HPC kind of classic era, you had a little bit of data in in the form of input directories and then a ton of data out in the form of simulation data and checkpoints and things like this. Well, in the era of AI, that dynamic completely gets turned upside down, where now you have a ton of data that needs to go into training these models and a very small amount of data that comes out. And so what we realize is that a lot of organizations have built infrastructure for that first class of workload and haven't thought at all

Starting point is 00:10:25 about the read problem and the random read problem that is pervasive with AI. And I have this axiom that says pretty much every HPC center is in the process of trying to also evolve to or expand into also being an AI center of excellence or competence. But not every AI customer wants to become an HPC customer, if you know what I mean. And so what's happening is those organizations that have strong backgrounds in HPC, well, they know about GPUs. They know about programming languages that are used for these types of AI accelerators. They know about RDMA networks and they know about scalable storage. They're well suited to deploy AI infrastructure if they kind of move to a different application area. And here they all need all flash for their

Starting point is 00:11:16 infrastructure. And they're realizing that none of the parallel file systems and none of the scale out mass platforms were ever designed to make scalable flash affordable. And that of the scale-out NAS platforms were ever designed to make scalable Flash affordable. And that's the kind of saving grace that VAST has here. But on the flip side, we made a conscious decision to build a NAS as opposed to a parallel file system. Because parallel file systems are really tricky. You know, I've got something like 15 years of experience with Lustre. I was in the original Lustre team. And, you know, the NAS market sold circles around parallel file systems, even though from a scale and a performance perspective,

Starting point is 00:11:53 parallel file systems have always been better. It's just they've been always more difficult to deploy. And the customers are telling you, by looking at the market share capture, you don't need that performance, and you can get suitable levels of capability from a NAS, customers will always choose the NAS. And so we basically looked at it and said, how can we unlock the performance from NAS such that you can use it for everything? And that's why we think we're really well poised. That's why NVIDIA, for example, is an investor in VAST. That's why we are proud to say we just won the HPC Wire Editor's Choice Award, because we're basically just democratizing this easy system for any class of scale, and readying every organization for this movement that's about to hit us around artificial

Starting point is 00:12:36 intelligence. Congratulations on the award. That's wonderful news. You've just described incredible capability that VAST is delivering to the HPC arena. As you look forward into 2023 and the current demands that scientific community has to solve some of the world's biggest problems, what are the key breakthroughs that you're expecting from the industry as a whole to further high performance compute platforms? And what is VASC's role within that? You're starting to see evidence of some really spectacular science that becomes possible, mostly on the back of people figuring out how to incorporate neural networks, machine learning, deep learning into the classic applications that

Starting point is 00:13:26 they've been deploying. You know, I think about some areas where I saw a presentation from NOAA at the Hyperion user group, and they basically said, almost all of our codes are moving towards machine learning so that we can essentially get to much better model accuracy. You know, we work with oil and gas companies, and a lot of them are now stopping exploration, but they're not stopping computing, and they're not stopping accelerating their computing investments. They've realized that AI can help them with everything from getting more efficiency out of the reservoirs that they've already discovered to finding new applications for alternative energy that ultimately can allow them to diversify their business.

Starting point is 00:14:11 But probably my favorite story was there was a code that was released by Google earlier this year called AlphaFold2. And basically, you know, up until now, you had all these customized processors that were designed for protein folding and simulation. And what Google showed the world is, you know, if you don't need to be absolutely accurate on this determination as you start to look for ways that proteins fold and how they can bind with certain biological structures. Well, you can just use a GPU and you can infer on that and get like 99% accuracy. And then if you find something that looks good, then you go actually calculate it properly. And this has been a huge scientific breakthrough, almost a grand challenge problem that's been solved, where now you've got universities and research labs around the world that are

Starting point is 00:15:03 all doing protein folding a hundred times faster than they used to, which will lead to just so much faster drug discovery. And I think that is an example of what's going to start happening at a more and more frequent pace as you start to realize that, you know, the models that are now being trained are moving into the trillions of parameters. And you're just going to have this cascading amount of innovation that comes from it that the pace of which we've never seen as a society. And so honestly, I don't know what's going to come over the next year or so. But what I do know is that the market is relentless to push the envelope in ways that

Starting point is 00:15:42 we never saw before. Jeff, thank you so much for being on the program. You've shared some incredible thoughts about high-performance computing, the innovation of storage, and VAST's role. I appreciate you being on. One final question for you. Where can folks engage with the VAST team and continue the dialogue with you? Well, we're at SC22 this week in Dallas.

Starting point is 00:16:04 So if you're curious, just stop by our big old booth in the middle of the trade show floor. And you can definitely have a conversation with some of our engineers there. If not, you can find us on vastdata.com and we can take it from there. Fantastic. Thank you so much for being on the show today. Thank you. Thanks for joining the tech arena, subscribe and engage at our website, the tech arena.net. All content is copyright by the tech arena.

Pet Camera - EBO Air 2

In The Arena by TechArena - Universal Storage Disruption with Vast Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

In The Arena by TechArena - Universal Storage Disruption with Vast Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.