In The Arena by TechArena - Riding the AI Data Pipeline with VAST Data, a Data Insights Series podcast with Solidigm
Episode Date: April 29, 2024TechArena host Allyson Klein is joined by Solidigm’s Jeniece Wnorowski as they continue to explore rapid data innovation fueling today’s computing. In today’s episode, they chat with VAST Data�...�s Global VP of Engineering, Submaranian Kartik, as he describes how his team has delivered a breakthrough data platform for the AI Era.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein. Now, let's step into the arena.
Welcome to Tech Arena. My name is Alison Klein, and this is a Data Insights
podcast. And that means I have my co-host with me, Janice Narowski. Welcome to the program,
Janice. How are you doing? Well, thank you, Allison. I'm doing great. I'm so excited to
be here today. Janice, you have been traveling all over the world, and I know that you've been
talking to a lot of folks about data and the data pipeline. We are in for a fantastic episode today.
Tell us who is coming on the program to talk to us. Yeah, I have been traveling a lot, namely
events, and I've just been blown away by the work that our special
guest has been doing with the team, and that is Vast Data. So today we have joining us is
Kartik Subramanian, who is the Global Vice President of Systems Engineering for Vast.
And welcome to the show, Kartik. Well, thank you, Janice. Much appreciated and fantastic to be
with you on another podcast over here. This is great. I remember the last recording we did,
I thoroughly enjoyed it. So looking forward to our conversation today. Excellent. So Kartik,
I am so excited to talk to you about what you've been doing with the VAST team, but why don't we just
get started? VAST has been on the program before, but you've been getting incredible traction in the
market for driving the AI data pipeline to new scale and performance. Can you give us a sense
of what is shaping this market and how is VAST making progress?
Yeah, so this market is explosive, as you know.
Since the introduction of ChatGPT, there's been this, shall I say,
irrational exuberance in the market around anything connected with generative AI.
And yes, it absolutely clearly has enormous promise.
It's still in first innings, in my opinion.
So a lot of the activity we are seeing is really partly in the enterprise and partly in the cloud as well.
There's a new breed of cloud service providers who are specialized
in running these kind of workloads
for generative AI, which usually people think of as large language models, but there are other
kinds of generative AI workloads as well. And the large language models require a large number of
GPUs with very sophisticated networking and storage connected to them. And therefore,
the hyperscalers, there's a shortage of these things. So this market has exploded.
We are heavy participants in that explosion.
And we are essentially feeding the frenzy, throwing gasoline on the fire over there.
On the flip side is enterprise.
Enterprise is very tentative right now.
They are still trying to define what they want to do,
trying to understand
what sort of impact this will have on their
business. Are they going to make money?
Are they going to save money?
Or are they going to stay out of jail?
Hopefully a combination of all these three.
And so while their steps
are tentative, we think that in the
next year, two years or so, this wave has just begun.
We've only scratched the surface.
The expansion and explosion is on its way at this point.
Got it.
So with that expansion and explosion, can you tell us, Kartik, a little bit about how Vast does this differently from other storage models?
Sure.
For starters, we are not a storage company, as you guys know.
We're a data platform company.
So, of course, we're known for our highly scalable, highly performant, online at all times, strongly secure platform, which we really love.
And then we expose ourselves to file and object protocols.
But gone quite a bit beyond that and said,
why should data only be looked at from one set of lenses?
Why not, you know, structured data as a table?
And so we've introduced functions around the database in our system. This allows
for data lake technologies, typically things like Trino
and Spark, et cetera, to work natively off of us.
We're continuing to move forward and producing concepts like a global namespace.
So any data is visible anywhere to
any GPU farm.
In fact, I just got off a call with a customer showing them how to take a large, long-running training job
and move it from one data center to another data center
without losing that data within minutes.
And this is a stunt which is very, very difficult to pull off.
So probably the most important thing is we look at data as a pipeline rather than
something just static.
It's a constantly flowing stream of data.
And that there are different modalities to be able to analyze the data.
Some of them are with GPUs and some of them are with more traditional technologies like
CPUs.
We are the only ones who can do the entire pipeline, core to cloud to edge, as well as
through all the different types of data.
This is what's been the heart of our success in this market.
So we're now proudly the standard for a large number of the largest of the cloud service providers in the tier two space.
We're sure and we are making a run with the hyperscalers as well.
Now, Kartik, there's been a lot of parallels drawn between traditional high performance computing platforms and AI training clusters.
While you were talking, I could kind of piece together parts of the answer to this question.
But for the audience, how does these systems work similarly?
And then how do you see the differences between what enterprises and large cloud players are doing with their AI training and what traditional HPC technical computing clusters would perform.
So both share similarities.
Both are forms of high-performance computing, obviously.
More traditional HPC environments still rely on a large number of compute elements
which are distributed, tens, thousands of nodes, 10,000 nodes, or even more.
And they cooperatively work to solve certain types of problems. elements which are distributed, tens, thousands of nodes, 10,000 nodes, or even more.
And they cooperatively work to solve certain types of problems.
That market has been around for over 20 years and is pretty mature.
And the primary workloads that they did was what we often call HPC simulations.
Large amounts of data are ingested, crunched cooperatively between many, many CPUs, and then that produces
an output, which is useful.
So oil and gas, energy, competition, fluid dynamics, all these are very common workloads
there.
Those codes were optimized mainly for large datasets and large-scale sequential reads
and sequential writes.
Large block sequential reads and sequential writes was what really dominates that.
The other form of accelerated computing, which we are in the era that we're in right now,
uses other coprocessors such as GPUs and FPGAs and other things like that.
And there the workloads are very, very different.
They're very read intensive, yes, but random read-intensive. So solid-state is a necessary component of the media that needs to underlie that
because these technologies are not able to provide the IO performance that's needed for stuff like this.
Their scaling and availability characteristics are also somewhat different as well.
These systems have to be highly shared systems and are very highly available.
Cannot take an outreach for anything.
Many of the HPC clusters, and we are, by the way, extremely active in high performance
peering as well.
Some of the largest HPC clusters in the world run on us.
But they are usually relatively homogenous sets of workloads
that are running in AI.
These are strongly multi-tenant environments,
have a secure environment, classified environments, et cetera.
So they have a different level of requirements
compared to what you would see in an HPC world.
But in a nutshell, yeah,
need heavy random read,
heavy IO characteristics,
especially for things like checkpointing.
And therefore,
it mandates large-scale all-slash systems.
So Kartik, with that sophistication, right,
these workloads are pretty complex.
Where in your mind are customers in terms of sophistication and implementing your systems?
Yeah, as I mentioned, right now the people who are doing the most active work here are the actual model builders themselves. So all of us have heard of models like GPT-3 and GPT-4. These require enormous
amounts of data as well as training to get built. Or
LAMA for meta, MISTRA, for example. And these
are all people who are on
the leading edge of research in here.
Clearly, there are many private sector people, too, who are doing a lot of work here.
And, you know, large autonomous driving companies,
so absolutely, drug discovery companies are doing this as well.
Traditional brick-and-mortar enterprise is more tentative.
Like I mentioned earlier, they're still identifying use cases.
So they tend to start with pre-trained models, which they would get from Hugging Face or
something like that, and then expose that to their internal data through things like
retrieval, augmented generation, or RAG, and then be able to do inference with something like that.
This is going to morph because of a regulatory climate
that's changing extremely rapidly.
The European Union has already passed the AI Act
that is mandating that certain business sectors
and certain types of data, you have to preserve data for a long time.
You need reproducibility many months after the fact. So you need to know what data went in, input training, what the
outputs are, so they can repudiate anything anyone alleges about them. U.S. as well. We've
all recently seen the new law proposed by Adam Schiff in the Senate, which is requiring that everybody declare
any copyrighted information
that they may have use for training.
This means now it's no longer just a GPU game.
It's a governance game.
And we're going to have to have compliance archives
and controls in place to be able to work with this.
We think that over time,
people will be training their own models,
probably not huge ones, but smaller ones.
We may see the emergence of more specialized AI models
dominate over highly general model like ChatGPT
as things go on.
So despite the fact that it's tentative,
we're seeing spending pick up
and interest pick up quite a lot in the enterprise. The cloud guys, of course, are just going berserker at this point.
They're buying GPUs like they're going out of time, literally tens to hundreds of thousands
at a time. Now, I've been following Vast on the tech arena for the last couple of years. And in
fact, you guys are one of my first guests on this platform. You made some really exciting announcements around collaboration with NVIDIA and Supermicro lately.
Can you help unpack those and talk a little bit about what these new collaborations with the industry leaders in AI are helping with delivering new capability to your customers?
Absolutely.
So I've had the privilege of working with NVIDIA now for over four years.
All the initial testing we did with GPU direct storage for high-performance RDMA networks,
as well as BasePath and SuperPath certification, things that I was deeply involved in all the
way through.
One of the interesting things about VAST that a lot of people don't realize is
even though we do storage and we're a full data platform company,
we're not a hardware company.
We're a completely software company.
So the hardware stack under us can be very varied.
And we've been fortunate.
We've been partnering at SolidArm for so long.
You guys are anchor suppliers for us for the dense land that we need
to make our systems affordable and high-performing.
And there are other form factors as well, which we are exploring.
So the Supermicro partnership that we announced, our re-announced, our GTC, is one of those things over here which is, we believe, super important.
Prior to this, the shelves that actually held our Dentsland, which were made by our contract manufacturers, tended to be somewhat specialized and this book in the sense that even though they were built out of out of widely available industry components that required special assembly and care
with super micro what we looked at was to say can we use a fully totally generic
off-the-shelf industry standard server instead to be the foundation for BAST. And this is really what we did over the last few months or so.
So we take a server with 12 disk shells,
have some storage class memory and some dense NAND,
and voila, now you've got a building block for VAST.
We did another thing which is very interesting.
As you know, we are very containerized in our architecture.
Both our front-end nodes, which handle protocols,
as well as the back-end nodes, which handle the media,
are all essentially Docker containers.
So we decided to co-locate them on the same server
that we have the storage shelves on.
So we essentially eliminated a whole layer
of server architecture in this mix.
And that allows us to have a very highly hyper-converged setup,
which has extremely good scale properties.
We think this is a fantastic offer for people in the cloud space.
It's built for scale.
It's built for high performance.
It's built for ease.
Probably most importantly, it's built also for low form factor
and for low power,
which are increasingly critical in this space.
Got it. Kartik, you mentioned your work with Solidigm
and having worked with Solidigm for a while now.
And obviously the foundation of your architecture,
a big portion of that foundation is the you know, the data and the media.
But can you tell us a little bit more about what type of drives you're using with SolidIne and how those help you?
Yeah, we go to SolidIne because you guys make solid, I guess, dense NAND systems.
So we started out with the U.2 form factor QLC technology,
which you had introduced,
because one of the key design elements in our platform
was the goal to forever kill disk drives
and go to completely solid-state media.
But we knew that the most whiz-bang technology and the work would not be worth it if it costs
three times or four times as much.
So we had to normalize the cost curve.
So going with DenseLand was a major step forward for us.
What really changed the game was we figured out, along with Intel and you guys, how to
create a flash translation layer which would allow us to extend the endurance for these drives to well beyond what you would normally expect.
We were able to extend the endurance to beyond 10 years.
That suddenly catapults Dense NAND into the arena of being viable for enterprise workloads.
This was a huge, huge move for us.
This let us bend the cost curve significantly,
along with some of the other software features we have,
such as large-scale reduction of data.
A combination of the two now makes us not just fast,
scalable, and online operations and performance,
but also affordable, which is a key element of what we do here.
We have continued that partnership with zinogenes
we moved on from u.2 now we're using google form factors eagerly awaiting other things that we're
going to be doing together lots and lots of demand for even more density larger larger drives you
know we are we started with 15 terabyte drives. Now we are about to introduce 60 terabyte drives.
They're in heavy demand, though.
I got to tell you, everybody's buying them up like they're going out of style, which is good for you guys.
So this is excellent.
So, yeah, that's what we're working towards.
Awesome.
Yeah, we really appreciate the collaboration.
I can speak on behalf of our team over the years.
And we are, like you said, also really excited for the future.
And we started talking about how, you know, traveling all over and I'm seeing VAST everywhere.
Your booths are always packed with people interested in your technology.
But can you tell us, for this audience here, where can folks go to learn more about your solutions?
Fantastic.
First place to start is to go to our website, vastdata.com.
You'll find there a lot of very interesting material
on what kind of industry sectors,
what kind of solutions we offer,
ranging all the way from high-performance computing
to life sciences
to media entertainment, and of course, the ubiquitous AI, which is almost so horizontal.
And along some solutions which people are often surprised to find us in, like in the backup and
recovery space, where we act as a target, an all-flash target for backup systems. One might ask
why, but that's because our restore speeds are like
blindingly fast. And in this day of ransomware, that seems to be as much a concern rather than
full environment recovery. In fact, it's a concern more than just a single file recovery or a single
directory recovery. So those are all the things you can learn about us from that perspective.
You can also learn about us from a data platform perspective.
What's the buzz all about when we say we can expose data as a table?
What kind of problems can we solve with that?
And how do we plug in into
and refactor Hadoop environments
or other kinds of data lake environments
like Spark and Impala and Hive
or the tools that are used over there?
All of that stuff also is clear.
A deeper architectural understanding of what Vast is, how it operates.
We have a fantastic white paper.
Easy to find, vast.com slash white paper.
Once you go there, it's a long but easy read.
And it'll give you a full detailed exposition of what makes us really good.
And do not forget to look up all the customer testimonials over here.
We have some marquee customers in every one of these sectors.
Many of them have great videos they've recorded, like, you know,
the solution briefs, which are associated there, white papers,
all of that stuff is public.
Next step, of course, contact someone from VAST. You know, if your
appetite is not whetted, trust me, we're just waiting to engage with you and we'll be able to
provide you one-on-one assistance in anything you like, far deeper, dives, drill downs,
design workshops, all of that stuff as we go around the world.
Well, Kartik, thank you so much for taking time out of your day to talk with Janice and I and share your vision for the data pipeline. It was so cool. I've been following VAST and the incredible
solutions that you've been delivering to market. So it's a real pleasure having you on the program.
Thanks for being here. As always.
I'll catch you next time.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net. All content is copyright by the Tech Arena. Thank you. you