In The Arena by TechArena - MLCommons: Setting the Standard for AI Performance
Episode Date: November 6, 2025From storage to automotive, MLPerf is evolving with industry needs. Hear David Kanter explain how community-driven benchmarking is enabling reliable and scalable AI deployment....
Transcript
Discussion (0)
Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Allison Klein.
Now, let's step into the arena.
Welcome in the arena. My name is Allison Klein. We are coming to you from the AI Infra Conference in Santa Clara, California.
And it's another Data Insights episode, which means I'm with Janice Norowski. Hey, Janice, how's it going?
Hey, Allison, it's going great.
We have had such a busy week talking to everyone across the value chain, silicon suppliers,
folks who are building infrastructure systems, running data centers, and the practitioners
that are bringing AI to life inside of organizations.
So exciting.
We're turning the lens a little bit with this interview.
I'm excited.
Why don't you introduce the topic in our guest?
Yeah, I'm actually real excited about this topic because it's a hot one.
When you think about benchmarking with AI, it's like, where do you go?
So today we actually have David Cantor, the founder of ML Commons, as well as MLPIRF.
So welcome to the program.
Thank you so much.
It's a pleasure to be here.
I was so excited to have you included in our AI InfraSuite, if you want to think about it that way.
And I know you've been on Tech Arena before and we've engaged for a very long time together.
But why don't you just go ahead and introduce your role, your background in the industry because it's fascinating.
And what is going on with ML Commons?
because there's a lot.
Yeah, I guess for context.
My background is semiconductors, things that compute for a long time.
And then I had the pleasure of wandering into one of the first ML-Perf meetings.
And it was an all-volunteer project.
We all came together, focused around, the importance of getting honest, reproducible, representative.
And so I started out as the group secretary.
Then when we decided to create ML Commons to house it, I booted it up.
as the first executive director, grew it to 20 people, and now I get to just focus on ML Perf,
as AI performance has become like a broader, more multifaceted challenge, really. So our mission
is to make AI better for every. We want it to be faster and more capable. We want it to be more
energy efficient. We want it to also be safer, more accurate. And so what are the things that
we can do there. And ML Perf is really the marquee project when it comes to establishing and helping
guide the industry as well as measure faster and more energy-efficient AI performance.
You know, having been in storage myself a long time, I've always noticed, especially now,
storage isn't the thing that bubbles up when you're looking at AI, particularly when it
comes to benchmarking. So can you tell us why is storage now more than never becoming so central
as organizations scale their AI environments.
Yeah.
So first of all, I think of AI performance
as a full system problem,
everything from the NAN flash hard drives,
transistors that we use to build the systems out of,
all the way up to the software,
the data as well that we're using.
And I think to your point,
storage has been overlooked in many cases.
And, you know, at least the story of ML Perth storage
began with a couple of folks talking
at one of our community meetings, and someone had mentioned that they knew of a very large
cluster for compute that had been stood up by a customer, and they couldn't get more than 30%
utilization because their storage couldn't raise. That was like the aha moment. And I'm like,
and this is a problem, right? These are companies that have like outstanding teams of engineers,
and they're running into these road bumps, but they're ahead of the rest of the industry. They're
ahead of the enterprise. They're ahead of traditional customers. So to me, this was a call to like, okay,
we should make a benchmark to fill this gap.
And some of our other advisors from Nutanics, some of the other folks, we came together
and said, okay, this is a gap we can help to fill to help people understand.
Oh, does storage play the role in the eye?
And how can you make sure that we're focusing in the right way?
Now, you've been around compute for a very long time.
And obviously, you've been in ML Commons since it's an inception.
So you know how these benchmarks are evolving.
How do you see the response from the storage industry as you've taken this on?
And where do you think that goes in terms of evolution of the way people are thinking about the solutions in the space?
So I think one of the things that was honestly very thrilling to me, earlier this year we had, I think, our third round of MLPERS storage results, and we had 26 submissions.
That's amazing.
Yeah, that was a record for us.
It was the largest number of submitters we had up until that point.
And it was just topped by the ML Perfinscher yesterday, which got 27 a bit.
Okay, there you go.
So huge popularity.
And I think one of the things that's been very interesting is just the rate of change in AI is so rapid.
And we've seen that at our benchworks, too.
We started out focusing on what does data ingestion for training look like.
We added, what does checkpointing look like?
Because when you're doing at-scale training, failures, your ability to recover
swift to get that thousands of accelerators back on track. How quickly did you write that checkpoint
out? How quickly did you read it back in mass time on the clock? And we're even seeing more
evolution where we're seeing stories starting to play all in inference. And so just as AI becomes
more and more challenging, I think we sing, you know, oftentimes say compute and storage and
networking are sort of the three legs of the stool. We're seeing every leg of the stool getting
pressure. That's true. So with that, from your perspective, what are some of the challenges? We often
hear a lot about these challenges in large-scale AI training, you know, when it comes to resiliency and
system reliability. But how does storage play into those challenges? You know, when you think about
reliability, storage, that's where you want your data to reside. That's where you want your
checkpoints to reside. And so storage is an integral element of your reliability story. And
And when you think about just, again, the full system, everyone does training slightly differently.
And so for some people, they're just pulling raw data off of storage.
Sometimes they're pulling data off and doing a lot of transformations.
And so there might be a compute intense aspect to the storage before you get to the main training.
One of the things we did was we surveyed the field before we built benchmarks.
And there's just massive right.
But one of the ways to think about it is this era,
of AI. If you look back to 2011 with AlexNet, with ImageNet, the core observation is that if
you add ordinate up at some light, quality or intelligence, if you want to put it that way,
starts going on. And so we need more and more data. Well, that data has to let some, right? So
that's just one of the many ways that storage comes into. Now, you mentioned the 26 vendors that
participated in your last benchmark. What's interesting when you look at solutions is that people are taking
such vast approaches. Some folks are looking at local NBME, some are looking at software-defined
solutions. How do you view the diversity in the marketplace? And what does that say about
innovation? The diversity is actually truly astound. And while I would say I know a bit about
storage, as you pointed out in storage, we're in storage, like one of the principal tenants
of M.O. Perf is we're not prescript. There is no right answer. For some people, they want
file stories. For some people, they want object to storecru, and those are all valid solution.
Realizing the consequences of that and just how many different solutions out there was
really quite IOC to me. Many people build their infrastructure differently for different purposes.
You might start out as I did saying, hey, maybe all the self-driving car companies, maybe their
systems are kind of similar. Turns out not the case at all. No, and it's like they're all so different.
And so I think that just speaks to both the beauty and the challenge, which is if you want to build a good benchmark, you have to be very un-opinionated, very inclusive.
But it does let everyone show off what they want to do, but those degrees of freedom can also make it a bit challenging.
I have a follow-in question.
I mean, you've got that diversity.
You've just talked about some people want this, some people want that.
then should they look at those storage benchmarks as not,
who got the highest performance,
which is the way I typically historically think about benchmarks as,
where is the solution set that I'm looking for
and what can I learn about them from their performance?
Yeah, I think actually the latter view is really the more informative.
Certainly higher scores are better,
but one of the things that I think is very interesting about ML Perf
is it's not denominated necessarily in who,
gets the best bandwidth, because ultimately, at the end of the day, the core idea of MLPERF
storage is how do we look at the storage for an overall system without needing the
curriculum? Right.
How can we look at the storage you might need for 10,000 accelerators without having to buy
10,000 accelerators?
But the name of the game here is we're going to go train the network.
How do we keep those 10,000 accelerators sufficiently?
So how do you right-size the storage?
And I think that is what we set up to do that.
I think that's what we accomplished.
And, you know, like all benchmarks, like all projects,
I think there's a bunch of things we did really right.
And I think the popularity of the benchboard speaks to that.
And there are always things that we're going to implement as well as adding in more.
Nice.
Collaboration does seem to be a big part of ML Commons DNA.
How do partnerships differ within organizations and how does that shape ultimately the broader AI community?
Yeah, I would say that in general, I think one of the biggest strengths of that Mill Commons is being able to bring everyone together under one, where you have very divergent products, very divergent product strategies, architectures, and deans, and try to synthesize out of that something that is fair, reproducible, and representative.
Right, on the benchmark. And it's a vehicle for learning for everyone as well. The benchmarks, I think, have.
help vendors understand strikes and weaknesses and exposes some of the challenges that they're
going to have. To me, one of the most beautiful stories was someone who came back to us from a
different ML Perf benchmark, and they were just thrilled at some of the problems that they found.
And so for that, I said, whatever the score on the benchmark is, doesn't matter. But what matters
is that we learn a ton that we are going to translate into improvements for our customers. And that to
them. Yeah, I'm important. Anything. Because ultimately, it's about how do we work together to make
AI better? How do we bring those capabilities to more people, more areas? I live in San Francisco.
We have Waymo's all over them. Right. And I can tell you, as a bicyclist, they feel much safer.
And, you know, that's just one instance of the magic of a guy, right? There's other things,
you know, medical technology, hear about better detection of tumors. How can we make that or common
at Enrich Society in that way.
And we have to do it all together.
You took me back to when I was in the industry.
We would get ML Perf results in,
and we would immediately go into a conference room
and dissect where the gaps are
and what we needed to do next.
It's very visceral and it's very real.
And I'm sure that companies from across the industry
are living the same experience
every time you publish results.
What do you see the biggest areas of improvement are?
And I'm not asking you about
any particular company, because I know that you don't go there. But when you look at the storage
results in particular from the last cycle, how do you see the industry saying, hey, these are
the things that we're learning, these are the areas that are north stars in terms of reaching
for what customers need? I would say, first of all, keeping pace with the advances in and
computing the evolutions of workloads, which is, and this is one of those more subtle things,
which is if you're buying storage today for an accelerator today, that's fun.
But the accelerators or processors two years from now will process the same amount of data much more quickly.
And so if your storage is right size today, it may actually be a little bit slow for the future.
And so there's always this, how do we keep pace of that while staying accurate and true to our brutes?
How do we stay up with all the different ways that A.I. and is stressing on storage.
And a very, very simple example is, I think today it's now very widely established.
Everyone, when you go to deploy AI for inference, people are looking at KB caching,
at disaggregated inference.
And there's a role that storage can play there.
That wasn't something that people were looking at at all when we started MLPERS storage.
And even a year ago, I think that was a little bit less combat.
And so just keeping up with all of that, keeping 30 odd companies, all together,
all along moving forward.
It's very challenging, but it is also incredibly rewarding to bring everyone together and build
something like that's fantastic.
So looking ahead with that, where do you see what advancements or directions in AI excite
you most?
And how do you see ML Commons contributing to that future?
Yeah.
So this has actually been a breakout here for us.
Just in the last couple months, I was jotted an email down at the team and I was
describing it as the summer of MLPIRF.
We talked about MLPERS storage.
We also have MLPurf automotive, which came out very recently,
MLPIRF client.
And part of this is as we're seeing AI and being adopted in more and more places,
we have to come in and help fill those gaps.
And so storage was sort of us working with some storage folks to spot some coming challenges.
MLPIRF automotive was in response to the automotive folks saying,
okay, we are going to be using more AI, making more intelligent vehicles.
We need to get our hands back us.
And so I just look at this and I say, okay, how do we make our benchmarks better?
Because this is just a new field.
Like some of the most established edgeworking organizations that we look up to are 30 years old and they've been humming their craft.
We're seven years old.
We're still a grade school.
The babies of benchwords.
Yes, that's right.
And, you know, so there's a lot to learn.
And then just keeping a pace of things is both.
A little stressful and a bit of a challenge, but bringing everyone together, I think we can call it.
Well, David, I love having you on the show.
We've spent times in airports together and industry conference receptions.
And every time we have a conversation, I'm like, oh, my God, I walked away with so much knowledge.
And I appreciate you coming on Tech Arena.
One final thing for you, obviously, ML Perf is so foundational.
And everybody that is listening on the show wants to find out more.
I can guarantee it.
So where do you go to find out the latest from ML Commons and engage with you in particular?
Yeah.
You can find me online.
I am the Cantor on Twitter.
I'm on LinkedIn.
MLCommons.org is the place to go.
To your point, MLPurf is one of our exciting projects.
There are a lot of other things around data.
How do we standardize making data accessible to AI?
Yeah, that's see you.
You know, super exciting.
How do we make AI more reliable and responsive to what humans want?
And so there's a ton of projects.
we have the Demo Commons.
And I always say to anyone, if any of these resonate with you, please show up.
Like, we are a community of volunteers.
It came together with a bunch of folks who just saw a problem and said, we all want to solve it together.
I started out as a volunteer and look where I ended up.
Yep.
Thank you so much for being here.
And, Janice, that wraps another edition of Data Insights.
So thank you, Janice.
Thank you, David, for both for you know the show.
Thank you so much.
And thank you all for your time and attention.
Thank you, Alison.
Thank you, Dave.
Thanks for joining Tech Arena.
Subscribe and engage at our website,
techorina.a.i.
All content is copyright by Techarena.
