Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 06x03: AI PCs, Renewed Focus on AI Safety, and More with David Kanter of MLCommons
Episode Date: March 4, 2024AI is powering breakthroughs across all domains. In this episode of Utilizing AI Podcast brought to you by Tech Field Day, part of The Futurum Group, David Kanter, Founder and Executive Director of ML...Commons, joins hosts, Stephen Foskett and Frederic Van Haren, to talk about MLCommons’ role in driving valuable AI solutions, and helping organizations overcome the challenges around AI safety. MLCommons’ set of benchmarks provides transparent ratings and reviews of a wide range of products guiding buyers towards better purchase decisions. Hosts: Stephen Foskett, Organizer of Tech Field Day: https://www.linkedin.com/in/sfoskett/ Frederic Van Haren, CTO and Founder of HighFens, Inc.: https://www.linkedin.com/in/fredericvharen/ Guest: David Kanter, Founder and Executive Director, MLCommons: https://www.linkedin.com/in/kanterd/ Follow Gestalt IT and Utilizing Tech Website: https://www.GestaltIT.com/ Utilizing Tech: https://www.UtilizingTech.com/ X/Twitter: https://www.twitter.com/GestaltIT X/Twitter: https://www.twitter.com/UtilizingTech LinkedIn: https://www.linkedin.com/company/Gestalt-IT Tags: #UtilizingTech #AI #MLPerf #UtilizingAI #AISafety @MLCommons @UtilizingTech
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum Group.
This season of Utilizing Tech focuses on the emerging topic of artificial intelligence, just like the first three seasons did.
We're exploring the practical applications and the impact of AI on technology innovation in enterprise tech. I'm your host, Stephen Foskett, organizer of the Tech Field Day events,
and I'm joined by my co-host, Frederick Van Haren. Welcome.
Thanks. Glad to be here.
So we are here in Santa Clara for AI Field Day this week,
which is a really good opportunity to talk about some AI, don't you think?
I think so, too. Yeah, we always, when we talk about AI, we always talk a lot about training and how fast
your training has to be and benchmarking.
And I think nowadays we see more and more of a trend to focus on inference.
And I think generative AI is also part of that deal.
And with inference comes a whole slew of different approaches and different problems that we definitely should talk about.
And one of the things, too, that we're hearing a lot about is the alleged AI PC.
We shall see how real that gets.
Well, obviously, the hardware is real.
I guess the question for me is sort of how good is it?
How useful is it?
What's going to happen with it?
And then there's, of course, the other world of things that brought up again and again during the Tech Field Day
presentations, the AI Field Day presentations, which is AI safety. Right. Yeah, I think safety
is really key. It's a problem that comes along all the time. It's something we shouldn't ignore.
It's not easy to solve, but definitely something we should look at more closely.
So back in season three of the podcast, Frederick, you and I interviewed David Cantor from ML Commons. And guess what? If we look here, we've got David back for Utilizing AI once again. Welcome
back, David. Thank you. It's a pleasure to be back. Did I miss a whole season? You missed two
seasons, surprisingly enough. I missed two seasons. So why was my character
killed off in the prior season? Well, because this is a weird show in which we talk about
completely different things each season. And so you missed the CXL season and the Edge season.
Okay. Now, given what ML Commons does and ML Perf, I think we should have invited you on for Edge.
But tell us a little bit first, what is ML Commons? Yeah. So ML Commons is an industry consortium. We're focused on making AI better for everyone. And we really bring together
industry, academia, everyone across the globe. And with some of our efforts in AI safety, we're reaching
out to new constituencies, civil society, philanthropy, government, and even ordinary
people. I think one of the things that has really happened recently is, I'd say in the last couple
of years since you had me on, is AI has gone from something that was maybe
more of a technocratic concern to something that was tangible for everyone. And, you know,
really exciting in that regard. So, you know, we are a really unique community. We bring people
together to build for AI. We're one of the only communities that I know of that really does that building.
We don't want to do policy.
We don't want to do marketing.
Marketing is part of what we do, but it really is how can we bring together the folks who are driving AI forward and make it better for everyone.
Yeah, it really is remarkable that when we were recording the first three seasons of this podcast,
chat GPT was not a thing. And here we are, and everybody in the world knows about AI. Everyone in the world knows about large language models. Everybody's talking about generative AI. So I
think we need to start, I guess, we need to start with one of these topics. I know that there's a new
MLPerf coming out soon. Talk to us a little bit more about the next round of MLPerf.
Yeah. So for those who don't know, with MLPerf, we submit in rounds once a quarter.
This quarter, the focus is inference. And then next quarter, we'll be training. And so we always are trying
to update the benchmark suite, stay abreast of all of the developments. And one of the things
that, like many folks, we heard from tons of customers saying, we love using ML Perfect.
It is the gold standard for evaluating performance, for helping to make buying decisions, but we
need large language models. And so I'm really thrilled that we're going to be
delivering on this. So last year we added GPT-3 for training, and this year we're
adding LAMA2-70 billion for inference as well as stable diffusion XL. So that
covers both your large language model and your image generation,
which are two of the hottest areas.
So the submission is ongoing right now,
but we'll get to see the results in about a month
and see where the chips fall.
So I'm super excited about that.
Right, so when we talk about ML Perth for inference,
the submissions, is
that coming from enterprises or hardware vendors or kind of a match or mismatch,
mix match of all of them? Yeah, we actually tend to see a bit of a mix. In
general, I'd say it tends to be from solution providers, whether it's, you know,
we've got cloud folks, we've got hardware vendors, we've got software vendors who are, you know, building on top of standard hardware, but using
their software to add value. And we do have some customers. You know, we've had a number of
universities and other sites submitting on some of their own systems that they've already purchased.
But yeah, it is predominantly from the solution providers.
And it's a pretty big effort to submit to MLPerf, right? I mean, this is not something that
a hobbyist in their basement is going to do. This is something that you need to be a real
organization, right? Well, you know, look, I mean, I never want to downplay the capabilities
of hobbyists because I'm always impressed by some of the folks who come out of the woodwork. But it is really a significant accomplishment to get an MLPerf
submission. AI stacks are very complicated. There's a lot of configuration, a lot of tuning.
And it's like a marathon, right? Simply getting to the finish line is a genuine accomplishment.
And I might be able to run a marathon, maybe. Haven't done it before, but maybe I could.
But I'd probably turn in a great result for a 40-year-old. I'm reasonably fit, but
there's all sorts of different folks running
in marathons.
If you want it to be truly representative, I think it's probably best to have the people
who really know the solution doing that.
That being said, you've had MLPerf for mobile for a while now.
Yes.
Yeah.
When we started out, it was just with training.
And then over time, we've branched out into inference,
into mobile inference for smartphones,
where the ecosystem is a little bit more developed.
We know what the operating systems are that matter for mobile phones.
All of those operating systems generally have a preferred path to go down.
We've added tiny storage.
I think that's something new since the last time we've talked.
And there's all sorts of other places we're looking about.
But one of the other efforts we're really excited about is, I'd say, for a while, AI
has largely been a phenomenon in the cloud.
And one of the trends we're seeing is there's a tremendous push for local execution.
Sometimes that's simply for cost reasons.
Sometimes it's for privacy reasons.
Sometimes it's for latency reasons.
And so one of the things that we announced earlier this year is that we got all of the personal computing
ecosystem together to start building a client-oriented benchmark for machine learning.
And again, drawing on all the experience that we have with MLPerf training, MLPerf inference,
where we make sure that we get the right quality, we're picking the right networks, we do it in an open, transparent, fair, well-understandable way
so that we can start bringing that performance measurement,
that benchmarking, and ultimately drive all of the systems forward
to be more efficient, more power efficient, and more capable
to the systems we use as daily drivers.
So for people that are not too familiar with ML Commons,
so how should people imagine ML Perf?
So a bunch of submissions, somebody's interested in something,
they look at ML Commons.
What's the kind of the procedure or the steps organizations take
in order to learn from ML Perf?
Yeah, so I think, you know, so there's a lot of different ways, right?
So one you mentioned, you can submit.
So if you think you have a great solution,
you know, you can show up,
you can join and submit your results.
But if you're a customer,
I think it's a tremendous value.
Actually, I was talking to a customer for AI Systems
and they were saying, you know,
they were looking through the submissions
and their engineers were looking at them in detail and it really helped them figure out
how to configure things so that, you know, they didn't, they weren't really able to get
the most out of their system and get faster time to value.
So you know, one is to guide the decision making process, right?
What is the right thing for my needs?
Are my needs up here?
Are they down here?
Am I doing computer vision?
Am I doing speech-to-text?
Am I doing large language walls?
Those things all behave differently.
Helping guide what is the right thing to do is one aspect, but then even after the purchase,
as I said, like helping to configure, helping to operate. We really see it
as providing best known methods to the industry. So I'm curious, I mean, the client benchmark,
for these AI PCs, one of the questions I think that a lot of people have, certainly that I have
about AI PCs, is what is the application that's going to
be run on AI PCs that's going to make them AI PCs as opposed to, you know, PCs. Now, I do not doubt
that the hardware vendors are going to put inferencing engines on the chips. I do not doubt
that those are going to be useful and benchmarkable. But your benchmark is not a synthetic
benchmark. You're benchmarking actual real-world use case type things.
I mean, that's sort of the flag of MLPerf and what makes it useful.
So what are the things that are being done on these PCs that show whether they're good
AI PCs or bad AI PCs?
And what does that say about the future of this market?
No, I mean, that's a great question.
And so one of the approaches that we took to this is when we were thinking up what we would
do for MLPerfClient, we decided to take a scenario-focused approach.
Let's try to focus on actual use cases, things that are going to drive real value.
Now the first thing we were looking at was something based on generative AI, likely large language models.
And there's a bunch of different things that we've been evaluating.
And in conjunction with our community, I'm not here to say, oh, this is exactly what I say drives the industry, right? I am here to listen
and understand what people are doing. And I think there's a lot of different ways that you can,
for instance, use a large language model and deploy it, right? There's chatbots,
there's summarization, there's translation. As I look at all of those, you know, I think there's a
lot of different options out there.
I'd say one of the ones that I'm a little bit more excited about is probably on the summarization side,
and that would match up a little bit more closely with something like chat GPT.
Summarize my email, or I'm headed to Turkey soon. Like, what would be a fun thing
to do for a day in Turkey, right? That's going to be summarizing and synthesizing a bunch of stuff.
Yeah, and I think you're completely right. I think that summarization is actually going to be,
I don't want to say the killer app, because it's not much of a killer app, but I think it's going
to be a real useful thing that people are going to get from large language models on client devices.
Yeah.
What are some of the other applications you guys are excited about?
If you had all infinite power on your laptop, on your desktop, what are some of the things
that you'd be curious about?
Infinite power, that's an interesting question.
You want to snap your fingers and make half the world disappear?
Yeah, and then snap again to get the world back.
Yeah.
I don't know.
It's a...
Yeah, I think it's a...
When I look at the industry, it's the use cases are there.
It's just that there is a need for time to market, right?
So there was a time where people were saying, I'm converting from CPUs to GPUs.
I want to understand what the GPU will do for me
compared to a CPU infrastructure.
Today, I see more and more organizations
kind of doing the time to market thing.
They already have GPUs,
but they look at benchmarks to figure out
how can we reduce the two-week cycle
to a one-week cycle and so on.
So I would say that in the traditional AI,
where organizations are building their own model,
training their own models and doing their own inference,
that is more where I see a lot of time to market,
while in the generative AI,
I see a lot more creativity regarding to applications.
And so the shift is going from building models
more to transfer learning
and see what kind of power they need
in order to build that,
which is more of an inference problem
than a hardcore training problem, right?
Because the large language models
are pretty much given to you
and the only thing you need to do
is to bring your own context
and use RAC to kind of build your new models.
So I think those two is what I see.
It's a lot of creativity.
There's still a lot of hype around AI, unfortunately.
I think organizations that are trying to learn
are trying to move along and move away
from the way they used to do it in traditional IT,
buying a piece of infrastructure for three years.
But application-wise, I think there's still a lot of creativity.
There's still a lot of confusion.
I think ML Commons is helping a lot because you provide a little bit more than a reference system.
You can go to a hardware vendor, and the hardware vendor will claim that it's the best ever,
and nobody else does it, and the fastest, and the hardware vendor will claim that it's the best ever and nobody else does it and the fastest and the cheapest and so on, while having a sense
of reality is really important.
I think what's also interesting with ML Commons is that you also have a data angle.
And I think because you can have an algorithm and try to benchmark an algorithm, but if you don't try and do it with the same
data, your results might not be an apple to apple comparison. So maybe if you have some feedback for
us regarding to data and ML comments, that would be good as well. Yeah, no, I mean, that's an
integral part of all of our benchmarks is establishing not just what is the model,
what is the data, making
sure that we've got a good clean license for that, establishing the quality targets as
well.
And so there's a lot of work there.
And I mean, just things are moving so fast.
In the client space, we decided to really just focus on, let's just build the first
benchmark, build the framework around it, and then look and start
finding other scenarios that we want to expand into. Yeah. And in answer to your question about
sort of what do we see on the client side, I will just say that I've put a lot of thought
into the AI PC world, as have the other futureurum analysts. And a couple of things that jump out at me,
certainly large language models and generative AI for text, 100%.
I would also say speech-to-text is going to be a killer app
on every one of these machines, really high-quality speech-to-text,
really high-quality predictive text. One of the things the things too that I think is going to be emerging is video to data,
which is going to be an interesting area and something that we've seen a lot with already
with Google's announcements this week, where people are basically taking, sort of think of an amazing
screen reader that generates data from what your camera is seeing. I think AI PCs are going to be
doing that. So for example, you know, capturing everything in a presentation or everything in a
video and figuring out what are all the things, what are all the topics mentioned. I mean,
you look at a lot of what are these meeting assistants are already doing this, you know,
what are the action items from the meeting? I think that there's going to be a lot of spatial
awareness of, you know, what are the action items from my surroundings, from my daily work?
You know, I think we're going to see a lot of that sort of model as well. So each of these,
honestly, they're not that different. And I think that the same hardware is going to do a pretty
good job with all of these tasks. But I do think that once we start processing a huge volume of
data from cameras, for example, it's going to really drive demand for higher performance hardware.
So another thing that came up here at AI Field Day quite a lot is the question of AI safety
and trust. And these are one of those things where I don't want to throw stones at the companies in
the industry because I know that they're all working hard to basically make this stuff work.
But a lot of them are a little concerned, I think, as well, just like end users,
about safety and trust and what that means. And I know you are as well.
Yeah. No, that's right. So actually, one of the big things that we announced late last year was
we have an effort in AI safety. We started a working group to start at the start.
What motivated this?
We had tremendous success with MLPerf.
At a high level, the idea was, look, let's set the right metrics and then let's get everyone
to work together and drive forward and build the right solutions to measure the right things
to help us improve over time.
And that's sort of a virtuous cycle. And based on that, if you look at the data, we got 30, 50x speed up over the lifetime of ML Commons in training speed.
What if we could improve safety for AI by a similar amount by kind of taking that same focus in a very trusted,
open setting and drive things forward for AI safety. So that was sort of the
core idea. And as I said, it really is a much, much broader constituency for us.
Again, we really are great at building and pulling our community together to
build, but now the community
is just so much larger. Everyone cares about AI safety. Governments, you're seeing NIST in the
United States. There's the EU AI Act. Civil society, various philanthropies. There's just a
huge amount of space to cover here. And our feeling is that by building an open
testing platform that can both support testing and benchmarks, that we can really help to
improve the safety of AI over time and help create, and guide responsible development.
You know, in the same way that crash safety ratings help car manufacturers understand,
hey, what are the scenarios we want to design against?
Like, is a rollover important? Crashing headfirst into a wall?
Like, we can help guide that and work together with the broader community because we also we don't have all of the answers
And so we really want to invite everyone in
to help us work together
And and ultimately build for for safer and and better AI for everyone, you know, that's right part of our mission
But I see two sides one is a community. I mean, that's where it all starts
Yeah, and then the second portion is governments, right?
Do you see governments being more proactive and active in AI safety?
Or do you see them more on the sideline providing commentary, so to speak?
So I guess not being an elected or appointed official, I'm not sure I want to speak too much.
But, you know, I think one of the things that is very different about AI safety is that there is a really wide variety of different approaches, right?
And there's no one government, right?
You know, the United States is 50 states that are all together in a union,
and each state might think about things subtly, you know, slightly differently. And then,
but, you know, internationally, you know, things are very different as well. And so I think for us,
the key goal is not telling governments what to do, but, you know, how can we build tools that
will support all of the different needs, no matter how divergent they
are within reason.
Well, it's a little bit like the car industry, right?
Every country has its own specific rules, but there are basic rules where everybody
has to play along.
Yeah.
And I think one of the things that actually has been a pleasant surprise is I think there's a great deal of agreement on some of the
basic stuff right and obviously there's places where not everyone agrees but you
know just knowing that I think most folks come in with the right intentions
we want to make the world a better safer safer place for our kids, for our nephews, for the next generation.
And I think there's a large amount of agreement on what some of those things are.
And it's, you know, again, how can we build the tools that will help people, you know,
both test generative AI and other AI before deployment, test it at deployment, and guide the next
generation of systems as well. I think a lot of that, honestly, is going to be achieved
by having the conversation with a variety of people from a variety of countries and a variety
of backgrounds and perspectives. Really, I think what we need is to start asking the questions
so that we can put together a
framework for safety and a vocabulary for safety around AI. You know, maybe there's going to be
different standards in the EU versus the United States. I think probably there is.
But we should be using the same vocabulary and we should be asking the same questions
in the same way and then coming to different answers. And then that will allow,
I think, the AI developers to better answer those questions and to better abide by those
requirements. And I think that having the discussion is a really positive way for ML
Commons to do more than just be the performance people. Yeah, and I think to continue with your metaphor,
I think what we want to do is we want to build the tooling and
the infrastructure that allows you to ask the questions and
get the answers in a robust, transparent, principled, and open way.
And so we've partnered with folks in academia, Percy Liang at Stanford,
and the Helm Project had done some really great work in this area. So we're working together with them to build out our testing and safety platform.
And, you know, the other thing is this is an open effort.
So, you know, we want folks to show up, help us build, help tell us how we can build better.
Right. You know, we believe that we're going to do the best job we can, of course, but part of having
everyone under a big tent is you get to see what folks' concerns are, make sure that you
can incorporate them, and be very open to their suggestions and be very humble as well.
I think that's a critically important thing.
Before we go, tell us a little bit,
what's next for ML Commons? Yeah, I mean, I think for me, when I look at 2024, it's,
we're going to keep on doing all the awesome work we've done with MLPerf. I'm really excited to
launch MLPerf on the client side. And there might be other MLPerf stuff in the future, too. And then, you know, building out sort of our AI safety effort is critically important for us.
You know, getting our first demo in Q2 out will be great.
And then sort of figuring out what to do next, what comes after that, how we want to build on top of that.
We have some nascent efforts in automotive as well.
Also, we'll have a demo in Q2.
So we just have an absolutely exciting year ahead of us.
There's just so much to be done.
And I think that's part of what this technological change does is there's almost an infinite amount of really impactful and exciting work where we can ultimately make AI better for everyone.
And that's the goal of the organization.
Well, thanks so much for joining us again.
I look forward to seeing you on season six or seven or eight or whatever it happens to be.
Whenever it makes sense, right?
Giving us an update on everything
that ML Commons is doing.
Before we go,
where can folks learn more about this
and connect with you
and become part of all of your efforts?
Yeah, mlcommons.org.
And there should be a link on top.
We actually redid the website
since we last saw you.
But there should be a join
or get involved link up at the top.
Great.
How about you, Frederick?
Well, you can find me as Frederick V. Heron on LinkedIn
and on highfens.com, which is H-I-G-H-F-E-N-S.com.
And as for me, you'll find me at S. Foskett on most of the socials, including LinkedIn,
X Twitter and Mastodon. And of course, you'll find us here every Monday at Utilizing Tech,
Tuesdays on the new Tech Field Day podcast and Wednesdays on our weekly news rundown.
Thank you for listening to Utilizing AI, part of the Utilizing Tech podcast series.
If you enjoyed this discussion, please do subscribe. You'll find Utilizing Tech in all of your favorite podcast applications as well as on Group. For show notes and more, go to our dedicated website, which is utilizingtech.com, and find the podcast on
Twitter and Mastodon at Utilizing Tech. Thanks for listening, and we will see you next Monday. Thank you.