Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 06x03: AI PCs, Renewed Focus on AI Safety, and More with David Kanter of MLCommons

Episode Date: March 4, 2024

AI is powering breakthroughs across all domains. In this episode of Utilizing AI Podcast brought to you by Tech Field Day, part of The Futurum Group, David Kanter, Founder and Executive Director of ML...Commons, joins hosts, Stephen Foskett and Frederic Van Haren, to talk about MLCommons’ role in driving valuable AI solutions, and helping organizations overcome the challenges around AI safety. MLCommons’ set of benchmarks provides transparent ratings and reviews of a wide range of products guiding buyers towards better purchase decisions. Hosts: Stephen Foskett, Organizer of Tech Field Day: ⁠⁠https://www.linkedin.com/in/sfoskett/⁠⁠ Frederic Van Haren, CTO and Founder of HighFens, Inc.: ⁠⁠https://www.linkedin.com/in/fredericvharen/⁠ Guest: David Kanter, Founder and Executive Director, MLCommons: https://www.linkedin.com/in/kanterd/ Follow Gestalt IT and Utilizing Tech Website: ⁠⁠⁠⁠https://www.GestaltIT.com/⁠⁠⁠⁠ Utilizing Tech: ⁠⁠⁠⁠https://www.UtilizingTech.com/⁠⁠⁠⁠ X/Twitter: ⁠⁠⁠⁠https://www.twitter.com/GestaltIT⁠⁠⁠⁠ X/Twitter: ⁠⁠⁠⁠https://www.twitter.com/UtilizingTech⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠https://www.linkedin.com/company/Gestalt-IT Tags: #UtilizingTech #AI #MLPerf #UtilizingAI #AISafety @MLCommons @UtilizingTech

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum Group. This season of Utilizing Tech focuses on the emerging topic of artificial intelligence, just like the first three seasons did. We're exploring the practical applications and the impact of AI on technology innovation in enterprise tech. I'm your host, Stephen Foskett, organizer of the Tech Field Day events, and I'm joined by my co-host, Frederick Van Haren. Welcome. Thanks. Glad to be here. So we are here in Santa Clara for AI Field Day this week, which is a really good opportunity to talk about some AI, don't you think? I think so, too. Yeah, we always, when we talk about AI, we always talk a lot about training and how fast
Starting point is 00:00:48 your training has to be and benchmarking. And I think nowadays we see more and more of a trend to focus on inference. And I think generative AI is also part of that deal. And with inference comes a whole slew of different approaches and different problems that we definitely should talk about. And one of the things, too, that we're hearing a lot about is the alleged AI PC. We shall see how real that gets. Well, obviously, the hardware is real. I guess the question for me is sort of how good is it?
Starting point is 00:01:21 How useful is it? What's going to happen with it? And then there's, of course, the other world of things that brought up again and again during the Tech Field Day presentations, the AI Field Day presentations, which is AI safety. Right. Yeah, I think safety is really key. It's a problem that comes along all the time. It's something we shouldn't ignore. It's not easy to solve, but definitely something we should look at more closely. So back in season three of the podcast, Frederick, you and I interviewed David Cantor from ML Commons. And guess what? If we look here, we've got David back for Utilizing AI once again. Welcome back, David. Thank you. It's a pleasure to be back. Did I miss a whole season? You missed two
Starting point is 00:02:04 seasons, surprisingly enough. I missed two seasons. So why was my character killed off in the prior season? Well, because this is a weird show in which we talk about completely different things each season. And so you missed the CXL season and the Edge season. Okay. Now, given what ML Commons does and ML Perf, I think we should have invited you on for Edge. But tell us a little bit first, what is ML Commons? Yeah. So ML Commons is an industry consortium. We're focused on making AI better for everyone. And we really bring together industry, academia, everyone across the globe. And with some of our efforts in AI safety, we're reaching out to new constituencies, civil society, philanthropy, government, and even ordinary people. I think one of the things that has really happened recently is, I'd say in the last couple
Starting point is 00:03:00 of years since you had me on, is AI has gone from something that was maybe more of a technocratic concern to something that was tangible for everyone. And, you know, really exciting in that regard. So, you know, we are a really unique community. We bring people together to build for AI. We're one of the only communities that I know of that really does that building. We don't want to do policy. We don't want to do marketing. Marketing is part of what we do, but it really is how can we bring together the folks who are driving AI forward and make it better for everyone. Yeah, it really is remarkable that when we were recording the first three seasons of this podcast,
Starting point is 00:03:54 chat GPT was not a thing. And here we are, and everybody in the world knows about AI. Everyone in the world knows about large language models. Everybody's talking about generative AI. So I think we need to start, I guess, we need to start with one of these topics. I know that there's a new MLPerf coming out soon. Talk to us a little bit more about the next round of MLPerf. Yeah. So for those who don't know, with MLPerf, we submit in rounds once a quarter. This quarter, the focus is inference. And then next quarter, we'll be training. And so we always are trying to update the benchmark suite, stay abreast of all of the developments. And one of the things that, like many folks, we heard from tons of customers saying, we love using ML Perfect. It is the gold standard for evaluating performance, for helping to make buying decisions, but we
Starting point is 00:04:45 need large language models. And so I'm really thrilled that we're going to be delivering on this. So last year we added GPT-3 for training, and this year we're adding LAMA2-70 billion for inference as well as stable diffusion XL. So that covers both your large language model and your image generation, which are two of the hottest areas. So the submission is ongoing right now, but we'll get to see the results in about a month and see where the chips fall.
Starting point is 00:05:18 So I'm super excited about that. Right, so when we talk about ML Perth for inference, the submissions, is that coming from enterprises or hardware vendors or kind of a match or mismatch, mix match of all of them? Yeah, we actually tend to see a bit of a mix. In general, I'd say it tends to be from solution providers, whether it's, you know, we've got cloud folks, we've got hardware vendors, we've got software vendors who are, you know, building on top of standard hardware, but using their software to add value. And we do have some customers. You know, we've had a number of
Starting point is 00:05:54 universities and other sites submitting on some of their own systems that they've already purchased. But yeah, it is predominantly from the solution providers. And it's a pretty big effort to submit to MLPerf, right? I mean, this is not something that a hobbyist in their basement is going to do. This is something that you need to be a real organization, right? Well, you know, look, I mean, I never want to downplay the capabilities of hobbyists because I'm always impressed by some of the folks who come out of the woodwork. But it is really a significant accomplishment to get an MLPerf submission. AI stacks are very complicated. There's a lot of configuration, a lot of tuning. And it's like a marathon, right? Simply getting to the finish line is a genuine accomplishment.
Starting point is 00:06:48 And I might be able to run a marathon, maybe. Haven't done it before, but maybe I could. But I'd probably turn in a great result for a 40-year-old. I'm reasonably fit, but there's all sorts of different folks running in marathons. If you want it to be truly representative, I think it's probably best to have the people who really know the solution doing that. That being said, you've had MLPerf for mobile for a while now. Yes.
Starting point is 00:07:19 Yeah. When we started out, it was just with training. And then over time, we've branched out into inference, into mobile inference for smartphones, where the ecosystem is a little bit more developed. We know what the operating systems are that matter for mobile phones. All of those operating systems generally have a preferred path to go down. We've added tiny storage.
Starting point is 00:07:48 I think that's something new since the last time we've talked. And there's all sorts of other places we're looking about. But one of the other efforts we're really excited about is, I'd say, for a while, AI has largely been a phenomenon in the cloud. And one of the trends we're seeing is there's a tremendous push for local execution. Sometimes that's simply for cost reasons. Sometimes it's for privacy reasons. Sometimes it's for latency reasons.
Starting point is 00:08:17 And so one of the things that we announced earlier this year is that we got all of the personal computing ecosystem together to start building a client-oriented benchmark for machine learning. And again, drawing on all the experience that we have with MLPerf training, MLPerf inference, where we make sure that we get the right quality, we're picking the right networks, we do it in an open, transparent, fair, well-understandable way so that we can start bringing that performance measurement, that benchmarking, and ultimately drive all of the systems forward to be more efficient, more power efficient, and more capable to the systems we use as daily drivers.
Starting point is 00:09:05 So for people that are not too familiar with ML Commons, so how should people imagine ML Perf? So a bunch of submissions, somebody's interested in something, they look at ML Commons. What's the kind of the procedure or the steps organizations take in order to learn from ML Perf? Yeah, so I think, you know, so there's a lot of different ways, right? So one you mentioned, you can submit.
Starting point is 00:09:27 So if you think you have a great solution, you know, you can show up, you can join and submit your results. But if you're a customer, I think it's a tremendous value. Actually, I was talking to a customer for AI Systems and they were saying, you know, they were looking through the submissions
Starting point is 00:09:44 and their engineers were looking at them in detail and it really helped them figure out how to configure things so that, you know, they didn't, they weren't really able to get the most out of their system and get faster time to value. So you know, one is to guide the decision making process, right? What is the right thing for my needs? Are my needs up here? Are they down here? Am I doing computer vision?
Starting point is 00:10:10 Am I doing speech-to-text? Am I doing large language walls? Those things all behave differently. Helping guide what is the right thing to do is one aspect, but then even after the purchase, as I said, like helping to configure, helping to operate. We really see it as providing best known methods to the industry. So I'm curious, I mean, the client benchmark, for these AI PCs, one of the questions I think that a lot of people have, certainly that I have about AI PCs, is what is the application that's going to
Starting point is 00:10:45 be run on AI PCs that's going to make them AI PCs as opposed to, you know, PCs. Now, I do not doubt that the hardware vendors are going to put inferencing engines on the chips. I do not doubt that those are going to be useful and benchmarkable. But your benchmark is not a synthetic benchmark. You're benchmarking actual real-world use case type things. I mean, that's sort of the flag of MLPerf and what makes it useful. So what are the things that are being done on these PCs that show whether they're good AI PCs or bad AI PCs? And what does that say about the future of this market?
Starting point is 00:11:21 No, I mean, that's a great question. And so one of the approaches that we took to this is when we were thinking up what we would do for MLPerfClient, we decided to take a scenario-focused approach. Let's try to focus on actual use cases, things that are going to drive real value. Now the first thing we were looking at was something based on generative AI, likely large language models. And there's a bunch of different things that we've been evaluating. And in conjunction with our community, I'm not here to say, oh, this is exactly what I say drives the industry, right? I am here to listen and understand what people are doing. And I think there's a lot of different ways that you can,
Starting point is 00:12:13 for instance, use a large language model and deploy it, right? There's chatbots, there's summarization, there's translation. As I look at all of those, you know, I think there's a lot of different options out there. I'd say one of the ones that I'm a little bit more excited about is probably on the summarization side, and that would match up a little bit more closely with something like chat GPT. Summarize my email, or I'm headed to Turkey soon. Like, what would be a fun thing to do for a day in Turkey, right? That's going to be summarizing and synthesizing a bunch of stuff. Yeah, and I think you're completely right. I think that summarization is actually going to be,
Starting point is 00:12:59 I don't want to say the killer app, because it's not much of a killer app, but I think it's going to be a real useful thing that people are going to get from large language models on client devices. Yeah. What are some of the other applications you guys are excited about? If you had all infinite power on your laptop, on your desktop, what are some of the things that you'd be curious about? Infinite power, that's an interesting question. You want to snap your fingers and make half the world disappear?
Starting point is 00:13:26 Yeah, and then snap again to get the world back. Yeah. I don't know. It's a... Yeah, I think it's a... When I look at the industry, it's the use cases are there. It's just that there is a need for time to market, right? So there was a time where people were saying, I'm converting from CPUs to GPUs.
Starting point is 00:13:46 I want to understand what the GPU will do for me compared to a CPU infrastructure. Today, I see more and more organizations kind of doing the time to market thing. They already have GPUs, but they look at benchmarks to figure out how can we reduce the two-week cycle to a one-week cycle and so on.
Starting point is 00:14:04 So I would say that in the traditional AI, where organizations are building their own model, training their own models and doing their own inference, that is more where I see a lot of time to market, while in the generative AI, I see a lot more creativity regarding to applications. And so the shift is going from building models more to transfer learning
Starting point is 00:14:28 and see what kind of power they need in order to build that, which is more of an inference problem than a hardcore training problem, right? Because the large language models are pretty much given to you and the only thing you need to do is to bring your own context
Starting point is 00:14:45 and use RAC to kind of build your new models. So I think those two is what I see. It's a lot of creativity. There's still a lot of hype around AI, unfortunately. I think organizations that are trying to learn are trying to move along and move away from the way they used to do it in traditional IT, buying a piece of infrastructure for three years.
Starting point is 00:15:09 But application-wise, I think there's still a lot of creativity. There's still a lot of confusion. I think ML Commons is helping a lot because you provide a little bit more than a reference system. You can go to a hardware vendor, and the hardware vendor will claim that it's the best ever, and nobody else does it, and the fastest, and the hardware vendor will claim that it's the best ever and nobody else does it and the fastest and the cheapest and so on, while having a sense of reality is really important. I think what's also interesting with ML Commons is that you also have a data angle. And I think because you can have an algorithm and try to benchmark an algorithm, but if you don't try and do it with the same
Starting point is 00:15:45 data, your results might not be an apple to apple comparison. So maybe if you have some feedback for us regarding to data and ML comments, that would be good as well. Yeah, no, I mean, that's an integral part of all of our benchmarks is establishing not just what is the model, what is the data, making sure that we've got a good clean license for that, establishing the quality targets as well. And so there's a lot of work there. And I mean, just things are moving so fast.
Starting point is 00:16:17 In the client space, we decided to really just focus on, let's just build the first benchmark, build the framework around it, and then look and start finding other scenarios that we want to expand into. Yeah. And in answer to your question about sort of what do we see on the client side, I will just say that I've put a lot of thought into the AI PC world, as have the other futureurum analysts. And a couple of things that jump out at me, certainly large language models and generative AI for text, 100%. I would also say speech-to-text is going to be a killer app on every one of these machines, really high-quality speech-to-text,
Starting point is 00:17:03 really high-quality predictive text. One of the things the things too that I think is going to be emerging is video to data, which is going to be an interesting area and something that we've seen a lot with already with Google's announcements this week, where people are basically taking, sort of think of an amazing screen reader that generates data from what your camera is seeing. I think AI PCs are going to be doing that. So for example, you know, capturing everything in a presentation or everything in a video and figuring out what are all the things, what are all the topics mentioned. I mean, you look at a lot of what are these meeting assistants are already doing this, you know, what are the action items from the meeting? I think that there's going to be a lot of spatial
Starting point is 00:17:55 awareness of, you know, what are the action items from my surroundings, from my daily work? You know, I think we're going to see a lot of that sort of model as well. So each of these, honestly, they're not that different. And I think that the same hardware is going to do a pretty good job with all of these tasks. But I do think that once we start processing a huge volume of data from cameras, for example, it's going to really drive demand for higher performance hardware. So another thing that came up here at AI Field Day quite a lot is the question of AI safety and trust. And these are one of those things where I don't want to throw stones at the companies in the industry because I know that they're all working hard to basically make this stuff work.
Starting point is 00:18:44 But a lot of them are a little concerned, I think, as well, just like end users, about safety and trust and what that means. And I know you are as well. Yeah. No, that's right. So actually, one of the big things that we announced late last year was we have an effort in AI safety. We started a working group to start at the start. What motivated this? We had tremendous success with MLPerf. At a high level, the idea was, look, let's set the right metrics and then let's get everyone to work together and drive forward and build the right solutions to measure the right things
Starting point is 00:19:22 to help us improve over time. And that's sort of a virtuous cycle. And based on that, if you look at the data, we got 30, 50x speed up over the lifetime of ML Commons in training speed. What if we could improve safety for AI by a similar amount by kind of taking that same focus in a very trusted, open setting and drive things forward for AI safety. So that was sort of the core idea. And as I said, it really is a much, much broader constituency for us. Again, we really are great at building and pulling our community together to build, but now the community is just so much larger. Everyone cares about AI safety. Governments, you're seeing NIST in the
Starting point is 00:20:12 United States. There's the EU AI Act. Civil society, various philanthropies. There's just a huge amount of space to cover here. And our feeling is that by building an open testing platform that can both support testing and benchmarks, that we can really help to improve the safety of AI over time and help create, and guide responsible development. You know, in the same way that crash safety ratings help car manufacturers understand, hey, what are the scenarios we want to design against? Like, is a rollover important? Crashing headfirst into a wall? Like, we can help guide that and work together with the broader community because we also we don't have all of the answers
Starting point is 00:21:07 And so we really want to invite everyone in to help us work together And and ultimately build for for safer and and better AI for everyone, you know, that's right part of our mission But I see two sides one is a community. I mean, that's where it all starts Yeah, and then the second portion is governments, right? Do you see governments being more proactive and active in AI safety? Or do you see them more on the sideline providing commentary, so to speak? So I guess not being an elected or appointed official, I'm not sure I want to speak too much.
Starting point is 00:21:45 But, you know, I think one of the things that is very different about AI safety is that there is a really wide variety of different approaches, right? And there's no one government, right? You know, the United States is 50 states that are all together in a union, and each state might think about things subtly, you know, slightly differently. And then, but, you know, internationally, you know, things are very different as well. And so I think for us, the key goal is not telling governments what to do, but, you know, how can we build tools that will support all of the different needs, no matter how divergent they are within reason.
Starting point is 00:22:27 Well, it's a little bit like the car industry, right? Every country has its own specific rules, but there are basic rules where everybody has to play along. Yeah. And I think one of the things that actually has been a pleasant surprise is I think there's a great deal of agreement on some of the basic stuff right and obviously there's places where not everyone agrees but you know just knowing that I think most folks come in with the right intentions we want to make the world a better safer safer place for our kids, for our nephews, for the next generation.
Starting point is 00:23:07 And I think there's a large amount of agreement on what some of those things are. And it's, you know, again, how can we build the tools that will help people, you know, both test generative AI and other AI before deployment, test it at deployment, and guide the next generation of systems as well. I think a lot of that, honestly, is going to be achieved by having the conversation with a variety of people from a variety of countries and a variety of backgrounds and perspectives. Really, I think what we need is to start asking the questions so that we can put together a framework for safety and a vocabulary for safety around AI. You know, maybe there's going to be
Starting point is 00:23:51 different standards in the EU versus the United States. I think probably there is. But we should be using the same vocabulary and we should be asking the same questions in the same way and then coming to different answers. And then that will allow, I think, the AI developers to better answer those questions and to better abide by those requirements. And I think that having the discussion is a really positive way for ML Commons to do more than just be the performance people. Yeah, and I think to continue with your metaphor, I think what we want to do is we want to build the tooling and the infrastructure that allows you to ask the questions and
Starting point is 00:24:32 get the answers in a robust, transparent, principled, and open way. And so we've partnered with folks in academia, Percy Liang at Stanford, and the Helm Project had done some really great work in this area. So we're working together with them to build out our testing and safety platform. And, you know, the other thing is this is an open effort. So, you know, we want folks to show up, help us build, help tell us how we can build better. Right. You know, we believe that we're going to do the best job we can, of course, but part of having everyone under a big tent is you get to see what folks' concerns are, make sure that you can incorporate them, and be very open to their suggestions and be very humble as well.
Starting point is 00:25:19 I think that's a critically important thing. Before we go, tell us a little bit, what's next for ML Commons? Yeah, I mean, I think for me, when I look at 2024, it's, we're going to keep on doing all the awesome work we've done with MLPerf. I'm really excited to launch MLPerf on the client side. And there might be other MLPerf stuff in the future, too. And then, you know, building out sort of our AI safety effort is critically important for us. You know, getting our first demo in Q2 out will be great. And then sort of figuring out what to do next, what comes after that, how we want to build on top of that. We have some nascent efforts in automotive as well.
Starting point is 00:26:07 Also, we'll have a demo in Q2. So we just have an absolutely exciting year ahead of us. There's just so much to be done. And I think that's part of what this technological change does is there's almost an infinite amount of really impactful and exciting work where we can ultimately make AI better for everyone. And that's the goal of the organization. Well, thanks so much for joining us again. I look forward to seeing you on season six or seven or eight or whatever it happens to be. Whenever it makes sense, right?
Starting point is 00:26:45 Giving us an update on everything that ML Commons is doing. Before we go, where can folks learn more about this and connect with you and become part of all of your efforts? Yeah, mlcommons.org. And there should be a link on top.
Starting point is 00:26:57 We actually redid the website since we last saw you. But there should be a join or get involved link up at the top. Great. How about you, Frederick? Well, you can find me as Frederick V. Heron on LinkedIn and on highfens.com, which is H-I-G-H-F-E-N-S.com.
Starting point is 00:27:21 And as for me, you'll find me at S. Foskett on most of the socials, including LinkedIn, X Twitter and Mastodon. And of course, you'll find us here every Monday at Utilizing Tech, Tuesdays on the new Tech Field Day podcast and Wednesdays on our weekly news rundown. Thank you for listening to Utilizing AI, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe. You'll find Utilizing Tech in all of your favorite podcast applications as well as on Group. For show notes and more, go to our dedicated website, which is utilizingtech.com, and find the podcast on Twitter and Mastodon at Utilizing Tech. Thanks for listening, and we will see you next Monday. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.