a16z Podcast - Hugging Face's Clem Delangue on Open Source AI and the LLM Bubble | MTS Live

Starting point is 00:00:00 The idea of like restricting a technology like AI based on risks is just like, for example, you would say, okay, some people can punch other people. So let's tie down everybody's hands, right? Because it's too dangerous. Some people can punch. Right. But you really want to do that because your hands are so useful. The way you want to control it is untie everyone and then regulate or fight the bad actors. So, for example, if hacking that creates cybersecurity risks, it's illegal, right? So you have to fight it, but not by preventing everyone from getting these capabilities. Otherwise, you blow down progress, you create massive gaps in terms of controls, in terms of capabilities, and you create actually additional risks. This episode originally aired on NTS. Open source software built much of the modern internet.

Starting point is 00:00:58 Linux, Apache, Kubernetes, and even the transformer architecture behind ChachyPT, all spread because researchers and developers could study, modify, and improve them in public. But AI is increasingly moving in the opposite direction, with the most powerful models distributed behind closed APIs controlled by a small number of companies. At the same time, China has emerged as one of the biggest contributors to open source AI, while debates around safety, regulation, and access are becoming more politically charged. And now those same tensions are extending into robotics, where AI is beginning to move off the screen and into the physical world. Theo Jaffe and Sophia Puccini speak with Clem DeLong, CEO at Hugging Face.

Starting point is 00:01:46 We are live here on MTS with Clement DeLong, who is the CEO of Hugging Face. which has been really an incredible resource for anyone who's interested in large language models and especially open weight large language models. I've been a hugging face user for a while now. So it's great to have you here. Clem, thanks so much for coming on MTS. Yeah, of course. Thanks for having me.

Starting point is 00:02:10 Absolutely. Okay, so you are a big proponent of open source. First of all, how do you predict, and you believe that open source is like a very important, you know, thing for, innovation and competition. So can you compare and contrast sort of like the open source environments

Starting point is 00:02:28 in the US and China to start? Yeah. So I mean, historically, the US was super, super strong with open source. That's kind of like what led to the current AI revolution. Right? Like the T in chat, GTT is actually coming from Transformer, which was open source

Starting point is 00:02:44 from Google. Unfortunately, for the past few years, this trend has changed and things tended to kind of like close down in the US and kind of like frontier labs more kind of like sharing their models behind like closed source APIs. The China so the complete opposite movement. They're the strongest open source contributors today. If you ask most most startups, most academia in the US that are using open source, they're usually using

Starting point is 00:03:22 Chinese open source models, right? You've probably heard of deep seek, of Quinn, of Kimi. There are kind of like a bunch of companies and organizations in China contributing massively to the field of open source. Great. So you recently said we're in an LLM bubble. What makes you think that? Well, I was asked if we were in an AI bubble, and I said we're probably not,

Starting point is 00:03:51 kind of like in AI as kind of like a general field bubble, but I feel like if there's one specific domain of AI where there's so much investment that there's maybe a risk of overinvesting, it's large language models distributed behind APIs, right? Like you see the building of crazy data centers for it. And obviously you see a lot of revenue growth, but with kind of like in certain margins

Starting point is 00:04:23 and certain kind of like long-term sustainability and mode for it. So if there's a bubble, it's probably an LLM but we'll see what happens in the next few months. Well, you're a big proponent of open source, you know, as we all know. But do you think that labs should ever restrict releasing their models in an open source way for safety reasons? Like yeah, in 2022-23,

Starting point is 00:04:50 It was way too early for that. The models at the time were toys. But now we have stuff like Claude Mythos, which supposedly can really assist people with cyber attacks. We have models that are increasing pretty dramatically in bio capability, which could be even scarier. So do you think companies should still be releasing their models as open source? So the interesting thing is that we've had these conversations and this kind of like talking point for a while in AI when we were early at. taking face, I think, six, seven years ago. At the time, it was GPT2,

Starting point is 00:05:25 and there was already, like, a lot of people saying that it was too dangerous to release in open source at the time, right? It was six, seven years ago when basically it was nothing more than just an auto-complete. I think we've seen progressively that these were quite overblown, and I think they're also overblown today, right? and move point is that, you know, Mito's, I think when it was announced, was it like three weeks ago, a month ago? It was crazy dangerous and now it's starting to be deployed kind of like everywhere, right? I think they just have access to the first international organization, it's in South Korea, I think, yesterday or something like that.

Starting point is 00:06:10 And probably in a few weeks or in a few months, everyone is going to be using Mitoos and not kind of like, destroy the world as a result. So I think with the current models, it's safe to release behind APIs. It's safe to release in open source. And it's actually the safest way because it gives everyone kind of like the capabilities to not only build the systems, but also build the protection systems, right?

Starting point is 00:06:47 So if we talk, for example, for cybersecurity. The biggest risk is that a few players have capabilities that other people don't have, right? And so the attackers could have capabilities that the defenders don't have. Whereas kind of like if you make it more open, actually it's usually easier for the defenders to react

Starting point is 00:07:09 and kind of like make the whole system safer. So that's kind of like what we see with each release is where there are always kind of like overgrown concerns before. And then progressively just we all adapt. And the benefits, kind of like outweights the risks. Yeah. It feels like we'll still be dealing with this problem in like 50 years where somebody releases like some sort of like open source robotics,

Starting point is 00:07:40 you know, robot or program or something. And then everyone is like, no, you shouldn't have done that. It's so risky. And then we'll just adapt again. It's kind of like the story of technology You know, like I mean The idea of like restricting a technology Like AI based on risks

Starting point is 00:07:57 It's just like for example You would say okay Some people can punch other people So let's tie down everybody's hands Right? Because it's too dangerous Some people can punch Right But in reality you don't want to do that

Starting point is 00:08:11 Because your hands are so useful They're creating so many good things In the world You need your hands the way you want to control it is untie everyone, give the freedom to everyone, and then regulate or fight the bad actors, right? So, for example, if, you know, hacking that creates cybersecurity risks, I mean, it's illegal, right?

Starting point is 00:08:34 So you have to make it illegal. You have to fight it, but not by preventing everyone from getting these capabilities. Because otherwise you blow down progress, you create massive gaps in terms of controls, in terms of capabilities, and you create actually additional risks. Well, right now on the topic of regulation, President Trump is in China where he will be meeting with Xi Jinping over the next couple days. And they're going to be discussing, among other things, AI regulation and international AI agreements. So what do you hope to get out of this in terms of open source? Yeah, I mean, I'm excited to see conversations about open source AI. Probably there's going to be some conversations about distillation,

Starting point is 00:09:29 about collaborations between two countries. I hope, you know, both countries will be able to agree on fostering more transparency, more openness to kind of like help more people. people access, access this technology. I'm glad that Jensen hopped into the plane and join these conversations because I think he has a lot of the right perspectives on this topic to kind of like basically create more collaboration between countries and kind of like shared progress. Yeah, I'm curious about your robotics push. So you guys, launched Le Robot in 24.

Starting point is 00:10:17 And you've talked about how robotics is the next frontier unlocked by AI and all of this stuff. How do you sort of see this playing out and what is the role of open source? Yes, I have two little robots behind me, two rich inini. We've shipped almost 10,000 of them all over the world. So it's probably one of the most widely distributed robots of, of, of, of the year at this point. I think what's really cool with robotics

Starting point is 00:10:48 is that enables kind of like very new use cases and better use cases for AI. So for example, for the Ritchini, you have an app store. Anyone can build apps. So there's been over 300 apps that have been created for it already. And when you see it in action, for example, with kids, empowering kids to interact with AI in a different way than, you know,

Starting point is 00:11:13 looking at a laptop or looking at the phone, you realize that it's very empowering. When you see kind of like the Ritchimini on a kitchen table, looking around and helping you cook, you realize that it's enabling, empowering, creating new use cases that are just not possible just with a laptop and a phone, right? That's why OpenEi and Sam Outman, for example,

Starting point is 00:11:39 I've talked a lot about their excitement about bringing new devices. devices to market. There's an important China-US component there because it's very likely that Chinese are going to dominate robotics, or at least they're already dominating. And so on this topic too, it's really important that we build more in the US on this topic.

Starting point is 00:12:10 and we obviously have a lot of strength for it with the strength of the startup ecosystem in the US, the strengths of the frontier models. I hope to see a lot more in the coming months on the topic. Hugging Face has been compared to GitHub a lot, you know, the GitHub of AI. But why wasn't GitHub the GitHub of AI?

Starting point is 00:12:31 It seems like they've kind of fumbled a lot of things in the AI realm. So why do you think HuggingFace became sort of the go-to place for model developers to deploy models and not GitHub? Yeah, I mean, I don't blame them. They have a lot on their plates, right? Like, I think with the coding assistant, they've got like dealing with their own set of issues.

Starting point is 00:12:56 The reality is that hosting and sharing AI artifacts is quite different than hosting code. So even if people have been calling us the GitHub of AI, I think it's two very different things. For example, for us, the volume of files of data that we're dealing with is much, much larger than what the GitHub is doing. For example, just last week, we added two petabytes of data to the platform just last week. It's a gap like a matter of comparison is the equivalent of 500,000 two hours movies that have been uploaded to her. Face just last week.

Starting point is 00:13:41 So you have a lot of like structural differences. And we managed to build kind of like our infrastructure capabilities in a way that makes it just like better for people that are building in AI to use Hugging Face to host their models, their data sets, both publicly but also privately. We have a lot of private usage now. So that's kind of like some of the reasons why. we managed to do it whereas GitHub focused on

Starting point is 00:14:12 other things. Totally. Well, that's pretty cool. We love hugging face. And we really appreciate your early support of MTS and our drops. So it was great to have you on today.

Starting point is 00:14:24 Clem, thanks so much for coming on MTS. Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family.

Starting point is 00:14:40 For more episodes, Go to YouTube, Apple Podcasts, and Spotify. Follow us on X and A16Z and subscribe to our Substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode. This information is for educational purposes only and is not a recommendation to buy, hold, or sell any investment or financial product. This podcast has been produced by a third party and may include pay promotional advertisements, other company references, and individuals unaffiliated with A16Z.

Starting point is 00:15:09 Such advertisements, companies, and individuals are not in doors by AH Capital Management LLC, A16Z, or any of its affiliates. Information is from sources deemed reliable on the date of publication, but A16Z does not guarantee its accuracy.

a16z Podcast - Hugging Face's Clem Delangue on Open Source AI and the LLM Bubble | MTS Live

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.