No Priors: Artificial Intelligence | Technology | Startups - Erik Bernhardsson on Creating Tools That Make AI Feel Effortless

Episode Date: January 9, 2025

Today on No Priors, Elad chats with Erik Bernhardsson, founder and CEO of Modal Labs, a platform simplifying ML workflows by providing a serverless infrastructure designed to streamline deployment, sc...aling, and development for AI engineers. Erik talks about his early work on Spotify’s ML algorithms, what Modal offers today, and his vision for building an end-to-end solution for AI engineers. They dive into GPU trends, cloud vs on-premise setups, and when to train custom models vs use off-the-shelf solutions. Erik also shares his thoughts on the evolving role of AI in fields like coding, physics, and music. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Bernhardsson Show Notes: 0:00 Introduction  0:22 Erik's early interest in ML infra 1:22 Founding Modal Labs  4:17 State of GPU use today and what’s to come 7:14 Modal's end-to-end vision  9:00 Differentiating amongst competition 10:20 Cloud vs on-premise  12:35 Popular AI models  13:20 Gaps in AI infrastructure 14:55 Insights on vector databases  16:48 Training models vs off-the-shelf models  17:47 AI’s impact on coding and physics 22:14 AI's impact on music

Transcript
Discussion (0)
Starting point is 00:00:00 Today I'm chatting with Eric Bernhardtson, founder and CEO of Modul. Modul developed a serverless cloud platform tailored for AI, machine learning, and data applications. And before that, Eric worked at Better.com and Spotify, where he led Spotify's machine learning efforts and built the recommender system. Well, Eric, thanks so much for joining me today on NoPriars. Yeah, thanks. It's great to be here. So if I remember correctly, you worked at Spotify and helped build out their ML team and recommender
Starting point is 00:00:27 system, and then we're also a better.com. What inspired you to start modal and what problem were you hoping to solve? Yeah, as it started as Spotify a long time ago, 2008, and I spent seven years there. And yeah, I built a music recommendation system. And back then, there was like nothing really in terms of data infrastructure. Hadoop was like the most modern thing. And so I spent a lot of time building a lot of infrastructure, particularly I built a workflow schedule called Luigi that basically no one uses today. I built a vector database that, called Illinois, that, you know, for a brief period people used, but no one you really used this today. So spent a lot of time building a lot of that stuff.
Starting point is 00:01:02 And then later at better, I was a CTO and thinking a lot about like developer productivity and stuff. And then during the pandemic, I took some time off and started hacking on stuff. And I realized I always wanted to build basically a better infrastructure for these types of things, like data AI machine learning. So pretty quickly realized like this is what I wanted to do. And that was sort of the genesis of modal. That's cool. How did that approach evolve or where do the main areas that the company focuses on today?
Starting point is 00:01:27 So I started looking into, first of all, just like, what are the challenges with data, AI, machine learning infrastructure, and I started thinking about from like a developer productivity point. What's a tool I want to have? And I realize a big sort of challenge is like working with the cloud is arguably kind of annoying. Like, as much as like I love the cloud for it powered it that gives me, and I've used the cloud since, you know, way back 2009 or so, it's actually pretty frustrating to work with. And so in my head, I had this idea of like, what if you know. make cloud development feel almost like as good as local development, right? Like it had this like fast feedback loops. And so started to think about like, how do we build that and realize pretty quickly like, well, actually we can't really use
Starting point is 00:02:06 Docker and Kubernetes. We're going to have to throw that out. And, and probably going to have to build their own file system, which we did pretty early in a build on scheduler and build their own container runtime. And so that was like basically the two first years of model. It was just like laying all that like foundational infrastructure layer in place. Yeah. And then in terms of the things that you offer today for your customers, what are the main services or products or? Yeah, so we're infrastructure as a service, so which means like on one side, we run a very big compute pool, like thousands of GPUs and CPUs, and we make it very easy to get, you know,
Starting point is 00:02:39 if you need 100 GPUs, we can typically get you that within seconds. So sort of one big multi-tenant pool, which means like capacity planning is something we kind of take, you know, it's something we solve for customers. They don't really need to think about reservations. We always provide a lot of on-demand GPUs. On the other side, there's a Python SDK that makes it very easy to build applications. So the idea is like you write code, basically like functions in Python, and then we take those functions, turn them into serverless functions in the cloud.
Starting point is 00:03:08 We handle all the containerization and all the infrastructure stuff. So you don't have to think about all the sort of Kubernetes and Docker and stuff. And the real killer app, as it turns out, like we started this company pre-gen AI, but as it turns out, the main thing that really started driving all the traction was when stable diffusion came out. And a bunch of people came to us and were like, hey, actually, this looks kind of cool. Like you have GPU access.
Starting point is 00:03:27 It's very easy to, you know, you have to think about, you know, spinning up machines and provisioning them. So that was like our first sort of killer app was like just doing Gen AI in a serverless way with the focus of diffusion models. Like now we actually, we have a lot more of different modalities.
Starting point is 00:03:40 Like a lot of users are still like texted image, but we also see a lot of audio and music. So one example of a customer, I think is super cool building really amazing stuff is Suno, which does AI-generated music. So they run all their inference. on modal, very large scale. There's a lot of customers like that,
Starting point is 00:03:58 sort of dealing with, like, you know, building cool Gen. AI models. In particular, I would say, in the models of, like, audio, video, image and music, stuff like that. That's cool. And I think Sunno's using all, like, transformer backbone now for stuff, right, versus a diffusion model-based thing. I think it's a combination of both.
Starting point is 00:04:14 I'm not sure. Yeah, yeah, I think they talk about it publicly. That's the only reason I mention it. You wrote in October this post, I think it was called The Future of AI Needs More Flexible GPU Capacity, and in general, what I've heard in the industry is that a lot of ways that people use GPU is reasonably wasteful.
Starting point is 00:04:31 And so I'm a little bit curious about your view on flexibility around GPUs, how much is actually used versus wasted, how much optimization is left, even just with existing types of GPUs that people are using today. Yeah, GPUs are expensive, right? And I think it's sort of kind of as like a paradox
Starting point is 00:04:47 it means that cloud, you know, a lot of the cloud capacity is like the only way to get it is to sign long-term commitments, which I think for a lot of startups is really not the right model for how things should be. Like, I think the amazing thing about the cloud was always to me that, like, you have on-demand access to, like, whatever many CPUs you need. But for GPUs, like, the main way to get access has been over the last few years due to the
Starting point is 00:05:08 scarcity has been to sign long-term contracts. And I think fundamentally, that's just not how startups should do it, right? Like, you know, and that kind of get-it has been sort of supply-demand issues. But just, you know, looking at the CPU market, the fact, the fact that you know, the fact that you have like instant access to thousands of CPUs if you need it. Like my vision has always been there should be the same thing for GPUs. And that means, you know, especially as we shift more to inference, I think for training it's been sort of a less of an issue,
Starting point is 00:05:35 because like you can sort of just like make use of the training resources you need. But for instance, influence especially, you don't even know how much you need, right? Like in advance, like it's very volatile. And so a big challenge that we sold for a lot of customers is we're fully usage-based. So when you run things on modal, we charge you only for the time the containers actually run and that's a massive hassle for customers' traditions, like doing the capacity planning and thinking how many GPUs. And then having the issue, like, either you're over-provision
Starting point is 00:06:03 and you're paying for a lot of idle capacity or you're under-provision, and then you have, you know, when you run the capacity shortage, you have degradation and service. And so whereas with modal, we can handle these very bursty, very unpredictable workloads really well because we basically take all these user workloads
Starting point is 00:06:19 and just run a big pool of thousands of GPUs across many different customers. Yeah. One of the things that always struck me about training is, to your point, you kind of spin up a giant cluster. You run in a huge supercomputer, right? And then you run it for months, in some cases. And then your output is a file.
Starting point is 00:06:36 And that's literally what you've generated, you know? It's kind of insane if you think about it. And that file in some sense is a representation of the entire internet or some corpus of human knowledge or whatever. And then to your point with inference, you need a bit more flexibility in terms of spinning things up and down, or alternatively, if you're doing shorter training runs or certain aspects of post-training, you may need more flexible capacity to deal with. Totally. And that's something we're really interested in right now. Like traditionally,
Starting point is 00:06:59 most of the modal has always been inference. Like that's been our main use case. But we're really interested also in training. So in particular, like probably focused more on these like shorter, like very burst to sort of experimental training runs, not that like very big training runs because I think that's a very different market. So that's like a very interesting thing. How do you think about meeting people's end-to-end needs? So I know that there's a lot of other things that people do. There's, you know, a lot of people are using RAG to basically augment what they're doing or, you know, there's a variety of different things that people are now doing at time of inference in terms of using compute to, you know, take different approaches there.
Starting point is 00:07:35 I'm a little bit curious how you think about the end-to-end stack of things that could be provided as infrastructure and where modal focuses or wants to focus. Yeah, totally. I mean, our goal has always been to build a platform and cover like the end-to-end use case. It just turned out that inference was, we were well positioned to focus on that as our first killer app. But my, my end goal has always been to make engineers more productive and focus on what I think was like the high code side of ML. Like we're, like our target audience tends to be more like sort of traditional like ML engineers, like people building their own models. But there's many different aspects of that. There's like the data pre-processing. Then there's the training and then there's the inference. And
Starting point is 00:08:14 And that is actually probably like even more things, right? Like, you know, having feedback loops where you gather data and like, you know, online ranking models and all these things. And so my goal for modal is always been to cover all of that stuff. And so it's interesting. You see a lot of customers now, we don't have a training product, but a lot of customers use model for batch pre-processing. So they use model to, you know, maybe they're training of video models and maybe they have
Starting point is 00:08:36 like petabytes of video. So then they use modal actually, maybe with GPUs even, to like do feature extraction. And then they train it elsewhere. and then they come back to modal for the inference. So for us to do training makes a lot of sense. And in general, I think it makes a lot of sense to sort of build a platform where you can handle the entire sort of machine learning lifecycle end to end
Starting point is 00:08:55 and many other things related to that. Also the data pipelines and nightly batch jobs and all these things. Yeah. I mean, what you describe is a pretty broad platform-based approach. I think there's a handful of companies who are sort of in your general space or market. How do you feel that modal differentiates from them? I think, first of all, we're cloud native. Like, we're just, like, cloud maximalists.
Starting point is 00:09:17 Like, we went all in and said, like, basically, we're going to build a multi-tenant platform that runs everyone's compute in the same. And the benefits of that are very tremendous because, like, we could just do capacity management much better. And that's one of the ways we can offer, like, instantaneous access to hundreds of GPUs if you need to. Like, you can do these, like, very bursty things.
Starting point is 00:09:33 And we just give you lots of GPUs, right? I think the other benefit or the other sort of differentiation is to be a very general purpose. We focus on sort of what I think, I mentioned, like, high code. Like, we run custom code in our containers, in our infrastructure, which is a harder problem. Like containerization and running user code in a safe way is a hard problem. And then dealing with container cold start, and like I mentioned, we have to build it on scheduler.
Starting point is 00:09:56 We have to build it on container runtime and our own file system to boot containers very quickly. And I think so, unlike many other vendors, like they're only focused on, say, inference or maybe only LMs. Our approach has always been to build a very general purpose platform and sort of, you know, In the long run, I hope to sort of, that sort of manifestation will be more clear because I think there's many other products we can build on top of this now that we have the compute layers sort of kind of becoming more and more mature. When I talked to large enterprises about how they're thinking about adoption of AI, many of them already have their data on Azure or GCP or AWS. They're running their application on it. They've bought credits in the marketplace. They want to spend resident. They've already gone through security reviews. They've kind of done a lot. And they worry about things like latency. or pings out to other third-party services
Starting point is 00:10:44 versus just running on their own existing cloud provider or their hyperscaler that they work with or set of hyperscalers, you know, many of them actually work across multiple. How do you think about that in the context of modal in terms of your own compute versus hypers versus, you know, the ability to run anywhere? Yeah, totally.
Starting point is 00:11:00 And of course, there's also a sort of security compliance aspect of this. Like I think, you know, it is a, you know, challenge. I look back at when the cloud came. And I remember back in like 2008, 2009, the cloud came. And my first reaction was like, how the hell would like, why would anyone put their computer in someone else's computer and like run that? And I think, you know, to me, that was just like insane. Like, why would anyone do that? But over the next couple of years, I was like actually kind of makes a lot of sense. And I think now even like among like enterprise
Starting point is 00:11:28 companies, like there's a sort of recognition that like, yeah, actually probably our computer is more safe in the big hyper-scalers. And in a similar vein, I remember talking to Snowflake back in say 2012 or something like that. And they had a sort of similar approach where like they basically said, like, we're going to run databases in the cloud, and it's not going to be in your environment, you know, or maybe in your environment, but like, we're in infrastructure as a service. And I thought that was nuts. And then obviously, like, I think Snowflake now is a very large, you know, public and traded company. I think that, like, infrastructure as a service makes a lot of sense. And so I think there is a little bit of resistance to, to adopting this, like,
Starting point is 00:12:01 multi-tenant model. But I think, you know, when you look at, like, security and adoption of cloud, I think we have a lot of tailwinds blowing in our direction. I think security is moving away from sort of a network layer into an application layer. I think bandwidth costs are coming down. I think there's a lot of tricks you can do to minimize bandwidth transfer costs. You can store data in like R2, for instance, which has zero egress fees. It's something that I think is realistically going to mean we're going to have to push a lot. But I think there's so many benefits of this multi-tenant model in terms of capacity management
Starting point is 00:12:33 that to me it is very clearly like a big part of the future of AI is like running a big pool of compute and slicing it very dynamically. You mentioned earlier that one of the things that really caused early adoption of modal was stable diffusion and sort of these open source models around ImageGen. Are there any open source projects or models that you're seeing be very popular in recent days or in the last couple of months that have really started taking off? That's a good question. I think if anything, it's actually been a little bit of a shift towards more like proprietary models. But like proprietary opensports models, I guess, so like flux, I think, most of really. recently it's been, you know, a model that's getting a lot of attention.
Starting point is 00:13:14 I'm personally very interested in, like, audio. I think Unreal is, like, very under-explored. I think there's a lot of opportunity for open-source models in that space. But I don't think we've seen anything really cool yet. What else do you think is missing in the world today in terms of AI infrastructure or infrastructure as a service? I mean, I'm very biased, but I think modal is missing. Like, basically a way to, like, for engineers to take code and run it.
Starting point is 00:13:38 And look, I'm very bullish. on like, you know, code and like people wanting to write code and building stuff themselves. I think outside of sort of LM space, which is like a very kind of different world, in my opinion, I think there's always going to be a lot of applications where people want to train their own models, they want to run their own models, or at least like run other models but have like very custom workflows. And I just don't think there's been a great way to do that. It's like pretty painful to do that. And so I think that's pretty exciting. I think on the storage side, there's some other really exciting stuff. Like we haven't really touched storage at model. We focus very much
Starting point is 00:14:08 on compute. So I'm personally very interested in sort of vector database. Like, how's that going to evolve? I don't think anyone really knows. I'm pretty interested in, like, you know, more efficient storage on training data. I'm also really interested in, like, I guess another thing I'm very fascinated by right now is training workloads. In order to train large models efficiently, you've had to really spend a lot of money and time setting up the networking. So one of the things I'm really excited about is, what if you don't, you know, what if we can make training less bandwidth hungry because I think that would actually change a lot of the infrastructure around training where you can now like kind of tie together a lot of GPUs in
Starting point is 00:14:47 different data centers and and not have to you know have this like very large data centers with like you know infiniband and stuff so that's like another sort of infrastructure thing I'm looking forward to see more development on them. How important so there's there's sometimes been a little bit of debate around vector dbs and you mentioned that you actually built one when you're at Spotify. I think Spotify today hit $100 billion in market cap. I think it's one of the first European technology companies to get there, which is pretty cool. So a lot of folks I know may use one of the existing vector debes or in some cases are just using Postgres with PGVector, right? How do you think about the need for vector databases, a sort of standalone pieces of infrastructure versus just, you know, adopting Postgres versus doing something else?
Starting point is 00:15:33 Yeah, I feel like everyone's debating that. I don't know necessarily. Like, I think there's a lot of, there's a case to be made that, you know, you can just stick everything into relational database and you're fine. To me, like, the bigger question is, like, in the long run, like, you know, if you think about, like, what's like an AI native data storage solution. Like, I don't even know if it's, like, necessarily has the same form factors and the same interface as a database.
Starting point is 00:15:56 So that's actually a bigger question that I'm more excited about. It's like, I think people look at, like, Victor databases and, like, you know, whether it's relation or not, this sort of shoehorn is. into this, like, you know, sort of old school model if, like, you put data, you get data back. But I don't know, I think there's, like, a lot of room to sort of rethink that in the age of AI and have very different, like, you know,
Starting point is 00:16:15 interaction models with that data. I know that sounds a little fluffy. Yeah, it's super interesting. Could you say more on that? I mean, like, one thing I think a lot about is, like, maybe the database itself be, like, the embedding engine, right? Like, instead of, like, you put a vector in, you know, you search by that vector,
Starting point is 00:16:28 I think there's a lot of, you know, the more native, like, AI-native storage solution would be you put text in, you put, you know, video in, you put image in, and then you can search by that. Like, to me, that would be like a more sort of native, AI native sort of storage solution. So that's like one line of thought that I've had is like, maybe we just, we're just like so early to this that like, I think it's going to take five, ten years for it to really, for it to shake out.
Starting point is 00:16:52 Yeah, that's really cool. I guess one other thing that you mentioned was more people seem to be training their own models, at least in a lot of the areas that modal works with. Do you think there's any heuristic that people should follow in terms of when to train their model versus use something off the shelf? I think eventually, like, for any company where model quality really matters, unless you kind of train your own model in the end, like, I feel like it's going to be hard to sort of defend the fact that like, you know, you have a better solution. Because otherwise, like, what's your moat? Like, if you don't have your own model, like, you need to find some, a moat somewhere else in the stack. And that might be possible to find.
Starting point is 00:17:28 it might be somewhere else for a lot of companies. But I think at least if you have your own model and that model clearly is better than anyone else, then that sort of inherently is in itself. I think it's more clear outside of the LM space when people are building audio, video, image models. I think if that is your core focus, it's very clear to me.
Starting point is 00:17:49 You kind of have to train your own models in that case. Yeah. If I remember correctly, you were an I-O-I gold medalist. Yeah, that's right. Obviously, you think a lot about code and coding and how do you think that changes with AI over time? Or do you have any contrarian predictions on what happens there? I don't know if this is contrarian, but I actually think that, like, you know, this is just like one out of many improvements in developer productivity.
Starting point is 00:18:14 And, you know, you look back at like, you know, whatever, like compilers was originally like, you know, a tool that made developers more productive and then like higher level programming languages and databases and cloud and all these things. So, like, I actually don't know if, like, AI is, like, you know, different than any of those changes in the hindsight. And so, and by the way, like, every time that's happened, you know, it turns out, like, there's so much latent demand for software that actually, like, the number of software engineers goes up. So, like, I feel like you look back at, like, you know, last, like, 40 years of software development. Like, every decade, engineers get, like, 10 times more productive due to better frameworks or better, you know, tooling or whatever. And it turns out, actually, that just unlogged more latent demand for software engineers.
Starting point is 00:18:54 So I'm very bullish in software engineers. I think it would take a lot to sort of destroy that demand. I think people look at a lot of AI as like a kind of fix something. But in my opinion, it's like, no, it's like it's just going to unlock more latent demand for more things. So I'm very bullish in software engineering. And then I guess the other field that you touched a long time ago was I think you won a Swedish physics competition in high school. And I'm curious if you followed any of the physics-based AI models or some of the simulation. Like, that's an area that strikes me is very interesting.
Starting point is 00:19:24 And the way you think about the models for it are different. Yeah. I did win the Swedish high school physics competition. I was a total math lead nerd when I was, you know, my teenagers. Okay. Yeah, I think it's a really fascinating area right now. Like, it's one of those areas that seems like there's some real reinvention needed and not as many people working on it.
Starting point is 00:19:46 So it's one of the areas I'm kind of excited about just in terms of there's lots and lots of different applications that you can start to come up with relative to it. Yeah. I mean, physics, in my opinion, it's like, you know, it looked like like like the golden era of physics, like the 20s and 30s and 40s. I kind of feel like it's like it hasn't really evolved much to the field. So I don't know, maybe I would love for you to be right that there's like a resurgence of, you know, new physics-based models. Yeah, I don't know if I would necessarily help in the short run with basic research. I think it just helps with simulation.
Starting point is 00:20:11 It kind of feels like physics as a field really kind of doubled down on sort of the Ed Wooden path of physics and maybe got a little bit lost there or something. I'm not sure. Are you talking about more like material, like doing more like compute-based? It's kind of like ANSIS or other companies where, you know, you simulate an airplane wing. You simulate load bearing in a... Oh, I see. It's like high HPC. There's always existed, right?
Starting point is 00:20:33 Like, especially in like, you know, oil and gas and stuff like that. But it's a lot of kind of small bespoke kind of fine-tuned or hand-tune models for specific things versus, you know. I mean, meteorology is like something I actually think like deep learning should like change, right? Like, it sort of makes a lot of sense. Like, you know, deep learning should be very good at, like, you know, predicting, you know, turbulence and things like that. Like, because turbulence is actually very hard to solve the traditional physics models, right? And so deep learning should, in theory, I kind of feel like makes a lot of sense. Yeah, I think there's been a couple of papers on that out of invidia.
Starting point is 00:21:06 And then I think Google has a team that's worked on it. And so there's a couple different sort of weather simulation teams that have started to publish some pretty interesting stuff, it seems. Yeah. Yeah. I mean, I would also point to, like, in a decent area, like, biotech, I think, has, like, been, you know, computational methods have been enormously successful, right? Like if you look at protein folding in particular, but also other things like sequence alignment and things like that.
Starting point is 00:21:28 And that's actually a field where we start to see a lot more usage as model as well. It's a lot of, I feel like there's like a kind of a resurgence of computational biology. It's really exciting thing. Oh, that's really neat. Yeah. Are there specific use cases you see people engage with most across your customer base relative to the sciences? There's a lot. I'm not a bioperson.
Starting point is 00:21:49 So this is kind of superficial, kind of looking at our customers. But, like, one thing I've seen a lot is actually medical imaging. Because my understanding is, like, with modern methods, like, you can do, like, very automated, like, get, like, millions of, you know, get, like, millions of, you know, experiments and do, you know, automated electron microscope imaging of that. So we've actually seen quite a lot of customers, like, use modal for, like, then processing and doing computer vision on those images, which is kind of cool. It's really cool. Is there any area that you're most excited about from a human impact perspective for some of these models? You know, with my background at Spotify, like I think Suno is like, to me, very exciting thing. I think it's still like very early, sort of AI-generating music.
Starting point is 00:22:29 You can still hear that it's like not right. It's sort of uncanny value a little bit. But like Sooner's like every generation of their model is like getting better and better. And first of all, like music in itself tends to be like sort of always like one of the first areas where you see real impact of new technologies, whether, you know, Spotify or like, iTunes or piracy or like all these things or gramophones going back, right? So I always think music is like an exciting area for that, in that sense. Like it always shows like the opportunity of new technologies. And I also think like soon as like fundamentally something you couldn't have done before Jenny I.
Starting point is 00:23:02 So that to me is like really exciting. It's like sort of really pushing the frontier enabling a completely new product that there's like, there's no way this sooner could have existed five years ago. That's cool. Well, I think we covered a lot today. Thanks so much for joining me. Yeah, thanks a lot. It's great. Find us on Twitter at NoPriarsPod.
Starting point is 00:23:20 Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.