No Priors: Artificial Intelligence | Technology | Startups - Music consumers are becoming the creators with Suno CEO Mikey Shulman

Starting point is 00:00:00 Hi, listeners, and welcome to No Pryors. Today we're talking to Mikey Schulman, the co-founder and CEO of Suno, an AI music generation tool, trying to democratize music making. Users can make a song complete with lyrics just by entering a text prompt. For example, I was playing with it this morning, and you guys all get to hear Kodo Boombop with Lofi intricate beats. Great. Nature weave tails in each gentle race. Interior pedals, fall time slows its pace. Every feeding cherry boom, a hint of grace.

Starting point is 00:00:38 Okay. So I'm feeling really excited about quality here for a company that is just under two years old, but is making waves in the AI music industries. Since you came out of stealth mode late last year, Mikey? That's right. All right. Well, we're excited to talk to you about AI music models and how it's been going since launch. Thanks for much for doing this. Welcome. Thank you. I'm super excited to be here. Okay, maybe just start us off with a little bit of background. You're a kid who loved music, playing in bands.

Starting point is 00:01:09 How do you go from that to, you know, Harvard Physics PhD building, you know, couple AI companies? Yeah, I guess a bit of a circuitous route. I've been playing music for a really long time since I started playing piano when I was four. I played in a lot of bands in high school and college growing up. And the dirty secret is I'm not that good. And so the smart move, I suppose, for me, was to pursue the thing that I was relatively better at, which was physics. I went to college and then to grad school and did a PhD in physics.

Starting point is 00:01:43 Studied quantum computing. Maybe for your next podcast, I can tell you about why you shouldn't go into quantum computing. What did you think you were going to do? Like, did you think you were going to be like a theoretical physicist or like an academic? Oh, goodness. Two things. Like, I've never had a master plan, so I don't think I thought what I was going to do or not going to do. But I am certainly not great of physics.

Starting point is 00:02:07 You know, I think I had a reasonably successful PhD, not because I'm good of physics. The quantum mechanics that I studied was worked out in, like, the 50s. There was a lot of very tricky low-temperature microwave engineering that turns out to be really important for actually doing this stuff. I got lucky that I was relatively good at that compared to all the other physicists. So, you know, kind of something on the boundary between two disciplines. I enjoyed every second of that. I would do it all again, even knowing what I would be when I grew up or when I grew out of that. Still very close with my PhD advisor.

Starting point is 00:02:45 I still live walking distance from my old lab. You know, it's kind of a fun place to just walk around Cambridge, Massachusetts. But, yeah, quantum computing is cool, not what I wanted to do. do with my life. I found a company called Kentro by accident, not founded, found. They were local and I met them and probably 10 people at the time and I met all 10 and I really, really liked them. And I said, let's go do this. And I was hired as a software engineer. And I think I got really, really lucky in terms of timing about a month after I joined, the machine learning opportunities came along. And in 2014, guy with PhD in physics is what passes for machine learning

Starting point is 00:03:25 engineer. And so I took full advantage of that opportunity, learned a ton, got to build a team, got to build some fun products. We were acquired by S&P Global in 2018 and got to pursue a lot of fun stuff after that acquisition as well. So I guess I found my way into AI somewhat by accident, but I really like it. It's a lot of fun. So you guys actually started with, this open source model bark. Can you talk about what the idea was at the very beginning and how you ended up in music generation? We were doing all text at Kencho, and we did our first audio project after we were acquired by S&P Global, which was learning to transcribe earnings calls. So I'm sure both of you have read an earnings call transcript. Exceedingly likely it was done by

Starting point is 00:04:10 S&P Global. It used to be done completely manually, very painful. And we could lend a lot of speed and scale by bringing automation to that. And we fell in love with doing audio AI. Like, we happened to be musicians, but it kind of took this very honestly non-sexy project of earnings call transcription to show us how much we loved it. And we also realized that certainly compared to images and text, audio was really, really far behind. And this was in 2020. And I think that's maybe even more true now if you just look at everything that's happened in images and text in the last years. Like I said, we never had a master plan. We made Bark as an open source project, and even before we released Bark, we knew we wouldn't be focusing on speech. I think, if I'm honest,

Starting point is 00:04:57 a lot of people told us, go build a speech company. It is more straightforward. You'll build a great B2B product, and people will love it. And we couldn't help ourselves. We just love music too much. And so we decided to build a music company. Why did you know you weren't going to focus on speech? Speech is super interesting, but the inherent creativity that we were so drawn to. It was like not really present in speech. Speech just needs to be right. Just like read me this New York Times article. And if it's a tiny bit non-expressive or a tiny bit robotic, that'll still get the job done. And the real creativity was happening in a totally different part of audio, which is music, which all I care about is how it makes me feel.

Starting point is 00:05:35 That's really cool. And then as a price that you've taken, because I guess the two main architectures that people have used for different forms of audio models. I mean, a lot of them are traditionally on diffusion models. I know there's been more work on the Transformer side. And then there's obviously a few other types of architectures. Is there anything you can tell us about sort of the technical approaches you've taken or how you think about it? And one of the reasons I ask is obviously for a lot of the transformer models, people just look at scaling laws and how things will sort of adapt with scale. And I'm a little bit curious how that applies to music and how you think about that future relative to models and approaches.

Starting point is 00:06:06 We don't make it a secret that these are just transformers. This is somewhat our backgrounds doing text before, but also transformers. scale nicely. A lot of work ends up being done for you by the open source text community, which is always really nice. We can really be choosy with where we innovate, and where we end up innovating a lot

Starting point is 00:06:27 is how do you tokenize audio? You know, audio does not give us the good favor of being nicely discretized. It's sampled very, very quickly, approximately 50,000 samples per second. It's a continuous signal. And so you have to use a set of heuristics or models in order to turn

Starting point is 00:06:46 that into a manageable set of tokens. And that's where we expend, I think, a lot of our kind of innovation cycles is really understanding that. As you said, the thing that matters is how it makes you feel. And so, like, how did you measure quality in your own models? Like, what do you know about how to train something that creates great generations? Is it just all like Mikey as human eval? It's definitely not all Mikey as human eval. But, you know, one thing we say here is, is that aesthetics matter, and I think that is a recognition that I think in all branches of AI we become slaves to our metrics, and you say I did this accuracy on this benchmark and this benchmark, and in the real world, sometimes it doesn't necessarily matter. And these benchmarks

Starting point is 00:07:34 are extra terrible in audio just because the field is so new. And so aesthetics matter is like a way of saying that you have to use your ears in order to evaluate things. You can look at the things like at what your final losses or something like that. But ultimately, it's definitely more tedious to evaluate than you want it to be. I think the good news is everybody here really loves music. And so evaluating your models, which means listening to a lot of things and getting people to listen to a lot of things and doing a lot of A-B tests, turns out to be fun. But I think we have a long way to go in this journey on how we're actually going to evaluate these things. And I think we learn a lot about human beings and human emotions while we learn to evaluate these things.

Starting point is 00:08:19 Yeah, it's interesting because as an analog, I know that in the early days of mid journey, one of the ways it really stood out is people just felt that there is better taste exhibited, you know, it's better aesthetics versus, hey, there's a much better e-gal function that they're optimizing against, although obviously there were things that were doing there as well. And so it feels similar here where that sort of taste component really matters, particularly early on. Are there other ways that your music background has impacted the development of Suno or really helped sort of facilitate some of the things that you're doing?

Starting point is 00:08:50 There's this cliche about it being really important to look at your results and look at your data in machine learning and in AI. And if that is pleasurable, it is not nearly as tedious. And that's not just for me, that's kind of everybody here. And that ends up mattering a lot. I've learned a lot about music actually since starting. this company and just the exposure to different genres that I never knew existed and exposure to hybrids of genres that have yet to be created by people has been like really, really

Starting point is 00:09:22 eye-opening. But it's funny because you ask like, okay, you know, maybe the stuff that I know about music, we actually try very hard not to put too much, play implicit bias in the model. The model shouldn't know about music theory. You don't tell GPT this is a noun and this is a verb. GPD figures it out. If I tell my model, there are only 12 tones. My model will only know how to output 12 tones. If I tell my model, there's 50 different instruments. I will never get that unique sound. And so we've really tried very hard not to do anything like that. And honestly, I don't think this is so smart of us. This is something that we've stolen from the text world of there's something beautiful about Next Token prediction that ends up being very, very powerful.

Starting point is 00:10:05 Mikey, what's hard in AI music? Like, I know less about what this frontier looks like. I know less about like what this frontier looks like, where do you want to push in terms of things that are really hard for the model to get right? Like, you know, in visual models or video like human hands, object permanence, like there's lots of things that are more intuitive to me there. Yeah, that's a really good question. I confess, I've not really thought about that too much. There are the easy things or the easy to describe things like, you know, did you get the stereo right? Did you get the bit rate high enough, et cetera? Again, I think the reason you music is so special is because it makes you feel a certain way. And like to the extent that

Starting point is 00:10:42 any of this is difficult, it is because you are really targeting human emotions in some way. And that's not terribly well understood by anyone. And it is also super, super, super, super diverse and super culturally dependent and super age dependent or demographic dependent. So, you know, I think what we're doing is so far from objective truth. And it's very easy for people who spend all their days in text LLMs to be thinking about things like, this is how well I did on the L-SAT. You know, I can pass the bar with this size model, like the law bar. And none of that exists for us.

Starting point is 00:11:22 It's really just like I made a song and it made me feel a certain way. And it may have been grainy audio that made me feel a certain way. It may have been a long song, a short song. I think there's a lot more unanswerable questions in this domain. And then one of the things that you all did quite early is, I believe you have like a free tier so people can make up to 10 songs a day and then you have a subscription-based approach.

Starting point is 00:11:45 How do you think about your users over time in terms of consumer versus prosumer versus business users? And is it too early to tell? Is that there a specific area that you're most focused on? Like, how do you think about all that stuff? Yeah, that's a great question. I would say, you know,

Starting point is 00:11:58 we are trying to change how the entire globe interacts with music and to open new experiences for people. And so what that means is that this is a consumer product. This is not sprinkling AI into Ableton or Logic or Pro Tools. This isn't for the person already staying up all night as a hobbyist trying to produce music. This is for everybody. This is for like my mom. And, you know, I think the business side of things, it may not be conventional wisdom to say start charging immediately for your product. But it's actually really important as we are trying to create something that is a set of behaviors that does not exist, to be able to

Starting point is 00:12:43 understand what actually makes people want to part with their hard-earned dollars. If I'm being honest, people ask about the business model of generative AI a ton, and I think everybody's doing kind of something that looks like SaaS pricing, and it's kind of done very crudely, and we are certainly no exception to that. But I don't know if this is right in the long term. And it strikes me as probably just a vestige of it is the same types of people who were building SaaS companies five years ago and the same investors who were investing in SaaS companies five years ago who are building it and investing in it this time around. And so it kind of feels like a bit of a vestige. No offense to you guys. You guys are both great investors. But like this feels

Starting point is 00:13:24 like something that's not totally worked out yet. Yeah. It's interesting because like I remember talking to some people were very active in the 90s as the web browser was really coming to the fore front, and they were trying to figure out the right business model for web pages. And a lot of the emphasis was actually, should we do micropayments? So every time you read a New York Times article, you pay a fraction of a cent instead of ads-based models, right? And of course, the world ended up collapsing on that side to ad-based models. But nobody that I've talked to from that era actually thinks that was necessarily the right answer. They just think it was the easiest thing to do in the short run. And so I think there's a really interesting question here, to your point, in

Starting point is 00:14:02 terms of, you know, subscriptions, there's ads, there's other sorts of paid placement. There's a variety of things you could do over time. There's micro transactions. And so there's reselling things in a marketplace and letting people take a cut of subscribers, you know, almost like a next-gen Spotify or something. So it's super interesting to wonder how all this evolves and where you take it. So it's really cool that you're thinking deeply about it right now. Yeah, it's actually funny to hear you say that because I remember back in the 90s, my older brother was a beta tester for AOL. And I actually remember some of these things happening. And I remember actually watching him beta test these things.

Starting point is 00:14:38 Yeah, that's cool. Are there any ways that people have started to use a product that were very unexpected for you? Or surprising use cases or applications or other things people have done with it? I think so much has been really fulfilling and cool to see and definitely surprising. And, you know, one thing I'm constantly reminding everyone is that we are eliciting a set of behaviors that are not common and that are not regular for people to do. And so it's not going to be surprising when we see stuff that comes out. It's maybe not surprising that people love to feel creative and they love to feel ownership over what they produce and they love to share it with others. If you want to be

Starting point is 00:15:21 a little bit more reductive about it, they love to feel famous. But I think it's not the same way that famous people are famous. It's a little bit different. And so we've seen that people will spend a lot of time in front of their computers enjoying making songs. This is really cool, and it is different from, I think, the way music is done now. Music is done now, sometimes painfully, but only in service of the final product. And I think when you open this up to people, sure, you definitely care about the final product, about what the song sounds like on the other end, but you also really cared about the journey and that people will really enjoy making music, regardless of the final product.

Starting point is 00:16:01 product. And I can tell you, you know, personally, uh, the most font I have ever had doing music is playing music with friends, jam sessions, even when you're not recording. And I think there's something that's like very, um, very akin to that, that we are able to open up with some of these technologies. It's such like a magical experience. And I, um, I feel like everybody should, should feel some of that like joy of creation with other people. Maybe you already see it in the product, but are you imagining that you get that collaboration? joy from like or you know the creations joy working with yourself feeling like you are more skilled you're collaborating with AI with Suno or are people jamming do you see like mix tape like

Starting point is 00:16:43 sharing behaviors today you can talk about we see all of that which is super cool like a video game music is fun by yourself and maybe more fun in multiplayer mode and so we see people enjoying this by themselves but we see people basically hacking multiplayer mode uh into this in in lots of fun ways where you can have people co-writing lyrics together, trading off words, trading off verses, I'll write the verse, you write the chorus, or I'll write the lyrics and you pick all the styles and I'll make a song and then I'll send it to you and you'll make a song back. And so it's not surprising. I think humans really evolved to resonate strongly with music and want to do music together. Every culture basically has music. And so it really shouldn't be

Starting point is 00:17:24 surprising that we see all of this. But it is really fulfilling from our perspective because it really brings people together. It makes people smile. I don't pretend like we're here in cancer at Suno, but it is really cool to make a lot of people smile. One of the things that you and I talked about previously was in creation platforms, you have like a very skewed ratio in general. And there varies by, you know, what the platform is of like creators and people who are listening, absorbing, viewing, whatever, right? There, of course, are a lot of people who make music today. But you listen to the creations of a relatively few number of people, right? How much do you think that changes with something like Suna?

Starting point is 00:18:08 I think a lot. I will say, you know, I'm speculating here. It's still super, super early. But I think of us opening a few important avenues. The first is, I guess, all of the sort of smaller niche micro-sharing that is possible where we can make songs that the three of us are going to listen to. because it is capturing a moment that three of us had the same way we might take a selfie. And that is sharing dynamics that just are completely absent in music right now.

Starting point is 00:18:40 Let's do it. Let's do it. Sorry to interrupt you. Okay. I need some genres. What should we make a song about? My favorite genre, but I don't know that it's supported yet as funk, P-H-O-N-K. But it made me too obscure.

Starting point is 00:18:55 Okay, that'd be very exciting. No, I think we can do some. But let's do some hybrid also, like, I don't know, a song. Reggae song? How about some like, yeah, or like Hawaiian R&B? Ooh, Hawaiian. You want to choose like an instrument to add in there? Yeah.

Starting point is 00:19:12 You said Koto before. Koto? Or sitar. Something random. Satar. Satire sounds cool. Yeah. I have heard a lot of really good sitar trap on Suno.

Starting point is 00:19:23 Yeah. Goes really well together. That's my second favorite. About. Okay. of priors in statistics. We have no priors here. Let's see how we do.

Starting point is 00:19:35 Just learning from the world. Ground up. I've learned a lot. I've learned a lot of new genres since starting this. What's your favorite new genre, by the way? They came out of that. Gosh, there's some recency bias here, but sitar trap is freaking fantastic. Yes.

Starting point is 00:19:52 That sounds good. That sounds good. Tryers in the game It's all about Pobility Got my Hawaiian shirt Feeling real fly you see Safe old day to fly you see

Starting point is 00:20:08 Try the other one Like ways on the shore I'm got a deep in It's no probability's the Lord But since I'm strong Spreading with the fog Meats as I am a modest data Try the other one

Starting point is 00:20:24 We should just change this to the intro For No Priors going forward I'm trying to fly In a statistical paradise With beats of that price in the game So about probability I'm a fly

Starting point is 00:20:43 To see a bit of Throwing like waves on the shore I'm diving deep And swarm abilities galore The Stata strongs Lending with the phone beats I like that It's fun. We're going to have to, like, get an image where I don't even own a Hawaiian shirt, but we're going to have to get an image where we're all, like, wearing the Hawaiian shirts. Spin these. The Palmer Lucky. Yeah, yeah, fine. I'll just get Palmer to do it. Um, uh, I'm maybe the only person who does this with Suno. But every time, like, I create a song, I imagine what the artist would look like that creates the song. I'll, like, visualize it where I'm like, it's the big dude with the Hawaiian shirt and he's got the sitar with him.

Starting point is 00:21:27 I love that. I will tell you, you know, one like very cool and unexpected thing we saw is we shipped a very simple feature of you can edit your song title. Maybe you fat fingered it or something. And as soon as we did that, people started to put their names in their song titles when they hit our trending page. And people, people like to feel good about their creations and you should know. In hindsight, it's obvious. And people will hack your product and tell you what they want out of it. Just one thing back to your point before, though, Sarah, I think, we talk a lot about, you know, how asymmetric the creation versus consumption is on different platforms, and TikTok is famously very creation heavy, although still most of TikTok is consumption. And I think these set of technologies have the ability to skew that much, much farther, because the creation process is so enjoyable. But I actually think if we do this right in the future, these are not going to be the terms that we use to describe what we're doing. going to say I'm creating or I'm consuming. These things will bleed into one another. We'll have a lot of lean-in consumption. We'll have a lot of lean-out creation. And I think we will eventually

Starting point is 00:22:38 decline to draw the line of how many people are creating, how many people are consuming. And we'll just say, like, people are enjoying all of this music stuff. That's a really interesting vision of the future. I guess that has pretty deep implications as well in terms of how you think about music, the music industry, how it permeates society. Do you have a view in terms of what all this looks like five years from now? If we are correct that there are just modes of experience around music that people don't have access to, that we can get a billion people much more engaged with music than they are now, that just in terms of the number of dollars or the amount of time people are spending doing music,

Starting point is 00:23:18 both of those are going to go up dramatically. That I feel quite confident about. The exact nature of how this looks, I think, is up for some more debate. So this is just an opinion. I don't, because music is so human and so much emotional connection involved in it, I don't really see people losing connection with their favorite artists at all. In fact, if you labor around music and you understand the process, you feel a much deeper connection with the artists that you love.

Starting point is 00:23:50 Another thing I think is likely to happen, if we look at like the last wave of technologies to enter music, let's say the DAW, this really accelerates how quickly music can change and how quickly culture can change. You know, music is really just the reflection of culture. And I think the way that happened is the DAO really let a lot of people start making music who could never make music. You could do this from your dorm room if you had a good pair of headphones and you had a good ear and you were willing to put in the work to learn the tool. And I think if we can give this to so many more people, yes, a lot more people will create. A lot more people will become tastemakers, but the rate at which culture changes the rate at which the styles of music

Starting point is 00:24:32 changed, the rate at which new styles of music are uncovered is likely to go up a lot. And I think even if you were just going to only ever listen to music, which some people will, that will get so much more interesting. Things are going to change so much more quickly. You will not have people really, I think, cribbing off of one another in the same way. So I'm really excited about that. just because not every listener will like mix a DA, like a digital audio workstations like Ableton or something, right? Like it's, you know, you can generate music, put it on a timeline and create sound as, as Mikey was saying, in your dorm room, in your apartment cheaply. That was pretty revolutionary when it turned out you didn't need, you know, a $500,000 SSL mixer and a staff of 10 people to cut an album. that was that was really revolutionary and people made tremendous contributions to like our collective

Starting point is 00:25:27 culture when that happened like there were there were 15 year olds who got discovered that and that was extremely rare before that I actually think it's really like an untold story I'm not the right person but somebody with like really rich musical history understanding should like explain what happened with digitization of music we're like ah I have like infinite set of like every snare drum sound in the world I can think of just the ability to completely unconstrained on, as you said, something that's much cheaper than traditional tooling, where you don't need to know how to play any instrument. And now I think of some of what Suno is doing is making the assembly of that, like, another magnitude easier. I think that's right. There's one other thing

Starting point is 00:26:11 that I'm really excited about getting unlocked, which is that if you look at the last 10 years of music. A lot of the changes are, let's say, sonically, it's like interesting sounds and maybe slightly less so evolving how interesting songs are. And it's a function of the technology that got unlocked, like a lot of digitization of things. And I'm actually really excited for the opposite. Like, AI is certainly able to produce interesting sounds that we've never heard before. But putting these tools in people's hands, we can unlock song structures and chord changes and borrow different styles and mix them with other styles

Starting point is 00:26:50 and make stuff that is not only sonically new but kind of melodically new. And I think that has the ability to really keep people listening to stuff and on my most optimistic days, I'll say, on TikTokify music, like get us listening to stuff for more than 30 seconds at a time.

Starting point is 00:27:09 Maybe I'm a little bit naive and optimistic, but I think it's very possible. Yeah. Okay, before we wrap, Like, I played a song at the beginning. We made a song. You got to play one that's your favorite. That's a creation.

Starting point is 00:27:20 Oh, that's a, let me find it. I'm tempted to play a song that's at the top of our showcase. And it's by an artist called Oliver McCann. It's got a lot of plays. It's a really interesting song. It is certainly the public's favorite. So I can play it now. Oh, my love, my friend, you know, it's been a while without thinking of you, but the thought makes me smile.

Starting point is 00:28:04 I'm so tough wanting. But what am I to do? I need some space to breathe. So give me some room. Oh, my love. It's unbelievable. The amazing thing about this, by the way, which, you know, just for a listener's sake,

Starting point is 00:28:39 is the vocals are completely mission-created. The music is completely mission-created. The lyrics are machine created. And so this truly is a synthetic song, which I think is pretty amazing. Yeah, it certainly is easy to lose sight of that fact when you do this day in and day out, but it is incredible.

Starting point is 00:29:00 I'll say one step further, the machine doesn't know that there is even a concept of voice. Like, it's just all sound, and somehow it's able to produce the sounds that we have been evolved and acculturated to, resonate with. And so all of that makes me think I have the coolest job in the world. Not bad for a quantum physicist, a failed one, I guess. Exactly. Mikey, how big is Suno? It's obviously very popular now. You're growing the team. What are you looking for? Yeah, we are always,

Starting point is 00:29:31 we are always on the hunt for the best people, people who love technology, people who deeply love music, people who are excited about bringing more music to the world. We're hiring in primarily the post, Cambridge, Massachusetts, or New York, come drop us a line, careers at suno.com. Great. Well, thank you so much for joining us today. I think we covered a lot of great things. I had a great time. Thanks so much for having me. Find us on Twitter at NoPriarsPod. Subscribe to our YouTube channel. If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

Starting point is 00:30:14 Thank you.

No Priors: Artificial Intelligence | Technology | Startups - Music consumers are becoming the creators with Suno CEO Mikey Shulman

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.