Big Technology Podcast - Spotify's Plan For AI Generated Music, Podcasts, and Recommendations — With Gustav Söderström

Episode Date: November 13, 2024

Gustav Söderström is Spotify's co-president, chief technology officer, and chief product officer. He joins Big Technology Podcast — as we debut video episodes on Spotify — for discussion of Spo...tify's approach to AI generated content, algorithmic recommendations, and more. Tune in for a deep conversation covering whether Spotify wants AI-generated music and podcasts on its platform, how it can lean on AI recommendations to enhance discovery while sustaining human choice, and its long term AI vision. Stay tuned for the second half where we discuss Spotify's plans for podcasts, audiobooks, and other new formats. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. For weekly updates on the show, sign up for the pod newsletter on LinkedIn: https://www.linkedin.com/newsletters/6901970121829801984/ Want a discount for Big Technology on Substack? Here’s 40% off for the first year: https://tinyurl.com/bigtechnology Questions? Feedback? Write to: bigtechnologypodcast@gmail.com

Transcript
Discussion (0)
Starting point is 00:00:00 Spotify's chief product officer, chief technology officer, and co-president joins us for a deep conversation about how AI is changing the music industry, podcasts, and audiobooks, from recommendations to synthetic content. That's coming up right after this. Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond. We have a great show for you today because we're sitting here in Four World Trade Center, Spotify's New York City headquarters with the companies.
Starting point is 00:00:30 He's co-president, Chief Product Officer, and Chief Technology Officer. Yes, all that in one. Gustav Sotom, Sotom is here. Gustav, great to see you. Welcome to Big Technology. Thank you for having me, Alex. It's a pleasure to be here. Great to be here.
Starting point is 00:00:43 I mean, we're in a beautiful studio in your office. I've been looking around. I just can't believe how amazing the studio is. And also, it's cool for me to be sitting here with you because I'm using your app every day. And Spotify is the place where I touch some of the most, I wouldn't even call possessions because I'm subscribed to it. but one of the most beloved experiences that I have, which is music and so many of us use Spotify all the time, but we hear from you guys rarely, so I do appreciate the opportunity to speak with you.
Starting point is 00:01:10 Me too. I appreciate that. I'm very glad to hear that, and I'd love to share as much as I can about how Spotify actually works. It's sort of a passion of mine to try to explain things and how they work, so I actually love these podcasts. In some ways, an app will determine how people experience a format, but in some ways, a moment in time will determine how an app has to deal with the content within it. And Spotify's going through both of those, both of those regard artificial intelligence. I don't know if you've heard of Suno. In fact, I'm sure you've heard of Suno. It's one of our favorite things to use on Big Technology podcast.
Starting point is 00:01:48 Ranjo and I, we do this show on Friday. We built a theme song with Suno and played it and it was a good time. I'm curious from your perspective running product that's Spotify, how do you feel about AI music, AI-generated music? Because the songs, they're not amazing, but they're good, there have been some big hits. Do you view this as an opportunity, a threat? Do you want it on your platform? So the way I think about, I'm a technology, so obviously I'm very excited about the technology itself and I love AI. I think it's a super impressive product.
Starting point is 00:02:31 It works amazingly well. And it's philosophically, it's very interesting that something we thought was impossible just a few years ago, that a machine could sound like something a human did, can be creative. Legitimately incredible. You prompt it and how it comes a great sounding song? It is incredible.
Starting point is 00:02:50 So I think that technology is amazing. Now, my interest is to think of these technologies as tools. So if you think about music, it's going through a journey of more capable tools. If you go way back, if you were a musical genius, like a Bach or someone, you literally needed access to an orchestra to be able to realize that genius. Even if you could play multiple instruments yourself, you couldn't play them at the same time. So you actually needed like an orchestra. And then we got to record a music and you could record one instrument at the time,
Starting point is 00:03:22 so you got more and more independent. And then somewhere around the 80s, the synthesizer came along and meant that you didn't have to be able to play all the instrument yourself. You could sort of quote-unquote fake the drums using the synthesizer and the guitar and so forth. So I think there's been this progression of more powerful tools that enabled more and more creativity. And then somewhere in the 90s, the DAO, the digital audio workstation came along. And being a Swede, we're proud of this, someone like Avici came along. And what is interesting with the Vichis, he was not very proficient at any one instrument or a singer. So in a previous world, he would not have been considered a very creative person because he couldn't realize that.
Starting point is 00:04:05 With access to this tool, the digital audio workstation, it turns out he was one of the most creative people we had that we are very, very proud of. So for him, the digital audio workstation was, as Steve Jobs would say, a bicycle for the mine. It meant that he could get more productive and he could express his genius. And the big question with this next round of tools is the same. Is it amplifying creativity or is it replacing people? And I think it's amplifying creativity. It is giving more and more people the access to be creative. You need even less motor skills on a piano or something.
Starting point is 00:04:42 You need less technical skills than a digital audio workstation. So I think of them as tools. And I think there's this interesting question on what is AI music? I think people say AI music and they mean something that was prompted with like not too much of a prompt and not too much works, like 100% AI. But the truth is that much of music being made today is a combination. I think many of the big artists are using AI for parts of their songs, parts of the track, with the drums, etc. So I think there's actually a scale between zero AI and 100% AI. And I think we're on this progression where it's actually going to be very difficult to say what is an AI song.
Starting point is 00:05:22 Does it have to be 190%, 70%, 50%, but the real question is, do you welcome this stuff on your platform? Let's say somebody does prompt 100% AI. Spotify could fill up with songs that are AI prompted. It's very easy to create these songs and then upload them to the Internet. How do you feel about those? Do you want them? So there are two questions there. One is what is Spotify about?
Starting point is 00:05:45 We're a tool for creators. And if creators want to use AI to enhance their music, as long as we follow the legislation and copyright laws, we want them to be able to monetize their music and pay out. So for us, we are trying to support creators. And the music catalog has grown tremendously since we started from tens of millions of tracks to hundreds of millions of tracks, and I think it's going to keep expanding.
Starting point is 00:06:10 but what I think is important for us to figure out that I think is our job and the rest of the music industry is if you go back to the years of piracy, there was this technology called peer-to-peer and file sharing. You worked on that early on. Exactly. We actually incorporated that technology into Spotify, but before Spotify, the technology sort of preceded the business model.
Starting point is 00:06:33 It was great for consumers. They could now get all of this music for free, but it didn't work for creators. And I think we're in the same period of time now where the technology has preceded the business model. So I think the technology is great. I do think we need to find a way for the creators of participated in this to be reimbursed. So that's something that we are thinking about and the rest of the industry is thinking about. If we can find a business model, I think we could unlock a tremendous amount.
Starting point is 00:07:04 So there's a separate question, which is then these models would. the way they were trained, will that be considered legal or not, which is a legal question that is being decided on some time period, for example, in the US. These companies are now sued. So I think that question will be decided about legislation, but let's assume that there is one of these models, whether it has to be retrained on other data or not. Is that an interesting tool for us if it was trained legally? Yes, if creators can participate in it.
Starting point is 00:07:33 So first of all, it's good to hear that you're already thinking about issues of compensating creators, musicians, because, you know, I write text in addition to podcasting, and I know that models have trained on my text. And previously, I'm not going to see a dime on that. It's a little different, right, with music. But yeah, if you can channel different musicians, there should be, I think, some renumeration. But I'm going to just ask one last time on this point, then we're going to move on. So meta, for instance, they have AI generators. The feeds have, I won't say filled but there's lots of AI generated images they're engaging meta seems to be okay with this it doesn't ban it and now some of the top content on a meta platform is shrimp jesus which sort of
Starting point is 00:08:19 combines like two of people's great loves which is god jesus and seafood and i've seen that yeah it's it's massive these type of images are massive on meta so from a spotify perspective if these songs generated by AI music generators become engaging. And let's say they follow the rules. It's like good for Spotify. Well, I think like this. If creators are using this, these technologies,
Starting point is 00:08:45 they are creating music in a legal way that we reimburse and people listen to them and they are successful, we should let people listen to them. I think what is different, though, I don't think is our job to generate that music instead of the creators, right? That's a key difference. Are we as a platform for creators?
Starting point is 00:09:01 And then we can have a discussion on which tools are they allowed to use. Like they could use the other workstation, but not an LLM. Maybe that's not actually, we shouldn't decide that for them. But there is a question, should we generate all the music ourselves? And that's where we're saying, no. We're not going to generate that music. And other platforms maybe will, because it's cheap content, right? So that's the key difference of we decided what we want to be in this world,
Starting point is 00:09:22 and it's a platform for creators. Then there's a question of which tools they are allowed to have, which is partially a legal question, and partially up to the creators, I think. Okay. So there's a potential world where one of these tools seems to have violated copyright, and you might ban creators from uploading music that have used that tool. We're already taking, if we get, we have detection systems for if you are, if it's a derivative of work of something that already exists, so we have systems to take these down.
Starting point is 00:09:51 If you're creating something completely new, that isn't a derivative of anything, there isn't a copyright infringement, then the labels tell us. So that's the other question on, like, what are these models trained on? And we're not creating this model. So we're watching what happens there and we're going to follow the law. But I think from a high level, this should be a very exciting tool for creators, for musicians, for authors, for podcasters. I think if you look at something like Notebook LM, for example, it's actually created by a journalist and a writer as a tool. So I think my bet is that these are bicycles for the mind, but sort of bicycles for the mind on steroids.
Starting point is 00:10:32 Right. And when those shifts happens, there is always tension between the people who didn't use these tools. It feels like this is a little bit like cheating. And the people are saying, like, no, I want to be creative too. And it's always a different, difficult transition period. It's just the story of technology. And by the way, we're going to get to Notebook LM in a bit. So I definitely want to hear your perspective on that.
Starting point is 00:10:53 But let me ask this one. So first of all, what you're describing. is just sort of like this is what happens in tech companies you think you have something figured out and then next thing you know new innovation you have to account for it's kind of what makes it exciting that it happens and you already have addressed where this is going which is do we get to a place where remember you started talking about this saying we never could have anticipated that this is possible and now it's like feels like magic prompt and you get a song out and I called them great earlier. They're not great, but they're good enough. And this is literally first generation of
Starting point is 00:11:31 this stuff. It's going to get better. And as you think deeper about it, do we go to a place where you can start to prompt music that is going to be better than any song that you might listen to that has been created for certain moods? For instance, like, let's say you're in like a introspective mood or in a loving mood or in an angry mood and you're just able to prompt and create that song that perfectly touches the heart at that moment. And I started off talking about how this format is beloved, music is beloved, it touches the heart. And if AI can do that, does that become the future of music? So you've already said you don't want to play in it.
Starting point is 00:12:11 But is that something that you can discount from coming in? So I think two things. Music is used for many different things, right? And so you have, for example, music that you're using to study. I think it's a good example. The extreme version of that is people listen to white noise. So, like, would white noise be generated? It's actually already artificially generated.
Starting point is 00:12:32 It's one of the top podcast formats on Spotify. So there is a scale here, and I think you're right, for certain things, maybe you could create better white noise, maybe you could create better, you know, always varying ambient music for your studying, maybe for gaming, maybe that music should automatically adjust what's happening on the screen. So I think we're going to see lots of AI-generated music for those, use cases. But there's another use case which I think is very important. A lot of people use music to build their identity, especially when you're a teenager. You go to a concert, you buy the
Starting point is 00:13:05 jacket from that concert. Why did you buy that jacket? It's like a pin you're identifying with this band. You're building your own identity through this band. I don't think that will work with AI generating music because there is no one behind it. So I think some music, and I'm sure this is happening already. I'm sure many publishers are generating music for for coffee tables and so forth. That will probably happen. But I do think the human need for for having someone to believe in an actual artist that you care about, I don't think Taylor Swift will be replaced by an AI, not because the music couldn't sound similar, but because the whole point is Taylor Swift and belonging to something. So I think it's not a it's not a binary answer. Like is this going to
Starting point is 00:13:49 happen or not? No, it's not going to not going to happen. I think both both will probably happen. You know, two years ago, I might have fully agreed with you that there's always going to be that need for the story and the human connection. And now I'm not so sure. Because... Because I do think that this stuff can be good enough. It's already proven that it's already exceeded some of our greatest expectations. And I think we would like to think that we want that connection with the human. But all right, let's go right into Notebook L.M.
Starting point is 00:14:22 But I think one thing to... say that I think it's interesting is what tends to happen in these worlds is that the thing that is scarce gets even more valuable. So one bet would be that true human connection gets more valuable than ever when a lot of what you talk to in the future may be LLMs. That would be my bet. I'm hoping that's the case because part of the business that I'm running is predicated on the idea of connecting to a human who can sort of dissect and break stuff down is valuable. So I'm hoping that is the case. So, but I also, I'm not as sure as I used to be. And I think it's wise to not be sure of anything right now, given the pace of progress. And I think that brings us right
Starting point is 00:15:02 into Notebook LM, which I was planning to leave for later, but you set it up perfectly. And it's this Google product that you can put notes in and then it will actually generate this podcast with two co-hosts that sound like ridiculously human. Yeah, they don't. They don't sound like robots and in fact people have sort of like fed them scripts where they like realize that they're actually not real people and their AIs and they just have this kind of breakdown and it's insanely entertaining but the bottom line is and they're not quite where they need to be they're still a little hokey I think and just kind of they're like if you listen for a minute you're blown away if you listen for five minutes you start to cringe but they also do a good enough job of breaking things
Starting point is 00:15:47 down where they can pass and I started to see them right now showing up in the second half of episodes where people are like we're going to do the episode and in the second half we're going to give you the AI to listen to. But what happens if they end up being the first half and Spotify's made a big move into podcasts? What do you think about the rise of these AI podcast hosts? So I think Notebook Elm is very impressive and you know you could predict given the evolution of voice quality of these things and understanding of a language model that this would happen. So I'm not at all surprised in a sense that you can generate audio that is engaging to listen to talk audio. But what I think was a great innovation of Notebook LM was that
Starting point is 00:16:32 people generated monologues and what humans really respond to are dialogues. And in retrospect, it's pretty obvious like almost all podcasts are dialogues. Like if I sat here for one hour, it's not that interesting. So I think the big hack was to go through a piece of material and present it as a dialogue and prompted the right way. There was also obviously the internal Gemini model at Google that is probably very good and the voice models got better. But I actually think what they found was product market fit for the actual audio format.
Starting point is 00:17:02 And it turned out to be the podcast format quite literally. It's pretty crazy. I mean, somebody on threads tacked me and was like, the male voice sounds like you. And I listened and I was like not the same tone, but also the cadence and the type of questions. Like, does that mean that I'm just like the blend of all different? I like this, like, you know, kind of the unremarkable middle of this?
Starting point is 00:17:25 Or did they copy my voice? I'm hoping it's the second one. It'll be interesting to see if people either get tired of hearing the same two people talk about everything or the opposite. They get used to the same two people and would prefer to hear the same to build trust. I don't know. I think humans are very quick and prone to sort of anthropomorphize. and it's sort of a hack on our human brain.
Starting point is 00:17:48 So you feel like you know these people because you heard them talk about so many things now. So I think it's very interesting. It's hard to predict where we'll go. As a platform, we view it the same way. Of course, people are uploading these podcasts to Spotify as well. And I don't know from the top of my head if anyone has super high engagement,
Starting point is 00:18:08 but certainly people are listening to them. So it's the same question. Does this turn into a tool for creative people? who can write stories but don't want to have the podcast around it or just have no one interviewing them so they just do an interview around their own material um i don't think i think you're going to run into the same problem where if you just ask it to talk about something it's not going to be very good you need a good source material so it's the same question is this a tool for creative people to get even more productive and creative or is the replacement of creative people my bet is
Starting point is 00:18:40 it's another tool it's pretty interesting because it sort of broadens out the long tail and And for those not familiar with the industry jargon, it's basically just that like a lot of listening is concentrated in a small amount of shows. And then there's this great long tail, right? Like if you think about like a bar chart as it just sweeps out and there's lots of, you know, seldomly listen to shows. Yeah. And the thing about these podcast generators, notebook LM in particular, is you can take it and create a podcast for something that's so niche that you would never have a show similar with. AI code, right? You can start coding things. I think you spoke about this in your interview with Tom McCone on building one, another LinkedIn podcast network show, where now you'll code
Starting point is 00:19:25 things that you would never code before because you can do it. And it's similar, it might go the same way with podcasts where you can, for instance, before I was heading down to Manlo Park to interview Andrew Bosworth, I just dumped in all my source material and it read me, I created a podcast about like his current statements there was like seven interviews that him and zuck did before i showed up there and i was able to get the summary that podcast never would have actually made sense to produce but for me it made sense and maybe that's where this goes yeah i i love that framing like one useful framing i think of these techniques is is a financial framing like the cost of something goes to zero like the cost of writing code goes to zero cost of doing a podcast goes to zero cost of prediction goes to
Starting point is 00:20:09 zero, what happens? And usually what happens is the alternatives to that good, they get challenged, but the compliments to that good. You know, you have the famous like, what if the price of coffee goes to zero, then the tea is going to be replaced, but sugar is a complement is going to explode. So I like that way of thinking about it. And I think what's going to happen is exactly what you're saying. We're going to have enormous amounts of content around niches where it didn't make sense to produce a podcast. So one way to think about it is just like the cost went to zero. So I do think that the catalog is going to explode.
Starting point is 00:20:48 And then what does that mean? Well, it probably means that the recommendation problem becomes even more important because now it's even harder to keep track of everything that is uploaded. I also think that if you have this like vast sea of the perfect sort of discussion around any topic, the recommendation problem becomes more valuable to solve. bigger the catalog is. But I also think you're going to see the same thing as we see in music. The superstars will actually also get bigger. This is what I find fascinating. People say like are Netflix winning or YouTube. Well, the truth is both. The tail is getting bigger, but the shows are
Starting point is 00:21:22 getting bigger. And they're saying, are the Indus winning or Taylor Swift. Well, both. Indus are winning, but Taylor Swift is bigger than ever. I tend to see like these both things happening at the same time, which is why I'm hesitant to like say like that is going to happen. Right. But not this. Yep. Okay, let's talk about AI recommendation. It's a big part of Spotify. And we're going to just start at the end for this conversation because your vision eventually is, so right now, like we'll go into Spotify, there'll be some algorithmic recommendation, there'll be some stuff that we listen to. Your vision, if I have it right, is eventually you want Spotify to be sort of this ambient friend for us that knows this context of the situations where in maybe AR. We're just talking
Starting point is 00:22:04 about Orion Glasses before we start recording, but maybe they know the context of where we are and can chime in and give us, you know, an example of type of some music that we might want to listen to. Is that right? Why would we, why would you be pursuing that? Well, I do think of, so when we started Spotify,
Starting point is 00:22:23 I was not part of funding Spotify, I joined in 2008, late 2008, 2009. Sport was found in 2006. It was pretty early on. And it's interesting that, this was before machine learning became a thing and so Spotify was quite focused on social features for purposes of recommendation we needed social features because that's how most people discover music through a friend so we wanted it to connect to people and then AI came along or what was
Starting point is 00:22:50 called machine learning back then and we realized that through all the playlisting data we had which is basically one we think about the playlisting data is almost as labeling for the user They are creating a set for themselves. For Spotify, they were saying, like, these tracks go well together. These tracks go well together. So we got a lot of label data, basically. And we said internally,
Starting point is 00:23:12 now some people have a musical friend that happens to know their taste and so forth, but most people don't. So now we can build this friend for everyone. That was the AI. But the interesting thing is that thing of like building a friend for everyone that can give music recommendations like Discover Weekly, it was always an analogy. People did not think of Discover Weekers and thought of as a set, as a service and so forth.
Starting point is 00:23:36 I think what's happening now with AI is that the analogy is actually becoming reality. And so you can see us moving a little bit in that direction. You have the AI DJ that starts to give Spotify voice that talks to you. And I think what is going to happen with these LLMs is, at least for some brands, you will start having literal relationships with them. And I would love if it is the case that you think of Spotify as actually a friend, actually a friend, not an analogy anymore, but reality. This is a person that, this is a thing that knows me well.
Starting point is 00:24:06 This is a musical intelligence, a podcast intelligence, a book intelligence. And actually like hearing it, you know, tell me about new things and suggest things I'm interested in. So I think that's, that is where we're moving. I think other brands are moving there as well. I think if you, if you look at someone like Duolingo, they've actually only communicated through four characters all along. When you get a push noted, it's not from Duolingo, it's from Lily or Saur or some. They really, they give me a hard time if I'm away for a couple hours. And that was also kind of an analogy, but now with AI, you can actually talk to these characters.
Starting point is 00:24:41 So I think this is a journey many companies are on. And it's interesting to play that out. It means a part of what was called branding before is like, what personality do you want your company to have? Not as an analogy, but literally, what personality should spot if I have. I think that's fascinating time to work in tech, and it's something we're thinking a lot about. And I think that you might be underrating how much people view Discover Weekly as a friend. Now, for folks who don't use Spotify, Discover Weekly will basically take into account you're listening and your preferences and give you a playlist of, what, 30 songs on a Monday morning.
Starting point is 00:25:15 And they're just new songs for you to discover. And people will be like, Discover Weekly really got me this week, or Discover Weekly isn't inflecting some pain on me this week. happened. I thought we had a close relationship. And now you don't owe me at all. And you also have, so you have this AI DJ. It's, you can find it in the app. It's okay, I think. There's definitely, I'm curious, the feedback I've heard is people were excited about it initially and have grabbed, we've moved away from it. And what is, so now I'm sitting in front of the, you know, the person running product at Spotify. What is actually happening with this AI DJ? Is the experience there and are people using it? Yeah. So in the numbers, they're not moving away from it. It's actually very successful. So my friends are just pretty snobby music listeners. Well, for the people that use it, it's actually their biggest set. It's bigger than their Discovery Weekly usage. So it's quite a binary experience. I think it's for people who don't know what they want to listen to and just want to put something on,
Starting point is 00:26:12 it's working very, very well. What I would say, though, is when we launched the AI DJ, the big innovation there was that we managed to basically digitize a voice of a real person to make it sound very believable. But the things that it's said around the music were, like, to some extent, heuristics and kind of repetitive after a while. So what we've done since then is we've invested quite a lot in, this is quite recent that is rolling out in LLMs that actually tell interesting stories about the music.
Starting point is 00:26:43 And we see very strong effects on this, on the retention of the application. So whereas the thing you used to say, here is this and this song from this and that, I think you like it. Now I can say things like this artist was just in Copenhagen or has played here and here last week. You're starting to get interesting stories. We're starting to feel more personal. The other thing that I think is missing that I hope we can do someday is it can talk to you and you can talk back by skipping. But obviously in the age of like talking to machines, you would like to be able to just talk to it and say like, no, this was not very good.
Starting point is 00:27:19 My Discovery Weekly this week was not what I wanted and give actual feedback. And that is technically very possible now with these LLMs. So that's what I'm hoping will happen. This should not be a one-way relationship, which Spotify has been for technical reasons. It should turn into a two-way relationship. Okay, I have questions about that coming up.
Starting point is 00:27:38 And to introduce that segment, I want to talk to you a little bit about how much we should allow the algorithms to dictate what our music experience and podcast experience is going to be versus how much should be dictated by us. How much agency should we have over our own choices?
Starting point is 00:27:59 Kyle Chica, New Yorker reporter, recently wrote about how he's leaving Spotify. I'm just going to put the argument out there and hear what you think. And I'll just read it straight from this story. He goes, through Spotify, I can browse many decades of published music,
Starting point is 00:28:13 more or less instantly. I can freely sample the work of new musicians. It has become aggravatingly difficult to find what I want to listen to. With a recent product update, he says, it became clearer than ever what the app has been pushing me to do, listen to what it suggests, not choose my music on my own. What do you think about that argument? Well, I think this is an individual feedback, but I think generally you have very different types of users. So I'm going to get this person back on
Starting point is 00:28:44 Spotify, 100%. I think there's an interesting tradeoff here that is real. So people want less friction. They want to spend less time searching. You want to make things as easy as possible, right? But there is this end of the line where you sit there and you just receive. You're kind of forced fed and you don't give any signal back, maybe a few clicks and so forth. And that's something that we want to avoid. I think this is where the industry is going. It's going more towards distraction content and sort of just sitting and receiving. And it's a little bit of a dystopian end of the line there. So what is interesting with Spotify, which we are re-emphasizing,
Starting point is 00:29:24 is that it was actually a platform where you invested quite a lot in your own playlisting, right? And there is a trade-off here between if we, you could have a division as we should be so good at machine learning that you should never playlist again. That would be the goal, because then you've done the user, a great service supposedly. But then you also receive no signal and the user does no investment.
Starting point is 00:29:46 So we're actually reemphasizing playlisting quite a lot, your own investment. And over the years, we've gone more towards machine learning and algorithms because it works. People listen more and they appreciate the service more. But we need to cater to everyone, including this reporter. So the Spotify user base is divided into many different kinds of people. You have the track listeners only listen to playlist, you have the hardcore album listeners. It's like, I just want to listen to an album the way the creator thought about it. I don't want the songs in between.
Starting point is 00:30:21 You have, like, the artists, radio listeners only listen to one type of artists. And it's actually a big challenge to build a service that serves everyone when people are very different. So we try our best to make sure that the sort of music aficionados who want their library to be album, album, album, can have their service. but then you have the other people who just want like I just want my daily mix to play in my air I don't you know I just want to collect tracks they also need to be successful so we're trying to build and cater for both you can never please everyone 100% but we're trying to be statistical about it to make sure that it is it is vastly better for the majority of people but our goal is to cater to everyone and I do think there's a real point around going
Starting point is 00:31:12 to zero user investment seems good in the short term, but I don't think it's good in the long term because you actually lose signal from that user. And at the end, I think they feel less participatory in the experience. Even if the engagement looks high, if you've done no feedback, I don't know how much you feel this is actually your service. Definitely. And look, I'll confirm that Spotify does listen to user feedback. I sent a tweet out a couple of years ago talking about how like sometimes I'm baffled
Starting point is 00:31:39 by the Spotify product decisions. And I mean, maybe it was because I was a reporter, but someone from your team reached out. And I talked about how I wanted to see recently played. Like oftentimes I'll be listening to something, and then I'll go away from it, and I can't find it in the app. And then a couple months later, there's a recently played button in the app.
Starting point is 00:31:57 There are some great updates coming for you as well on that topic, because this is a big user need. Maybe it takes a little bit longer than we want. But obviously, our goal is to listen to user feedback. but we get very sometimes really completely opposing use of feedback that's the tricky thing who do you listen to the most people who want this desperately or hate this desperately and there's a lot of both types of feedback so it's product development at this scale is sort of a statistical experience but you still have to have a bit of an opinion if you only treat
Starting point is 00:32:27 as statistics the application is going to be very weird right at the end of the day so you have to combine some sort of vision and conviction but you have to be still very data driven I think I think an interesting example of user investment and AI that we launched recently is something called AI Playlisting. So this is, I think, a good example of like the first time you can talk to Spotify. So the AI DJ talks to you and it's getting better, but it doesn't listen. It listens to clicks maybe. But with AI Playlisting, we built this experience where you can prompt what is an LLM with
Starting point is 00:33:04 what kind of playlist. So if we have an LLM and the LLAMs have a set of world knowledge about music, but then we have the music catalogue and we have your listening history. So this is an LLM that understands your particular taste. And you can ask it for a playlist with, you know, big drops and EDM for driving fast at night or something. And then it will try to do that. And then you can say like, no, a bit more upbeat or not that artist and so forth. And this, I think, is a good mix of using AI, but not to force video stuff. it's actually a very high signal
Starting point is 00:33:36 you are literally telling us what you want and then when we say here it is you say that one yes no no yes and then you can repromp so it's back to I think it should be a two-way conversation and I think the first wave of machine learning allowed us to do the one-way push
Starting point is 00:33:51 the next wave generative air allows us to actually listen to you even in clear text so communicating with Spotify just through skip buttons it's a pretty narrow signal so it's kind of hard for us to understand like when you skip it was it because you hated it or because you liked it,
Starting point is 00:34:06 but it was too many times. Now you can actually say, like, I really don't like this because like remove it. So I was DMing with Kyle last night as like, hey, I'm going to meet with Gustav. What should I ask him? And one of the things he said is should Spotify users be able to tweak the recommendations?
Starting point is 00:34:21 And your answer here is resounding yes. Absolutely, absolutely. We are working on these things, both the obvious things where you can say like, I didn't like this particular thing. But I think the free text element is very interesting. If you could talk to it, you'd probably it would learn much more, but you would probably also get more trust. Definitely.
Starting point is 00:34:39 Let me ask you one broader question about this. Because I won't stick on Kyle's stuff for the entire conversation, but I thought it was really interesting. And he wrote a book called Filter World. The main argument, he's been on the show. I'll link it in the show notes. The main argument is that our world mediated by algorithms has become too bland. And effectively, that algorithm have flattened out, you know,
Starting point is 00:35:03 what used to be a more vibrant experience with things like music. Do you see that at all? I think this is a really interesting argument. There are two ways I want to address that. One is for Spotify specifically. We've seen the feedback that people feel like it's great for the kind of stuff I already listen to, but I feel like I'm in a bubble. I'm getting more of the same.
Starting point is 00:35:26 I'm not getting new stuff. This is sort of a Spotify specific challenge because most of the time your phone is in the pocket and you're listening. And when you're listening, you're listening to a session. Let's say you're listening to indie folk or something. Then it's quite easy for us to say, here's another indie folk song. And you're going to say, oh, that's a good recommendation. But if we start playing Metallica there, you're going to be like, what is this?
Starting point is 00:35:48 So most of the recommendation sort of inventory we have is kind of constrained naturally to watch all they're listening to because we can't put in very random things. You would say this is a bad recommendation. So this is a challenge for us when we want to show you something, completely new. The favorite example is, I love reggaeton, but you wouldn't have seen that from my listening history. How do we solve that problem? So we started investing about two years ago in other types of foreground recommendation. Sort of like the feeds that you see on social media, but you can literally say like, okay, I'm bored. I want to go wide. Then you can go into
Starting point is 00:36:26 these foreground feeds of music where you can swipe through many tracks and they're very efficient. The hit rate is going to be low because now we're in a territory where the whole point is we don't know that you like this. So our hit rate is going to be low. Then I think you need a very efficient UI to evaluate lots of content, right? Because the hit rate may be one in 20. You're not going to listen to 20 songs. There's over an hour of music. You need to go quick. So we try to solve that problem for when Alex is bored and he wants to branch out. As soon as we see that signal, we didn't have tools for that before. So we built that. So that's part of the answer. Spotify being an audio service made it a bit harder to go explore. So now we have these
Starting point is 00:37:05 foreground feeds. We have music videos, not in the US yet, but in much of the rest of the world, we have music videos that are very helpful when you're evaluating new music. But the more philosophical part of this answer is, did the algorithms sort of flatten out? Because they are to some extent trying to find statistical patterns and averages. And I think if you look at recommendation technology, I don't think this is widely known yet, but these deep learning-based systems, they had flattened out in terms of if you added more use of data or more parameters,
Starting point is 00:37:35 they did not get better, like the LLMs. There were no scaling laws. It's just like, it is what it is, and you could move at 0.2%. There's something that has happened there recently, which is called generative recommendations, where you actually use a sort of large language model instead of these old deep learning models.
Starting point is 00:37:53 And you basically think of, user actions as a language. So you have a sequence for user. They click this, they listen to that, click this, they listen to that. And then just if you turn that into tokens, just as you can turn a language into tokens, just as you can try to predict the missing word in a sentence, you can try to predict the missing action in a sequence. And it turns out that these generative recommendations, they do scale with more user data and more parameters, just like the LLMs. So this is a long-winded way of saying, I think he's right. that the recommendations did flatten out, it's also true that people are changing recommendations
Starting point is 00:38:29 stacks, and it now is unclear why they couldn't continuously get better. So I'm hoping that the recommendations do get more intelligence, because intelligent, because now it's not just a statistical average, they can look at your specific user history going years back, and they could potentially understand that it's actually, you know, Christmas again, and last year at Christmas you did this. I'm hoping it gets more intelligent. And one last question about recommendations. or maybe I have two, but one important one that comes from Ron John Roy, who's on the Friday show with us, he would like there to be a parent mode on Spotify where if you have kids, you can be like I'm on child mode and then recommend kid music and then parent mode, you know, don't blur my recommendations.
Starting point is 00:39:13 What do you think about that? So we have a bunch of different solutions for this. Obviously, there's a family plan, so hopefully your kid can have their own account and then it doesn't. That costs more. The recommendation, exactly. What are you going to do for your three-year-old? Exactly. The other thing is you can create a playlist for your kid,
Starting point is 00:39:31 and then if you click the settings, you can say, do not include them in my recommendations. And then it actually doesn't destroy your recommendations at all. So there are those solutions. We're also trying to understand that all of this is kids' music. So while this is part of your taste profile, we should not play this in your other sets, because this is probably something you're doing
Starting point is 00:39:53 for sort of a use case. So you probably want a kid's music playlist in there, but you don't want that music to affect your other sets. There's an algorithmic component. There's a subscription plan component, and then it's back to more user control. You can actually already say that this playlist should not be considered my taste. So we're going to build more of those controls.
Starting point is 00:40:15 Okay. Ranjan will be happy to hear that. Yeah. Okay, really last question about recommendations. Then we're going to go into podcasts and some other formats. I don't know if you have seen this YouTuber. His name is Fontana. He did this thing about the Shaboozy song being the song of the summer,
Starting point is 00:40:33 explaining why. And he made an observation there that was interesting to me. Talking about how we used to hear music on the radio often. And the music that was played there was music that would often be played when we're with other people, with friends, having a good time. And it led to more dance songs, rock al anthems and stuff like this. And today we're like mostly accessing music via streaming platforms. And he says those are much more individualized recommendations, which has kind of shifted the way that music is made and even the hits and music.
Starting point is 00:41:12 What do you think about that argument? So there is a philosophical question there, which has been researched a few times, which is, do you have an innate taste in your brain? And our job is to search for that and find it. find it or do what we play actually affect what you like and there are all these experiments in colleges where you know you play like different songs to different groups and then you see what they like and it seems like it's a bit a bit of both you have some sort of innate taste but you're also affected by what you hear to this argument like the the radio can change your your taste so so i think there's there's true to that argument what i think is interesting about our music listening is that
Starting point is 00:41:51 When we survey users and we ask them, what percentage of your listening is with others? It's a huge percentage, double digit percentage. So music is actually a very social activity still. And in some cases, we see this. We have this feature called JAM that is taking off like a rocket for us. It's doing very well.
Starting point is 00:42:13 And JAM is essentially, we can detect when two phones are close to each other. It's just like, hey, do you want to join Alex's Jam? And now we have a joint queue. So at a party, the way you party right now with Spotify is you don't go and like interrupt. You just bring up your phone, you join the queue, and then you can queue things up, right? And so we have a lot of joint listening, and people are listening, like I said, I don't want to say the exact percentage, but it's double digit percentage of listening happening in groups.
Starting point is 00:42:40 It just looks to individual as the individual listening to us. So I think it's actually happening more than maybe people think. It's not 100% individual listening, but because we don't. see them as group listings, we're still treating them as individual listeners. So now that we're getting more data on what is good group music, that becomes a different category. So I think the radio use case is happening. You're hearing songs at parties and with others and when you're riding in the car and so forth. It just looks to these services as lonely listening, but it's actually quite social. Right. Okay, let's take a quick break and come back to talk about
Starting point is 00:43:18 podcast audiobooks and see how many random questions I can get to before our time is out. We'll be back right after this. Hey, everyone, let me tell you about The Hustle Daily Show, a podcast filled with business, tech news, and original stories to keep you in the loop on what's trending. More than 2 million professionals read The Hustle's daily email for its irreverent and informative takes on business and tech news. Now, they have a daily podcast called The Hustle Daily Show, where their team of writers break down the biggest business headlines in 15 minutes or less, and
Starting point is 00:43:48 explain why you should care about them. So, search for the Hustled Daily Show and your favorite podcast app, like the one you're using right now. We're back here on Big Technology Podcast with Gustav Sotestrom. He's the chief product officer, chief technology officer, and co-president of Spotify. So Spotify is investing heavily in podcasts. This has been going on for a long time, for us through, largely through an original strategy and now less so. Also, audiobooks. You can find And my book, always day one on Spotify, if you're a premium listener, which I'm happy about because more people can listen to, to the book. What has gone into the decision to just bring all these formats together in one app?
Starting point is 00:44:30 And, I mean, are they good businesses for you, podcasts and audiobooks? Yes, if we start with the first one, how do we come to this decision? What happened is that we saw internally actually at Spotify, a lot of our developers, sort of hacking Spotify into or hacking podcasts using RSS into the Spotify experience. And we saw it again and again at Hack Weeks. And first we thought like maybe it's a niche random need. We saw it again and again. And so then we just, it's like user feedback, user research.
Starting point is 00:45:04 You know, Spotify is still like many thousands of employees. So it's not a very representative sample of society, but it is some sample of society. So if you see the same user need many times, you should take it seriously. So we started looking at that. And then we looked at podcast that we saw had a lot of potential and was growing, but we didn't think anyone was doing something very interesting with it. So we decided to then just approach it because we saw the user need internally. We saw the market growing, we sized it, and then we saw that there was no one really investing in it.
Starting point is 00:45:33 Apple hadn't invested in it, and they had like 98% of the market. So that's how we came to it. And then the question is... Yeah, that Apple podcast app needs work. Okay, but sorry, go ahead. But we were grateful for that. So then the question is why in the same application, why not as a separate application? And that's, there are two views of that.
Starting point is 00:45:56 One is it's a strategic decision. The biggest barrier to something new right now, unfortunately isn't necessarily the quality of the application, it's the user acquisition cost. Distribution is everything. Distribution is still everything. And actually, at the beginning of the iPhone era, there was a lot of organic distribution. people went to the app store every day. It's like no one goes there anymore.
Starting point is 00:46:19 So you almost have to pay for every new users. So user acquisition costs is probably the biggest inhibitor to most business plans. So if we built a separate app, we would have to reacquire our own users again, and that would make it very expensive. And we have seen all of these big, big companies, the American tech companies,
Starting point is 00:46:35 launching app after app, and basically nothing worked. Then we look at China, which is a different strategy of the super apps, where they double down on their own distribution. so you can think of like podcast pre-installed. So that was the strategic angle for what this made sense. But I actually have a user angle on this where I think it is the better experience.
Starting point is 00:46:56 So I think in 2024, the user should not adapt the software to the content. I think in 2024, the software should adapt to the content. So if you play a piece of music, there should be skip buttons. If you play a podcast, it's not rocket science to change the skip buttons to 15 seconds scrub. And if you play an audiobook, the change into chapters, like, come on, it's 2024. Why do you have to switch apps for that?
Starting point is 00:47:21 Right. Right. So we actually both believe that it was strategically the best for us because then we could double down our own distribution. But we also think this long term is the right user experience. It is the easiest for the user. Now we have this beautiful connections between the audiobook and the author being interviewed in a podcast on the same thing where it's seamless. Instead of like, now you should switch the app and go somewhere else. So that's the reason that we do it in the same application.
Starting point is 00:47:45 And talk a little bit about discoverability, because that's the biggest issue for podcasts. I mean, if I, and as a company that's an expert in recommendations, which we've spent like most of this show talking about, that should be something that you get done pretty well. But for instance, like if I'm listening to the tech shows, and I'm not listening to Big Technology Podcasts, I probably want to see that there's a show called Big Technology Podcast out there. And from what I've heard, discoverability, like both from product people and from podcast. producers has been the biggest issue probably because there's like a huge investment that goes into listening to even that first five minutes of a show i mean that's like two minutes longer than your average song to try out a new show and most of them most i mean i actually changed my show that we could do our like really like you know information rich uh intro which you just experienced
Starting point is 00:48:37 and then take a break take a break and come back in because if people are going to try it out i want them to know what they're getting versus like the typical long wind well here we are today and yeah it's been beautiful so i'm just curious what you think about this discoverability thing so i think you're completely right short form formats are easier because the discover is the consumption so like a talk on tick talk it's not like there's a recommendation for the for this talk like when you watched it you consumed it music is almost the same it's three minutes it's not quite like you know but it's almost like if you discover it you also consume that podcasts are different you kind of need a trailer you know because it's
Starting point is 00:49:15 It could be an hour of investment. And books are actually even harder. It could be 15 hours of investment. So I think a lot of the challenge is to create a good representation, a good short form representation, this long form content to understand if you should invest your time. And so this is something that we are investing quite a long. The podcast world didn't have that for the longest time, right?
Starting point is 00:49:40 And I think this is also part of the reason why, if you look at the old sort of Apple podcast world, It's a few shows that have a lot of followers sort of forever, but it's really hard to break in for a new show. I think this is changing now with these short form previews that are happening on TikTok, on YouTube, on Spotify, where you can quickly go through and understand what a show is about. I think video is actually helping.
Starting point is 00:50:04 It's the same in music. We see that music video is very important in the discovery moment. And actually, a new release with a music video in an AB test, that's much better than a new release without a music video. in terms of downstream. And I think it's the same for podcast. If you're quickly saying, like, I'm interested in technology podcast, it's quite hard to, it helps a lot to have video for those podcasts.
Starting point is 00:50:28 And so this is what we built these foreground feeds where you can go through a lot of, a lot of material within your interest with lower friction. So we're investing quite a lot in sort of the quote-unquote preview problem. And it's the same for books. To get a good understanding of a book quick is hard. you can use a lens for that to try to summarize them you can use the author's own summary so this is something we're investing quite a lot
Starting point is 00:50:53 okay so you have been introducing video and for podcasts I know this one is going to be on video I'm hoping to do a lot more video podcasts through Spotify are you going to do short form video feed TikTok like well we we already have that for the intro so as a creator you can upload your video podcast you can also choose this is the short form representation I want in sort of discover feeds
Starting point is 00:51:15 right so Spotify has Discover feeds for for music for podcast and for books but it's important to know they look like TikTok but on TikTok or Instagram the the item is the consumption itself right and they are measuring it on how long you stay in the feed we are actually doing something it looks the same but we're doing the complete opposite how often you leave the feed to how often you save it so we're trying to get to save for later so we're ranking them on how many things you save not how long you stay which drives a very different recommendation right so we're trying to get people to save your episode into the library to listen to the full thing so that's the optimism we actually
Starting point is 00:51:52 don't want to stay in that feed we wanted to quickly get through save a bunch of stuff so your library is full of interesting podcast okay we have a couple minutes left i don't want to leave without asking this question and this will be the last one i have although sometimes i say that and end up asking a few more but let me just ask you this one and hopefully we'll be able to get out of here after this one uh TikTok it's such a culture setter and there are moments I think where tic where dances and songs will go viral on tic doc and quickly become the number one song in the world so just talk a little bit before we leave about the influence of ticot on driving culture driving listening on spotify how what's the
Starting point is 00:52:31 magnitude of it what do you see on your end i mean ticot is is huge and and instagram reels is huge like a lot of culture happens on these platforms so we've chosen as a strategy to invest in these platforms on TikTok actually you can now save the track straight to Spotify so for us this is a huge discovery funnel we also have editorial playlist called like TikTok viral hits and so forth to try to capture what is happening on those on those platforms so one way to think about it is for us this is like a top of funnel things happen there if we're well integrated we capture the downstream listening from that so so this is we're trying to integrate into all of the big social platforms because a lot of culture happens there.
Starting point is 00:53:17 But we also want culture to be able to happen on Spotify. So this is why we have our own editorial playlist, specifically in music. We do drive a lot of music culture. So both when it comes to being able to save from these platforms, but also being able to share to these platforms and be able to talk about Spotify music on these platforms, we've invested a huge amount of engineering and being, making sure that's very easy to message a Spotify link in a WhatsApp or in a messenger or in something like this. We want all the conversations about music to be Spotify links going back and forth.
Starting point is 00:53:56 Yeah, it's fascinating. It could be a little disconcerting sometimes to hear like a full version of a song that you've heard on TikTok a bunch. Like that Apple song, I didn't realize there was a beginning or an end to it. I just thought it was that Apple dance part of it. But anyway, it says a lot about me, I guess. So we're here at the end of the show. How often do people get to the end of podcasts on Spotify? They do get, I have a number.
Starting point is 00:54:19 I don't know if I can share it, but you can see a curve like this, starts at 100. It goes down. It depends between creators, but you have fall off in the beginning. But then after a certain point, most people just go stick to the end. Then there's like a really big drop of at like, you know, 90-something percent. It's usually end music or something. And does the end music hurt with discoverability? Because is Spotify saying, okay, only, you know,
Starting point is 00:54:40 we had 60 something percent up until like minute the last minute but then they go down to 30 before they complete no we shouldn't end abruptly or should we no we control for that so we understand that this is the end credits and people move on to the so we can take our time coming in for a smooth yeah you can have it you know good exit song it's fine not that many people will listen through i can tell you that yes but it's not going to hurt your recommendation score well for those who've listened up until this point we thank you for sticking through at gustav great great to speak with you. Thank you for answering all these questions. Such a pleasure being on the podcast. Really appreciate it.
Starting point is 00:55:15 Great having you. All right, let's hit that exit music. Everybody, thank you so much for listening to this episode of Big Technology Podcast. Great being here at Four World Trade, Spotify, headquarters, and getting a chance to speak with Gustav. We're coming in for a nice, slow, and lovely landing as you get on the rest of your day. And we'll see you next time on Big Technology Podcast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.