This Week in Startups - The Future of Sound: Udio’s Vision for AI-Generated Music | E2016
Episode Date: September 27, 2024This Week in Startups is brought to you by… .Tech Domains. Don’t miss our “Jam with JCal” contest! To apply and get more details go to https://Jamwithjcal.tech brought to you by .tech domains.... LinkedIn Ads. To redeem a $100 LinkedIn ad credit and launch your first campaign, go to https://www.linkedin.com/thisweekinstartups Brave. If you’re building AI and search-based applications, train your models with the Brave Search API. Get started for free at https://brave.com/jason * Todays show: Udio’s David Ding joins Alex to discuss the inception of Udio (1:32), advancements in AI music creation (8:47), and the evolution of Udio’s AI model (10:32). Plus, David demos Udio’s capabilities live (32:47)! * Timestamps: (0:00) Udio’s David Ding joins Alex (1:32) David's journey and the inception of Udio (4:26) AI music generation and user control over music elements (7:52) .Tech Domains - Apply for the Jam Session with JCal contest today at https://jamwithjcal.tech (8:47) Advancements in AI music creation and data annotation (10:32) Evolution of Udio's AI models and early versions (14:59) Udio's target audience and the future of AI in music (20:06) Udio's potential DAW integration and music production terms (21:52) LinkedIn Ads - Get a $100 LinkedIn ad credit at https://www.linkedin.com/thisweekinstartups (23:17) Udio's funding and business model (27:11) Financial discipline and GPU cost efficiency at Udio (31:28) Brave Search API - Get started for free at https://www.brave.com/jason (32:47) GPU-based compute challenges and a live Udio demo (48:03) Udio's user interface, engagement, and community insights (49:57) Udio's growth, virality, and competitive stance (53:27) Udio's model quality and expansion roadmap * Subscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.com Check out the TWIST500: https://twist500.com Subscribe to This Week in Startups on Apple: https://rb.gy/v19fcp * Check out Udio: https://www.udio.com * Follow David: X: https://x.com/daviddingai LinkedIn: https://www.linkedin.com/in/david-fengning-ding-053b1282 * Follow Alex: X: https://x.com/alex LinkedIn: https://www.linkedin.com/in/alexwilhelm * Thank you to our partners: (7:52) .Tech Domains - Apply for the Jam Session with JCal contest today at https://jamwithjcal.tech (21:52) LinkedIn Ads - Get a $100 LinkedIn ad credit at https://www.linkedin.com/thisweekinstartups (31:28) Brave Search API - Get started for free at https://www.brave.com/jason * Great TWIST interviews: Will Guidara, Eoghan McCabe, Steve Huffman, Brian Chesky, Bob Moesta, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland * Check out Jason’s suite of newsletters: https://substack.com/@calacanis * Follow TWiST: Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin Instagram: https://www.instagram.com/thisweekinstartups TikTok: https://www.tiktok.com/@thisweekinstartups Substack: https://twistartups.substack.com * Subscribe to the Founder University Podcast: https://www.youtube.com/@founderuniversity1916
Transcript
Discussion (0)
Hey, everybody. Welcome back to Twist. This is Alex, and we have a special interview for you today.
I am an enormous fan of music. You may not know it, but I grew up playing classical and jazz trumpet throughout my youth.
And music has remained an absolute huge passion of mine throughout, really, my entire life.
So when the AI revolution of the last couple years came to the world of music, I was incredibly curious.
Two companies have really caught our eye here on this weekend startups, UDO, and of course, its competitor, Suno.
Today we have David Ding, the co-founder and CEO of UDO on the show to tell us what it's for,
who's paying for, and where AI-based music creation is going.
This weekend startups is brought to you by dot-tech domains.
Don't miss our Jam With J-Cal contest.
To apply and get more details, go to JamwithjCal.com.
Brought to you by DotTech domains.
LinkedIn ads.
To redeem a $100 LinkedIn ad credit and launch your first campaign,
go to LinkedIn.com slash this week in startups.
And Brave.
If you're building AI and search-based applications,
train your models with the Brave Search API.
Get started for free at brave.com slash Jason.
We're going to talk about AI and music.
David, hi. How are you?
And welcome to the show.
Hi. Hello. I'm really excited to be here.
So I want to start with some background stuff
because I know you were at DeepMind for a while
and UDio is a relatively young company.
I think it was founded in 2023.
So just give us, what was the moment in time in which you said,
I have to leave where I am and go found this company because why?
Yeah, sure.
So, yeah, as you said, UDO was founded last year, November of 2023.
And before that, I was a researcher at DeepMind.
And throughout my entire childhood, I've always been interested in two things primarily.
So one is technology, and the other is music.
So as a kid, I always wanted to build computers that can simulate the way a human brain works.
You know, like maybe you can wire the neurons together and then try to model the brain.
And then it turns out that when we went to college, like, this thing was starting to pick up traction.
My first year of college, I took a machine learning course so that I can like participate in this field.
And my other passion growing up was music.
So I played classical piano and played it for at least like 10 years, like growing up before going to college.
And I always thought that would be really, really cool if a computer technology could compose and make music.
And so then fast forward to when I was working at DeepMind, generally modeling really took up.
You see technology if like chat GPT or Dali or Mid Journey like emerge.
that really revolutionize the way that computers can make art.
And so at that point in time, I was like,
hmm, what happens if we apply the same technology that I've been learning how to build
and apply to music to have a machine that can help people create music,
ideate, and create songs?
And so this is why we left DeepMind to create a company that produces a product
to help artists and songwriters,
turn their ideas into reality.
So I want to go back to the point about LLMs to image generation to music generation
because, I mean, my day job is writing.
So that's kind of what I know the best.
And so to me, the idea of a large language model taking in a lot of data and then
helping kind of do next word prediction, admittedly, it's more complicated than that.
But I can really understand it.
I kind of get how we can use LLMs to do image generation.
But when we expand the work done to music, to me,
I feel like I'm missing a link in how the technology actually functions.
So without spilling any secret sauce, if you will,
how does the AI models that I best understand end up creating tunes?
Because it just seems to be like a real stretch of what was possible,
but clearly it works.
Yeah, so similarly to how these large models learn how to produce images and text,
All models, they learn how to produce music by listening to lots of examples of music.
So you listen to music and it tries to synthesize the common elements across music.
So like elements of music theory, like which chords follow which other chords or how rhythm interacts with the overall structure of the song.
As well as other elements, what does it mean to the country music, which is rock music.
or how does a guitar string vibrate
and how does the sound of a piano echo and reverberate around the room?
And finally, how does this all interact with the recording technology?
How do you turn this sound and turn it into stereo?
And so this model, because it's trained on the final output music,
it learns how to do everything.
So from like the very fundamental music theory level,
all the way to how the sound is really.
recorded by the microphone.
Okay, so prepping for our chat today, I was playing around with you, Dio.
By the way, I am now your most recent paying customer.
Shout up.
And I decided to throw a curveball at your software.
And I said, okay, look, I wanted to do a progressive metal song that sounds a little bit like
periphery, band that I love.
But I'm like, look, let's do it in six, eight time.
Now, you're a classical train pianist.
I'm a classical Italian trumpet player.
You and I know that when it comes to time signatures in the world of music,
6-8 is not very complicated, right?
We're not doing like 11-8 or something crazy.
You count in sixes and then five.
So this is pretty simple.
And it kind of did it, but not perfectly.
And I know this technology is still improving,
so I'm not trying to be negative.
But is this going to a direction in which I could tell a service like UDio?
Like, I want to do a song, first half in 6-8,
second half in 7-8,
and I want to do a chord change from C to C major.
and then, like, how specific can we get?
And then is that underpinned based on a very granular understanding of how music was put together,
or does the software better understand, like, broader chunks of it versus, like, down to the individual note level?
Yeah, so this is definitely a direction that we do want to support, giving users and musicians more ways of controlling the model,
like time signature, key, tempo, BPM, or instrumentation,
or even like dynamic levels, like start quiet, start swelling, and then like, and then die down again.
So there's something that we definitely want to support.
Time signature is something that we do not support at the moment because in our music, you know, when we're training our models,
we did not teach it the concept of a time signature when we were annotated data.
Our key signature, the key of the song, is something that we do support.
And this is something that we did not support when we launched the model,
version 1, back in April, but it's something that we added in July,
because we recognize that users want to be able to control the key.
And so then we annotated our dataset to contain, oh, this is B major, this is C-sharp minor.
So that now when you go to Edeo and you specify a key, A minor, it will produce a song in that key.
Well, A minor is now the most famous key in, I think, all of music things do, Mr. Kendrick Lamar.
If you don't get that reference,
congrats for being offline for the last three months of music history.
Wow, this jam with J-Cal contest has been a blast.
So far, I've had the opportunity to meet with four great founders from companies like Corpod,
Ulama, Uptrans AI, and the ROMAP, all because they all use dot-tech domains.
And we have room for one more.
Do you want to come on the pod and tell me what you're building?
Well, you only need two things to answer.
You got to be a founder with under $2 million in funding,
and you've got to have one of those awesome dot-tech.
domains. So head to jam with jacal.com.com and tell me what you're building. And if you win,
I will invite you onto this week in startups and you'll get to share your vision with me and the
world. I'm working with dot tech domains because killer startups use them. You know 1x.tac.tac.rabbit.com,
so many others. And guess what? We use it too. That's right. Dottech powers our founder Friday
program. So tell me about your awesome. Dottech domain and startup. Apply for the Jam with Jal
contest today at jam with Jal.com.
we're picking the final winner soon.
Okay, so it sounds like what I did there was I asked UDO to do something that it doesn't do quite yet,
which is probably why it got a little bit funky.
But you said something interesting there, which is data annotation.
And that I think is the thing that I was missing because it sounds like you guys,
the human label and like help it understand like this is a rock drumbeat in 4-4.
So does that create like a flag that then the software or the model can kind of go back to
and like point to and understand?
Yes, so by annotating the data in a training data set,
you teach the model how to associate certain descriptive words with musical elements.
So then it sees 3-4, the time signature, and it hears a song that's in 3-4.
And it knows, oh, 3-4, it means like you have like three beats.
And then like the first beat is emphasized.
When a user then asks the model to create 3-4 music, it can then like,
take its understanding of 3-4 and apply it to the competition of the song.
It's kind of like when a human learns,
if you never teach a human, oh, this is 3-4,
you can't ask the human, hey, create me 3-4 music.
Even if it can produce 3-4 music,
it just doesn't know what 3-4 the words actually mean.
So it sounds like the data annotation then provides almost like a connective layer
between music and the user's request
and kind of helps natural language input
translate to something the computer can
understand as a command prompt, essentially.
Yes, exactly.
And we aim to improve our model
by giving it more annotations to understand more elements of music
so that the model can produce these elements upon command.
Okay, so I want to go back in time, though,
because I've been playing with UDO since, and this is a true story,
one of my friends started sending us funny songs he made for us,
in the group checked.
And they were,
they were whimsical things like,
Alex doesn't want to go to work tomorrow and like stuff like that.
And I was like,
okay,
where is this coming from?
And it was from you guys.
And so I got a kind of early look at the software and I've made different songs and
I've gotten to play with the new model some.
But back in the beginning day,
when you were first getting like the point one version of this out,
one,
how good or bad was it?
And how,
how easy was it to get from like proof of concept,
if you will,
to something you were confident that people might want to actually use.
Yeah, so funny that you mentioned, like the first version of our model,
like the baby, the very, very baby version,
when we were still debugging our overall code base and training structure.
We spent a couple weeks trying to figure out why our model couldn't produce any lyrics.
You provide lyrics, and the model just refuses to sing the lyrics.
And then we spent a while looking at the model,
like analyzing different loss curves.
And then eventually we found the reason to be quite simple
is that when we were feeding the data set to the model,
there was some kind of bug that caused the lyrics to not appear.
And so the model never saw the lyrics,
and so therefore it couldn't possibly know how to turn the lyrics into a song.
So essentially, it couldn't run the engine of lyrics
because there were no words going in.
Exactly, yeah.
And so this really goes to show how the process is quite dependent.
You had to pay attention to detail, and it's all about the input data.
And so we fix the bug, and then after we fix the bug, the model actually just kind of took off.
Every week we saw improvements.
The first week, I probably knows the broad genres like rock versus jazz.
As model training progressed, it started learning more, more spruce,
specific keywords like energetic, what's hard rock, what is like smooth jazz, and also the
sound quality improved, starting from something that sounds like very noisy to something that's
much more refined and more like what you get from a studio.
Yeah, no, the actual fidelity is pretty good in my experience.
And one thing, as a fan of heavier music in general, there are certain heavy metal
subgenres that depend a lot on orchestral additions that are mostly programmed.
And so I'm familiar with like the current state of the art for studio music with added digital elements, if you will.
And we're not that far off from this just with UDO's own creation software.
So that's very exciting.
But it sounds like from the the point of inception of the initial like it works to go into market to 1.5 release in July is pretty quick.
And that chart's going up and here's a model quality and fidelity and so forth.
Do you think that trajectory continues for a long time,
or were there early winnings, David,
that lets you improve faster than you might be able to now and in the future?
So obviously there is a point where you start from zero and you get something.
And so that's the biggest at Delta.
And as you observed, our audio quality is actually pretty good,
although there are still areas for improvement which we are working towards.
but the big focus going forward
is additional controls for users
so giving people
more ways of controlling the music
like maybe you want to provide
like a guide
like you had this
melodic line already
and you want the model to follow this melodic line
and add musical elements to it
or maybe you have this like musical style
but you don't really know how to describe it using words
so how would you take this musical style
synthesize it and feed it
as an example from the model to follow.
And so we want to enable these additional controls
because we recognize that music creation,
the creator wants to have a lot of control over the music
because that is their own creation, right?
And so that's the area that we really want to focus on going forward.
Okay, I want to do some demos in a little bit
to show people what we're talking about
because you and I've used this, of course, a lot and they might not have.
But one thing that I was thinking about is who this is for,
because I am an enormous music fan.
So to me, music is part of my day from kind of when I get out to when I go to bed.
I'm either listening to audiobooks or music, right?
And so to me, it's very personal, very important, and I know music theory, and I love it,
and it's key to me, not everyone's like that.
And so people have different music tastes, different consumption habits.
And so I don't know, is UTO aimed for folks that want to create stuff for their own consumption?
Is it more of a rough draft machine for artists as they explore new ideas?
Is it a way to generate Muzak for elevators?
So I guess kind of like, who do you think this is for now and in the future?
So we think that EEO is for people who love music, people like yourself.
And also like artists and songwriters who obviously love music as well.
We just want to create a tool to allow, to make music creation a lot,
easier than before, kind of like other tools that have come, that came before in the past.
Like for like DAWs, sampling, drum machines, these are all innovations that turned
something that was a little bit harder before, and with the aid of new technology,
just making this creation process easier so that more people can participate in the creation
process, and that existing artists can leverage this to try out ideas at a faster pace
and come out with music that incorporates these elements
and creative ways that maybe even the creator of the technology
never had in mind.
I think one good example for this is Autotein.
When Autotein came out, a lot of people,
they had qualms about using it.
It's like, oh, it's like cheapening the experience.
That's a flight way of saying it.
Yes, but sorry, keep going.
Yeah, like people were like, oh, this thing is like cheapening the experience.
Like it allows people who can't sing to sing, and that's a bad thing.
But then, like, you know, like, what really happened was, like, you know, it, like, really transformed the industry.
Like, people were using it.
And then people found ways of using it very, very creatively, like, you know, like bumping it up beyond, like, the spectrum and embracing the altruiting sound as a musical style, right?
And so we think that with these technologies, it makes music creation easier, and people will find, like, creative ways of using it.
Okay.
So it sounds like for someone like me, a big music for you.
I could use it to create fun things for myself to listen to.
If I was a musician, I can use it to expand ideas and give me new ideas.
But this doesn't replace, you know, I don't know, my spouse's Spotify account at some point in time.
This is more like distinct acts of creation in the future versus passive consumption.
Exactly.
So, I mean, you might, you say that you play a trumpet, right?
Like your trumpet doesn't replace listening to your music.
like great trumpet players of the past on Spotify, right?
Because you enjoy listening to music that other people create,
but you also want the joy of creating music yourself.
Yeah, no, I think that's right.
And what I like about the idea behind taking modern AI techniques
and applying them to music is it just allows a lot more people to do stuff.
You know, David, like five years ago,
people talked a lot about low code and no code.
And there was this big chat about the democratization of software development.
And that's kind of worked out.
But I love the idea of more power to more people.
And this to me seems to fit into that.
Now, on the critical side, though, some musicians are worried that they're going to be replaced whole cloth or diminished in some way.
I want to run my theory past you, which is that I don't think that's going to happen because the musicians that I love to listen to have their own very specific, sometimes experimental style that probably couldn't be replicated by.
even very intelligent models.
So to me, this exists, if you will,
side by side with kind of how music is made today.
I'm curious if that's your view as well.
Yeah, so we, so that's definitely my view as well.
We, we, so I believe that people will continue making music the way they've always made music.
And this is simply another tool in a toolkit that they can choose to use,
or they don't have to use it.
But then it just, um, something additional, right?
just because the electric guitar got invented,
it doesn't mean that the acoustic guitar got completely replaced, right?
It just means that there's yet another instrument that you can add onto your band.
Yeah, actually, I remember in my high school jazz band,
we had a song, I think it was an old buddy rich tune,
and it had a little bit of guitar by itself,
and our guitar player played electric,
and one time he forgot to turn his guitar up,
so we got to that part of the song, and he played,
and no sound came out, and our director was like,
well, you know, he was a trumpet player,
and he was making fun of the electric guitar
for needing, you know, help
essentially, and I was like, I don't know, that seems a little
bit old fashion, but this probably fits
somewhere in there.
I do want to ask a quick question, though, about where
UDio will come up, because you mentioned
DAWs or digital audio workstations earlier,
very much now a well-known entity in the musical world.
Does UDio ever become
part of one of those, a plugin, an API that I can call?
Does it leave the website and end up somewhere
else? Quite possibly. We think a lot of our power users, they use ETO to come up with ideas,
and then they download the individual stems, which is a feature that we allowed. We allow,
so people can download stems and then load the stems up in their DAW for further post-processing.
Okay, so essentially they take the route draft, bring the stems over, and then you can do more,
okay, all right, that's pretty cool. Is it hard to do individual stems? Because that implies that,
the model is making
a collection of tracks
that are then mixed together.
Is that how it's always been, or is that a new
change to how the underlying
model works?
The underlying model always produces
the fully mixed track.
But then recently
with Vision 1.5,
we added the ability for users
to download the individual stamps,
which are separated post hoc
from the mixture. Oh, post hoc.
Interesting. Yeah.
Oh, okay.
So you create something that's mixed and then you isolate.
I would have thought this other way around, but that's why we ask questions.
Yeah.
Okay.
So before we talk about money, stems are the individual tracks inside of a song for, for example, bass or guitar, piano or whatever.
I just want to make sure that everyone listening to understand stems.
David, is that how you would define them as well?
Yep.
Okay, cool.
So if you don't know what stems are, now you do.
Okay, there are more than 50,000 venture back startups in the end.
United States alone, this means marketing has to be perfectly targeted. You got a lot of competition
out there. Or you're just going to fade into the background and your money will go with it. All your
ad spend will be for naught. You got to make sure you target the right prospects. So how are you going to do
that? Especially in a business to business context. Well, the answer is obviously LinkedIn ads,
where you can precisely reach the professionals who are likely to find your ad relevant. Just think about
it. Wouldn't it be great to target your ads by the job title where the industry
the location that that company is, you know, and maybe even a very specific company. Maybe you've got
a list of 20 lighthouse customers that you want a bear hug, that you want them to know about
your product or services, LinkedIn ads is going to help you do that by building a relationship
and driving results. LinkedIn is the environment where people are receptive to business. They're not
there for food or politics or entertainment or music. They're there to do business. A billion
members. A hundred thirty million of them are decision makers and ten million of them are C-level executives.
So start converting your B2B audience into high-quality leads today. Get $100 from your boy, J-Cal,
LinkedIn.com slash this week in startups to claim that credit. Again, LinkedIn.com slash
this week in startups, no spaces, no dashes, terms and conditions. Why? Because it's giving you a
hundee. Okay, so UDEO raised $10 million. That was earlier this year, Andrews and Horowitz was
in there a number of artists, including the producer, Take Heath, and I love to see venture capital
funds, glad you guys raise some money. But my thought is this. I currently pay you $10 a month
to use something like 1,200 song creation credits. I look at that and I know how much AI costs to run.
People talk a lot about that. I feel like I'm burning through your bank account. So is it as
expensive as I imagine it is to run the model to create music because it sounds very
compute intensive.
Yeah, obviously it's a balance for us.
We want to make sure the price is set at a point where we allow people who are curious
about the technology to try it out while being able to make this process sustainable.
So we chose a price in a way to basically allow for this, like to be able to sustain
usage while not being very expensive.
And going down the road, we definitely want to optimize our models,
make them more efficient so that they can run at a cheaper cost
because we want to maintain this commitment to users,
but we also want to run a sustainable business.
Yeah, so we've seen this with just a big one example out there,
OpenAI's GPT family of models.
When 4-0 came out, it was one cost, and then it's come down, I think,
and we've seen that pretty frequently.
Does that mean that you guys are able to extract a lot of efficiency
from the underlying model
and that this should get much cheaper to run over time,
or is there less low-hanging fruit
because it's doing music,
which is just, to me, seems harder than doing text?
We think that there is a lot of room for improvement for sure.
I wouldn't comment on whether or not it's on the same scale as Open AI.
Open AI obviously has entire teams of incredibly talented engineers working on this,
and we are a much smaller company.
But we do believe that there are similar levels of efficiency gains to be had.
Okay, so essentially, yes, UDio is a smaller company.
I think Open AI has over 1,700 people now,
but with work a similar-ish curve.
Okay, that's actually a really good question for me to ask.
How big is the company today?
What's your current staff size?
So we currently have about 17 people.
So we've grown quite a bit.
How many people did you have before?
When we launched the company,
when we launched our model back in April,
we only had eight people.
Eight.
Oh, God.
Yeah.
And just because this is a startup show,
let's do some basics.
Remote, hybrid, or in office?
Mostly in office,
but some working remote.
Okay.
And just thinking about staffing for the rest of the year,
are you going to keep hiring as aggressively as you have?
Or will that slow down that you've more than doubled in size?
We'll probably stay a little bit more steady.
Okay.
So I know you guys raised from Andresen,
I mean, a venture firm that everyone watching this show knows.
And you guys raise 10 milly.
Is that enough money, David?
Because some of your competitors have raised more.
And we are in the era right now of companies that use AI raising,
lots of money, let's say.
So I'm just kind of curious why
10 million was the number
and also, you know,
how soon are you going to be back on the show
telling me about your shiny new round?
Yeah, so when we started,
we raised 10 million because we want to be disciplined
in how we spend the money.
We believe that like there's some amount of truth
in the idea that scarcity produces innovation.
And so we try to be super efficient
in a way that we use
our capital to develop our models.
So tell me about that
because, you know,
developing a model, I mean, people talk about
how models are eventually going to cost like a billion dollars to put together,
but that's for a very general purpose model and so forth.
So for you guys,
how do you ensure that your capital expenditures
on model creation and improvements are cash efficient?
Yeah, so one thing that we do is to try to secure a cheapest
compute power that's available
like the chip that's cheapest
in terms of floating point operations
per second
versus dollars
and so we ended up
choosing Google Cloud's GPUs
which we identified as offering significant
savings over other like chips
like NVIDIA GPUs
Google's startup cloud program
is one of the occasional sponsors of this show
So I just want to point out that no, no one asked them to say that.
That was off the cuff, but there you go.
So we're not being biased.
Just to put it in your perspective for me, though, because I don't get to go to those negotiations.
How much cheaper was GCP for UDO compared to competing providers?
Was it a lot cheaper or was it more of a marginal differential?
Yeah, I'm not sure if I should comment on specific numbers, but it is.
Oh, you should.
David, you definitely should.
You should drop all the numbers you can right now.
Yeah, but it's definitely quite a bit.
cheaper. And so that's
one factor. And the other factor
is that we have quite a
few really talented modeling
research scientists
among our co-founders.
And because they have a lot of experience
training these really big
alternative models, they know how to
make maximum use of
the available hardware, how to
create really efficient programs
and how to design architectures
that can train efficiently.
So if you're doing that work, though, because that's nitty-gritty stuff, if you're doing all that already, why not buy your own H-100s or equivalent and just run your own mini data center?
It seems to me like if compute's going to be such a core element of what makes the digital brain that you use, why not own the neurons themselves?
I guess for us, we as a startup, we didn't really want to deal with the logistics of running our own.
own data center. And we thought
would be simply to go with a cloud
option. Do you
see the company in, let's say five years
from now, just looking down the road
so far that I know we're making kind of almost like a joke here,
but like, do you think that you'll still be
on a major public cloud provider
in five years? Or do you eventually
off-ramp when you have
more money and staff and so forth and do
your own data crunching?
Yeah, it's hard to say. Like, the cost of
computation on the cloud
has been going down
I think there's like some kind of new law, like Huang's law or something,
that supplanted Moist's law about the cost of GPUs over the years.
The cost of a cloud company could go down very significantly,
and it's very hard to predict five years down the line,
whether or not it will be more economical to buy your own chips
or to lease them from the cloud.
Okay, so Huang's law, by the way,
this was, of course, a reference to Jensen from Invidia, I presume?
Yeah.
Yeah, okay.
So if you know in video, you know the guy, Huang's Law is, and I'm Wikipedia in this live, so this is not very lettered of me.
But it's a general idea that as Moore's Law predicted that the number of transistors would double about every two years,
Huang's Law is that GPUs will more than double their performance every two years.
So it's essentially an acceleration or a faster version of Moore's Law for GPUs.
That speaks very well for you guys, because that means that your gross margins should improve over time, just naturally as chip companies make better chips.
That's kind of cool.
That's a tailwind for you as a CEO.
Yeah, it's definitely something that we're very excited about, like, cheaper compute, making even more powerful technologies possible.
All right.
Are you building the next great AI product?
Well, if you're doing that, you know how expensive all these APIs can be for model training data, obviously, and training AI is very expensive.
That's a fact. We all know that. So you have to try Braves' new search API. Yes, I'm talking about Brave, the privacy browser that I use every day and on my mobile phone. Braves browser has 65 million users. And that drives a lot of data into the Brave search engine, which is the only global scale independent search index outside of big tech. And that index is available to anyone with a Brave Search API. So you're going to be able to use the Brave Search API to power your chatbot or train models in full.
form answers to real-time queries, and serve images, web results, even rich text snippets.
The Brave Search API features an easy-to-use intuitive data structure, so you're going to be able
to get things done quickly, and its data is populated by real human interaction, not web crawlers.
That's critical.
And it's all done at a fraction of the cost of the major players, free for up to 2,000 queries per
month, so you can try it on, play with it, really sort of brainstorming, and then plants
start as little as $3.
So here you go.
If you're building next-gen AI apps or chatbox,
you've got to try the Brave Search API.
Get started today at brave.com slash Jason.
On the public cloud front,
I'm going to not ask about an individual provider
because I don't want you to get in trouble,
but I have heard that there is a capacity crunch out there,
that there's not enough total GPU-based compute
for people that wanted.
Has UDio hit any issues getting the amount of
compute capacity that it needs at any point?
Yeah, I mean, it's always a balance where eventually we'll be able to get the compute,
but definitely at times it just takes a while for different cloud providers
to be able to find the chips that are available.
Okay.
Now, I want to go from there to a demo so everyone can see the product that we're talking about
from a compute perspective.
So, David, we drew straws before, and you're going to drive
because you told me that you have some new stuff to show off.
So let's pull up UDio.
If you're watching this on YouTube,
you can see what we're doing live.
If you're watching,
listening to this on Spotify or Apple Podcasts,
we will narrate as best we can,
but we are going to do a little bit of testing around here to show off
what we can pull off.
So David,
talk to me.
What are you showing me?
Yeah, so I'm showing you the UDO create page.
It's a dedicated,
you can think of it like a creation studio,
where you have a list of your recent creations
on the right-hand side.
And then on the left-hand side,
you have a place where you can specify
the type of music you want to create,
as well as any lyrics that you want to have.
So maybe we can start very simple.
Let's start with just creating, I don't know, like rock music.
So here we can type rock.
And then for simplicity's sake,
I'll just create a song about New York.
And so this is a feature that we launched just yesterday, actually,
where you can ask the language model to write lyrics for you
before you submit the song, and can even tell it to give it suggestions
on what to do.
Like for example, let's say we want to make it a little bit shorter.
Nice.
Okay, so you can essentially tell it to get more verbose or less verbose.
You can do other things as well.
I don't know, like make sure to mention New York.
Does it keep the last prompt in mind when you give it another instruction?
So is it still thinking keep this short as you add the make sure to mention New York in the update box?
Oh, yes.
So like we actually have a prompt history that shows all the prompts that have accumulated so far.
So we aim for this to be to improve upon our previous lyrics writing experience.
by giving people the aid of AI to help them come up with ideas
when they might have writers' blog.
They don't really know what to do,
for example, like me at this current moment.
And so now that I have like the genre and like lyrics,
I can hit create.
And that will chew up this creation.
Yeah.
And while that goes, I just,
I mean, thinking about this,
because whenever I sit down to use UDio or similar product,
I tend to think in not genre term,
but in terms of bands that I love and kind of how they approach to the world.
And I'm kind of curious when you're using UDO,
do you tend to stick more towards like rock,
or do you get a hyper-specific, like,
make me a rock song with a touch of, I don't know,
Tom Petty or something like that,
because you can pull in different influences.
Yeah, so I would say that I usually just stick with genre information,
but for users who have a specific artist,
mind. We provide functionality for a user to type in the name of the artist and we look up the style for the artist. So we don't actually put the name of the artist anywhere in the prompt because we don't want to create something that sounds exactly like that artist. But then for example, when you type in like Led Zeppelin, it will like replace Led Zeppelin with the list of genres like he is that they are like, you know, like commonly associated with. So like, you know, like maybe hard rock, maybe male vocal,
vocalist, 70s, just things like that.
Okay, so if I put in, I mean, this is again a niche genre, but like,
periphery, it's going to think progressive metal, guitar forward, male vocals,
so it'll essentially atomize an artist's name.
So essentially, then artists become shorthands for genre and style.
That's correct. And we try to make sure that like the generally outputs
are definitely influenced by the style of that artist,
But it's not that exact style, because that's one thing that we really do want to avoid.
And we are dancing around the lawsuit here, and I'm trying to deliberately ask questions you can't answer.
But yeah, thank you for answering that. Can we play this? Let's hear it.
Yeah, sure. So this is the first example that came out.
It's better than something that I could write. So shout out to that.
And then, for example, you can then like add additional descriptors to this.
So maybe not just rock,
maybe you want to make it a minor.
And then you hit create again and then
then cues it up.
And so you,
I'm curious about this,
because there is a little bit of time that goes through
from when you click create to when UDio gives you the song,
which, by the way, to me, is no big deal.
But it does seem to be variable.
David, so what makes it a longer or shorter calculation process
on the UI side?
Yeah, so on the UI side, so we submit the request over to the server,
and the server reads the prompt like Rock A minor
and tries to figure out what to do with it before it sends it to the model.
So we have some processing that goes on.
We also run checks for every sound that gets submitted.
We take the lyrics and we do a copyright check to make sure that the lyrics are not copyrighted lyrics.
and this is something that's probably a little bit overly strict right now.
There are a lot of public domain songs or things that should not be,
that are not really copyrighted that gets flagged,
but we want to earn on the side of caution rather than not flagging something that's actually copyrighted.
Okay, now we have this new song, same idea, but now in A minor.
Let's listen to the new song, Chasing the Pulse.
Let's see. I'm not sure if you have a perfect pitch or not,
but like, it's a little bit hard for me to tell,
It's definitely a minor key.
David, I'm not going to lie.
I do not have perfect pitch.
Indeed, if you had ever heard me sing a bedtime song,
you would think to yourself,
that guy plays music because it doesn't sound like it.
So I can't tell if it's a minor or not.
It did sound minor.
But what hit me, though, is when the,
in the chorus or maybe it was the bridge,
when the harmony voices came in,
wasn't on the first note.
They came in a little bit later,
which feels very stylistic.
and therefore, and I mean this in the best possible sense, like human.
It felt like something that that would be a music editorial decision that a human can make to say,
hey, we're going to have lead singer and then the harmony come in and delay.
I don't know.
I'm always a little bit torn with this between going, this is the coolest thing I've ever seen.
And, oh God, are humans going to lose?
Just because, you know, I've sat in the guts of a symphony as we took Beethoven's fifth out of the studs and rebuilt it.
I don't want a future in which we lose that,
but at the same time,
I'm going to click these buttons 100 times
because it's a lot of fun.
So maybe I'm part of my own problem, I suppose.
Yeah, well, I think that people who play in bands,
they don't necessarily, they won't stop
just because there's this additional source of music.
One of our co-founders actually is involved in a band,
and he regularly toured the UK to,
to perform with his band.
And so I think, and he
continuously enjoys this, right?
Because it's just fun for humans to create music.
And we just want to, like, give people,
more people the opportunity.
Like, he can create music with his band.
But previously, he couldn't create music in his bedroom,
or, like, lying down, like, on his couch,
and then, oh, I want a song.
And, like, previously, what would you do?
Like, you can't even do anything.
And so now, this is possible.
Yeah.
And just because I'm going to be an enormous brat because I can.
Court, one of our fine producers here at Twist, has given me a prompt he would like us to try.
So, David, if you're up for it, in Zoom chat, there is a prompt entitled,
a jazzy, neo-noir, offbeat rap song about dinosaurs, which is evidence that Cort is
Gen X, but we'll leave that aside for now.
But can we give, can we give that a try?
Yeah, Jazzy, Neo-Noir.
offbeat
rap song
about dinosaurs
let's see what we get
the person who requested this song
for everyone who's listening to this later on
wasn't a punk band once
so there you go
this is what a punk fan
is going to put into the
the UDO generation process
all right
while this is waiting David
one question I had written down
just because you know
I love this sort of thing
what's the craziest song
that you guys have seen people come up with
because everything that I've done
thus far as been pretty standard
but I'm curious, has anyone like really
blown your head off?
Well, one of the examples of a song that
took off pretty unexpectedly
is a song called BB-O-Jizzy.
It's a song that one of our users
created. He himself
is not a musician, but he is a
comedian, actually. And so
he wrote like the funniest lyrics
and he used Edeo to turn this set of lyrics into a song
and it ended up being sampled from by Metro Boomin
who created a beat
part of like entire like Drake and Kendrick Field
and challenge people to create like a racist on top of this
and the funny thing is like Drake himself actually wrapped on top of it
it's pretty amazing watching it from sidelines
to see your tool be genuine
in most in pop culture.
We'll come back to that in a minute,
but I want to play everyone this song.
So here is the first sample clip
of jazzy neo-noir off-beat rap song about dinosaurs.
Hit it, David.
Bowman through the night for your start roar.
Lost in a city lights,
Zeno feet at the floor.
Bar so is rude.
T-racks on above.
Terradacto by rhythm digging in my soul.
Bones of the past.
We dance like where ageless history.
We breathe in jazzy night.
Boundless stages.
Actos in the alley.
shadows keep time
Jurassic jazz notes
And the moons climb
Dinosaurs in the urban globe
Rhythms are time
Let the ancient show
Underneath the story flow
Where the city
And wild collide
We go
I mean, I'm not gonna lie
That's not bad
That's not bad
Brontosaurus groove
T-rex on a roll
Teraductural fly
Rhythm Digging in my soul
That is
Actually probably better
Than some stuff
That I listen to on Spotify currently
Yeah, our lyrics are very peculiar.
It's an artifact of language model.
No, no, no, no.
I meant all that.
That was not sarcasm.
I mean, I never thought I'd see
Brontosaurus, T-Rex, and Teradactyl,
all within one rhyming couplet, essentially.
Yeah, no.
Again, I don't know if you answer this,
but is the model that writes the lyrics
the same model as what does the music
generation or are those two different models that then are brought together for this final
product that we just heard?
We use different models.
So the model that creates the music is a proprietary model that we trained because there's
nothing like that elsewhere.
And the model that writes the lyrics, we just use GPT actually.
Oh, simple enough.
Yeah.
Well, I mean, it works pretty well.
Okay.
I want to talk about 1.5 a little bit and then we'll wrap on virality.
So 1.5 came out back in July.
That brought key control, improved, I think it was global language,
and then also audio quality.
So how has been the reaction to 1.5?
And then what's next from UDO in the feature context?
Yeah, I think people were excited by the changes.
Like for better global languages,
a lot of our Chinese-speaking users
remarked how the model suddenly became a lot better
at producing lyrics that have
Chinese in them. Our
key control is definitely
a feature that people have wanted
for a very long time. And people
love the ability of specifying
the key and then modulating within the song.
So you can specify
a key for the first section. And when you extend
this section, you can then specify a different key and you can
kind of like, you know, specify your own
harmonic progression throughout the course of the song.
How long until that's like super visual?
Like I can imagine myself like, let's say as long as three minutes and I have like a line and I'm like, this chunk should be in A minor and be uptempo X, Y, Z and then I want six measures of this, that.
Like, does this become a visual tool versus just something that I prompt, at least in my experience with words?
Yeah, so this is something that going forward, we do want to make more visual over time.
So we recognize there are some deficiencies in our current interface that make it a little bit harder than necessary.
And so we want to do like
user research to figure out how best
to craft the interface
in a way that's intuitive for musicians.
Okay. So because musicians are already super
familiar with editing software and so forth.
So that kind of interface would be
a second nature to them.
Now, I want to talk about virality
because if we go back to when you guys
announced your fundraise,
I think Bloomberg reported,
and I have it somewhere in my notes here,
that you were seeing something like
10 songs created every
minute or something like that. I forget the exact
pace. But how
has the company been doing in usage terms
in the last couple of months and how much
bigger is it compared to that April, June
timeframe? Yeah, so it was actually like
10 songs every second, not
minute.
But people are
people still like super
engaged with the entire process. We have
a very dedicated
group of power users
who, you know, go on Discord
all the time and they
they share the songs that they have created.
There's actually a bit of a collaborative flow as well
where people work on lyrics and songs together.
And you see this because in the final output,
they will credit each other.
They will say, oh, this song created with the help of this other user.
And it's really fun to see people working on music in this way.
It's kind of like how people would jam together in the past, right?
And well, people still jammed together.
But now we have another way for people to jam together.
And I think this is also part of what music is about,
like bringing people together with a common passion.
I agree with that entirely.
I was just thinking that, you know,
not everyone now has to go to a jam room,
which means that they don't have to get hearing loss.
Like I did growing up in my ska band,
which rest in peace did not make it big
and turn us all into multi-millionaires.
I'm sad to say.
Now, on the virality point, you mentioned a Discord,
you mentioned power users.
I learned about you guys from a friend,
but I'm kind of curious,
is the product here inherently viral
because, you know,
I was sent a song about me and my friends,
I like to use it,
and then I've been playing with this,
or any to people.
And so I'm just kind of curious
if that limits your sales and marketing costs
because people are almost taking your product
to their own networks ambiously.
Yeah, I mean,
most of our growth,
almost all of our growth is we are completely organic channels
where people just like,
sharing amazing outputs that they have,
and then people are asking each other,
how do you do that,
and then, like, are spreading this way.
So how has been, like, registered user growth at the company?
Is it still as quick as it was before in, like, percentage terms or gross number terms?
How should they think about growth at the company, essentially?
Yeah, I mean, obviously, there was a very large initial spike when we launched,
but we do still see, like, steady growth every single month.
and we believe that once we launch new versions of the model,
there will be renewed excitement
and as people find new ways of controlling the outputs
and using the model for their own production needs.
Okay. All right.
Well, I mean, I'm going to be watching with very close eyes
because I'm a user and now a customer.
But one thing I've heard from VCs lately,
I think Sarah Taville from Benchmark wrote about this,
and she said that a lot of the AI,
the big AI model companies, the open AIs and so forth,
are going to go kind of up stack in time.
And so that startups, not yours,
but some startups that do build products using well-known commercial models,
for example, might eventually get supplanted by their model provider,
essentially going upstack and taking their lunch.
Are you at all worried as a company at one of the larger model companies,
a mistral and anthropic and open AI?
I'm going, hey, music is cool.
We should do that too.
and then kind of bulldozing into your market.
Yeah, so we think that inevitably in the future
there will be more companies who enters this music creation space.
We believe that music is sufficiently different from text,
and there's a significant product element as well.
You want to have the right interfaces for people to interact with these models.
So like a chatbot, kind of like,
like chat GPD is probably not the right interface
for people who want to create music.
And so I think there's actually like an open question
on how to best produce this type of product.
It's something that we're working towards.
It's a tight coupling between the model and the product
and like getting the right level of controls in the model
so that you can expose them to the user in an intuitive way in the product.
Okay, and then just to wrap things up here, David.
I just want to ask you one more thing before I let you go,
which is, you know, I think about UDO and Suno
is the other company that I think people best know in your space.
And so I'm kind of curious,
where do you see UDO today in comparison to Suno
and how many of their engineers are you currently trying to poach?
So I think, like, we want to position ourselves
as allies to artists and soundwriters and producers.
We want to focus on, like, giving them the highest quality tools available.
So instead of focusing on, like, meme songs,
in particular, we want to focus on
the really powerful creation course
to help creatives
like make music and make
high quality music that they're proud of
and maybe eventually they want
to incorporate in their
other music workflows.
That was a very, very deaf to non-answer,
but
let me take another run at this.
Do you consider UDio's music model
to be the best in the market today?
I would say so, yes.
It's the only model that
that produces stereo music
at like 44 kilohertz
sampling rate.
It's a lot higher fidelity
than any other music model that's out there.
It has a better understanding of
genre as like almost any other music model.
Okay, I'll take that.
I do want to have you back, though,
in I was going to say a year,
but given that you launched the product in April
and it already feels like we've gone through
two generations, probably sooner than that,
because I'm curious to see
how fast things improve, how competition evolves.
And if you guys do decide to go back out into the market,
because I think that given your traction, early monetization,
and so forth, you should be able to raise more.
So it'll be very curious.
But David, thank you so much for coming by Twist.
I really appreciate the information and the notes.
And thank you for making a new tool for me to play with,
because I absolutely love music.
Yeah, thank you for hosting the podcast.
It's very fun.
All right, everybody.
Twist is back.
We do live news.
If you're not with us on YouTube, see you there.
or also in every single podcast platform you can possibly find,
and we are always trying to find the best and most interesting founders
to explain the market as it is.
This has been UDio, David Deng, and Alex.
Hey, we're out of here.
