The AI Daily Brief: Artificial Intelligence News and Analysis - The Frontier of AI-Generated Music Models
Episode Date: August 3, 2023On today's episode NLW explores Meta's newly launched AudioCraft compared to Google MusicLM. Before that on the Brief, Goldman Sachs predicts AI investment will reach 4% of GDP by 2025; a 9-figure inv...estment for an Nvidia competitor; new Alibaba AI models and more. Today's Sponsor: Supermanage - AI for 1-on-1's - https://supermanage.ai/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Breakdown, we're comparing Google's MusicLM with Meta's just-released AudioCraft.
Before that on the brief, major fundraising in the AI chip space, and Goldman Sachs makes a big prediction when it comes to AI and GDP.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our Discord, our YouTube, and our newsletter.
Welcome back to the AI Breakdown Brief.
All the AI headline news you need in around.
five minutes. If you'd like to follow along, go to the AIbreakdown.Behive.com. You can also find that link
at breakdown.network. Today, we kick off with a story that is another report with some big prognostications.
This one comes from Investment Bank, Goldman Sachs, and they say that by 2025, AI investment could
represent up to 4% of US GDP. This comes from the article, AI investment forecast to
approach $200 billion globally by 2025 that was published a couple days ago.
go this week. Now, even in a world of Bumbastic AI reports, this one is bombastic. The article begins,
Innovations in electricity and personal computers unleashed investment booms of as much as 2% of US GDP
as the technologies were adopted into the broader economy. Now investment in artificial intelligence
is ramping up quickly and could eventually have an even bigger impact on GDP, according to Goldman
Sacks' economics research. So first, let's look at what they're estimating those investment numbers in
AI are in the U.S., in China, and across the world. In 2021, the world saw $93.54 billion of investment
in AI. That actually came down slightly in 2022 to 91.9. But in 2023, Goldman estimates that the
world global investment in AI will equal $110.19 billion. Of that, they estimate the U.S. will
represent $56.83 billion, and China will represent $24.74 billion. By 2024, Goldman sees the
world number going up to 132 billion, with the U.S. at around 68 billion, and China at around 30 billion,
and the lines just go up from there. Part of what Goldman's argument rests upon is the fact that
people are understanding, widely speaking, just how much labor productivity could be increased by
generative AI across a huge number of different industries and sectors. However, as Goldman puts it,
for large-scale transformation to happen, businesses will need to make significant upfront investment
in physical digital and human capital to acquire and implement new technologies,
and reshape business processes. So effectively what they're saying is that there will be this
period of transition where a huge amount of upfront money will have to be spent in order to retrofit
the business world for the AI era. That means investing in actual infrastructure, which might
mean things like compute infrastructure, but it also means reskilling employees, which of course
comes with a price tag as well. When you view this $200 billion number, not as some generically
inflating investment amount, but representative of a key transitional period, I think it starts to look
a lot more realistic. To give evidence of that, they point to previous tech-driven productivity booms,
which they say have been driven by large investment cycles. In the case of both electricity
and personal computing, Goldman shows how increases in investment in the underlying infrastructure.
In the case of personal computing, that means information processing equipment and software investment,
and in the case of electricity, that means manufacturing equipment and plan investment, led then with a
of a few years or a half decade to productivity growth that broadly speaking mirrored that level of
investment. Goldman writes, AI-related investment is climbing from a relatively low starting point and
will likely take a few years to have a major impact on the economy. Over the longer term,
AI-related investment could peak as high as 2.5 to 4% of GDP in the U.S. and 1.5 to 2.5% of GDP
of other major AI leaders. They also point out that there has been a seminal shift just recently.
Goldman writes, even though it will take time for AI to boost productivity, market interest in
AI has already increased rapidly, with more than 16% of companies in the Russell 3,000,
mentioning the technology on earnings calls up from less than just 1% of those firms in 2016.
Roughly half of that spike came after the release of Chad Sheapit in the fourth quarter of
2022. For those of you who are listening rather than watching, Goldman's chart of the percentage of
companies that are mentioning AI in their earnings calls looks like the type of up-into-the-right graph that
gets VCs excited to get up in the morning.
Now, what about who benefits from this $200 billion of new spend?
Goldman says that they expected to be concentrated in four key business segments.
The first is companies that train and develop AI models.
The second is those that supply the infrastructures, i.e. data centers.
The third is companies that develop software to run AI-enabled applications.
And the fourth is enterprise end users that pay for those software and cloud infrastructure services.
Now, when it comes to CEO expectations around how AI will impact labor,
Over the next year, a little over 20% of Fortune 500 CEOs surveyed
said that they thought AI would decrease labor needs,
while over 40% said that they thought that labor needs would be unchanged.
However, zooming five years out,
over 70% of Fortune 500 CEOs said that they anticipated that lower labor would be needed,
while under 20% said that labor would be unchanged.
The TLDR on this report is just another example of a major financial institution
that thinks we are just at the very beginning of a major transformative AI cycle.
Speaking of AI-related investment, next up on today's AI breakdown brief, we are looking at yet another firm that is trying to, if not dethrone Nvidia in the AI chip space, at least provide some competition.
Reuters reports that AI chip firm Ten Storrent has raised $100 million in fresh capital from investors including Hyundai and Samsung.
Previous to this funding, Ten Storrent had raised $234.5 billion and was already in the Unicorn Club, and that interest has been driven in part by the fact that it's led by chip industry veterans.
Jim Keller, who previously developed chips for companies including Tesla, Intel, and Apple.
Over in the world of LLMs, Alibaba Cloud has released two open source models to compete with
Metas Lama 2. Now, interestingly, a couple weeks ago, Alibaba also announced that it would be
supporting Metas Lama 2, and so it appears from this news that Alibaba is not putting all of its
chips in any one basket, even its own. It is worth noting, however, that we kind of have to put
open source in brackets, as it's similar to Lama 2's version of open source, which one might consider
as mostly open source, however, with a few different restrictions. In the same way that meta set
some limits around needing special approvals and permissions, should monthly active users exceed
a certain number, Alibaba's Quen 7B models come with a similar restriction. Finally, today, if there
was any doubt where Google search is moving, the company has just released a number of updates for
their experimental search generative experience. On Wednesday, August 2nd, they released.
least a blog post called three new things you can do with generative AI in search. Now remember,
the whole point of SGEE is that in addition to Google's classic set of little blue links from all
around the web, there is a top section that is generated by AI that brings together information
that perhaps lives within those links, but is custom curated by Google's artificial intelligence.
The biggest update this week is a move into the world of multimodality. As Google writes,
sometimes it's more powerful to understand something by seeing it. So we recently brought
images to even more AI-powered overviews. For example, when you search for something like
tiniest birds of prey, you'll quickly be able to reference what the bird looks like and get relevant
information from the web. And over the next week, you'll begin to see videos within some
overviews where it's helpful to see something in motion, such as a demonstration of a yoga pose
or how to get stains out of marble. Now, outside of that major substantive update,
Google has also increased the speed with which results populate, saying that they've reduced
the time it takes to generate AI overviews by half, and they're also adding small features like
making sure that you understand when different links that are being recommended were published
to help you decide which little pathways you want to follow down.
There's an interesting bifurcation happening right now
where the announcements from companies like OpenAI and meta around generative AI
tend to be big and technical and have implications for the way that the field develops.
Google, on the other hand, is extremely focused, it seems,
on the productizing of AI for regular people.
In other words, videos showing up in the SGE experience may not be as big of an announcement
as something like code interpreter,
but it might be relevant for a whole lot more people.
Anyways, guys, that is going to do it for today's AI breakdown brief.
Appreciate you listening or watching,
and I'll be back soon with the main AI breakdown.
Before we get into the main AI breakdown,
I want to tell you about today's sponsor, Supermanage.
If you work in a professional setting,
you probably have some version of a one-on-one meeting,
either with the people that work for you or the people that you work with.
Unfortunately, all too often,
those one-on-one meetings become glorified catch-up calls.
Don't you wish you could jump right to the stuff that really matters?
That's where SuperManage comes in.
Supermanage AI magically distills your team's public Slack channels
into a real-time brief on any employee, any time.
Catch up on contributions, work in progress, challenges they're facing, sentiment,
everything you need to show up ready for a truly meaningful conversation.
And it's completely free.
Visit supermanage.aI forward slash breakdown today
to start making the most of your one-on-ones.
And thanks again to Supermanage for sponsoring the AI breakdown.
Welcome back to the AI breakdown.
I think when we look back at the cultural battles that are fought around generative AI,
one of the front lines is going to be in the realm of music.
Music has always had a particular place when it comes to the disruption of new technologies.
And in many ways, the introduction of Napster and the battle that ensued with music lawyers
made that industry more prepared than many others
for what would come over the next couple decades of technology.
I don't think it's an accident that music is the industry
where the incumbent power, i.e. the record labels,
probably maintains more control relative to the startups
than just about any other space.
What's more, we've already seen AI start to cause
that sort of immune response once again.
Earlier this year, hard on my sleeve,
which of course was the beat that you heard over the intro of this episode,
if you're listening as a podcast,
really scared the hell out of music executives because of how good it was.
It was an AI version of Drake and an AI version of the weekend,
and the song was an absolute banger,
going completely viral getting millions and millions and millions of streams and downloads,
before effectively the entire music industry freaked out
and started throwing legal weight around like crazy,
getting it pushed off of platforms.
But of course, it's a digital artifact,
and it's still all over YouTube and everywhere else.
More playfully, but no less significantly,
we are constantly getting AI remixes like this one of Frank Sinatra singing Little Johns Get Low
that become sensations on TikTok or YouTube or wherever they're premiered,
and serve if nothing else to remind us just how good this technology is.
Earlier this year, Google released research on something that they called MusicLM.
In the same way that a mid-jurney or a stable diffusion is a text-to-image generator,
MusicLM was meant to be a text-to-music generator.
For example, here's audio generated from the prompt,
the main soundtrack of an arcade game.
It is fast-paced and upbeat,
with a catchy electric guitar riff.
The music is repetitive and easy to remember,
but with unexpected sounds,
like symbol crashes or drum rolls.
Now, part of what made music L.M.
exciting is that Google actually made it available
in its AI test kitchen.
For those lucky enough to get access,
yours truly included,
you can play around with prompting music LM
and seeing how it does.
Here's an example of the prompt,
a simple classical melody evoking the feeling of fall,
emphasis on a melodic top line played by flute.
Now, of course, the example I gave has an inherent element of subjectivity,
given that I asked it to evoke the feeling of fall,
but both of these options that it came up with did a pretty good job of,
one, being a simple classic melody,
and emphasizing a melodic top line that was played by the flute.
Frankly, subjectively, too,
I think they did an okay job with this evoking the feeling of fall piece.
The way that the test kitchen works is once you've got your two examples,
you give the one that you think did a better job, a trophy,
to help improve the model. In this case, I think the second did a slightly better job.
Let's try another. A driving 1980-style electronic synthwave track and a minor key that sounds like
it might be a dramatic interlude in a video game. Both of these frankly nail it, I guess I'll
give the trophy to the first just because I kind of like it better. And I think this is exactly
what gets me so excited about this type of technology, is that music is so much about a feeling
and a vibe, that being able to describe not just an instrumentation and a key, but an emotional
register and start to see even these very nascent examples actually achieve that is pretty
remarkable.
Well, just this week, Google's MusicLM got some competition, and that is Meta's AudioCraft.
Like MusicLM, AudioCraft is a way to generate audio and music from simple text prompts.
Meta's announcement post writes, imagine a professional musician being able to explore new compositions
without having to play a single note on an instrument,
or a small business owner adding a soundtrack
to their latest video ad on Instagram with ease.
That's the promise of AudioCraft,
our latest AI tool that generates
high-quality, realistic audio and music from text.
Now, interesting, audio craft isn't actually just one model.
Instead, it's a combination of three models
called MusicGen, AudioGen, and Encodec.
MusicGen, as you might guess,
is a music generation model.
Meta writes, music tracks are more complex
than environmental sounds,
and generating coherent samples on the long-term structure is especially important when creating novel music pieces.
Now, AudioGen is for the generation of audio that isn't necessarily music, but is perhaps environmental.
Now, encodec is a little bit different.
They call it a state-of-the-art, real-time high-fidelity audio codec leveraging neural networks.
Encoed is trained specifically to compress any kind of audio and reconstruct the original signal with high fidelity.
And of course, when it comes to these tools, what we really want to know is how it actually sounds.
You had a chance to hear the music LM version, so let's try those same two prompts in the hugging space demo environment for audio craft.
So here we go, a simple classical melody evoking the feeling of fall, emphasis on a melodic top line played by flute.
Not bad. And now let's try a driving 1980-style electronic synthwave track in a minor key that sounds like it might be a dramatic interlude in a video game.
Again pretty good, although maybe a little bit more major key than minor key there.
Perhaps unsurprisingly, I think when it comes to which of these tools is better, that may, in spite of the name,
of this episode not really be the most important question. The more important question is likely to me
to be how fast are people going to put this code into end-user experiences that musicians can actually
get their hands on? In my conversation with musician friends, there is a lot of mixed feelings
about this stuff. On the one hand, it's scary. People are worried about their skills being commoditized
and reduced to something that a computer does with the press of a button. At the same time, there's a lot
of excitement and enthusiasm to get their hands on these models and actually start to start to
try to use them as part of composition. My strong, strong feeling is that in the same way that there are
tens of thousands of times the number of people who can play guitar or play piano or sing, as there are
people who can take those skills and turn them into great songwriting or composition, that just making
it easier for anyone to dabble with music and audio creation isn't going to undermine the fact
that the best outputs are going to likely still come from the same people who are creating music
right now. This is obviously an extremely nascent part of the AI space, and I can't wait to see how it
develops. That's going to do it for today's AI breakdown. Come join us on the AI breakdown Discord,
drop in your best creations with these tools, and until next time, peace.
