The AI Daily Brief: Artificial Intelligence News and Analysis - The Frontier of AI-Generated Music Models

Starting point is 00:00:00 Today on the AI Breakdown, we're comparing Google's MusicLM with Meta's just-released AudioCraft. Before that on the brief, major fundraising in the AI chip space, and Goldman Sachs makes a big prediction when it comes to AI and GDP. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, our YouTube, and our newsletter. Welcome back to the AI Breakdown Brief. All the AI headline news you need in around. five minutes. If you'd like to follow along, go to the AIbreakdown.Behive.com. You can also find that link at breakdown.network. Today, we kick off with a story that is another report with some big prognostications.

Starting point is 00:00:43 This one comes from Investment Bank, Goldman Sachs, and they say that by 2025, AI investment could represent up to 4% of US GDP. This comes from the article, AI investment forecast to approach $200 billion globally by 2025 that was published a couple days ago. go this week. Now, even in a world of Bumbastic AI reports, this one is bombastic. The article begins, Innovations in electricity and personal computers unleashed investment booms of as much as 2% of US GDP as the technologies were adopted into the broader economy. Now investment in artificial intelligence is ramping up quickly and could eventually have an even bigger impact on GDP, according to Goldman Sacks' economics research. So first, let's look at what they're estimating those investment numbers in

Starting point is 00:01:28 AI are in the U.S., in China, and across the world. In 2021, the world saw $93.54 billion of investment in AI. That actually came down slightly in 2022 to 91.9. But in 2023, Goldman estimates that the world global investment in AI will equal $110.19 billion. Of that, they estimate the U.S. will represent $56.83 billion, and China will represent $24.74 billion. By 2024, Goldman sees the world number going up to 132 billion, with the U.S. at around 68 billion, and China at around 30 billion, and the lines just go up from there. Part of what Goldman's argument rests upon is the fact that people are understanding, widely speaking, just how much labor productivity could be increased by generative AI across a huge number of different industries and sectors. However, as Goldman puts it,

Starting point is 00:02:18 for large-scale transformation to happen, businesses will need to make significant upfront investment in physical digital and human capital to acquire and implement new technologies, and reshape business processes. So effectively what they're saying is that there will be this period of transition where a huge amount of upfront money will have to be spent in order to retrofit the business world for the AI era. That means investing in actual infrastructure, which might mean things like compute infrastructure, but it also means reskilling employees, which of course comes with a price tag as well. When you view this $200 billion number, not as some generically inflating investment amount, but representative of a key transitional period, I think it starts to look

Starting point is 00:02:59 a lot more realistic. To give evidence of that, they point to previous tech-driven productivity booms, which they say have been driven by large investment cycles. In the case of both electricity and personal computing, Goldman shows how increases in investment in the underlying infrastructure. In the case of personal computing, that means information processing equipment and software investment, and in the case of electricity, that means manufacturing equipment and plan investment, led then with a of a few years or a half decade to productivity growth that broadly speaking mirrored that level of investment. Goldman writes, AI-related investment is climbing from a relatively low starting point and will likely take a few years to have a major impact on the economy. Over the longer term,

Starting point is 00:03:37 AI-related investment could peak as high as 2.5 to 4% of GDP in the U.S. and 1.5 to 2.5% of GDP of other major AI leaders. They also point out that there has been a seminal shift just recently. Goldman writes, even though it will take time for AI to boost productivity, market interest in AI has already increased rapidly, with more than 16% of companies in the Russell 3,000, mentioning the technology on earnings calls up from less than just 1% of those firms in 2016. Roughly half of that spike came after the release of Chad Sheapit in the fourth quarter of 2022. For those of you who are listening rather than watching, Goldman's chart of the percentage of companies that are mentioning AI in their earnings calls looks like the type of up-into-the-right graph that

Starting point is 00:04:18 gets VCs excited to get up in the morning. Now, what about who benefits from this $200 billion of new spend? Goldman says that they expected to be concentrated in four key business segments. The first is companies that train and develop AI models. The second is those that supply the infrastructures, i.e. data centers. The third is companies that develop software to run AI-enabled applications. And the fourth is enterprise end users that pay for those software and cloud infrastructure services. Now, when it comes to CEO expectations around how AI will impact labor,

Starting point is 00:04:48 Over the next year, a little over 20% of Fortune 500 CEOs surveyed said that they thought AI would decrease labor needs, while over 40% said that they thought that labor needs would be unchanged. However, zooming five years out, over 70% of Fortune 500 CEOs said that they anticipated that lower labor would be needed, while under 20% said that labor would be unchanged. The TLDR on this report is just another example of a major financial institution that thinks we are just at the very beginning of a major transformative AI cycle.

Starting point is 00:05:15 Speaking of AI-related investment, next up on today's AI breakdown brief, we are looking at yet another firm that is trying to, if not dethrone Nvidia in the AI chip space, at least provide some competition. Reuters reports that AI chip firm Ten Storrent has raised $100 million in fresh capital from investors including Hyundai and Samsung. Previous to this funding, Ten Storrent had raised $234.5 billion and was already in the Unicorn Club, and that interest has been driven in part by the fact that it's led by chip industry veterans. Jim Keller, who previously developed chips for companies including Tesla, Intel, and Apple. Over in the world of LLMs, Alibaba Cloud has released two open source models to compete with Metas Lama 2. Now, interestingly, a couple weeks ago, Alibaba also announced that it would be supporting Metas Lama 2, and so it appears from this news that Alibaba is not putting all of its chips in any one basket, even its own. It is worth noting, however, that we kind of have to put

Starting point is 00:06:10 open source in brackets, as it's similar to Lama 2's version of open source, which one might consider as mostly open source, however, with a few different restrictions. In the same way that meta set some limits around needing special approvals and permissions, should monthly active users exceed a certain number, Alibaba's Quen 7B models come with a similar restriction. Finally, today, if there was any doubt where Google search is moving, the company has just released a number of updates for their experimental search generative experience. On Wednesday, August 2nd, they released. least a blog post called three new things you can do with generative AI in search. Now remember, the whole point of SGEE is that in addition to Google's classic set of little blue links from all

Starting point is 00:06:51 around the web, there is a top section that is generated by AI that brings together information that perhaps lives within those links, but is custom curated by Google's artificial intelligence. The biggest update this week is a move into the world of multimodality. As Google writes, sometimes it's more powerful to understand something by seeing it. So we recently brought images to even more AI-powered overviews. For example, when you search for something like tiniest birds of prey, you'll quickly be able to reference what the bird looks like and get relevant information from the web. And over the next week, you'll begin to see videos within some overviews where it's helpful to see something in motion, such as a demonstration of a yoga pose

Starting point is 00:07:27 or how to get stains out of marble. Now, outside of that major substantive update, Google has also increased the speed with which results populate, saying that they've reduced the time it takes to generate AI overviews by half, and they're also adding small features like making sure that you understand when different links that are being recommended were published to help you decide which little pathways you want to follow down. There's an interesting bifurcation happening right now where the announcements from companies like OpenAI and meta around generative AI tend to be big and technical and have implications for the way that the field develops.

Starting point is 00:07:58 Google, on the other hand, is extremely focused, it seems, on the productizing of AI for regular people. In other words, videos showing up in the SGE experience may not be as big of an announcement as something like code interpreter, but it might be relevant for a whole lot more people. Anyways, guys, that is going to do it for today's AI breakdown brief. Appreciate you listening or watching, and I'll be back soon with the main AI breakdown.

Starting point is 00:08:22 Before we get into the main AI breakdown, I want to tell you about today's sponsor, Supermanage. If you work in a professional setting, you probably have some version of a one-on-one meeting, either with the people that work for you or the people that you work with. Unfortunately, all too often, those one-on-one meetings become glorified catch-up calls. Don't you wish you could jump right to the stuff that really matters?

Starting point is 00:08:44 That's where SuperManage comes in. Supermanage AI magically distills your team's public Slack channels into a real-time brief on any employee, any time. Catch up on contributions, work in progress, challenges they're facing, sentiment, everything you need to show up ready for a truly meaningful conversation. And it's completely free. Visit supermanage.aI forward slash breakdown today to start making the most of your one-on-ones.

Starting point is 00:09:07 And thanks again to Supermanage for sponsoring the AI breakdown. Welcome back to the AI breakdown. I think when we look back at the cultural battles that are fought around generative AI, one of the front lines is going to be in the realm of music. Music has always had a particular place when it comes to the disruption of new technologies. And in many ways, the introduction of Napster and the battle that ensued with music lawyers made that industry more prepared than many others for what would come over the next couple decades of technology.

Starting point is 00:09:40 I don't think it's an accident that music is the industry where the incumbent power, i.e. the record labels, probably maintains more control relative to the startups than just about any other space. What's more, we've already seen AI start to cause that sort of immune response once again. Earlier this year, hard on my sleeve, which of course was the beat that you heard over the intro of this episode,

Starting point is 00:10:01 if you're listening as a podcast, really scared the hell out of music executives because of how good it was. It was an AI version of Drake and an AI version of the weekend, and the song was an absolute banger, going completely viral getting millions and millions and millions of streams and downloads, before effectively the entire music industry freaked out and started throwing legal weight around like crazy, getting it pushed off of platforms.

Starting point is 00:10:25 But of course, it's a digital artifact, and it's still all over YouTube and everywhere else. More playfully, but no less significantly, we are constantly getting AI remixes like this one of Frank Sinatra singing Little Johns Get Low that become sensations on TikTok or YouTube or wherever they're premiered, and serve if nothing else to remind us just how good this technology is. Earlier this year, Google released research on something that they called MusicLM. In the same way that a mid-jurney or a stable diffusion is a text-to-image generator,

Starting point is 00:10:55 MusicLM was meant to be a text-to-music generator. For example, here's audio generated from the prompt, the main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like symbol crashes or drum rolls.

Starting point is 00:11:17 Now, part of what made music L.M. exciting is that Google actually made it available in its AI test kitchen. For those lucky enough to get access, yours truly included, you can play around with prompting music LM and seeing how it does. Here's an example of the prompt,

Starting point is 00:11:32 a simple classical melody evoking the feeling of fall, emphasis on a melodic top line played by flute. Now, of course, the example I gave has an inherent element of subjectivity, given that I asked it to evoke the feeling of fall, but both of these options that it came up with did a pretty good job of, one, being a simple classic melody, and emphasizing a melodic top line that was played by the flute. Frankly, subjectively, too,

Starting point is 00:12:34 I think they did an okay job with this evoking the feeling of fall piece. The way that the test kitchen works is once you've got your two examples, you give the one that you think did a better job, a trophy, to help improve the model. In this case, I think the second did a slightly better job. Let's try another. A driving 1980-style electronic synthwave track and a minor key that sounds like it might be a dramatic interlude in a video game. Both of these frankly nail it, I guess I'll give the trophy to the first just because I kind of like it better. And I think this is exactly what gets me so excited about this type of technology, is that music is so much about a feeling

Starting point is 00:13:48 and a vibe, that being able to describe not just an instrumentation and a key, but an emotional register and start to see even these very nascent examples actually achieve that is pretty remarkable. Well, just this week, Google's MusicLM got some competition, and that is Meta's AudioCraft. Like MusicLM, AudioCraft is a way to generate audio and music from simple text prompts. Meta's announcement post writes, imagine a professional musician being able to explore new compositions without having to play a single note on an instrument, or a small business owner adding a soundtrack

Starting point is 00:14:22 to their latest video ad on Instagram with ease. That's the promise of AudioCraft, our latest AI tool that generates high-quality, realistic audio and music from text. Now, interesting, audio craft isn't actually just one model. Instead, it's a combination of three models called MusicGen, AudioGen, and Encodec. MusicGen, as you might guess,

Starting point is 00:14:42 is a music generation model. Meta writes, music tracks are more complex than environmental sounds, and generating coherent samples on the long-term structure is especially important when creating novel music pieces. Now, AudioGen is for the generation of audio that isn't necessarily music, but is perhaps environmental. Now, encodec is a little bit different. They call it a state-of-the-art, real-time high-fidelity audio codec leveraging neural networks. Encoed is trained specifically to compress any kind of audio and reconstruct the original signal with high fidelity.

Starting point is 00:15:10 And of course, when it comes to these tools, what we really want to know is how it actually sounds. You had a chance to hear the music LM version, so let's try those same two prompts in the hugging space demo environment for audio craft. So here we go, a simple classical melody evoking the feeling of fall, emphasis on a melodic top line played by flute. Not bad. And now let's try a driving 1980-style electronic synthwave track in a minor key that sounds like it might be a dramatic interlude in a video game. Again pretty good, although maybe a little bit more major key than minor key there. Perhaps unsurprisingly, I think when it comes to which of these tools is better, that may, in spite of the name, of this episode not really be the most important question. The more important question is likely to me to be how fast are people going to put this code into end-user experiences that musicians can actually

Starting point is 00:16:26 get their hands on? In my conversation with musician friends, there is a lot of mixed feelings about this stuff. On the one hand, it's scary. People are worried about their skills being commoditized and reduced to something that a computer does with the press of a button. At the same time, there's a lot of excitement and enthusiasm to get their hands on these models and actually start to start to try to use them as part of composition. My strong, strong feeling is that in the same way that there are tens of thousands of times the number of people who can play guitar or play piano or sing, as there are people who can take those skills and turn them into great songwriting or composition, that just making it easier for anyone to dabble with music and audio creation isn't going to undermine the fact

Starting point is 00:17:07 that the best outputs are going to likely still come from the same people who are creating music right now. This is obviously an extremely nascent part of the AI space, and I can't wait to see how it develops. That's going to do it for today's AI breakdown. Come join us on the AI breakdown Discord, drop in your best creations with these tools, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The Frontier of AI-Generated Music Models

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.