The AI Daily Brief: Artificial Intelligence News and Analysis - The 5 Important AI Models Released This Week

Episode Date: April 11, 2024

This week witnessed the introduction of five notable AI models, highlighting the field's swift advancements. Udio emerged as a standout music generator, rivaling the previously leading Suno. Updates a...nd new releases included Mixtral 8×22B, Google's publicly available Gemini 1.5 Pro with enhanced features, OpenAI's improved GPT-4 Turbo, and anticipated versions of Meta's Llama 3. ** CHECK OUT THE JUST-LAUNCHED SUPERINTELLIGENT PLATFORM - 300+ AI video tutorials https://besuper.ai/ ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Breakdown, we're looking at five new important models that have launched just this week. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, or Discord, and our newsletter. Hello, AI friends. A quick note before we get into today's show, if you've been listening, of course, you will know that we launched Super Intelligent yesterday. And for those of you who don't know, superintelligent is our new platform for teaching people how to use AI. We've got more than 300 fun, fast, and most importantly useful, practical video tutorials. Each of them has a companion project, which is a set of step-by-step instructions that allow people to actually use the tools and create the things that are being created in these tutorials.
Starting point is 00:00:52 And we're adding between 30 and 50 new video tutorials each week. That has just gone live to the public. It is $20 a month for unlimited access. If you're interested, check it out at B-Super.a.i or email me NLW at B-Super.Ai if you have questions. it was directly inspired by conversations that I had with all of you AI breakdown listeners, and so I'm really excited for you to check it out. Because of that launch, though, this week has been a little bit chaotic, and so today, instead of our normal brief-in-dom episode format, we're actually doing a bit of a countdown of the five important models that have been launched this week,
Starting point is 00:01:22 including four LLM models as well as one music model that has everyone buzzing. So without any further ado, let's dive in. Welcome to the AI breakdown. Everyone anticipated, I think, that 2024 was going to be quite the year when it came into AI competition and technical advances. And this week is a great example, a little snapshot that shows just what passes for normal in 2024. We're going to look at five new models that were released this week or updated in some cases, four of which are LLMs, but the first of which is a new music generator called UDO that has absolutely captured the AI Twitter sphere's attention.
Starting point is 00:01:58 For the last couple of months, the big hot player in AI music has been Suno. We've talked about them here on the show, and they really represented a major advance from things that we had seen before. However, a couple of weeks ago, we started seeing little drips and drops that there was something even better coming. Last week, the first contraband examples of this music generator's output started to be published to X, and this week we finally got the service itself, which again is called Udio. Now, just to give a quick example of what this thing can do, let's listen to a snippet of Dune, the Broadway musical. Pretty amazing stuff.
Starting point is 00:02:35 Consequently, there have been a ton of posts like this one from Min Choy, which reads, This is Wild. UDio just dropped and it's like Sora for music. The music are insane quality, 100% AI. I thought it was interesting, though, when Bilowal's C2 tweeted, anyone else noticed this pattern with new AI features. One, cool new tool drops. Two, early adopters rake in insane views because it's novel. Three, everyone jumps on the bandwagon.
Starting point is 00:02:57 Four, overuse equals oversaturation equals no one cares anymore equals views drop. Five, it always comes back to the creators making something truly unique. It's the quintessential hype cycle in three months, and it only seems to be compressing further. It's Suno now, but Sorrel will be the same when it hits GA. If anyone can do it on demand, it loses value because it's not special anymore. At that point, it's back to creative self-expression, except now you're creating at a higher level of abstraction with your newfound superpowers to make something unique that cuts through the noise.
Starting point is 00:03:23 Now, hold aside in these specific examples of whether this is happening to Suno or will happen to Udeo. I think it gets at a profound truth about the world that we're moving into. In a world where so much can be created so easily, the quality of what actually breaks through and captures attention is going to consequently have to go up incredibly. It is extremely unlikely to me that the best songs that come even from a sophisticated application like UDO are going to come from a random person on Twitter who just happened to stumble into it versus someone who is probably a musician or a songwriter already adapting to and mastering a new medium.
Starting point is 00:03:56 Now, while by and large, most of the folks that I've seen have ranked UDO ahead of Suno at this point, developer Nick Dobos disagrees. He writes, UDio has better audio quality, but I haven't found a single song I've liked, missing a certain something-something. Meanwhile, I have around five Suno songs stuck in my head right now. Another AI creator Boris followed up and said same. Suno generates complete bangers. UDO? Interesting snippets, no control over the lyrics to experimental styles in general. Getting at the broader point that I was just making, Nick Dobos also tweeted, With the rise of Gen A.I. We are going to see an interesting clash of cultures as new creatives start and learn with Gen A.Tools first. A big cohort of coders, visual artists, musicians and video makers will go mid-Jurney, Dolly, Suno, and Udio, runway before Photoshop, Canada Final Cut, Ableton, and Fruitiloups. Expect some wildly different styles as this younger cohort learns creativity at a higher abstraction level, and then learns traditional media and productivity tools backwards. Also expect 90% of the previous cohort to cry and whine about it, this isn't real art. The other 10% will embrace the new tech and build the most amazing things you ever seen by combining new and old techniques. I could not agree more that this is likely to be the
Starting point is 00:04:59 pattern. And whatever you think about this, if you haven't had a chance to play with UDO yet or go listen to some of the creations, it is highly worth taking a few minutes to do so. From there, let's move into our LLM announcements. The first wasn't really an announcement at all. Mistral, as they have classically come to do, just dropped a link to a release of a new model on Twitter without any further explanation. The model is called 8X-22B, M-O-E, and Gigazine writes, Although details are unknown, 8X22BMOE may have more than three times the number of parameters of the model Mistral 8x7B, which has been shown to outperform GPT 3.5 and Lama 270B in many benchmarks. They also add, the total number of parameters may be up to 176 billion.
Starting point is 00:05:39 The context length that can be handled is said to be 65K. Now, the open source community is really excited about this one, not only because it appears that it might have increased capacity, but also because the last mistral model that was announced was their first closed source model, which was to be distributed exclusively through Microsoft. At the time, I talked about how most of the people that I saw in the open source community were willing to not just give them the benefit of the doubt, but understand that they had to fund the business somehow, but still, seeing them continue to advance on the open source side
Starting point is 00:06:05 has a lot of people breathing a sigh of relief and staying up all night to start hacking. Today's podcast is brought to you by Plum. Is your product team struggling to keep up with the incredible pace of AI development? Are you tired of spending countless engineering hours just to test out small prompt changes in your product? Thankfully, there's Plum. Build cutting-edge AI experiences for your users in a fraction of the time. Say goodbye to the slow, tedious process of hand coding and hello to the future of AI development. Get ahead of your competition and start moving as fast as AI does.
Starting point is 00:06:36 Check out Useplum.com and shoot me a message to get early access. Another surprise release a little bit earlier in the year came from Google. You'll remember that back in December, facing down intense pressure to try to keep up or catch up with OpenAI, Google announced its suite of Gemini models. The problem with that announcement was that their biggest and most performant model, the one that they said actually beat GPD4 on many benchmarks, wasn't going to be available until sometime in 2024. When that ultra model did finally come out, Google surprised everyone by just about a week later announcing Gemini 1.5 Pro, an even more advanced model that most notably was said to have a 1 million token context window, completely blowing the doors off of everything we had seen
Starting point is 00:07:17 before. At the time, Gemini 1.5 Pro was only available to developers through Google's AI Studio, but this week, Gemini 1.5 Pro has moved into a public preview period and can be accessed via the Gemini API. In addition to just being more widely available, they've also added native audio or speech understanding capabilities, as well as a file API to make it easy to handle files. Their announcement post writes, we're also launching new features like system instructions and JSON mode to give developers more control over the model's outputs. DDI at Menlo Ventures writes, Gemini 1.5 Pro's video understanding is the most underrated thing in AI. In 50 seconds it quote-unquote saw an 11-minute YouTube video, around 175K tokens of the most iconic moments in sports,
Starting point is 00:07:59 and was able to, perfectly, to my knowledge, list all 18 of the moments. There is no other video AI this good. What D.D. is getting at is one of the reasons that people have been so excited about Gemini 1.5 Pro in its longer context window is that it really does open up totally new use cases that just aren't possible with different types of input lengths. Not to be totally outdown, open AI announced what they called a, quote, majorly improved GPT4 turbo model, first available through the API and slowly rolling out into chat GPT as well. Now, of course, when a lab calls something majorly improved, it opens it up for people to ask, is it majorly improved? Professor Ethan Mollock writes, as is usual with AI, a, quote, majorly improved GPT4
Starting point is 00:08:37 model comes with no real change logs or release notes. It's going to be better at many things and worsen some other things and also different in some ways you aren't expecting. Or that just might be in your head. AI is weird. He goes on, benchmarking is especially hard when we don't even agree as to what words involve mean. I believe that reasoning has been improved, but I'm not sure what that actually translates to. The only way to figure out is to put in hours to test yourself, question mark. Still, from a marketing standpoint, OpenAI is going hard on what the new capacities mean. They shared a thread on Twitter slash X, where they showed off use cases of the new GPT4 Turbo that include that very impressive Devin AI Software Engineering Assistant. They pointed to a nutrition app that
Starting point is 00:09:15 used GBT Turbo 4 with Vision to get better insights about what was in food, and they talked about TL Draw, which is an incredibly advanced AI-powered UI designing tool that's gotten a lot of buzz on Twitter recently as well. Now, it appears that one of the things that OpenAI is excited about is the increase in co-generation performance with this new GPT4 Turbo. Some are skeptical, however. Benjamin DeKracker, for example, tweets, this is why OpenAI staff have been acting all cheeky. Newest GPT4 Turbo update is doing very well on coding benchmarks, very well indeed. Here's the thing. I don't really believe it. For example, these also show the old version of GPT4
Starting point is 00:09:49 beating Claude 3 Opus at Code Generation. In my real-world experience, that's not true. Opus has been significantly better. TLDR, the upgrade is probably good, but I'm not convinced it's actually back at number one for code. Real world is what matters, which is always hard to measure. Now, on the flip side is Pietro Sherano, who writes, side-by-side comparison between the latest version of GBT4 Turbo in the previous one,
Starting point is 00:10:09 0-125 preview. Not only is the new version less verbose and it goes directly into code, but it also rightfully so decides to add a flag to download the highest quality video. Smart. This, of course, in some ways agrees with what Benjamin was saying, that it's not going to be evaluations that matter, but ultimately what people find in practice. Now, one meta note that was really interesting to me
Starting point is 00:10:29 comes once again from Billowal Seedhoo who writes, is it just me or was today the first time OpenAI was unable to overshadow a Google AI announcement? Gemini 1.5 Pro is pretty wild. Just dropped in an audio file, an hour-long video interview, and now it's helping me package it up for YouTube. multi-modality plus one million context window is clutch. Through some thumbnails added and it has the prior context to help me decide the best option. Titles, tags, description, promotional posts, all doable with rich context, not just transcripts. So the point here, which was something that
Starting point is 00:10:57 I noticed as well, is that OpenAI has done a very good job in general of undercutting everyone else's announcements with more impressive things of their own. In fact, I could be misremembering this, but I'm pretty sure that Sora came out right around the same time that Gemini 1.5 Pro was first announced and it just totally sucked all of the oxygen out of the room. I tend to agree that from my observational point of view, Gemini 1.5 Pro got at least as much buzz as GPT4's turbo upgrades, which I think has a lot to do with how much people are just waiting now for OpenAI to actually make a big jump forward. I don't believe right now it's that OpenAI has distinctly found themselves behind, although I do think that many people believe that
Starting point is 00:11:32 Claude 3 is the most perform at LLM right now, but more just that they've been at parody for so long, something just feels strange. For my money, though, the thing that had people the most excited so far this week were reports that came out on Monday that meta was planning to launch some versions of Lama 3 as early as next week. This was initially a report from the information. Their source was a employee and said that the company was planning to launch two small versions of Lama 3 as a precursor to the launch of the biggest version which was expected this summer. This was confirmed at an event in London on Tuesday, said Nick Clegg met as president of global affairs. Within the next month, actually less, hopefully in a very short period of time, we hope to start rolling out our new suite
Starting point is 00:12:10 of Next Generation Foundation models Lama 3. There will be a number of different models with different capabilities, different versatility, released during the course of this year starting really very soon. Said Meta Chief Product Officer Chris Cox, the plan is to power multiple products across Meta with Lama 3. One thing that's shifted with meta's discussion, all the way from Zuckerberg on down, is that they are no longer content to just be the best open source model. They want to be state-of-the-art competing against any model. Reinforcing that was Joelle Pinot, the vice president of AI research, who said, our goal over time is to make a llama-powered meta-a-i be the most useful assistant in the world.
Starting point is 00:12:43 And so between this Meta Lama 3 announcement, as well as the new Mistral model, the open source community had a lot to be excited about this week. Anyways, guys, we'll wrap there, but what I will say is that part of why this week feels so reflective to me of just where we are, is that all of these big announcements, while cool, aren't getting people to jump up and down and scream and say, wow, what a crazy week, although I'm sure some of the other YouTubers will have titles to that effect. Instead, this feels, if certainly not like a slow week, then certainly still something that is to be expected.
Starting point is 00:13:11 In other words, not too many standard deviations away from the mean. So that's the world you're living in now. Five big models coming out a week, and everyone just excitedly going on with their life. Anyways, that is going to do it for today's AI breakdown. If you haven't yet, check out B-super.a.i. It's the fast, fun, and most importantly, useful way to learn AI through video tutorials and companion how-toes. Appreciate you listening or watching as always, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.