The AI Daily Brief: Artificial Intelligence News and Analysis - The 5 Important AI Models Released This Week
Episode Date: April 11, 2024This week witnessed the introduction of five notable AI models, highlighting the field's swift advancements. Udio emerged as a standout music generator, rivaling the previously leading Suno. Updates a...nd new releases included Mixtral 8×22B, Google's publicly available Gemini 1.5 Pro with enhanced features, OpenAI's improved GPT-4 Turbo, and anticipated versions of Meta's Llama 3. ** CHECK OUT THE JUST-LAUNCHED SUPERINTELLIGENT PLATFORM - 300+ AI video tutorials https://besuper.ai/ ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Breakdown, we're looking at five new important models that have launched just this week.
The AI Breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, or Discord, and our newsletter.
Hello, AI friends.
A quick note before we get into today's show, if you've been listening, of course, you will know that we launched Super Intelligent yesterday.
And for those of you who don't know, superintelligent is our new platform for teaching people how to use AI.
We've got more than 300 fun, fast, and most importantly useful, practical video tutorials.
Each of them has a companion project, which is a set of step-by-step instructions that allow people to actually use the tools and create the things that are being created in these tutorials.
And we're adding between 30 and 50 new video tutorials each week.
That has just gone live to the public. It is $20 a month for unlimited access.
If you're interested, check it out at B-Super.a.i or email me NLW at B-Super.Ai if you have questions.
it was directly inspired by conversations that I had with all of you AI breakdown listeners,
and so I'm really excited for you to check it out.
Because of that launch, though, this week has been a little bit chaotic,
and so today, instead of our normal brief-in-dom episode format,
we're actually doing a bit of a countdown of the five important models that have been launched this week,
including four LLM models as well as one music model that has everyone buzzing.
So without any further ado, let's dive in.
Welcome to the AI breakdown.
Everyone anticipated, I think, that 2024 was going to be quite the year when it came
into AI competition and technical advances. And this week is a great example, a little snapshot that
shows just what passes for normal in 2024. We're going to look at five new models that were
released this week or updated in some cases, four of which are LLMs, but the first of which is a new
music generator called UDO that has absolutely captured the AI Twitter sphere's attention.
For the last couple of months, the big hot player in AI music has been Suno. We've talked about
them here on the show, and they really represented a major advance from things that we had seen
before. However, a couple of weeks ago, we started seeing little drips and drops that there was
something even better coming. Last week, the first contraband examples of this music generator's
output started to be published to X, and this week we finally got the service itself, which again is
called Udio. Now, just to give a quick example of what this thing can do, let's listen to a snippet
of Dune, the Broadway musical.
Pretty amazing stuff.
Consequently, there have been a ton of posts like this one from Min Choy, which reads,
This is Wild. UDio just dropped and it's like Sora for music.
The music are insane quality, 100% AI.
I thought it was interesting, though, when Bilowal's C2 tweeted,
anyone else noticed this pattern with new AI features.
One, cool new tool drops.
Two, early adopters rake in insane views because it's novel.
Three, everyone jumps on the bandwagon.
Four, overuse equals oversaturation equals no one cares anymore equals views drop.
Five, it always comes back to the creators making something truly unique.
It's the quintessential hype cycle in three months, and it only seems to be compressing further.
It's Suno now, but Sorrel will be the same when it hits GA.
If anyone can do it on demand, it loses value because it's not special anymore.
At that point, it's back to creative self-expression, except now you're creating at a higher
level of abstraction with your newfound superpowers to make something unique that cuts through
the noise.
Now, hold aside in these specific examples of whether this is happening to Suno or will happen
to Udeo.
I think it gets at a profound truth about the world that we're moving into.
In a world where so much can be created so easily, the quality of what actually breaks through
and captures attention is going to consequently have to go up incredibly. It is extremely unlikely
to me that the best songs that come even from a sophisticated application like UDO are going
to come from a random person on Twitter who just happened to stumble into it versus someone
who is probably a musician or a songwriter already adapting to and mastering a new medium.
Now, while by and large, most of the folks that I've seen have ranked UDO ahead of Suno at this
point, developer Nick Dobos disagrees. He writes, UDio has better audio quality, but I haven't
found a single song I've liked, missing a certain something-something. Meanwhile, I have around five
Suno songs stuck in my head right now. Another AI creator Boris followed up and said same. Suno generates
complete bangers. UDO? Interesting snippets, no control over the lyrics to experimental styles in
general. Getting at the broader point that I was just making, Nick Dobos also tweeted,
With the rise of Gen A.I. We are going to see an interesting clash of cultures as new creatives start and learn with Gen A.Tools first. A big cohort of coders, visual artists, musicians and video makers will go mid-Jurney, Dolly, Suno, and Udio, runway before Photoshop, Canada Final Cut, Ableton, and Fruitiloups. Expect some wildly different styles as this younger cohort learns creativity at a higher abstraction level, and then learns traditional media and productivity tools backwards. Also expect 90% of the previous cohort to cry and whine about it, this isn't real art. The other 10% will embrace the new tech and build the most amazing things you ever
seen by combining new and old techniques. I could not agree more that this is likely to be the
pattern. And whatever you think about this, if you haven't had a chance to play with UDO yet
or go listen to some of the creations, it is highly worth taking a few minutes to do so.
From there, let's move into our LLM announcements. The first wasn't really an announcement at all.
Mistral, as they have classically come to do, just dropped a link to a release of a new model on
Twitter without any further explanation. The model is called 8X-22B, M-O-E, and Gigazine writes,
Although details are unknown, 8X22BMOE may have more than three times the number of parameters of the model Mistral 8x7B,
which has been shown to outperform GPT 3.5 and Lama 270B in many benchmarks.
They also add, the total number of parameters may be up to 176 billion.
The context length that can be handled is said to be 65K.
Now, the open source community is really excited about this one, not only because it appears that it might have increased capacity,
but also because the last mistral model that was announced was their first closed source model,
which was to be distributed exclusively through Microsoft.
At the time, I talked about how most of the people that I saw in the open source community
were willing to not just give them the benefit of the doubt,
but understand that they had to fund the business somehow,
but still, seeing them continue to advance on the open source side
has a lot of people breathing a sigh of relief and staying up all night to start hacking.
Today's podcast is brought to you by Plum.
Is your product team struggling to keep up with the incredible pace of AI development?
Are you tired of spending countless engineering hours just to test out small prompt changes in your product?
Thankfully, there's Plum.
Build cutting-edge AI experiences for your users in a fraction of the time.
Say goodbye to the slow, tedious process of hand coding and hello to the future of AI development.
Get ahead of your competition and start moving as fast as AI does.
Check out Useplum.com and shoot me a message to get early access.
Another surprise release a little bit earlier in the year came from Google.
You'll remember that back in December, facing down intense pressure to try to keep up or catch up with OpenAI,
Google announced its suite of Gemini models. The problem with that announcement was that their biggest
and most performant model, the one that they said actually beat GPD4 on many benchmarks, wasn't going to be
available until sometime in 2024. When that ultra model did finally come out, Google surprised everyone
by just about a week later announcing Gemini 1.5 Pro, an even more advanced model that most notably
was said to have a 1 million token context window, completely blowing the doors off of everything we had seen
before. At the time, Gemini 1.5 Pro was only available to developers through Google's AI Studio,
but this week, Gemini 1.5 Pro has moved into a public preview period and can be accessed via
the Gemini API. In addition to just being more widely available, they've also added
native audio or speech understanding capabilities, as well as a file API to make it easy
to handle files. Their announcement post writes, we're also launching new features like system
instructions and JSON mode to give developers more control over the model's outputs. DDI at Menlo Ventures
writes, Gemini 1.5 Pro's video understanding is the most underrated thing in AI. In 50 seconds it
quote-unquote saw an 11-minute YouTube video, around 175K tokens of the most iconic moments in sports,
and was able to, perfectly, to my knowledge, list all 18 of the moments. There is no other
video AI this good. What D.D. is getting at is one of the reasons that people have been so excited
about Gemini 1.5 Pro in its longer context window is that it really does open up totally new use
cases that just aren't possible with different types of input lengths. Not to be totally
outdown, open AI announced what they called a, quote, majorly improved GPT4 turbo model,
first available through the API and slowly rolling out into chat GPT as well. Now, of course,
when a lab calls something majorly improved, it opens it up for people to ask, is it majorly
improved? Professor Ethan Mollock writes, as is usual with AI, a, quote, majorly improved GPT4
model comes with no real change logs or release notes. It's going to be better at many things
and worsen some other things and also different in some ways you aren't expecting. Or that just
might be in your head. AI is weird. He goes on, benchmarking is especially hard when we don't even
agree as to what words involve mean. I believe that reasoning has been improved, but I'm not sure what that
actually translates to. The only way to figure out is to put in hours to test yourself, question mark.
Still, from a marketing standpoint, OpenAI is going hard on what the new capacities mean.
They shared a thread on Twitter slash X, where they showed off use cases of the new GPT4 Turbo that
include that very impressive Devin AI Software Engineering Assistant. They pointed to a nutrition app that
used GBT Turbo 4 with Vision to get better insights about what was in food, and they talked about
TL Draw, which is an incredibly advanced AI-powered UI designing tool that's gotten a lot of buzz
on Twitter recently as well. Now, it appears that one of the things that OpenAI is excited about
is the increase in co-generation performance with this new GPT4 Turbo. Some are skeptical, however.
Benjamin DeKracker, for example, tweets, this is why OpenAI staff have been acting all cheeky.
Newest GPT4 Turbo update is doing very well on coding benchmarks, very well indeed. Here's the thing.
I don't really believe it.
For example, these also show the old version of GPT4
beating Claude 3 Opus at Code Generation.
In my real-world experience, that's not true.
Opus has been significantly better.
TLDR, the upgrade is probably good,
but I'm not convinced it's actually back at number one for code.
Real world is what matters, which is always hard to measure.
Now, on the flip side is Pietro Sherano, who writes,
side-by-side comparison between the latest version of GBT4 Turbo in the previous one,
0-125 preview.
Not only is the new version less verbose and it goes directly into code,
but it also rightfully so decides to add a flag to download the highest quality video.
Smart.
This, of course, in some ways agrees with what Benjamin was saying,
that it's not going to be evaluations that matter,
but ultimately what people find in practice.
Now, one meta note that was really interesting to me
comes once again from Billowal Seedhoo who writes,
is it just me or was today the first time OpenAI was unable to overshadow a Google AI announcement?
Gemini 1.5 Pro is pretty wild.
Just dropped in an audio file, an hour-long video interview,
and now it's helping me package it up for YouTube.
multi-modality plus one million context window is clutch. Through some thumbnails added and it has the prior
context to help me decide the best option. Titles, tags, description, promotional posts,
all doable with rich context, not just transcripts. So the point here, which was something that
I noticed as well, is that OpenAI has done a very good job in general of undercutting everyone
else's announcements with more impressive things of their own. In fact, I could be misremembering
this, but I'm pretty sure that Sora came out right around the same time that Gemini 1.5
Pro was first announced and it just totally sucked all of the oxygen out of the room. I tend to
agree that from my observational point of view, Gemini 1.5 Pro got at least as much buzz as
GPT4's turbo upgrades, which I think has a lot to do with how much people are just waiting now
for OpenAI to actually make a big jump forward. I don't believe right now it's that OpenAI
has distinctly found themselves behind, although I do think that many people believe that
Claude 3 is the most perform at LLM right now, but more just that they've been at parody for so long,
something just feels strange. For my money, though, the thing that had people the most excited so far this
week were reports that came out on Monday that meta was planning to launch some versions of Lama 3 as
early as next week. This was initially a report from the information. Their source was a
employee and said that the company was planning to launch two small versions of Lama 3 as a precursor
to the launch of the biggest version which was expected this summer. This was confirmed at an event
in London on Tuesday, said Nick Clegg met as president of global affairs. Within the next month,
actually less, hopefully in a very short period of time, we hope to start rolling out our new suite
of Next Generation Foundation models Lama 3. There will be a number of different
models with different capabilities, different versatility, released during the course of this year
starting really very soon. Said Meta Chief Product Officer Chris Cox, the plan is to power multiple
products across Meta with Lama 3. One thing that's shifted with meta's discussion, all the way
from Zuckerberg on down, is that they are no longer content to just be the best open source model.
They want to be state-of-the-art competing against any model. Reinforcing that was Joelle Pinot,
the vice president of AI research, who said, our goal over time is to make a llama-powered meta-a-i be
the most useful assistant in the world.
And so between this Meta Lama 3 announcement, as well as the new Mistral model, the open source
community had a lot to be excited about this week.
Anyways, guys, we'll wrap there, but what I will say is that part of why this week feels
so reflective to me of just where we are, is that all of these big announcements, while
cool, aren't getting people to jump up and down and scream and say, wow, what a crazy week,
although I'm sure some of the other YouTubers will have titles to that effect.
Instead, this feels, if certainly not like a slow week, then certainly still something
that is to be expected.
In other words, not too many standard deviations away from the mean.
So that's the world you're living in now.
Five big models coming out a week, and everyone just excitedly going on with their life.
Anyways, that is going to do it for today's AI breakdown.
If you haven't yet, check out B-super.a.i.
It's the fast, fun, and most importantly, useful way to learn AI through video tutorials and companion how-toes.
Appreciate you listening or watching as always, and until next time, peace.
