The AI Daily Brief: Artificial Intelligence News and Analysis - Llama 3 Is Here (And Seemingly Better Than Expected)
Episode Date: April 18, 2024Meta has released the first of its Llama 3 models, enhancing the landscape of open-source large language models. These models, including the initial 8B and 70B versions, promise advanced capabilities ...and integration into Meta's consumer products, solidifying Meta's commitment to leading in AI innovation. The release indicates Meta's strategic push to compete in the open-source domain and extend these high-level AI functionalities through Meta AI across its platforms, setting a new standard for accessibility and performance in the AI community. ** CHECK OUT THE JUST-LAUNCHED SUPERINTELLIGENT PLATFORM - 300+ AI video tutorials https://besuper.ai/ Consensus 2024 is happening May 29-31 in Austin, Texas. This year marks the tenth annual Consensus, making it the largest and longest-running event dedicated to all sides of crypto, blockchain and Web3. Use code AIBREAKDOWN to get 15% off your pass at https://go.coindesk.com/43SWugo ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, the anticipated Lama 3 release has just happened.
Before that on the brief, Microsoft's VASA 1 model is turning heads.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
Yesterday, everyone started talking about a new model from Microsoft.
research called Vasa 1. Bindu Reddy summed it up this way. She called it the first AI-generated
video that looks super real and said it takes a single portrait photo and speech audio and produces
a hyper-realistic talking face video with precise lip sync audio, lifelike facial behavior, and
naturalistic head movements generated in real time. This is amazing given that the AI-generated
video looks very real. So for those of you who are watching this rather than listening to it,
let's check out a quick example. You ever had, maybe you're in that place right now where you wanted
turn your life around and you know somewhere deep in your soul there could be some decision.
It really is incredibly lifelike. And indeed, that's the way that they describe it in their research paper.
The paper is called Vasa 1, lifelike audio-driven talking faces generated in real time.
TLDR, they write, single portrait photo plus speech audio equals hyper-realistic talking face video
with precise lip sync audio, lifelike facial behavior, and naturalistic head movements generated in
real time. In the abstract, they write, the model is capable of.
of not only producing lip movements that are exquisitely synchronized with the audio,
but also capturing a large spectrum of facial nuances and natural head motions that contribute
to the perception of authenticity and liveliness. The core innovations, they say, include a holistic
facial dynamics and head movement generation model that works in a face latent space,
and the development of such an expressive and disentangled face latent space using videos.
Part of what they're so excited about is that the model not only delivers realistic,
lifelike videos, but supports as they write 512 by 512 videos at up to 40 frames per second with negligible
starting latency. What that means is that conceivably we're not too far away, not only from
life-like AI generated videos, but from actually being able to interact with these lifelike avatars in real
time. Now, in addition to producing just life-like videos from real photos, they say that their method
also has the ability to handle photo and audio inputs that are out of their training distribution.
For example, they can handle illustrations or paintings, singing audio, or non-English speech,
none of which were present in the training set.
Again, for those of you watching, here's an example of that lack of latency, which makes it
opportune for real time.
But you know what I decided to do?
I decided to focus all my attention, all my time on listening.
So instead of doing something else, I just listened, listened and listened.
because I'm a true believer that if you're really bad at something.
Now the paper points out that this is not a technology without risk.
The closer we get to truly lifelike, and especially real-time generation like this,
the greater the chance that it's misused for disinformation or misinformation,
for impersonating people without their permission.
And at this time, they write,
we have no plans to release an online demo API product,
additional implementation details, or any related offerings
until we are certain that the technology will be used responsibly
and in accordance with proper regulations.
And ultimately, the biggest caveat or asterisk on any of this
is that none of us have actually gotten a chance to play with it.
This is just a paper with cherry-picked examples
that, while super impressive,
theoretically might not represent the standard output
that you would get if everyone was allowed to use this.
Still, it does seem like a fairly significant jump
in this sort of avatar capacity,
and so it is very much something to be watching for in the future.
Next up, we get an update from OpenAI.
The company has released a new version of its assistance,
Now, the Assistance API is OpenAI's tool that allows people to build agent-like assistants
that have some specific purpose that they use ChatGPT for.
One of the big changes is, as Brian Romley writes, the new OpenAI Assistance API can access
up to 10,000 documents in a vector database rag.
It's quite useful, and I have already mentioned it to clients that need larger-scale rags.
Sely Omar from Cognosis says, new OpenAI assistant updates are really good, hard to bet against
them.
tested retrieval and it's insanely fast and almost instant.
Seems like they want a piece of the rag pie as well.
At one point does it become cheaper to use them versus building your own?
For those of you drowning in acronyms, rag stands for retrieval augmented generation
and basically refers to the idea of an LLM pulling from a specific set of information for some specific purpose.
This is a popular strategy right now for, for example, enterprises that want some version of an LLM
to be able to pull from their proprietary information in order to give people in the company insight that is specific to the company.
There are tons and tons of solutions for how an enterprise might go about that, and Sully's pointing
out that the better that OpenAI and ChatGPT get at it, the more incentive that companies have
just to stay in that ecosystem. Of course, ultimately, it's not just a question of technical
capacity, but a question of data trust. And my perception is at least that the biggest reason
that enterprises are choosing not Open AI to spin up their own models, to customize open
source models is because of those concerns around privacy. Still, like Sully says, the fact that this is an
easy and fast approach that works really well could shift the balance of that conversation a little bit.
Lastly today, an interesting one from the world of entertainment. There's been a lot of discourse in the
entertainment sector and in Hollywood about artificial intelligence. Last year, the writer strike and
the SAG strike were some of the first instances in which concerns around AI really started to get
into the mainstream. My question at the time was not so much if the concerns were real or not, but
whether, if the choice was on the one hand try to ban or prohibit all this technology or on the
other try to profit from it, how long would it take before Hollywood shifted over to the try to profit
from its side? Apparently, the creative artist agency, CAA, one of the big Hollywood talent agencies,
is thinking in a similar way and testing a new program called CAA Vault that allows the talent
they represent, or at least a small handful of A-lister's that are testing it, to create a digital
double of themselves that they can profit from. Said Alexandra Shannon, CAA's head of
corporate strategic development. On one hand, there's concerns that technology is being misused
to exploit name, image-likeness, voice, body of work without consent. But we also recognize that it's
creating opportunities for talent and an explosion of creativity in so many different ways. Shannon said,
these technologies should not devalue the human. If somebody's digital likeness is being used in a
campaign instead of them in person, it is still the value of that person and what they stand for as a
representative of a brand. Of course, the concern that many have is that if the marginal cost of
production comes down, and there is a much larger supply of celebrity spokespeople in the form of
these digital doubles, what will that actually do to the value of any individual instance of that?
It seems like a pretty price deflationary force. Of course, the Brad Pitt brand, for example,
is going to continue to retain value, but if there are digital Bradpits running around everywhere,
how much each individual instance of that is going to be able to charge. Ultimately, those are
questions for CAA, not for us, and so that is going to do it for the AI breakdown brief.
Next up, the main AI breakdown.
Today's podcast is brought to you by Plum.
Is your product team struggling to keep up with the incredible pace of AI development?
Are you tired of spending countless engineering hours just to test out small prompt changes in your product?
Thankfully, there's Plum.
Build cutting-edge AI experiences for your users in a fraction of the time.
Say goodbye to the slow, tedious process of hand coding and hello to the future of AI development.
Get ahead of your competition and start moving as fast as AI does.
Check out Useplum.com and shoot me a message to get early access.
Attention AI Breakdown listeners.
Consensus 2024 marks the 10th gathering for all things crypto, blockchain, and Web 3.
However, importantly, this year's agenda will also dive deep into AI-driven transformation.
And the speaker lineup includes the leading minds and innovators at the forefront of this digital renaissance.
Don't miss the Consensus AI Summit to cut through the hype to find where true transformation and opportunity lie.
Listeners to this show can get 15% off registration with the code AI breakdown.
Visit Consensus24.com to learn more.
Some of the folks will be at Consensus this year include Guillaume Verdun,
aka Beth Jzos, founder and CEO of XTropic, as well as spiritual leader of the acceleration
of the Accelerationist Movement, Neil Stephenson, co-founder of Laminow 1, and Brendan Ike,
the CEO of Brave Software.
Again, go to Consensus24.coindex.com to learn more and get 15% off registration with the code
AI breakdown.
Welcome back to the AI breakdown.
Last week, reports came out that Meadow was on the verge of releasing its latest open-source
LLM models Lama 3.
Now at first this was an unconfirmed rumor, but then within a couple days, meta seemed to
indicate that yes, this was coming, or at least a set of small versions of their next
model were coming in advance of the largest version which would be coming out over the
summer.
All week then, we have been waiting with bated breath to see what meta would put out, with
tons and tons of speculation on just how good it would be.
A couple hours before I recorded this episode, we started to get hints that Lama 3
was about to be dropped.
First we saw Lama 38B instruct be listed on the ashy
marketplace with little nuggets of information like this line. The fine-tuned versions are optimized
for dialogue use cases. And then on replicate.com, we saw pricing for four different models. Lama 370B,
Lama 380B chat, and Lama 38B chat. People got in their last polls on how good others anticipated
this to be. Yom Peleg wrote, final moments to guess, Lama 3 will be, and then gave the options
state-of-the-art open-source software, same level as state-of-the-art open-source software,
same level as state-of-the-art closed software or outperforming state-of-the-art closed software.
Outperforming state-of-the-art closed was the lowest-ranked option with 8.8%.
Same level as state-of-the-art closed had 12.8%.
And then state-of-the-art open source and same level as state-of-the-art open source were
almost exactly the same, getting 38.9% and 39.6% of the vote respectively.
Accelerate harder asked the simpler version of the same question.
Lama 3 will be either amazing or a let-down, with 47.2% saying a letdown and 52.
8% saying it would be amazing.
Just a few minutes later, we got the actual announcement.
The AI at Meta account on X wrote, introducing Meta Lama 3, the most capable openly
available LLM to date.
Today, we're releasing 8B and 70B models that deliver on new capabilities such as improved
reasoning and a set of new state-of-the-art for models of their size.
Today's release includes the first two Lama 3 models.
In the coming months, we expect to introduce new capabilities, longer context windows,
additional model sizes, and enhanced performance.
plus Lama 3 research paper for the community to learn from our work.
Chief AI scientist at METI Yan LeCoon gave a few more bits of information.
He said 8B and 70B models available today, 8K context length,
trained with 15 trillion tokens on a custom-built 24K GPU cluster,
great performance on various benchmarks,
with Lama 3.8B doing better than Lama 270B in some cases.
We'll come back to that 8K context length because it sticks out kind of like a sore thumb
relative to other models we've had recently.
But as he has started to do,
Zuckerberg took to their own.
networks to talk about the new release. He wrote on Instagram, Big AI News Today. We're releasing
the new version of meta-AI, our assistant that you can ask any question across our apps and
glasses. Our goal is to build the world's leading AI. We're upgrading meta-AI with our new
state-of-the-art Lama 3 AI model, which we're open sourcing. With this new model, we believe
meta-a-i is now the most intelligent AI assistant that you can freely use. We're making meta-a-i
easier to use by integrating it into the search boxes at the top of WhatsApp, Instagram, Facebook, and
messenger. We also built a website meta.a.ai for you to use on web. We also built some unique
creation features like the ability to animate photos. MetaAI now generates high quality images so
fast that it creates and updates them in real time as you're typing. It'll also generate a playback
video of your creation process. Enjoy MetaAI and you can follow our new meta.a.i. Instagram for
more updates. So a couple of things that are notable from this. One,
meta is continuing to reinforce their message of open source. And that's something we'll see even more
in some other parts of the announcement in just a minute.
Second, as had been intimated,
Zuckerberg is clearly not content
with just being the state of the art for open source.
They want to go after the state of the art in general.
You can see the sort of big language they're using.
Our goal is to build the world's leading AI.
With this new model, we believe
meta-a-I is now the most intelligent AI assistant
that you can freely use.
Even if there are little caveats in there,
i.e. you can freely use.
The ambition to be the best,
full stop is clearly on display.
Another really interesting thing, however,
from this announcement,
is the extent to which they are integrating this into products right out of the gate.
This is clearly not meant to just be a developer release,
but something that is immediately impacting consumer products as of today.
Alongside the announcement, they put out a lot of new information.
Some of it's for developers, but some of it, of it, of course, is benchmarks.
Almost immediately, people's eyes started bugging out at those benchmarks.
Matt Schumer writes, holy S.
Lama 370B cleanly beats Claude 3 Sonnet,
small enough to host its scale without breaking the bank.
What he's referring to is both the MMLU, where Meta-L-3-70B claims in 82 versus Clod3 Sonnet 79,
and Human Eval, where Meta-Lama 3 claims in 81.7 versus Claude III sonnet 73.
Bindu Reddy from Abacus writes, historic moment.
Lama 370B numbers are insane.
At 82 MMLU, it's far and away the best open-source model.
GSM 8K, math, and human-aval are mind-blowing as well.
The open-source community is definitely going to beat GPD4 in a matter of weeks.
Schumer also pointed out that.
quote, Lama 370B cleanly beats Mixtral 8X-22B.
We've talked a lot on this show about the extent to which Mistral had stolen some of the
open source thunder from Meta over the course of the end of last year, and there certainly
seems to be meta's clapback.
Overall, Professor Ethan Malik writes, meta released their open source AI Lama 3 today.
As a key leader in LLMs, their models are often the most advanced open source ones out
there.
Based on benchmarks, the current model is not quite GPT4 class, but their larger ones still training
will reach GPT4 level.
And indeed, this is what some people are the most.
excited about. Schumer again writes the craziest Lama 3 reveal. The 400B plus version of the model
is on par with Claude 3 opus and it's still training. Soon we'll have a better than opus fully
open source model. The implications are huge. Meta3 400B's reported MMLU score is right on par
with Opus 3 with their grade school math score and human avow score right around there as well.
Ashton Zhang from the meta team wrote, Lama 3 has been my focus since joining the Lama team
last summer. Together we've been tackling challenges across pre-training and human data, pre-training
long context, post-training, and evaluations.
It's been a rigorous yet thrilling journey.
He continues scaling is the recipe,
demanding more than better scaling laws and infrastructure,
e.g. managing high-effective training time across 16KGPUs requires innovative strategies.
Still, I think the big implications are summed up by Dr. Jim Fan from Nvidia,
who writes,
The upcoming Lama 3-400B will mark the watershed moment
that the community gains open-weight access to a GPT4-class model.
It will change the calculus for many research efforts in grassroots startups.
Lama 3400B is still training,
and will hopefully get even better in the next few months.
There is so much research potential that can be unlocked with such a powerful backbone.
Expect a surge in builder energy across the ecosystem.
Now, I think that's absolutely true.
In many ways, the story of 2024 so far has been this standardization of the GPT4 class models.
And so it's perhaps not surprising that open source seems to be catching up there as well.
However, surprising or not, as Jim points out,
the implications in terms of what people can build and how and for how much are fairly significant.
Now, I mentioned that to the extent that there has been any quick critique, it's that 8K context
window. Some pointed out that they thought that the community would extend that pretty quickly,
and others inside meta explained it as well. Ashton Zhang again wrote,
we've set the pre-training context window to 8K tokens. A comprehensive approach to data modeling
parallelism, inference, and evaluations would be interesting, more updates on longer
context later. In other words, there are clearly tradeoffs that they were willing to make
based on their goals with this release. Now, speaking of this release, one other really interesting
little detail. Meta seems to have taken it to heart to go directly to the community, because in
addition to all the traditional PR strategies, Zuckerberg popped up on a number of creator shows
that are beloved and well-known inside AI circles, but not even close to the size of mainstream outlets.
Roberto Nixon, who runs some of the most popular Instagram and TikTok channels on AI in future
technology, did an interview with Zuck, and Dwarkesh, whose podcast has quickly become, I think,
the most high-value interview show that exists, released an hour and 20-minute episode.
with Zuckerberg that gets deep not only on Lama 3, but as to our Keshe writes, open sourcing
towards AGI, custom silicon, synthetic data, and energy constraints on scaling, along with, you know,
intelligence explosions, bio-weapons, $10 billion models, and much more.
Overall, I would say the first impressions are extremely exciting.
I think even more exciting than the community thought they were going to be when they heard
that a couple of small models were coming last week.
I'm sure in the coming days, I will be able to add more context around what people are
finding in terms of actual performance.
but for now, it's a cool day with lots to explore.
That is going to do it for today's AI breakdown.
Until next time, peace.
