The AI Daily Brief: Artificial Intelligence News and Analysis - Llama 3.1 405B Eliminates Gap Between Open and Closed Source AI
Episode Date: July 24, 2024According to the earliest benchmarks, the newly released Llama 3.1 405B has almost entirely (if not entirely) closed the gap between closed and open source AI. At the very least, it's clear that 4...05B is a GPT-4o class model. Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'podcast' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, Meta's Lama 3.1405B may have closed the gap in the state of the art between open and closed source models.
Before that on the brief, OpenAI updates their safety policies without actually updating anything.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the AI Daily News you need in around five minutes.
We kick off today with the latest announcements from OpenAI on their safety strategy.
They tweeted very early this morning East Coast time,
making sure AI can benefit everyone starts with building AI that is helpful and safe.
We want to share some updates on how we're prioritizing safety in our work.
We believe that frontier AI models can greatly benefit society.
To help ensure our readiness, our preparedness framework helps evaluate and protect against
the risks posed by increasingly powerful models.
We won't release a new model if it crosses a medium risk threshold until we implement
sufficient safety interventions. We're developing levels to help us and stakeholders categorize and
track AI progress. This is a work in progress and we'll share more soon. Editor's note, this is, I think,
what we got some information about last week with their various levels of AI proceeding towards
AGI. This tweet thread continues, however. In May, our board of directors launched a new safety and
security committee to evaluate and further develop safety and security recommendations for open AI
projects and operations. The committee includes leading cybersecurity expert, retired U.S. Army General
Paul Nakasone. This review is a lot of the review is
underway and we'll share more on the steps we'll be taking after it concludes. Our whistleblower
policy protects employees' rights to make protected disclosures. We also believe rigorous debate about
this technology is important, and it made changes to our departure process to remove non-disparagement
terms. Safety has always been central to our work from aligning model behavior to monitoring for
abuse, and we're investing even further as we develop more capable models. Now, what's notable to this,
at least to me, is that there's nothing new that's actually announced. They frame it as we want
to share some updates on how we're prioritizing safety, but there's nothing actually new here.
This suggests to me that there was some reason that they felt they wanted to remind everyone of these efforts
and put it in a nice, easy-to-point-to-place, but not that they had something big and new that they wanted to highlight.
As I was wondering about what the reasoning was that might be behind this, it started to pop up on X that a group of five senators had sent Sam Altman a note.
The note, which was dated yesterday, July 22nd, reads, Dear Mr. Altman,
we write to you regarding recent reports about OpenAI safety and employment practices.
OpenAI has announced a guiding commitment to the safe, secure, and responsible development of artificial intelligence and the public interest.
These reports raise questions about how OpenAI is addressing emerging safety concerns.
We seek additional information from OpenAI about the steps the company is taking to meet its public comments on safety,
how the company is internally evaluating its progress on those commitments, and on the company's identification and mitigation of cybersecurity threats.
The letter goes on to ask 12 questions, requesting that information be shared by August 13th of this year.
One of the questions that many on Twitter have commented on is this one.
Will OpenAI commit to making its next foundation model available to U.S. government agencies
for pre-deployment testing, review, analysis, and assessment?
Now, it's likely that I'm going to dig deeper into each of these questions in an episode later this week.
So for now, I will just note that it seems like there is an increasing moment of scrutiny
around OpenAI from the U.S. government and that that might be driving some of this increased
conversation around safety and preparedness.
Next up, perplexity is back in the news, although this time not for a funding announcement.
been amazingly. No, unfortunately for them, this time, it's because Condé Nast has sent a cease
and desist letter to the company. The information reports that Condé Nast, which owns publications
like the New Yorker Vogan Wired, has sent a cease and desist demanding that perplexity
stop using content from these publications in its search results. This follows a letter last
month from Forbes, where Forbes accused perplexity of infringing on its copyright. Condéin
Nass similarly claims that perplexity is plagiarizing its content. It seems that the
ire around perplexity isn't around training data, as that's not what perplexity does.
does, but around the way that Perplexity's AI summarizes news articles. Now, there are a lot of
specific technical details in here around whether and where the bots that companies like Perplexity
use to allow publishers to block their crawlers actually work, but ultimately this comes down to
this new format for search. It feels to me fairly inevitable that the sort of combination of AI-generated
summary plus links is going to be the default and norm in the future, but the path to get there may be
littered with a lot of legal battles, proving once again that when it comes right down to it,
the only group of people who always win are the lawyers.
Moving to some science, a new Google model has, quote,
helped make a breakthrough in accurate long-range weather and climate predictions.
Writes the Financial Times, using a hybrid of machine learning and existing forecasting
tools, a model led by Google called Neural GSM,
successfully harnessed AI to conventional atmospheric physics models to track decades
long climate trends and extreme weather events such as cyclones.
A recent paper said Neural GSM proved faster, more accurate,
and used less computing power and tests against a current forecasting model
based on atmospheric physics tools called X-Shield.
In one trial, they say, neural GCM identified almost the same number of tropical cyclones
as conventional extreme weather trackers did, and twice the number of X shields.
In another test based on temperature and humidity, the error rate was between 15 and 50% less.
One of the things that often gets lost as we debate the big legal and ethical issues of AI
is how much it's likely to impact scientific discovery.
But that ultimately is going to be the subject for another show.
That is going to do it for today's AI Daily Brief Headlines edition.
next up the main episode.
Today's episode is brought to you by Super Intelligent,
the platform for fun, fast AI learning.
Super has a ton of new things going on.
We recently announced our partnership with Spotify,
through which users of that app
can now access Super Intelligent content
directly from their mobile apps.
We've also just launched the AI learning feed.
In addition to seeing the tutorials that we're dropping,
there are polls, news items with related lessons,
and a chance for people to show off the projects
in use cases that are making AI come alive
for them. We've also just kicked off the Super Summer Challenge, where each week we'll share a new
challenge that you can use to discover new AI tools and use cases. Go to B-Super.a.i and use code
super fun for 50% off your first two months. That's B-Super.a.i. Today's episode is brought to you
by Venice. Venice is a private, uncensored generative AI app. It accesses open source models
to enable text image and code generation without the fear of being spied on or having your data
exploited. Discuss anything with Venice without concerns about it being monitored, sold,
or given to advertisers and governments. Venice is different because your conversations and
creations are kept securely within your own browser, never stored or accessible by Venice. Unlike
other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you.
It simply provides direct access to machine intelligence. No topics are off limits, no ideas are
taboo. With Venice, you're in control of the AI, as you should be. Pro subscriptions are available for
$49 a year or $8 per month. Try it for free without an
account at venice.A.I. Welcome back to the AI Daily Brief. Today we are talking about some leaks
around the forthcoming meta-Lama 3.1405B model. As Didi from Menlo Ventures puts it, this is potentially
the biggest news in AI in several weeks. Lama 3.105B leaked day before on 4chan and obliterates
GPT40 on most benchmarks. Wright's runway Siki Chen for the first time ever, an open source model is
state-of-the-art, outperforming OpenAI's GPT-40 and Anthropics Sonnet 3.5 across multiple benchmarks.
So today we are going to talk about what we have learned so far about Lama 3.405B and what the
implications are of the full closure of the gap between open source and closed source.
Now, to go back, we have been waiting for some time for META's largest Lama 3 model.
About a week and a half ago, we got reports from the information and others that META was planning
on releasing the 405 billion parameter version of Lama 3 today.
on July 23rd. Back in April, we got two smaller models from Lama 3, including the 8 billion
and 70 billion parameter models, but this was always going to be the big show. In a post last week
about why we should care about this release, Tom's guide wrote, the Lama 3 400B model is particularly
exciting as it approaches performance parity with OpenAI's GPT40 model despite using less than half
the parameters. Apart from the potential benefits to cost and energy efficiency, there's another
significant advantage. One of the most compelling aspects of Lama 3 is its open license for research
and commercial use. If the 400B model is released under the same open license, it would democratize
access to state-of-the-art language capabilities, allowing researchers and developers to leverage this
powerful tool for their projects without relying on expensive proprietary APIs. Still, as Tom's guide pointed out,
there had also been some scuttlebut that meta was not going to exactly open source this model.
Notorious open AI leaker Jimmy Apples wrote, meta plans to not open the weights for its 400B model.
The hope is that we would quietly not notice and let it slide. Don't let it slide.
He followed up saying Dustin Moskowitz, who is one of the big funders of the EA space and the AI safety movement, is, quote, having a loud voice to the ears of lab CEOs behind doors.
Then later, however, Jimmy Apple's updated to say, apparently at the moment, they do plan to open source it despite Dustin's objections to Zuck.
Open source AI advocate and AI CEO, Bindu Reddy, made this prediction.
In a couple of days, we will stop talking about politics, hopefully.
Lama 405B will be the topic of the day.
The gap between the closed and openweight models will finally close.
And indeed, that is what seems to have happened, at least with the information we have so far.
Matthew Berman says, suddenly the world has access to an open source model considered state-of-the-art.
It beats GPT-40 on many benchmarks.
What a time to be alive.
Data Economy writes,
In a dramatic turn of events, early benchmarking data for the forthcoming Lama 3.1B models,
including 8B-70B and the colossal 405B, were leaked on the local Lama subreddit today.
The preliminary results suggest that Lama 3.105B could potentially surpass the performance of the current industry
leader, OpenAI's GPT4O across several critical benchmarks. Should that happen, it would represent
the first instance of an open source model eclipsing a leading closed source LLM. And a couple other things
that the community has honed in on. Hold aside the question of whether it is the state of the art
and which of the benchmarks we should care most about. If these benchmarks are anywhere close to accurate,
the thing that is for sure is that it is a GPT40 class model. In the same way that we can debate around
whether GPT40 or Claude 3.5 Sonnet is exactly the better or more performant model, this would
add Lama 3405B to that conversation. It's important to note here that in addition to Lama 3.1
405B, we're also getting the 3.1 updates to the 8b and 70B models. Maxime LeBahn writes,
the new 70B also looks insane with a significant boost of performance compared to the previous
version, and that I think is a part of the story that is actually flying a little bit under
the radar comparatively, i.e. how much better these smaller models have gotten as well.
Chris at Hinge Loss also noted that there seemed to be an updated license, which removed the
prohibition on using Lama 3 to improve other models. He used Diff Checker to compare the 3.1
versus 3 Lama License. Going back to this idea that the big story once everything settles
might be the smaller models, Kyle Corbett points out that if the leaked benchmarks are correct,
Lama 3.17B beats GPT40 Mini. Ida-McLeow really sums it up when he says,
if these Lama 3405 benchmarks are real, this will be the world's best model in the hands of
everyone to tune, cheaper than GPT40. It's hard to overstate how fast everything is changing.
Picking up on some of the themes that we were discussing in yesterday's show about the move of competition
to the smaller end of the spectrum, latent spaces swicks called what's happening the under 100B model
red wedding. He writes, I do not think that people who criticize OpenAI have sufficiently absorbed
the magnitude of disruption that has happened because of 4-0 Mini. He points out that both Lama 370B and
GPT-40B and GPT-40B, but GBT 4-40Mini is priced at a sixth of the price of Lama 370B.
He points out that many of the models that were state of the art just three months ago are now being
dominated by new counterparts. Swix writes, what's the depreciation rate on the flops it took to train them?
GPT4 took 500 million to train and it lasted a year. Intelligence too cheap to meter, but also
to ephemeral to support greater than five players doing R&D. Is there an angle here I'm missing?
Swicks also compared the new 3.1 benchmarks to Lama 3.0 and came to the conclusion that there was
a huge bump for the 8B and that the instruct 70B is mildly better, but that the 405B is
still behind flagship models. Now, we are slated to actually get this model today and should be
able to independently start verifying these benchmarks, and until then we won't know exactly what
the situation is. However, if things are close to what they appear to be, we really are, it appears
living in a new paradigm where open source has, by and large, caught up and closed the gap with
closed source. One thing I will be watching is whether that prompts OpenAI and Anthropic to
release new models that redefine the state of the art once again. For now, though, pretty interesting
times, lots to pay attention to. And of course, I appreciate you hanging out and listening or watching
as always. Until next time, peace.
