The AI Daily Brief: Artificial Intelligence News and Analysis - Llama 3.1 405B Eliminates Gap Between Open and Closed Source AI

Episode Date: July 24, 2024

According to the earliest benchmarks, the newly released Llama 3.1 405B has almost entirely (if not entirely) closed the gap between closed and open source AI. At the very least, it's clear that 4...05B is a GPT-4o class model. Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'podcast' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, Meta's Lama 3.1405B may have closed the gap in the state of the art between open and closed source models. Before that on the brief, OpenAI updates their safety policies without actually updating anything. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the AI Daily News you need in around five minutes. We kick off today with the latest announcements from OpenAI on their safety strategy. They tweeted very early this morning East Coast time, making sure AI can benefit everyone starts with building AI that is helpful and safe.
Starting point is 00:00:41 We want to share some updates on how we're prioritizing safety in our work. We believe that frontier AI models can greatly benefit society. To help ensure our readiness, our preparedness framework helps evaluate and protect against the risks posed by increasingly powerful models. We won't release a new model if it crosses a medium risk threshold until we implement sufficient safety interventions. We're developing levels to help us and stakeholders categorize and track AI progress. This is a work in progress and we'll share more soon. Editor's note, this is, I think, what we got some information about last week with their various levels of AI proceeding towards
Starting point is 00:01:12 AGI. This tweet thread continues, however. In May, our board of directors launched a new safety and security committee to evaluate and further develop safety and security recommendations for open AI projects and operations. The committee includes leading cybersecurity expert, retired U.S. Army General Paul Nakasone. This review is a lot of the review is underway and we'll share more on the steps we'll be taking after it concludes. Our whistleblower policy protects employees' rights to make protected disclosures. We also believe rigorous debate about this technology is important, and it made changes to our departure process to remove non-disparagement terms. Safety has always been central to our work from aligning model behavior to monitoring for
Starting point is 00:01:44 abuse, and we're investing even further as we develop more capable models. Now, what's notable to this, at least to me, is that there's nothing new that's actually announced. They frame it as we want to share some updates on how we're prioritizing safety, but there's nothing actually new here. This suggests to me that there was some reason that they felt they wanted to remind everyone of these efforts and put it in a nice, easy-to-point-to-place, but not that they had something big and new that they wanted to highlight. As I was wondering about what the reasoning was that might be behind this, it started to pop up on X that a group of five senators had sent Sam Altman a note. The note, which was dated yesterday, July 22nd, reads, Dear Mr. Altman, we write to you regarding recent reports about OpenAI safety and employment practices.
Starting point is 00:02:23 OpenAI has announced a guiding commitment to the safe, secure, and responsible development of artificial intelligence and the public interest. These reports raise questions about how OpenAI is addressing emerging safety concerns. We seek additional information from OpenAI about the steps the company is taking to meet its public comments on safety, how the company is internally evaluating its progress on those commitments, and on the company's identification and mitigation of cybersecurity threats. The letter goes on to ask 12 questions, requesting that information be shared by August 13th of this year. One of the questions that many on Twitter have commented on is this one. Will OpenAI commit to making its next foundation model available to U.S. government agencies for pre-deployment testing, review, analysis, and assessment?
Starting point is 00:03:00 Now, it's likely that I'm going to dig deeper into each of these questions in an episode later this week. So for now, I will just note that it seems like there is an increasing moment of scrutiny around OpenAI from the U.S. government and that that might be driving some of this increased conversation around safety and preparedness. Next up, perplexity is back in the news, although this time not for a funding announcement. been amazingly. No, unfortunately for them, this time, it's because Condé Nast has sent a cease and desist letter to the company. The information reports that Condé Nast, which owns publications like the New Yorker Vogan Wired, has sent a cease and desist demanding that perplexity
Starting point is 00:03:32 stop using content from these publications in its search results. This follows a letter last month from Forbes, where Forbes accused perplexity of infringing on its copyright. Condéin Nass similarly claims that perplexity is plagiarizing its content. It seems that the ire around perplexity isn't around training data, as that's not what perplexity does. does, but around the way that Perplexity's AI summarizes news articles. Now, there are a lot of specific technical details in here around whether and where the bots that companies like Perplexity use to allow publishers to block their crawlers actually work, but ultimately this comes down to this new format for search. It feels to me fairly inevitable that the sort of combination of AI-generated
Starting point is 00:04:09 summary plus links is going to be the default and norm in the future, but the path to get there may be littered with a lot of legal battles, proving once again that when it comes right down to it, the only group of people who always win are the lawyers. Moving to some science, a new Google model has, quote, helped make a breakthrough in accurate long-range weather and climate predictions. Writes the Financial Times, using a hybrid of machine learning and existing forecasting tools, a model led by Google called Neural GSM, successfully harnessed AI to conventional atmospheric physics models to track decades
Starting point is 00:04:37 long climate trends and extreme weather events such as cyclones. A recent paper said Neural GSM proved faster, more accurate, and used less computing power and tests against a current forecasting model based on atmospheric physics tools called X-Shield. In one trial, they say, neural GCM identified almost the same number of tropical cyclones as conventional extreme weather trackers did, and twice the number of X shields. In another test based on temperature and humidity, the error rate was between 15 and 50% less. One of the things that often gets lost as we debate the big legal and ethical issues of AI
Starting point is 00:05:06 is how much it's likely to impact scientific discovery. But that ultimately is going to be the subject for another show. That is going to do it for today's AI Daily Brief Headlines edition. next up the main episode. Today's episode is brought to you by Super Intelligent, the platform for fun, fast AI learning. Super has a ton of new things going on. We recently announced our partnership with Spotify,
Starting point is 00:05:28 through which users of that app can now access Super Intelligent content directly from their mobile apps. We've also just launched the AI learning feed. In addition to seeing the tutorials that we're dropping, there are polls, news items with related lessons, and a chance for people to show off the projects in use cases that are making AI come alive
Starting point is 00:05:45 for them. We've also just kicked off the Super Summer Challenge, where each week we'll share a new challenge that you can use to discover new AI tools and use cases. Go to B-Super.a.i and use code super fun for 50% off your first two months. That's B-Super.a.i. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text image and code generation without the fear of being spied on or having your data exploited. Discuss anything with Venice without concerns about it being monitored, sold, or given to advertisers and governments. Venice is different because your conversations and creations are kept securely within your own browser, never stored or accessible by Venice. Unlike
Starting point is 00:06:24 other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence. No topics are off limits, no ideas are taboo. With Venice, you're in control of the AI, as you should be. Pro subscriptions are available for $49 a year or $8 per month. Try it for free without an account at venice.A.I. Welcome back to the AI Daily Brief. Today we are talking about some leaks around the forthcoming meta-Lama 3.1405B model. As Didi from Menlo Ventures puts it, this is potentially the biggest news in AI in several weeks. Lama 3.105B leaked day before on 4chan and obliterates GPT40 on most benchmarks. Wright's runway Siki Chen for the first time ever, an open source model is
Starting point is 00:07:09 state-of-the-art, outperforming OpenAI's GPT-40 and Anthropics Sonnet 3.5 across multiple benchmarks. So today we are going to talk about what we have learned so far about Lama 3.405B and what the implications are of the full closure of the gap between open source and closed source. Now, to go back, we have been waiting for some time for META's largest Lama 3 model. About a week and a half ago, we got reports from the information and others that META was planning on releasing the 405 billion parameter version of Lama 3 today. on July 23rd. Back in April, we got two smaller models from Lama 3, including the 8 billion and 70 billion parameter models, but this was always going to be the big show. In a post last week
Starting point is 00:07:49 about why we should care about this release, Tom's guide wrote, the Lama 3 400B model is particularly exciting as it approaches performance parity with OpenAI's GPT40 model despite using less than half the parameters. Apart from the potential benefits to cost and energy efficiency, there's another significant advantage. One of the most compelling aspects of Lama 3 is its open license for research and commercial use. If the 400B model is released under the same open license, it would democratize access to state-of-the-art language capabilities, allowing researchers and developers to leverage this powerful tool for their projects without relying on expensive proprietary APIs. Still, as Tom's guide pointed out, there had also been some scuttlebut that meta was not going to exactly open source this model.
Starting point is 00:08:29 Notorious open AI leaker Jimmy Apples wrote, meta plans to not open the weights for its 400B model. The hope is that we would quietly not notice and let it slide. Don't let it slide. He followed up saying Dustin Moskowitz, who is one of the big funders of the EA space and the AI safety movement, is, quote, having a loud voice to the ears of lab CEOs behind doors. Then later, however, Jimmy Apple's updated to say, apparently at the moment, they do plan to open source it despite Dustin's objections to Zuck. Open source AI advocate and AI CEO, Bindu Reddy, made this prediction. In a couple of days, we will stop talking about politics, hopefully. Lama 405B will be the topic of the day. The gap between the closed and openweight models will finally close.
Starting point is 00:09:08 And indeed, that is what seems to have happened, at least with the information we have so far. Matthew Berman says, suddenly the world has access to an open source model considered state-of-the-art. It beats GPT-40 on many benchmarks. What a time to be alive. Data Economy writes, In a dramatic turn of events, early benchmarking data for the forthcoming Lama 3.1B models, including 8B-70B and the colossal 405B, were leaked on the local Lama subreddit today. The preliminary results suggest that Lama 3.105B could potentially surpass the performance of the current industry
Starting point is 00:09:38 leader, OpenAI's GPT4O across several critical benchmarks. Should that happen, it would represent the first instance of an open source model eclipsing a leading closed source LLM. And a couple other things that the community has honed in on. Hold aside the question of whether it is the state of the art and which of the benchmarks we should care most about. If these benchmarks are anywhere close to accurate, the thing that is for sure is that it is a GPT40 class model. In the same way that we can debate around whether GPT40 or Claude 3.5 Sonnet is exactly the better or more performant model, this would add Lama 3405B to that conversation. It's important to note here that in addition to Lama 3.1 405B, we're also getting the 3.1 updates to the 8b and 70B models. Maxime LeBahn writes,
Starting point is 00:10:20 the new 70B also looks insane with a significant boost of performance compared to the previous version, and that I think is a part of the story that is actually flying a little bit under the radar comparatively, i.e. how much better these smaller models have gotten as well. Chris at Hinge Loss also noted that there seemed to be an updated license, which removed the prohibition on using Lama 3 to improve other models. He used Diff Checker to compare the 3.1 versus 3 Lama License. Going back to this idea that the big story once everything settles might be the smaller models, Kyle Corbett points out that if the leaked benchmarks are correct, Lama 3.17B beats GPT40 Mini. Ida-McLeow really sums it up when he says,
Starting point is 00:10:57 if these Lama 3405 benchmarks are real, this will be the world's best model in the hands of everyone to tune, cheaper than GPT40. It's hard to overstate how fast everything is changing. Picking up on some of the themes that we were discussing in yesterday's show about the move of competition to the smaller end of the spectrum, latent spaces swicks called what's happening the under 100B model red wedding. He writes, I do not think that people who criticize OpenAI have sufficiently absorbed the magnitude of disruption that has happened because of 4-0 Mini. He points out that both Lama 370B and GPT-40B and GPT-40B, but GBT 4-40Mini is priced at a sixth of the price of Lama 370B. He points out that many of the models that were state of the art just three months ago are now being
Starting point is 00:11:37 dominated by new counterparts. Swix writes, what's the depreciation rate on the flops it took to train them? GPT4 took 500 million to train and it lasted a year. Intelligence too cheap to meter, but also to ephemeral to support greater than five players doing R&D. Is there an angle here I'm missing? Swicks also compared the new 3.1 benchmarks to Lama 3.0 and came to the conclusion that there was a huge bump for the 8B and that the instruct 70B is mildly better, but that the 405B is still behind flagship models. Now, we are slated to actually get this model today and should be able to independently start verifying these benchmarks, and until then we won't know exactly what the situation is. However, if things are close to what they appear to be, we really are, it appears
Starting point is 00:12:18 living in a new paradigm where open source has, by and large, caught up and closed the gap with closed source. One thing I will be watching is whether that prompts OpenAI and Anthropic to release new models that redefine the state of the art once again. For now, though, pretty interesting times, lots to pay attention to. And of course, I appreciate you hanging out and listening or watching as always. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.