The AI Daily Brief: Artificial Intelligence News and Analysis - Llama 3 Is Here (And Seemingly Better Than Expected)

Starting point is 00:00:00 Today on the AI breakdown, the anticipated Lama 3 release has just happened. Before that on the brief, Microsoft's VASA 1 model is turning heads. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. Yesterday, everyone started talking about a new model from Microsoft. research called Vasa 1. Bindu Reddy summed it up this way. She called it the first AI-generated video that looks super real and said it takes a single portrait photo and speech audio and produces

Starting point is 00:00:45 a hyper-realistic talking face video with precise lip sync audio, lifelike facial behavior, and naturalistic head movements generated in real time. This is amazing given that the AI-generated video looks very real. So for those of you who are watching this rather than listening to it, let's check out a quick example. You ever had, maybe you're in that place right now where you wanted turn your life around and you know somewhere deep in your soul there could be some decision. It really is incredibly lifelike. And indeed, that's the way that they describe it in their research paper. The paper is called Vasa 1, lifelike audio-driven talking faces generated in real time. TLDR, they write, single portrait photo plus speech audio equals hyper-realistic talking face video

Starting point is 00:01:27 with precise lip sync audio, lifelike facial behavior, and naturalistic head movements generated in real time. In the abstract, they write, the model is capable of. of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations, they say, include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Part of what they're so excited about is that the model not only delivers realistic,

Starting point is 00:02:00 lifelike videos, but supports as they write 512 by 512 videos at up to 40 frames per second with negligible starting latency. What that means is that conceivably we're not too far away, not only from life-like AI generated videos, but from actually being able to interact with these lifelike avatars in real time. Now, in addition to producing just life-like videos from real photos, they say that their method also has the ability to handle photo and audio inputs that are out of their training distribution. For example, they can handle illustrations or paintings, singing audio, or non-English speech, none of which were present in the training set. Again, for those of you watching, here's an example of that lack of latency, which makes it

Starting point is 00:02:38 opportune for real time. But you know what I decided to do? I decided to focus all my attention, all my time on listening. So instead of doing something else, I just listened, listened and listened. because I'm a true believer that if you're really bad at something. Now the paper points out that this is not a technology without risk. The closer we get to truly lifelike, and especially real-time generation like this, the greater the chance that it's misused for disinformation or misinformation,

Starting point is 00:03:12 for impersonating people without their permission. And at this time, they write, we have no plans to release an online demo API product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations. And ultimately, the biggest caveat or asterisk on any of this is that none of us have actually gotten a chance to play with it.

Starting point is 00:03:32 This is just a paper with cherry-picked examples that, while super impressive, theoretically might not represent the standard output that you would get if everyone was allowed to use this. Still, it does seem like a fairly significant jump in this sort of avatar capacity, and so it is very much something to be watching for in the future. Next up, we get an update from OpenAI.

Starting point is 00:03:51 The company has released a new version of its assistance, Now, the Assistance API is OpenAI's tool that allows people to build agent-like assistants that have some specific purpose that they use ChatGPT for. One of the big changes is, as Brian Romley writes, the new OpenAI Assistance API can access up to 10,000 documents in a vector database rag. It's quite useful, and I have already mentioned it to clients that need larger-scale rags. Sely Omar from Cognosis says, new OpenAI assistant updates are really good, hard to bet against them.

Starting point is 00:04:21 tested retrieval and it's insanely fast and almost instant. Seems like they want a piece of the rag pie as well. At one point does it become cheaper to use them versus building your own? For those of you drowning in acronyms, rag stands for retrieval augmented generation and basically refers to the idea of an LLM pulling from a specific set of information for some specific purpose. This is a popular strategy right now for, for example, enterprises that want some version of an LLM to be able to pull from their proprietary information in order to give people in the company insight that is specific to the company. There are tons and tons of solutions for how an enterprise might go about that, and Sully's pointing

Starting point is 00:04:55 out that the better that OpenAI and ChatGPT get at it, the more incentive that companies have just to stay in that ecosystem. Of course, ultimately, it's not just a question of technical capacity, but a question of data trust. And my perception is at least that the biggest reason that enterprises are choosing not Open AI to spin up their own models, to customize open source models is because of those concerns around privacy. Still, like Sully says, the fact that this is an easy and fast approach that works really well could shift the balance of that conversation a little bit. Lastly today, an interesting one from the world of entertainment. There's been a lot of discourse in the entertainment sector and in Hollywood about artificial intelligence. Last year, the writer strike and

Starting point is 00:05:34 the SAG strike were some of the first instances in which concerns around AI really started to get into the mainstream. My question at the time was not so much if the concerns were real or not, but whether, if the choice was on the one hand try to ban or prohibit all this technology or on the other try to profit from it, how long would it take before Hollywood shifted over to the try to profit from its side? Apparently, the creative artist agency, CAA, one of the big Hollywood talent agencies, is thinking in a similar way and testing a new program called CAA Vault that allows the talent they represent, or at least a small handful of A-lister's that are testing it, to create a digital double of themselves that they can profit from. Said Alexandra Shannon, CAA's head of

Starting point is 00:06:13 corporate strategic development. On one hand, there's concerns that technology is being misused to exploit name, image-likeness, voice, body of work without consent. But we also recognize that it's creating opportunities for talent and an explosion of creativity in so many different ways. Shannon said, these technologies should not devalue the human. If somebody's digital likeness is being used in a campaign instead of them in person, it is still the value of that person and what they stand for as a representative of a brand. Of course, the concern that many have is that if the marginal cost of production comes down, and there is a much larger supply of celebrity spokespeople in the form of these digital doubles, what will that actually do to the value of any individual instance of that?

Starting point is 00:06:50 It seems like a pretty price deflationary force. Of course, the Brad Pitt brand, for example, is going to continue to retain value, but if there are digital Bradpits running around everywhere, how much each individual instance of that is going to be able to charge. Ultimately, those are questions for CAA, not for us, and so that is going to do it for the AI breakdown brief. Next up, the main AI breakdown. Today's podcast is brought to you by Plum. Is your product team struggling to keep up with the incredible pace of AI development? Are you tired of spending countless engineering hours just to test out small prompt changes in your product?

Starting point is 00:07:21 Thankfully, there's Plum. Build cutting-edge AI experiences for your users in a fraction of the time. Say goodbye to the slow, tedious process of hand coding and hello to the future of AI development. Get ahead of your competition and start moving as fast as AI does. Check out Useplum.com and shoot me a message to get early access. Attention AI Breakdown listeners. Consensus 2024 marks the 10th gathering for all things crypto, blockchain, and Web 3. However, importantly, this year's agenda will also dive deep into AI-driven transformation.

Starting point is 00:07:52 And the speaker lineup includes the leading minds and innovators at the forefront of this digital renaissance. Don't miss the Consensus AI Summit to cut through the hype to find where true transformation and opportunity lie. Listeners to this show can get 15% off registration with the code AI breakdown. Visit Consensus24.com to learn more. Some of the folks will be at Consensus this year include Guillaume Verdun, aka Beth Jzos, founder and CEO of XTropic, as well as spiritual leader of the acceleration of the Accelerationist Movement, Neil Stephenson, co-founder of Laminow 1, and Brendan Ike, the CEO of Brave Software.

Starting point is 00:08:24 Again, go to Consensus24.coindex.com to learn more and get 15% off registration with the code AI breakdown. Welcome back to the AI breakdown. Last week, reports came out that Meadow was on the verge of releasing its latest open-source LLM models Lama 3. Now at first this was an unconfirmed rumor, but then within a couple days, meta seemed to indicate that yes, this was coming, or at least a set of small versions of their next model were coming in advance of the largest version which would be coming out over the

Starting point is 00:08:52 summer. All week then, we have been waiting with bated breath to see what meta would put out, with tons and tons of speculation on just how good it would be. A couple hours before I recorded this episode, we started to get hints that Lama 3 was about to be dropped. First we saw Lama 38B instruct be listed on the ashy marketplace with little nuggets of information like this line. The fine-tuned versions are optimized for dialogue use cases. And then on replicate.com, we saw pricing for four different models. Lama 370B,

Starting point is 00:09:19 Lama 380B chat, and Lama 38B chat. People got in their last polls on how good others anticipated this to be. Yom Peleg wrote, final moments to guess, Lama 3 will be, and then gave the options state-of-the-art open-source software, same level as state-of-the-art open-source software, same level as state-of-the-art closed software or outperforming state-of-the-art closed software. Outperforming state-of-the-art closed was the lowest-ranked option with 8.8%. Same level as state-of-the-art closed had 12.8%. And then state-of-the-art open source and same level as state-of-the-art open source were almost exactly the same, getting 38.9% and 39.6% of the vote respectively.

Starting point is 00:09:57 Accelerate harder asked the simpler version of the same question. Lama 3 will be either amazing or a let-down, with 47.2% saying a letdown and 52. 8% saying it would be amazing. Just a few minutes later, we got the actual announcement. The AI at Meta account on X wrote, introducing Meta Lama 3, the most capable openly available LLM to date. Today, we're releasing 8B and 70B models that deliver on new capabilities such as improved reasoning and a set of new state-of-the-art for models of their size.

Starting point is 00:10:24 Today's release includes the first two Lama 3 models. In the coming months, we expect to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance. plus Lama 3 research paper for the community to learn from our work. Chief AI scientist at METI Yan LeCoon gave a few more bits of information. He said 8B and 70B models available today, 8K context length, trained with 15 trillion tokens on a custom-built 24K GPU cluster, great performance on various benchmarks,

Starting point is 00:10:50 with Lama 3.8B doing better than Lama 270B in some cases. We'll come back to that 8K context length because it sticks out kind of like a sore thumb relative to other models we've had recently. But as he has started to do, Zuckerberg took to their own. networks to talk about the new release. He wrote on Instagram, Big AI News Today. We're releasing the new version of meta-AI, our assistant that you can ask any question across our apps and glasses. Our goal is to build the world's leading AI. We're upgrading meta-AI with our new

Starting point is 00:11:17 state-of-the-art Lama 3 AI model, which we're open sourcing. With this new model, we believe meta-a-i is now the most intelligent AI assistant that you can freely use. We're making meta-a-i easier to use by integrating it into the search boxes at the top of WhatsApp, Instagram, Facebook, and messenger. We also built a website meta.a.ai for you to use on web. We also built some unique creation features like the ability to animate photos. MetaAI now generates high quality images so fast that it creates and updates them in real time as you're typing. It'll also generate a playback video of your creation process. Enjoy MetaAI and you can follow our new meta.a.i. Instagram for more updates. So a couple of things that are notable from this. One,

Starting point is 00:11:55 meta is continuing to reinforce their message of open source. And that's something we'll see even more in some other parts of the announcement in just a minute. Second, as had been intimated, Zuckerberg is clearly not content with just being the state of the art for open source. They want to go after the state of the art in general. You can see the sort of big language they're using. Our goal is to build the world's leading AI.

Starting point is 00:12:15 With this new model, we believe meta-a-I is now the most intelligent AI assistant that you can freely use. Even if there are little caveats in there, i.e. you can freely use. The ambition to be the best, full stop is clearly on display. Another really interesting thing, however,

Starting point is 00:12:29 from this announcement, is the extent to which they are integrating this into products right out of the gate. This is clearly not meant to just be a developer release, but something that is immediately impacting consumer products as of today. Alongside the announcement, they put out a lot of new information. Some of it's for developers, but some of it, of it, of course, is benchmarks. Almost immediately, people's eyes started bugging out at those benchmarks. Matt Schumer writes, holy S.

Starting point is 00:12:54 Lama 370B cleanly beats Claude 3 Sonnet, small enough to host its scale without breaking the bank. What he's referring to is both the MMLU, where Meta-L-3-70B claims in 82 versus Clod3 Sonnet 79, and Human Eval, where Meta-Lama 3 claims in 81.7 versus Claude III sonnet 73. Bindu Reddy from Abacus writes, historic moment. Lama 370B numbers are insane. At 82 MMLU, it's far and away the best open-source model. GSM 8K, math, and human-aval are mind-blowing as well.

Starting point is 00:13:24 The open-source community is definitely going to beat GPD4 in a matter of weeks. Schumer also pointed out that. quote, Lama 370B cleanly beats Mixtral 8X-22B. We've talked a lot on this show about the extent to which Mistral had stolen some of the open source thunder from Meta over the course of the end of last year, and there certainly seems to be meta's clapback. Overall, Professor Ethan Malik writes, meta released their open source AI Lama 3 today. As a key leader in LLMs, their models are often the most advanced open source ones out

Starting point is 00:13:50 there. Based on benchmarks, the current model is not quite GPT4 class, but their larger ones still training will reach GPT4 level. And indeed, this is what some people are the most. excited about. Schumer again writes the craziest Lama 3 reveal. The 400B plus version of the model is on par with Claude 3 opus and it's still training. Soon we'll have a better than opus fully open source model. The implications are huge. Meta3 400B's reported MMLU score is right on par with Opus 3 with their grade school math score and human avow score right around there as well.

Starting point is 00:14:21 Ashton Zhang from the meta team wrote, Lama 3 has been my focus since joining the Lama team last summer. Together we've been tackling challenges across pre-training and human data, pre-training long context, post-training, and evaluations. It's been a rigorous yet thrilling journey. He continues scaling is the recipe, demanding more than better scaling laws and infrastructure, e.g. managing high-effective training time across 16KGPUs requires innovative strategies. Still, I think the big implications are summed up by Dr. Jim Fan from Nvidia,

Starting point is 00:14:47 who writes, The upcoming Lama 3-400B will mark the watershed moment that the community gains open-weight access to a GPT4-class model. It will change the calculus for many research efforts in grassroots startups. Lama 3400B is still training, and will hopefully get even better in the next few months. There is so much research potential that can be unlocked with such a powerful backbone. Expect a surge in builder energy across the ecosystem.

Starting point is 00:15:09 Now, I think that's absolutely true. In many ways, the story of 2024 so far has been this standardization of the GPT4 class models. And so it's perhaps not surprising that open source seems to be catching up there as well. However, surprising or not, as Jim points out, the implications in terms of what people can build and how and for how much are fairly significant. Now, I mentioned that to the extent that there has been any quick critique, it's that 8K context window. Some pointed out that they thought that the community would extend that pretty quickly, and others inside meta explained it as well. Ashton Zhang again wrote,

Starting point is 00:15:42 we've set the pre-training context window to 8K tokens. A comprehensive approach to data modeling parallelism, inference, and evaluations would be interesting, more updates on longer context later. In other words, there are clearly tradeoffs that they were willing to make based on their goals with this release. Now, speaking of this release, one other really interesting little detail. Meta seems to have taken it to heart to go directly to the community, because in addition to all the traditional PR strategies, Zuckerberg popped up on a number of creator shows that are beloved and well-known inside AI circles, but not even close to the size of mainstream outlets. Roberto Nixon, who runs some of the most popular Instagram and TikTok channels on AI in future

Starting point is 00:16:19 technology, did an interview with Zuck, and Dwarkesh, whose podcast has quickly become, I think, the most high-value interview show that exists, released an hour and 20-minute episode. with Zuckerberg that gets deep not only on Lama 3, but as to our Keshe writes, open sourcing towards AGI, custom silicon, synthetic data, and energy constraints on scaling, along with, you know, intelligence explosions, bio-weapons, $10 billion models, and much more. Overall, I would say the first impressions are extremely exciting. I think even more exciting than the community thought they were going to be when they heard that a couple of small models were coming last week.

Starting point is 00:16:52 I'm sure in the coming days, I will be able to add more context around what people are finding in terms of actual performance. but for now, it's a cool day with lots to explore. That is going to do it for today's AI breakdown. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Llama 3 Is Here (And Seemingly Better Than Expected)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.