The AI Daily Brief: Artificial Intelligence News and Analysis - REPORT: Llama 3 Coming in July (with Fewer Guardrails)

Starting point is 00:00:00 Today on the AI breakdown, we're talking about the unbelievably fast-moving advances in the AI video space. Before that on the brief, new information about meta's ambitions for Lama 3, including when we can expect it. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown Network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes. We've gotten some interesting information about Lama 3. the last couple months. Meta CEO Mark Zuckerberg has talked about it a little bit more. Some of the things that he mentioned were that they underestimated how important coding was not just to the utility

Starting point is 00:00:42 of an LLM for a meta user community, but just for the sheer act of reasoning for how smart an LLM was. He also discussed how, while Lama 1 and Lama 2 were leaders in the open source LLM space, they really wanted Lama 3 to be state of the art in general. While now we've heard some new things courtesy of the information. They recently published a piece, Meta wants Lama 3 to hand. handle contentious questions as Google grapples with Gemini backlash. According to the information, the safeguards that are currently on Lama 2 have made it too quote unquote safe in the perception of meta senior leaders, as well as among researchers who work on the model. By way of describing the difference between a reasonable safe query to block and perhaps a little bit more questionable one,

Starting point is 00:01:23 the information pointed out there was a big difference between not answering how to make a bomb versus how a worker could get around coming into the office on a mandatory in-office day. The information says, of course, as you might expect, that meta's conservative approach with Lama 2 was a PR-related strategy, that they didn't want to deal with PR-related issues. They're now trying to get a little bit smarter about how they handle these things, for example, not just shutting down around certain words, but trying to get better context. If someone asks how to kill a vehicle's engine, Lama 3 should be able to figure out that they're asking about shutting an engine off, not a type of murder. Of course, the context for all of this is the backlash that we've

Starting point is 00:01:58 been following around Gemini and its production of historically inaccurate images, such as adding people of color to Nazi uniforms and more historical inaccuracies like that. I, of course, have no idea how long this perception of Lama 2 being too safe has been going on inside the company. What I do know is that what Zuckerberg has shown when it comes to AI strategy is that he is very comfortable bobbing when others weave. Meta, for example, would not necessarily be the big tech company you most expected to go open source, and yet that is where they planted their generative AI flag. Part of that, I think, had to do with OpenAI and Microsoft going in the opposite direction. Even if it was ideological on some level, it was also opportunistic.

Starting point is 00:02:38 Is getting Lama 3 more open to answering tricky questions? Another example of that? Certainly seems possible. Whatever the case, the information piece also reinforces that Lama 3 is a very big deal. They wanted to match the performance of GPD 4, although these reports say that they haven't decided whether it will be multimodal yet. And of course, even if it does match GPT4, another question will be, by the time July rolls around, will we all be using GPT5? Still interesting details that show meta's continuing evolution in their role in the space and something to watch for sure. Speaking of

Starting point is 00:03:11 OpenAI, we shift now to another set of lawsuits around OpenAI and copyright. The Intercept, Raw Story, and Alternate have all filed separate lawsuits against both OpenAI and Microsoft, although they're all being handled by the same law firm. They claim that ChatGBTGPT, quote, at least some of the time, reproduces verbatim or nearly verbatim copyright-protected works of journalism without providing author, title, copyright, or terms of use of information contained in those works. The Raw Story and Alternate lawsuit go even further, alleging that OpenAI and Microsoft, quote,

Starting point is 00:03:40 had reason to know that ChatGPT would be less popular and generate less revenue if users believed that ChatGPT responses violated third-party copyrights. These lawsuits are basically saying that OpenAI and Microsoft were aware of potential copyright infringement and point to evidence like OpenAI's opt-out system that gives website owners the ability to block content from OpenAI web crawlers. Interestingly, at the same time this is happening, OpenAI has filed to dismiss parts of the New York Times lawsuit. On Monday, they filed an argument that ChatGBTGPT is, quote, not in any way a substitution for a subscription to the New York Times. The filing said, in the real world, people do not use ChatGPT or any other OpenAI product for that

Starting point is 00:04:16 purpose, nor could they. In the ordinary course, one cannot use ChatchipT to serve up Times articles at will. Indeed, Open AI alleges that the New York Times went out of their way to quote-unquote hack the chat GPT system to produce evidence that looks worse for OpenAI. There was also one more lawsuit filed against OpenAI this week. A Florida woman filed a lawsuit in California calling for a shutdown of the site, which basically is a litany of AI safety arguments, including some of the most extreme. Lawyers for the plaintiffs wrote, Technological safety measures must be added to the products that will prevent the technology from surpassing human intelligence and harming others. The lawyers argued that the Microsoft $10 billion investment represents a 180-degree shift from OpenAI's original mission and OpenAI, quote, prioritizing short-term financial gains over long-term

Starting point is 00:05:00 safety and ethical considerations. Meanwhile, OpenAI's main partner, Microsoft, keeps on pumping out the jams. Their latest offering is that a new customized co-pilot for finance teams is now on offer. Reuters writes, Microsoft previewed an artificial intelligence tool for customers' finance departments, part of a strategy tailoring new software to industries, professionals, and ultimately individuals. The new tool is, of course, called Microsoft copilot for finance, and is specifically optimized for reviewing datasets, producing reports from raw numbers, and other finance department functions like that. This is not the first function-specific copilot app we've seen. Microsoft

Starting point is 00:05:35 has also announced tools for salespeople as well as customer service representatives. Over in Apple Land, there has been much discussion about their AI strategy in the wake of reports that they had canceled their multi-billion-dollar electric car project and shifted many of those resources over to AI, but a shareholder proposal to require Apple to disclose how it was using AI in its operations and any sort of guidelines that it was creating has been rejected at their annual shareholders' meeting. Still, it's quite clear that the market really wants to know what Apple's plans with AI are. CEO Tim Cook, for example, reiterated that they will announce progress on generative AI later this year. It's widely anticipated that where we'll start to see generative AI come

Starting point is 00:06:12 to the Apple ecosystem in a big way, is a little bit at the Worldwide Developer Conference in the summer and a lot around iOS 18 in the fall. Internet denizens, however, like Jason Calacanus, are starting to notice subtle features that are already finding their way into iOS updates that point towards Apple's AI future. Between Metas moves, Microsoft Apple, it is going to be a very competitive year in the world of AI. So strap in and subscribe to make sure you don't miss a thing. That's going to do it for today's AI breakdown brief. Up next, the main AI breakdown. Welcome back to the AI breakdown. The last few weeks have seen a number of really big announcements from a technical perspective

Starting point is 00:06:51 in artificial intelligence. We had GROC totally transforming the speed and latency of how fast LLMs could produce results. We had before everyone cared about Gemini and its historically inaccurate image making, a huge announcement from Google around a million-plus token context window in Gemini 1.5 Pro. Paired with that, we also got claims that Magic.com. has an even bigger context window. And the third category of big advance was, of course, SORA. Now, I've talked extensively about how SORA has been so profoundly differentiated in terms of the quality of the video that it's triggered a response not just among AI folks, like you and me

Starting point is 00:07:29 who are listening to slash making this show, but to Normies who just see things careening ahead at a speed that they can't really imagine. Since Open AI's announcement of SORA a couple weeks ago, we continued to get drips and drabs of creations, each seemingly more impressive than the last. But what's additionally interesting is how SORA has seemed to really kick up the competition around AI video in a huge way. Obviously, that's not to say that there weren't a ton of projects already working on this, but now there was a lot of attention being paid, and we're getting increasingly exciting stuff. This week, we've seen two examples of this. One of them is called LTX Studio. It's a new platform from Lighttricks. The idea is to be able to generate

Starting point is 00:08:08 not just these short clips, not the four-second clips of the runways and peekas of the world, or even the 60-second clips of the Soras, but actual full-length videos, with multiple-generated scenes, consistent characters, consistent lighting, storyboarding tools. In other words, it promises a full AI-powered movie-generating movie creation suite. Nick St. Pierre at Nick Floats on Twitter writes, this is the first video AI tool I've seen that's focused on helping you generate an actual story, not just clips, storyboarding, casting, consistent characters, sound effects, and editing all together in LTX Studio. Now, right now, LTCS studio is on a waitlist model. So we haven't

Starting point is 00:08:47 had a ton of people who've been able to actually get in there and try it. But it seems pretty clearly on trend from really both of the big trends of this year that we frequently talk about in terms of trying to push the state of the art when it comes to generative capacity, but also newly designed custom purpose software built around specific use cases. If you've watched AI video generation evolve over the last call it six to 12 months, you've seen people do more and more extensive things. Videos that have complete narrative arcs, but to do so, they've really had to hack together a bunch of different tools. Initial shots and reference points in Mid Journey, ported into runway or PICA, edited together in another piece of software, using something different for sound

Starting point is 00:09:23 layered on top. The examples that LTX has released so far look like they could have been one of those creations, but the value, theoretically, is that they were made much more simply in a single place. The other big video software that everyone has been talking about this week is a research model that comes from Alibaba that they call Emo. Emo is short for Emoet Portrait Alive, and it can take a single reference image and a vocal or audio source and turn that into a generated video with accurate lip syncing matching the soundtrack. Writes Venture Beat,

Starting point is 00:09:53 the system is able to create fluid and expressive facial movements and head poses that closely match the nuances of a provided audio track. This represents a major advance in audio-driven talking head video generation an area that has challenged AI researchers for years. The lead author of the paper said, Traditional techniques often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address those issues, we propose EMO,

Starting point is 00:10:15 a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Trying to explain that more simply, Venturebeat goes on, unlike previous methods that rely on 3D face models or blend shapes to approximate facial movements, EMO directly converts the audio waveform into video frames. This allows it to capture subtle motions and identify specific quirks associated with natural speech. Unfortunately for us, a lot of the examples that have been released,

Starting point is 00:10:41 use audio from popular media, which would get this flagged and taken down from YouTube, but just go on Twitter slash X and search up Emo, and there is a lot to be very impressed about. It should be noted that as all of these new contenders have popped up, the previous best-known tools for video generation, notably Pika and Runway, have also been making moves. Pica, for example, just announced their own lip syncing tool. Earlier this week, they tweeted, we know there's been a lot of talk about AI-generated video recently. Well, look who's talking now. Early access to lip sync is available for pro users. And while it's not a specific feature that was announced, I even noticed today someone tweeting,

Starting point is 00:11:16 has Runway ML made a model update in the past day? Is anyone else getting significantly crisper and more coherent generations from the same prompt? This gets back to what we were saying at the beginning of the show that the introduction of Sora is sort of making everyone kick up their game. Another tool that is playing in the same space as LTX studio is called Augie. VC and AI builder, Yohei tweeted recently, video generation tools like Sora and Pika are cool, but you still need to stitch them together, add voiceover, etc. Augie does exactly that. He then shared a full video, all in a consistent retro neon animation sort of style, that had an overarching coherent narrative and apparently came from a single prompt. Yohei writes, this is a full

Starting point is 00:11:55 video generated by AI from a single prompt. Transcript, voice, multiple AI generated videos, then stitch together all in one go. What's more, I believe that right now, Augie is not on a wait list and you can actually try it out. Not to be left out, stability has announced, well, multiple things in this space. This week, they announced a partnership with Morph Studio, announcing it in a tweet, they write, Morph X Stability AI, a groundbreaking collaboration bringing you the next gen AI video creation workflow. Just let your ideas flow in and watch as vivid videos come out. With Morph Studios all one video generation solution, your inspirations from text, images, or existing videos evolve into captivating stories. Experience seamless creation with multi-parallel connections for every spark of

Starting point is 00:12:34 creativity. This is a very differently organized, full production suite, but a full production suite nonetheless. Across all these projects, just to reiterate, we are seeing two very different things happening simultaneously. The first is a push for advances in the actual models powering these videos. That is, of course, what SORA, for example, represents. But you're also seeing a really strong focus on workflows, the tools and interfaces that surround models to help people get the most out of them. In some cases, you're seeing both integrated at once. Whichever these tools catch on, whichever these models ultimately end up the most advanced, right now is a very, very good time to be a video creator or to become one if you're not yet. That, however, is going to do

Starting point is 00:13:14 it for today's AI breakdown. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - REPORT: Llama 3 Coming in July (with Fewer Guardrails)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.