The AI Daily Brief: Artificial Intelligence News and Analysis - REPORT: Llama 3 Coming in July (with Fewer Guardrails)
Episode Date: February 29, 2024The Information reports that Meta's most advanced model Llama 3 will be arriving in July, and will specifically have safety features dialed back. Also on this episode, AI video is heating up! INTEREST...ED IN THE AI EDUCATION BETA? ONE DAY LEFT TO REGISTER! Learn more and sign up https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're talking about the unbelievably fast-moving advances in the AI video space.
Before that on the brief, new information about meta's ambitions for Lama 3, including when we can expect it.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown Network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes.
We've gotten some interesting information about Lama 3.
the last couple months. Meta CEO Mark Zuckerberg has talked about it a little bit more. Some of the
things that he mentioned were that they underestimated how important coding was not just to the utility
of an LLM for a meta user community, but just for the sheer act of reasoning for how smart an LLM was.
He also discussed how, while Lama 1 and Lama 2 were leaders in the open source LLM space, they really
wanted Lama 3 to be state of the art in general. While now we've heard some new things courtesy
of the information. They recently published a piece, Meta wants Lama 3 to hand.
handle contentious questions as Google grapples with Gemini backlash. According to the information,
the safeguards that are currently on Lama 2 have made it too quote unquote safe in the perception
of meta senior leaders, as well as among researchers who work on the model. By way of describing
the difference between a reasonable safe query to block and perhaps a little bit more questionable one,
the information pointed out there was a big difference between not answering how to make a bomb
versus how a worker could get around coming into the office on a mandatory in-office day. The information
says, of course, as you might expect, that meta's conservative approach with Lama 2 was a PR-related
strategy, that they didn't want to deal with PR-related issues. They're now trying to get a little
bit smarter about how they handle these things, for example, not just shutting down around certain
words, but trying to get better context. If someone asks how to kill a vehicle's engine,
Lama 3 should be able to figure out that they're asking about shutting an engine off,
not a type of murder. Of course, the context for all of this is the backlash that we've
been following around Gemini and its production of historically inaccurate images, such as adding
people of color to Nazi uniforms and more historical inaccuracies like that. I, of course, have no idea
how long this perception of Lama 2 being too safe has been going on inside the company. What I do
know is that what Zuckerberg has shown when it comes to AI strategy is that he is very comfortable
bobbing when others weave. Meta, for example, would not necessarily be the big tech company
you most expected to go open source, and yet that is where they planted their generative AI flag.
Part of that, I think, had to do with OpenAI and Microsoft going in the opposite direction.
Even if it was ideological on some level, it was also opportunistic.
Is getting Lama 3 more open to answering tricky questions?
Another example of that?
Certainly seems possible.
Whatever the case, the information piece also reinforces that Lama 3 is a very big deal.
They wanted to match the performance of GPD 4, although these reports say that they haven't decided
whether it will be multimodal yet. And of course, even if it does match GPT4, another question will be,
by the time July rolls around, will we all be using GPT5? Still interesting details that show
meta's continuing evolution in their role in the space and something to watch for sure. Speaking of
OpenAI, we shift now to another set of lawsuits around OpenAI and copyright. The Intercept,
Raw Story, and Alternate have all filed separate lawsuits against both OpenAI and Microsoft,
although they're all being handled by the same law firm.
They claim that ChatGBTGPT, quote, at least some of the time,
reproduces verbatim or nearly verbatim copyright-protected works of journalism
without providing author, title, copyright, or terms of use of information contained in those works.
The Raw Story and Alternate lawsuit go even further,
alleging that OpenAI and Microsoft, quote,
had reason to know that ChatGPT would be less popular
and generate less revenue if users believed that ChatGPT responses violated third-party copyrights.
These lawsuits are basically saying that OpenAI and Microsoft were aware of potential
copyright infringement and point to evidence like OpenAI's opt-out system that gives website owners
the ability to block content from OpenAI web crawlers. Interestingly, at the same time this is happening,
OpenAI has filed to dismiss parts of the New York Times lawsuit. On Monday, they filed an argument that
ChatGBTGPT is, quote, not in any way a substitution for a subscription to the New York Times.
The filing said, in the real world, people do not use ChatGPT or any other OpenAI product for that
purpose, nor could they. In the ordinary course, one cannot use ChatchipT to serve up Times articles at will.
Indeed, Open AI alleges that the New York Times went out of their way to quote-unquote hack the chat GPT system to produce evidence that looks worse for OpenAI.
There was also one more lawsuit filed against OpenAI this week.
A Florida woman filed a lawsuit in California calling for a shutdown of the site, which basically is a litany of AI safety arguments, including some of the most extreme.
Lawyers for the plaintiffs wrote,
Technological safety measures must be added to the products that will prevent the technology from surpassing human intelligence and harming others.
The lawyers argued that the Microsoft $10 billion investment represents a 180-degree shift from
OpenAI's original mission and OpenAI, quote, prioritizing short-term financial gains over long-term
safety and ethical considerations.
Meanwhile, OpenAI's main partner, Microsoft, keeps on pumping out the jams.
Their latest offering is that a new customized co-pilot for finance teams is now on offer.
Reuters writes, Microsoft previewed an artificial intelligence tool for customers' finance
departments, part of a strategy tailoring new software to industries, professionals, and ultimately
individuals. The new tool is, of course, called Microsoft copilot for finance, and is specifically
optimized for reviewing datasets, producing reports from raw numbers, and other finance department
functions like that. This is not the first function-specific copilot app we've seen. Microsoft
has also announced tools for salespeople as well as customer service representatives.
Over in Apple Land, there has been much discussion about their AI strategy in the wake of reports that
they had canceled their multi-billion-dollar electric car project and shifted many of those resources
over to AI, but a shareholder proposal to require Apple to disclose how it was using AI in its
operations and any sort of guidelines that it was creating has been rejected at their annual
shareholders' meeting. Still, it's quite clear that the market really wants to know what Apple's
plans with AI are. CEO Tim Cook, for example, reiterated that they will announce progress on
generative AI later this year. It's widely anticipated that where we'll start to see generative AI come
to the Apple ecosystem in a big way, is a little bit at the Worldwide Developer Conference in the
summer and a lot around iOS 18 in the fall. Internet denizens, however, like Jason Calacanus,
are starting to notice subtle features that are already finding their way into iOS updates
that point towards Apple's AI future. Between Metas moves, Microsoft Apple, it is going to be
a very competitive year in the world of AI. So strap in and subscribe to make sure you don't miss
a thing. That's going to do it for today's AI breakdown brief. Up next, the main AI breakdown.
Welcome back to the AI breakdown.
The last few weeks have seen a number of really big announcements from a technical perspective
in artificial intelligence.
We had GROC totally transforming the speed and latency of how fast LLMs could produce results.
We had before everyone cared about Gemini and its historically inaccurate image making,
a huge announcement from Google around a million-plus token context window in Gemini 1.5 Pro.
Paired with that, we also got claims that Magic.com.
has an even bigger context window. And the third category of big advance was, of course, SORA.
Now, I've talked extensively about how SORA has been so profoundly differentiated in terms of the
quality of the video that it's triggered a response not just among AI folks, like you and me
who are listening to slash making this show, but to Normies who just see things careening
ahead at a speed that they can't really imagine. Since Open AI's announcement of SORA a couple
weeks ago, we continued to get drips and drabs of creations, each seemingly more impressive than the last.
But what's additionally interesting is how SORA has seemed to really kick up the competition
around AI video in a huge way. Obviously, that's not to say that there weren't a ton of projects
already working on this, but now there was a lot of attention being paid, and we're getting
increasingly exciting stuff. This week, we've seen two examples of this. One of them is called
LTX Studio. It's a new platform from Lighttricks. The idea is to be able to generate
not just these short clips, not the four-second clips of the runways and peekas of the world,
or even the 60-second clips of the Soras, but actual full-length videos,
with multiple-generated scenes, consistent characters, consistent lighting, storyboarding tools.
In other words, it promises a full AI-powered movie-generating movie creation suite.
Nick St. Pierre at Nick Floats on Twitter writes,
this is the first video AI tool I've seen that's focused on helping you generate an actual story,
not just clips, storyboarding, casting, consistent characters, sound effects, and editing
all together in LTX Studio. Now, right now, LTCS studio is on a waitlist model. So we haven't
had a ton of people who've been able to actually get in there and try it. But it seems pretty
clearly on trend from really both of the big trends of this year that we frequently talk about
in terms of trying to push the state of the art when it comes to generative capacity, but also
newly designed custom purpose software built around specific use cases. If you've watched AI video
generation evolve over the last call it six to 12 months, you've seen people do more and more
extensive things. Videos that have complete narrative arcs, but to do so, they've really had to
hack together a bunch of different tools. Initial shots and reference points in Mid Journey, ported into
runway or PICA, edited together in another piece of software, using something different for sound
layered on top. The examples that LTX has released so far look like they could have been one of those
creations, but the value, theoretically, is that they were made much more simply in a single place.
The other big video software that everyone has been talking about this week
is a research model that comes from Alibaba that they call Emo.
Emo is short for Emoet Portrait Alive,
and it can take a single reference image and a vocal or audio source
and turn that into a generated video with accurate lip syncing matching the soundtrack.
Writes Venture Beat,
the system is able to create fluid and expressive facial movements
and head poses that closely match the nuances of a provided audio track.
This represents a major advance in audio-driven talking head video generation
an area that has challenged AI researchers for years.
The lead author of the paper said,
Traditional techniques often fail to capture the full spectrum of human expressions
and the uniqueness of individual facial styles.
To address those issues, we propose EMO,
a novel framework that utilizes a direct audio-to-video synthesis approach,
bypassing the need for intermediate 3D models or facial landmarks.
Trying to explain that more simply, Venturebeat goes on,
unlike previous methods that rely on 3D face models
or blend shapes to approximate facial movements,
EMO directly converts the audio waveform into video frames.
This allows it to capture subtle motions and identify specific quirks associated with natural speech.
Unfortunately for us, a lot of the examples that have been released,
use audio from popular media, which would get this flagged and taken down from YouTube,
but just go on Twitter slash X and search up Emo, and there is a lot to be very impressed about.
It should be noted that as all of these new contenders have popped up,
the previous best-known tools for video generation, notably Pika and Runway,
have also been making moves. Pica, for example, just announced their own lip syncing tool.
Earlier this week, they tweeted, we know there's been a lot of talk about AI-generated video recently.
Well, look who's talking now. Early access to lip sync is available for pro users.
And while it's not a specific feature that was announced, I even noticed today someone tweeting,
has Runway ML made a model update in the past day? Is anyone else getting significantly crisper and
more coherent generations from the same prompt? This gets back to what we were saying at the
beginning of the show that the introduction of Sora is sort of making everyone kick up their game.
Another tool that is playing in the same space as LTX studio is called Augie. VC and AI builder,
Yohei tweeted recently, video generation tools like Sora and Pika are cool, but you still need to
stitch them together, add voiceover, etc. Augie does exactly that. He then shared a full video,
all in a consistent retro neon animation sort of style, that had an overarching coherent narrative
and apparently came from a single prompt. Yohei writes, this is a full
video generated by AI from a single prompt. Transcript, voice, multiple AI generated videos, then
stitch together all in one go. What's more, I believe that right now, Augie is not on a wait list
and you can actually try it out. Not to be left out, stability has announced, well, multiple things
in this space. This week, they announced a partnership with Morph Studio, announcing it in a tweet,
they write, Morph X Stability AI, a groundbreaking collaboration bringing you the next gen AI video
creation workflow. Just let your ideas flow in and watch as vivid videos come out. With Morph Studios all
one video generation solution, your inspirations from text, images, or existing videos evolve into
captivating stories. Experience seamless creation with multi-parallel connections for every spark of
creativity. This is a very differently organized, full production suite, but a full production
suite nonetheless. Across all these projects, just to reiterate, we are seeing two very different
things happening simultaneously. The first is a push for advances in the actual models powering
these videos. That is, of course, what SORA, for example, represents. But you're also seeing a
really strong focus on workflows, the tools and interfaces that surround models to help people
get the most out of them. In some cases, you're seeing both integrated at once. Whichever these
tools catch on, whichever these models ultimately end up the most advanced, right now is a very,
very good time to be a video creator or to become one if you're not yet. That, however, is going to do
it for today's AI breakdown. Until next time, peace.
