The AI Daily Brief: Artificial Intelligence News and Analysis - Meta's New Movie Gen Model Could Change AI Video
Episode Date: October 9, 2024Meta has unveiled its latest breakthrough in AI with MovieGen, a powerful new video generation model designed to take personalized video and audio creation to the next level. From high-definition text...-to-video to precise video editing and audio generation, MovieGen promises to be a game changer in the AI space. In this episode, explore what makes this release so exciting, how it compares to competitors like Sora and Runway, and what industry experts are saying about Meta's latest AI tool. Find out why MovieGen could revolutionize the future of video creation! Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, meta has jumped with both feet into the video generation game.
Before that in the headlines, OpenAI's big fundraising is leading to a new wave of big AI deals.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes.
This is perhaps completely to be expected, but it's not.
in the wake of OpenAI announcing their $6.6 billion fundraising deal at a $157 billion post-money
valuation, that momentum is being carried right on through into a set of other deals from
OpenAI competitors.
As we discussed last week, one of the things that made the deal notable was that OpenAI
explicitly asked their investors to avoid backing a group of companies which they seem to view
as competitive.
Those competitors included Elon's XAI, Anthropic, perplexity, and glean among others.
And that reads exactly like a list of people who are going back to the well,
trying to capture some of this momentum from the OpenAI fundraise
to top up their own cash reserves as well.
The information has just put out a list of deals that their sources see materializing.
XAI, who is of course the maker of Grock,
is starting to receive interest from prospective backers
for a potential raise to close by the end of the first quarter of next year.
The information has recently floated a $40 billion valuation in their own fundraising talks,
Perplexity, which most recently raised $250 million at a valuation of $3 billion back in June,
has started to receive some inquiries as well.
One more big deal that isn't in the Open AI competitive set necessarily, but which still
remains notable, is a new round from Enterprise-focused writer, who are looking to raise
between $150 million and $200 million at a $1.9 billion valuation.
The deal space is heating up significantly enough that some are saying it's reminiscent to the
2021 fundraising boom.
Now, while that fundraising period was all about low interest rates,
the post-COVID period, this new wave of dealmaking is extremely concentrated in just AI.
Interestingly, the information also reports that beyond just the excitement and momentum of a hype
cycle, quote, investors say they've also been encouraged by the fact that more startups are
generating revenue, even though profits may be years away. For example, Writer, which was generating
$40 million in annual recurring revenue at the end of September, which was up from $30 million
at the end of June. It projects that it will reach $50 million ARR by the end of the year.
11 Labs is also mentioned by the information as another company that's looking
to raise a potential multi-billion dollar valuation. One drag on the potential fundraising is the number
of companies whose founders have chosen to resign to go join big tech companies in this sort of weird
non-acquisition aqua hire model that seems to becoming more common in AI. Leaders from companies
including inflection, character, adept, and co-variant have all taken that path, which isn't exactly
a great one when it comes to investors looking for the next runaway success. Anyways, it definitely
feels like there is some funding momentum right now, and I will of course keep you posted as those
deal start to materialize. Now, speaking of OpenAI, last week's Deb Day was the big topic of
of conversation, and front of the show, Swix, did a great set of polls, which serve as a vibe check
for how people were actually interacting with OpenAI's announcements a few days after they were made.
When it comes to the real-time API, those who have or haven't tried but were interested or
impressed were beating out those who haven't tried and don't care or have tried and found it
me. The have-tried and impressed stood at 14.7% versus the have-tried and not impressed at 10.4%
And those who haven't tried but were interested represented 46% of respondents versus 28.8% of those
who haven't tried and don't care. Chad JBT Canvas, I thought, was interesting and maybe reflects
the facts that Swix's audience is more developer-heavy than Normie, because while among those who
had tried it more were impressed than were unimpressed, it wasn't by nearly as much as I would
have thought. 22% of his respondents had tried it and found it impressive versus 17.4% who had tried
it and found it meh. I think Canvas is a massive UI upgrade, but again, if you were looking
for model performance upgrades, this is more or less strictly speaking a user experience thing.
When it comes to model distillation, the vast majority haven't tried yet, and there were more
who don't care than we're interested, although not by much. On vision fine tuning, it was pretty
similar. Most people hadn't tried, and more of those who hadn't tried didn't care than were
interested. But the big one was the last question. Inclusive of all recent technical and non-technical
news at OpenAI and its competitors. When Swix asked, do you think that Open AI is headed in the
right track? And number two, do you think that Frontier AI as a whole is headed on the right track?
56% said yes to both. 22% said that OpenAI was not on the right track, but Frontier AI was.
4.5% said that OpenAI was on the right track, but Frontier AI as a whole wasn't. And 16.6% said
that neither OpenAI nor Frontier AI as a whole were on the right track.
56% of respondents saying that Open AI is headed in the right track and Frontier AI is headed that
direction, basically 60% overall saying Open AI is headed in the right direction strikes me as a win
given how contentious this company is.
Speaking of contentious, our last story today in the headlines, after lots and lots of bluster
about how Andreessen Horowitz was going to support the Trump campaign, founding partner Ben Horowitz
shared with employees that he planned to donate to Kamala Harris's run for president as well.
On Twitter, he said, I sent an internal email that Axios got a hold of. Here it is. This is what it is. There is nothing else no matter how it gets characterized. In that note to the A16Z team, he said, I wanted to give you an update on my political activity. As I mentioned before, Felicia and I have known Vice President Harris for over 10 years, and she has been a great friend to both of us during that time. She's also been a friend to the firm in our early days, helping with several events at my house when we built the original Cultural Leadership Fund Network. As a result of our friendship, Felicia and I will be making a significant donation to entities who support the Harris-Walter.
campaign. From a firm perspective, we continue to only take positions consistent with our little
tech agenda and how the various candidates that support or don't support policies to build a strong
startup technology industry. Although I have had several conversations with Vice President Harris and her
team on their likely tech policies, and am encouraged by my belief in her, they have not yet
stated what their tech policy will be, so the firm will not be updating its position in that
regard. As we stated earlier, the Biden administration has been exceptionally destructive on tech
policy across the industry, but especially as it relates to crypto and blockchain and AI. So while I am
very hopeful that the Harris administration will be much better, they have not yet stated their
intentions. A fairly common response was summed up by Amaran, who responded, I don't quite comprehend
the this admin has been destructive, but I'll vote for them in hopes they won't be destructive.
But by and large, the response was that it was another example of people playing all sides.
So ultimately, the question is, does this mean that Horowitz thinks that a Harris administration
would be meaningfully better for tech than a second Biden administration would have been?
or is this simply a reflection of either A, standard rich person opportunism, or B, genuine personal
relationships? Only time will tell, but for now, that is going to do it for today's AI Daily Brief
Headlines edition. Next up, the main episode. Today's episode is brought to you by Venice. Venice is a
private, uncensored generative AI app. It accesses open source models to enable text, image,
and code generation without the fear of being spied on or having your data exploited. Discuss anything
with Venice without concern about it being monitored, sold, or given to advertisers and governments.
Venice is different because your conversations and creations are kept securely within the browser,
never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay
to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence,
no topics are off limits, no ideas, or taboo. With Venice, you're in control of the AI, as you
should be. Pro subscriptions are available for $49 a year or $8 per month. AI Daily Brief listeners
receive a 20% discount on Venice Pro.
Visit venice.a.I. slash NLW and enter the discount code NLW Daily Brief. That's NLW Daily Brief,
all one word.
Today's episode is brought to you by Plum. Want to use AI to automate your work but don't
know where to start? Plum lets you create AI workflows by simply describing what you want.
No coding or API keys required. Imagine typing out AI, analyze my Zoom meetings and send me your
insights in Notion and watching it come to life before your eyes. Whether you're an operations leader,
marketer or even a non-technical founder, Plum gives you the power of AI without the technical hassle.
Get instant access to top models like GPT40, Claude Sonnet 3.5, assembly AI, and many more.
Don't let technology hold you back. Check out Use Plum, that's Plum with a B, for early access
to the future of workflow automation. Today's episode is brought to you by Super Intelligent.
Every single business workflow and function is being remade and reimagined with artificial intelligence.
There is a huge challenge, however, of going from the potential of AI.
to actually capturing that value.
And that gap is what Superintelligent is dedicated to filling.
Superintelligent accelerates AI adoption and engagement to help teams actually use AI
to increase productivity and drive business value.
An interactive AI use case registry gives your company full visibility into how people
are using artificial intelligence right now.
Pair that with capabilities building content in the form of tutorials, learning paths,
and a use case library.
And Superintelligent helps people inside your company show how they're getting value out of
AI, while providing resources for people to put that inspiration into action. The next three teams that
sign up with 100 or more seats are going to get free embedded consulting. That's a process by which
our super intelligent team sits with your organization, figures out the specific use cases that matter
most to you, and helps actually ensure support for adoption of those use cases to drive real value.
Go to Bsupert.a.I to learn more about this AI enablement network, and now back to the show.
Welcome back to the AI Daily Brief.
On Friday, meta-announced meta-movieGen.
They claim it is the most advanced media foundation models to date, and people are pretty excited
about it.
Today we're going to talk about what they announced, what the response is, where the skepticism
might lie, and basically use this as a chance to check in on video generation more broadly.
The company writes, movie gen delivers state-of-the-art results across a range of capabilities.
Movie-gen video is a 30-billion parameter transformer model that can generate
high quality and high definition images and videos from a single text prompt.
MovieGen audio is a 13 billion parameter transformer model that can take a video input
along with optional text prompts for controllability to generate high fidelity audio sync to the video.
They go on.
It can generate ambience sound, instrumental background music, and Foley sound, delivering state-of-the-art
results in audio quality, video-to-audio alignment, and text-to-audio alignment.
There is also video editing, using a generated or existing video, and accompanying text instructions
as an input, movie gen can perform localized edits such as adding, removing, or replacing elements,
as well as global changes like background or style changes, and there is also personalization
where you can use an image of a person or a text prompt with the model generating a video
with state-of-the-art results on character preservation and natural movement in video.
A lot of state-of-the-art talk, as you can see. That message was reiterated by chief product
officer Chris Cox on threads, who wrote, we're sharing our progress today on movie gen,
our project to develop the state-of-the-art for AI video generation.
As of today, our evals show its industry leading on text of video quality across a number of
dimensions with 16 seconds of continuous length, plus a leap forward for state-of-the-art on video-matched
audio, precise editing, and character consistency and personalization.
Cox continues, the feedback we heard from filmmakers and video creators was to prioritize ease of editing,
but even more the ability to generate videos with a specified character or image,
which these models now faithfully achieve.
A research blog post shared more about what the model can do.
They write, given a text prompt, we can leverage a joint model that has been optimized for both text to image and text to video to create high quality and high definition images and videos.
The 30 billion parameter transformer model has the ability to generate videos of up to 16 seconds at a rate of 16 frames per second.
They continue, we find that these models can reason about object motion, subject object interactions, and camera motion,
and they can learn plausible motions for a wide variety of concepts, making them state-of-the-art models in their categories.
Now, in terms of demonstration, Mehta showed off life-like water, skin texture, hair, and movement.
They presented the example of a koala on a surfboard with choppy waves splashing on the camera lens,
all perfectly rendered, a person dancing around in a bed sheet ghost costume, complete with a
reflection in the mirror, and of course, this underwater shot of a baby hippo swimming around,
complete with distortions and surface reflections.
A lot of the emphasis is around this personalized video generation.
A photo of a subject can be uploaded and transformed into a video clip.
For example, Meta gave us a fairly standard selfie combined with the prompt
a woman DJ and a pink jacket spins records with a cheetah by her side.
The potentially game-changing thing here is character consistency.
For many use cases, this would be exactly what takes generated video from a cool novelty
to an extremely powerful professional tool.
This would, of course, massively expand the use cases for not only filmmakers but also
advertisers and anyone who needs any sort of multi-step video that's longer than just a single
clip.
Video editing also potentially represents a really big update.
The example that Meta gave of precise video editing is of dressing up penguins in Victorian dresses,
then adding some beach umbrellas before turning the whole scene into an animation in the style of a pencil sketch.
They also showed off audio generation.
A TV engine roars and accelerates with guitar music,
and rustling leaves and snapping twigs with an orchestral music.
They summed up their vision by stating,
as we continue to improve our models and move towards a potential future release,
we'll work closely with filmmakers and creators to integrate their feedback.
By taking a collaborative approach, we want to ensure we're creating tools,
that help people enhance their inherent creativity in new ways,
they may have never dreamed would be possible.
Imagine animating a day-in-the-life video to share on reels
and editing it using text prompts
or creating a customized animated birthday greeting for a friend
and sending it to them on WhatsApp.
With creativity and self-expression taking charge,
the possibilities are infinite.
So a couple things.
This is actually quite a revealing endpoint.
First of all, this technology is not available currently.
So basically, we have a Sora situation on our hand,
where it appears really impressive,
but no one has the chance to actually test it out
in real life. Chris Cox dealt with this directly in his threads post saying,
we aren't ready to release this as a product anytime soon. It's still expensive and
generation time is too long, but we wanted to share where we are since the results are getting
quite impressive. Meta clearly thinks that some of this is going to be controversial. In that
same blog post, they wrote, while there are many exciting use cases for these foundation models,
it's important to note that generative AI isn't a replacement for the work of artists and
animators. We're sharing this research because we believe in the power of this technology to
help people express themselves in new ways and to provide opportunities to people who might not
otherwise have them. Our hope is that perhaps one day in the future, everyone will have the opportunity
to bring their artistic visions to life and create high-definition videos and audios using movie gen.
So what were people's reactions to this? Bilal Sidhu, the host of the TED AI show writes,
okay, finally dug into Meta's new movie gen paper. Text the video is cool and all, but to me,
the precise editing feature is the game changer. It can handle complete VFX tasks,
like replacing environments, doing set extensions, swapping characters, removing items,
adding particle effects with realistic lighting interaction.
Now, Biloa also got a little wonky about how they had figured out how to train this.
He said the coolest bit to me is how they trained this model because paired before,
after VFX editing data sets are super scarce.
TLDR, they taught it video editing through a clever three-stage process.
One, started with image editing data, treating it like single-frame video edits.
Two, created synthetic video editing tasks by animating still image edits and using AI models for object segmentation.
three, the model-generated edited videos and then learn to reconstruct the originals from the edited
version. Meta calls this video editing via back translation. Lots of people also landed on the SORA
comparison. Chubby on X writes, Meta has landed an absolute hit today with Meta MovieGen. Not only
is the video quality at Sora level, you can even upload your own photo and integrate it into the video.
Speaking of Sora, Andrew Curran points out that meta is not uncomfortable calling out competitors
by name. He flagged a section of the paper which reads, on text to video generation, we outperformed
prior state-of-the-art, including commercial systems such as runway gen 3, Luma Labs, and OpenAI
SORA, on overall video quality. Audio outperforms prior state-of-the-art, including commercial systems
such as PICA Labs and 11 Labs for sound effect generation, music generation, and audio extension.
That paper, by the way, also pointed out that this model was trained on 6,144 H-100 GPUs.
Roberto Nixon pointed out that holds aside the quality, meta also has the distribution.
He writes, meta is going to be very hard to beat. They just announced movie gen, which
looks to be at the very least on par with the best video models currently available.
They have 3.6 billion daily active users they can serve this to and free.
It'll be rolled out to all Instagram users early next year.
And that is, of course, what makes meta so different and such a huge giant in this space.
Between WhatsApp, Instagram, and Facebook itself, they cover a huge portion of the Earth's population.
To the extent that there is any skepticism, it's certainly in the fact that this isn't available yet,
but the pretty clear sentiment is that this is going to be a very big deal for video general.
I know that if the practical results are actually anywhere near as good as the demonstration videos,
meta is going to be a major player in the video space.
Super interesting stuff.
Can't wait to get my hands on it, but for now, that is going to do it for the AI Daily Brief.
Appreciate it listening as always, and until next time,
