The AI Daily Brief: Artificial Intelligence News and Analysis - Meta's New Movie Gen Model Could Change AI Video

Starting point is 00:00:00 Today on the AI Daily Brief, meta has jumped with both feet into the video generation game. Before that in the headlines, OpenAI's big fundraising is leading to a new wave of big AI deals. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. This is perhaps completely to be expected, but it's not. in the wake of OpenAI announcing their $6.6 billion fundraising deal at a $157 billion post-money valuation, that momentum is being carried right on through into a set of other deals from

Starting point is 00:00:45 OpenAI competitors. As we discussed last week, one of the things that made the deal notable was that OpenAI explicitly asked their investors to avoid backing a group of companies which they seem to view as competitive. Those competitors included Elon's XAI, Anthropic, perplexity, and glean among others. And that reads exactly like a list of people who are going back to the well, trying to capture some of this momentum from the OpenAI fundraise to top up their own cash reserves as well.

Starting point is 00:01:12 The information has just put out a list of deals that their sources see materializing. XAI, who is of course the maker of Grock, is starting to receive interest from prospective backers for a potential raise to close by the end of the first quarter of next year. The information has recently floated a $40 billion valuation in their own fundraising talks, Perplexity, which most recently raised $250 million at a valuation of $3 billion back in June, has started to receive some inquiries as well. One more big deal that isn't in the Open AI competitive set necessarily, but which still

Starting point is 00:01:40 remains notable, is a new round from Enterprise-focused writer, who are looking to raise between $150 million and $200 million at a $1.9 billion valuation. The deal space is heating up significantly enough that some are saying it's reminiscent to the 2021 fundraising boom. Now, while that fundraising period was all about low interest rates, the post-COVID period, this new wave of dealmaking is extremely concentrated in just AI. Interestingly, the information also reports that beyond just the excitement and momentum of a hype cycle, quote, investors say they've also been encouraged by the fact that more startups are

Starting point is 00:02:11 generating revenue, even though profits may be years away. For example, Writer, which was generating $40 million in annual recurring revenue at the end of September, which was up from $30 million at the end of June. It projects that it will reach $50 million ARR by the end of the year. 11 Labs is also mentioned by the information as another company that's looking to raise a potential multi-billion dollar valuation. One drag on the potential fundraising is the number of companies whose founders have chosen to resign to go join big tech companies in this sort of weird non-acquisition aqua hire model that seems to becoming more common in AI. Leaders from companies including inflection, character, adept, and co-variant have all taken that path, which isn't exactly

Starting point is 00:02:47 a great one when it comes to investors looking for the next runaway success. Anyways, it definitely feels like there is some funding momentum right now, and I will of course keep you posted as those deal start to materialize. Now, speaking of OpenAI, last week's Deb Day was the big topic of of conversation, and front of the show, Swix, did a great set of polls, which serve as a vibe check for how people were actually interacting with OpenAI's announcements a few days after they were made. When it comes to the real-time API, those who have or haven't tried but were interested or impressed were beating out those who haven't tried and don't care or have tried and found it me. The have-tried and impressed stood at 14.7% versus the have-tried and not impressed at 10.4%

Starting point is 00:03:26 And those who haven't tried but were interested represented 46% of respondents versus 28.8% of those who haven't tried and don't care. Chad JBT Canvas, I thought, was interesting and maybe reflects the facts that Swix's audience is more developer-heavy than Normie, because while among those who had tried it more were impressed than were unimpressed, it wasn't by nearly as much as I would have thought. 22% of his respondents had tried it and found it impressive versus 17.4% who had tried it and found it meh. I think Canvas is a massive UI upgrade, but again, if you were looking for model performance upgrades, this is more or less strictly speaking a user experience thing. When it comes to model distillation, the vast majority haven't tried yet, and there were more

Starting point is 00:04:05 who don't care than we're interested, although not by much. On vision fine tuning, it was pretty similar. Most people hadn't tried, and more of those who hadn't tried didn't care than were interested. But the big one was the last question. Inclusive of all recent technical and non-technical news at OpenAI and its competitors. When Swix asked, do you think that Open AI is headed in the right track? And number two, do you think that Frontier AI as a whole is headed on the right track? 56% said yes to both. 22% said that OpenAI was not on the right track, but Frontier AI was. 4.5% said that OpenAI was on the right track, but Frontier AI as a whole wasn't. And 16.6% said that neither OpenAI nor Frontier AI as a whole were on the right track.

Starting point is 00:04:45 56% of respondents saying that Open AI is headed in the right track and Frontier AI is headed that direction, basically 60% overall saying Open AI is headed in the right direction strikes me as a win given how contentious this company is. Speaking of contentious, our last story today in the headlines, after lots and lots of bluster about how Andreessen Horowitz was going to support the Trump campaign, founding partner Ben Horowitz shared with employees that he planned to donate to Kamala Harris's run for president as well. On Twitter, he said, I sent an internal email that Axios got a hold of. Here it is. This is what it is. There is nothing else no matter how it gets characterized. In that note to the A16Z team, he said, I wanted to give you an update on my political activity. As I mentioned before, Felicia and I have known Vice President Harris for over 10 years, and she has been a great friend to both of us during that time. She's also been a friend to the firm in our early days, helping with several events at my house when we built the original Cultural Leadership Fund Network. As a result of our friendship, Felicia and I will be making a significant donation to entities who support the Harris-Walter. campaign. From a firm perspective, we continue to only take positions consistent with our little

Starting point is 00:05:48 tech agenda and how the various candidates that support or don't support policies to build a strong startup technology industry. Although I have had several conversations with Vice President Harris and her team on their likely tech policies, and am encouraged by my belief in her, they have not yet stated what their tech policy will be, so the firm will not be updating its position in that regard. As we stated earlier, the Biden administration has been exceptionally destructive on tech policy across the industry, but especially as it relates to crypto and blockchain and AI. So while I am very hopeful that the Harris administration will be much better, they have not yet stated their intentions. A fairly common response was summed up by Amaran, who responded, I don't quite comprehend

Starting point is 00:06:23 the this admin has been destructive, but I'll vote for them in hopes they won't be destructive. But by and large, the response was that it was another example of people playing all sides. So ultimately, the question is, does this mean that Horowitz thinks that a Harris administration would be meaningfully better for tech than a second Biden administration would have been? or is this simply a reflection of either A, standard rich person opportunism, or B, genuine personal relationships? Only time will tell, but for now, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text, image,

Starting point is 00:07:02 and code generation without the fear of being spied on or having your data exploited. Discuss anything with Venice without concern about it being monitored, sold, or given to advertisers and governments. Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence, no topics are off limits, no ideas, or taboo. With Venice, you're in control of the AI, as you should be. Pro subscriptions are available for $49 a year or $8 per month. AI Daily Brief listeners receive a 20% discount on Venice Pro.

Starting point is 00:07:39 Visit venice.a.I. slash NLW and enter the discount code NLW Daily Brief. That's NLW Daily Brief, all one word. Today's episode is brought to you by Plum. Want to use AI to automate your work but don't know where to start? Plum lets you create AI workflows by simply describing what you want. No coding or API keys required. Imagine typing out AI, analyze my Zoom meetings and send me your insights in Notion and watching it come to life before your eyes. Whether you're an operations leader, marketer or even a non-technical founder, Plum gives you the power of AI without the technical hassle. Get instant access to top models like GPT40, Claude Sonnet 3.5, assembly AI, and many more.

Starting point is 00:08:16 Don't let technology hold you back. Check out Use Plum, that's Plum with a B, for early access to the future of workflow automation. Today's episode is brought to you by Super Intelligent. Every single business workflow and function is being remade and reimagined with artificial intelligence. There is a huge challenge, however, of going from the potential of AI. to actually capturing that value. And that gap is what Superintelligent is dedicated to filling. Superintelligent accelerates AI adoption and engagement to help teams actually use AI to increase productivity and drive business value.

Starting point is 00:08:49 An interactive AI use case registry gives your company full visibility into how people are using artificial intelligence right now. Pair that with capabilities building content in the form of tutorials, learning paths, and a use case library. And Superintelligent helps people inside your company show how they're getting value out of AI, while providing resources for people to put that inspiration into action. The next three teams that sign up with 100 or more seats are going to get free embedded consulting. That's a process by which our super intelligent team sits with your organization, figures out the specific use cases that matter

Starting point is 00:09:21 most to you, and helps actually ensure support for adoption of those use cases to drive real value. Go to Bsupert.a.I to learn more about this AI enablement network, and now back to the show. Welcome back to the AI Daily Brief. On Friday, meta-announced meta-movieGen. They claim it is the most advanced media foundation models to date, and people are pretty excited about it. Today we're going to talk about what they announced, what the response is, where the skepticism might lie, and basically use this as a chance to check in on video generation more broadly.

Starting point is 00:09:55 The company writes, movie gen delivers state-of-the-art results across a range of capabilities. Movie-gen video is a 30-billion parameter transformer model that can generate high quality and high definition images and videos from a single text prompt. MovieGen audio is a 13 billion parameter transformer model that can take a video input along with optional text prompts for controllability to generate high fidelity audio sync to the video. They go on. It can generate ambience sound, instrumental background music, and Foley sound, delivering state-of-the-art results in audio quality, video-to-audio alignment, and text-to-audio alignment.

Starting point is 00:10:25 There is also video editing, using a generated or existing video, and accompanying text instructions as an input, movie gen can perform localized edits such as adding, removing, or replacing elements, as well as global changes like background or style changes, and there is also personalization where you can use an image of a person or a text prompt with the model generating a video with state-of-the-art results on character preservation and natural movement in video. A lot of state-of-the-art talk, as you can see. That message was reiterated by chief product officer Chris Cox on threads, who wrote, we're sharing our progress today on movie gen, our project to develop the state-of-the-art for AI video generation.

Starting point is 00:10:59 As of today, our evals show its industry leading on text of video quality across a number of dimensions with 16 seconds of continuous length, plus a leap forward for state-of-the-art on video-matched audio, precise editing, and character consistency and personalization. Cox continues, the feedback we heard from filmmakers and video creators was to prioritize ease of editing, but even more the ability to generate videos with a specified character or image, which these models now faithfully achieve. A research blog post shared more about what the model can do. They write, given a text prompt, we can leverage a joint model that has been optimized for both text to image and text to video to create high quality and high definition images and videos.

Starting point is 00:11:36 The 30 billion parameter transformer model has the ability to generate videos of up to 16 seconds at a rate of 16 frames per second. They continue, we find that these models can reason about object motion, subject object interactions, and camera motion, and they can learn plausible motions for a wide variety of concepts, making them state-of-the-art models in their categories. Now, in terms of demonstration, Mehta showed off life-like water, skin texture, hair, and movement. They presented the example of a koala on a surfboard with choppy waves splashing on the camera lens, all perfectly rendered, a person dancing around in a bed sheet ghost costume, complete with a reflection in the mirror, and of course, this underwater shot of a baby hippo swimming around, complete with distortions and surface reflections.

Starting point is 00:12:14 A lot of the emphasis is around this personalized video generation. A photo of a subject can be uploaded and transformed into a video clip. For example, Meta gave us a fairly standard selfie combined with the prompt a woman DJ and a pink jacket spins records with a cheetah by her side. The potentially game-changing thing here is character consistency. For many use cases, this would be exactly what takes generated video from a cool novelty to an extremely powerful professional tool. This would, of course, massively expand the use cases for not only filmmakers but also

Starting point is 00:12:43 advertisers and anyone who needs any sort of multi-step video that's longer than just a single clip. Video editing also potentially represents a really big update. The example that Meta gave of precise video editing is of dressing up penguins in Victorian dresses, then adding some beach umbrellas before turning the whole scene into an animation in the style of a pencil sketch. They also showed off audio generation. A TV engine roars and accelerates with guitar music, and rustling leaves and snapping twigs with an orchestral music.

Starting point is 00:13:22 They summed up their vision by stating, as we continue to improve our models and move towards a potential future release, we'll work closely with filmmakers and creators to integrate their feedback. By taking a collaborative approach, we want to ensure we're creating tools, that help people enhance their inherent creativity in new ways, they may have never dreamed would be possible. Imagine animating a day-in-the-life video to share on reels and editing it using text prompts

Starting point is 00:13:41 or creating a customized animated birthday greeting for a friend and sending it to them on WhatsApp. With creativity and self-expression taking charge, the possibilities are infinite. So a couple things. This is actually quite a revealing endpoint. First of all, this technology is not available currently. So basically, we have a Sora situation on our hand,

Starting point is 00:13:59 where it appears really impressive, but no one has the chance to actually test it out in real life. Chris Cox dealt with this directly in his threads post saying, we aren't ready to release this as a product anytime soon. It's still expensive and generation time is too long, but we wanted to share where we are since the results are getting quite impressive. Meta clearly thinks that some of this is going to be controversial. In that same blog post, they wrote, while there are many exciting use cases for these foundation models, it's important to note that generative AI isn't a replacement for the work of artists and

Starting point is 00:14:25 animators. We're sharing this research because we believe in the power of this technology to help people express themselves in new ways and to provide opportunities to people who might not otherwise have them. Our hope is that perhaps one day in the future, everyone will have the opportunity to bring their artistic visions to life and create high-definition videos and audios using movie gen. So what were people's reactions to this? Bilal Sidhu, the host of the TED AI show writes, okay, finally dug into Meta's new movie gen paper. Text the video is cool and all, but to me, the precise editing feature is the game changer. It can handle complete VFX tasks, like replacing environments, doing set extensions, swapping characters, removing items,

Starting point is 00:14:59 adding particle effects with realistic lighting interaction. Now, Biloa also got a little wonky about how they had figured out how to train this. He said the coolest bit to me is how they trained this model because paired before, after VFX editing data sets are super scarce. TLDR, they taught it video editing through a clever three-stage process. One, started with image editing data, treating it like single-frame video edits. Two, created synthetic video editing tasks by animating still image edits and using AI models for object segmentation. three, the model-generated edited videos and then learn to reconstruct the originals from the edited

Starting point is 00:15:28 version. Meta calls this video editing via back translation. Lots of people also landed on the SORA comparison. Chubby on X writes, Meta has landed an absolute hit today with Meta MovieGen. Not only is the video quality at Sora level, you can even upload your own photo and integrate it into the video. Speaking of Sora, Andrew Curran points out that meta is not uncomfortable calling out competitors by name. He flagged a section of the paper which reads, on text to video generation, we outperformed prior state-of-the-art, including commercial systems such as runway gen 3, Luma Labs, and OpenAI SORA, on overall video quality. Audio outperforms prior state-of-the-art, including commercial systems such as PICA Labs and 11 Labs for sound effect generation, music generation, and audio extension.

Starting point is 00:16:08 That paper, by the way, also pointed out that this model was trained on 6,144 H-100 GPUs. Roberto Nixon pointed out that holds aside the quality, meta also has the distribution. He writes, meta is going to be very hard to beat. They just announced movie gen, which looks to be at the very least on par with the best video models currently available. They have 3.6 billion daily active users they can serve this to and free. It'll be rolled out to all Instagram users early next year. And that is, of course, what makes meta so different and such a huge giant in this space. Between WhatsApp, Instagram, and Facebook itself, they cover a huge portion of the Earth's population.

Starting point is 00:16:45 To the extent that there is any skepticism, it's certainly in the fact that this isn't available yet, but the pretty clear sentiment is that this is going to be a very big deal for video general. I know that if the practical results are actually anywhere near as good as the demonstration videos, meta is going to be a major player in the video space. Super interesting stuff. Can't wait to get my hands on it, but for now, that is going to do it for the AI Daily Brief. Appreciate it listening as always, and until next time,

The AI Daily Brief: Artificial Intelligence News and Analysis - Meta's New Movie Gen Model Could Change AI Video

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.