The AI Daily Brief: Artificial Intelligence News and Analysis - What Runway Gen-3 and Luma Say About the State of AI

Episode Date: June 18, 2024

Explore the latest in AI video technology with Runway Gen-3 and Luma Labs Dream Machine. From the advancements since Will Smith’s AI spaghetti video to the groundbreaking multimodal models by OpenAI... and Google DeepMind, this video covers the current state of AI development. Discover how companies are pushing the boundaries of video realism and accessibility, and what this means for the future of AI-generated content. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'youtube' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, a look at AI video generation and what the release of runway, Gen 3, Luma, and Kling in the last few weeks says about the broader state of AI development. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, visit the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI headline news you need in around five minutes. Today we kick off with a warning from the International Monetary Fund over the potential that AI will increase inequality. This is from a new report from the IMF published Monday, June 17th, that basically said that the IMF had what they called profound concerns about significant labor disruptions and rising
Starting point is 00:00:47 inequality. What's more, it pointed out that unlike previous disruptive technologies, AI was coming not just for blue-collar jobs, but for higher-skilled occupations as well. While acknowledging that generative AI held significant potential, to boost productivity and even advance the delivery of public services, it was also they believed going to cause some big problems. So what were these suggested remediation? Some of them were things like improving unemployment insurance, but a big focus was on policies around education and training. From the Financial Times, in its report, the IMF said that policies on education and training needed
Starting point is 00:01:19 to adapt to new realities to help repair workers for a rapidly changing job market in the future, with a greater focus on offering lifelong learning. Sector-based training, apprenticeships, and reskilling could play a greater role in helping workers with the transition to new tasks and sectors. Said ERA Dablanoris, deputy director of the IMF's fiscal affairs department and co-author of the report, quote, the transition could be painful for workers, and older workers may not have the skills that are needed in the age of AI, and it may require more time than in the past to acquire those new skills. We want people to be able to benefit more broadly from the potential that this technology holds, and we want to ensure that there are opportunities created for people.
Starting point is 00:01:54 Obviously, it will come as a surprise to none of you, given the fact that I've chosen to spend all my time on super-intelligent, a scalable platform to help educate the world on how to use AI, that I also think that a big part of the answer to this question is in education. Next up, we move over to NATO. Something you might not know is that NATO has a $1.1 billion innovation fund, the NIH. This fund was unveiled back in summer 2022, a couple months after the Russian invasion of Ukraine. One of the stated goals of the fund was to invest in technologies that could enhance NATO's defenses. The fund is currently backed by 24 of NATO's 32 member states. Earlier today on Tuesday, June 18th, the NATO Innovation Fund announced that it had invested in four European tech companies,
Starting point is 00:02:35 including fractal, which Reuters described as a London-based computer chipmaker aiming to make LLMs like those that power chat GPT run faster, Germany's ARX Robotics, which is an unmanned robotics company, Icomat, which makes lighter materials for vehicles, and SpaceForge, a company that uses specific conditions of space, including microgravity and vacuum conditions to build semiconductors in orbit. So there you have it. One of the biggest military alliances in the world is officially investing in AI. Speaking of interesting collaborations, OpenAI has continued to expand its footprint in the healthcare space, teaming up with color health on a new cancer co-pilot. Color was founded as a genetics testing company back in 2013 and has developed an AI assistant using the GPT4O model. According to the
Starting point is 00:03:18 Wall Street Journal quote, the co-pilot helps doctors create cancer screening plans as well as pre-treatment plans for people who have been diagnosed with cancer. Said Othman Laraki, the co-founder and CEO of color, the co-pilot is intended to assist doctors not replace them. Indeed, the use of that term copilot is meant to reference engineering co-pilots that are augmenting software engineers rather than replacing them. This is not the first time that OpenAI has done a deal with a company in the health space. Back in April, for example, they announced to deal with Moderna, where that company would use AI to speed up business processes such as selecting optimal doses for clinical trials.
Starting point is 00:03:52 And Brad Lightcap, OpenAI, COO, said we see a perfect fit for AI technology because they can help bring relevant information to the surface faster. They can give clinicians more tools to understand medical records, to understand data, to understand labs and diagnostics. We spent a lot of time on this show and in the media in general talking about the impact of AI on creative industries, entertainment, but in more heavy science-based industries, there is a ton of interest and excitement happening there that could lead to some incredible benefits from artificial intelligence. Finally, just to make it clear once again how hot the AI agent or AI assistant space is, U.com is raising $50 million to get into that space.
Starting point is 00:04:31 U.com had previously been positioned as an AI-infused search engine, but as all search engines have started to AIify themselves, they've started to move more directly in this AI assistant space. Reuters writes, U.com's 11 million visitors in May reflected a year-to-date rise in web traffic, but the number was still below its 20 million peak in February 2023. Its app downloads have decreased an estimated 69% so far in 2024. Against this backdrop, the company has morphed U.com into an AI assistant, one that is focused on productivity as well as internet search. Lots and lots of activity in this AI assistant space, but also tons of competition,
Starting point is 00:05:05 something we will continue to keep an eye on here at the AI Daily Brief. However, for now, that is going to do it for the headlong. lines up next the main episode. A quick note before we get back to the show, today's episode is brought to you by Super Intelligent. Super is of course the platform that we built and released a couple months ago to help people learn how to use AI. It's built around fun, fast tutorials that get you actually using AI in minutes, not hours, and certainly not days. The learning all happens in the context of an engaged community, and we've got a bunch of exciting features rolling out in the weeks to come. One of those is a new team's version of the platform, which includes a custom
Starting point is 00:05:42 curated playlist, as well as a showcase where people on your team can share their use cases and projects in AI across the organization. If you are interested in being a part of the super for teams beta, go to B-Supertai slash partner so you can learn about the program. Welcome back to the AI Daily Brief. There has been a ton of activity over the last few weeks in the AI video space, and I want to use this episode as a way to not only catch you up on what's been happening there, but also to use it as a way to ground our sense of where we are in AI development more broadly. And for that, I think we have to go back to about 15 months ago. You may remember in late March of last year when AI generated videos of Will Smith eating spaghetti
Starting point is 00:06:24 first started hitting the internet. This was right around the time of the six-month pause letter. It was right as the AI safety narrative was starting to take hold in mainstream media. And it was certainly still in a period where people were being dismissive of AI because of the weirdness in its generations, the weird extra fingers, strange extraneous body parts happening with no apparent connection to anything else going on. And these videos were sort of Exhibit A and all that. They were at once disturbing and also weirdly compelling, and people just watch them on repeat. However, for our purposes, they matter much more as a way marker of where things were, given the advances we've seen since then. At the end of last year,
Starting point is 00:07:01 our friends over at Leighton Space published something that they called the Four Wars of the AI stack. These were basically the big sets of questions they saw as being unanswered and for which the answers had a major impact on how the AI field was likely to develop. One of those wars was what they called the multimodality war, specialist models versus everything models. So on the one hand, you had specialist models, text to image startups like Mid Journey, text to audio startups, etc. versus the everything models that companies like OpenAI and Google Deepmind were pursuing. And at the time that this came out and we were first talking about it on this show, I think for as advanced as the everything models were, there was still a sense that those specialist
Starting point is 00:07:39 models really could remain differentiated. Specifically, I think a lot of people felt that Mid Journey was still ahead, for example, of OpenAI's Dolly 3, even if Dolly 3 had made some advances, like being better with text. But then in February, OpenAI released the first generations with its video model called SORA. Just less than a year from when we saw those Will Smith eating spaghetti examples, we got these hyper-photorealistic, that absolutely blew people away. There was the famous woman walking through a Tokyo street, Woolly mammoths running through snow,
Starting point is 00:08:12 a wide panning shot of a lighthouse on what looks like the California coast, and then there were examples like the pirate ship swirling around in what appears to be a cup of coffee, which showed just how much better this model seemed to understand physics. For many people, the release of SORA was a moment that absolutely reignited our sense of the unfettered possibilities of AI generation. The problem was that Sora hasn't really,
Starting point is 00:08:35 really been widely available. Sure, OpenAI has done a lot of sharing of SORA with very select artists, creators, and they've been clearly making a pitch towards Hollywood and the traditional film and entertainment industry, but it hasn't been available broadly to the public. The same is true for Google Vio, which was their answer to SORA shown off a couple months later, which also was limited in its distribution. So the situation we've been in for the last couple months is one in which the state of the art when it comes to video generation has not been widely available. The last few weeks, have seen some major moves in that area. Fast forward to yesterday, Monday, and Runway became the third AI video company in a matter
Starting point is 00:09:13 of three weeks to release their latest model. Runways was called Gen 3 Alpha. The announcement post on Twitter reads, Gen 3 Alpha can create highly detailed video with complex scene changes, a wide range of cinematic choices and detailed art directions. Runway continues, Gen 3 Alpha is the first of an upcoming series of models trained by runway on a new infrastructure built for large-scale multimodal training and represent represents a significant step towards our goal of building general world models.
Starting point is 00:09:38 Gen 3 Alpha was trained on both videos and images, and powers Runway's text to video, image to video, and text to image tools. Apparently, Runway has also been collaborating with entertainment companies, thinking about the obvious applications of this technology, and while it wasn't available immediately upon announcement, they said that it would be rolling out to everyone over the coming days. The early impressions of Gen 3 definitely felt that it represented a significant realism upgrade. And in addition to the fidelity of the actual generations,
Starting point is 00:10:04 they also promised more fine-grained control. The runway website reads, Gen 3 Alpha has been trained with highly descriptive, temporally dense captions, enabling imaginative transitions and precise key framing of elements in the scene. Almost immediately, people started doing their threads of the most impressive generations,
Starting point is 00:10:20 although initially many of them did come from the runway team itself, and there were some interesting notes as well. While a lot of folks were focused on the increase in the realism enabled by the new model, others like habitualization on Twitter, who does support at runway, tweeted a very weird, stylized video and said hyperrealism is sick, but I am unfortunately someone who will die on the hill
Starting point is 00:10:39 of avant-garde experimentalism and thus spent a while making surreal, grainy, noisy craziness in Runway's new Gen 3 model to great success. And then shared a set of five examples of these distinctly non-realistic generations. Tom's guide went a little bit farther in explaining what runway means when it says that its end goal is general world models. They describe them as, quote, an AI system that can build an internal representation of an environment and use it to simulate events inside that environment. Times Guide also noted that, quote, each video is about 10 seconds long, which is about twice as long as a Luma default and of a similar length to SORA videos. It is also nearly three times the length of the current runway Gen 2 videos. Andrew Curran also mentioned Luma, saying
Starting point is 00:11:18 Luma has started the avalanche. What he means is that last week, Luma Labs absolutely dominated the AI Twitter conversation with their release of Dream Machine. Dream Machine came out last week and while it wasn't nearly on the same level for most as the realism in something like Sora, The fact that it was widely available made all the difference in the world. One of the first use cases that popped off online was people animating classic memes. Even Andre Carpathie, formerly of Open AI and Tesla said, wow, the new model from Luma Labs extending images into videos is really something else. I understood intuitively that this would become possible very soon, but it's still something
Starting point is 00:11:52 else to see it and think through future iterations of. It should be noted though that really when it comes to what started this flood, it wasn't Luma Labs, but was actually from Chinese company Kui. Kui is basically a TikTok competitor with 400 million plus daily users, and when Kling was released, it was immediately available right then and there. Kling was good enough that some folks, like Didi at Menlo Ventures, said China might be surpassing the U.S. at AI. Now, of course, one of the big questions for folks with all this is whether OpenAI would
Starting point is 00:12:21 actually respond and get SORA out to the public. Matt Wolf writes, maybe we'll see Sora soon amongst all this competition popping up? When we do, will people still be excited by it? Then again, as Dogenioro writes, The leap of sorrow was so big that it feels like nothing surprises me anymore. Anyone else feel this way? AI and design, however, responded the surprise lies with the fact that while OpenAI has been dicking around pandering to Hollywood, other players have achieved comparable quality and are
Starting point is 00:12:45 making it available to the public in four months. That's the pleasant surprise. It's clear that the race is on. In fact, even as runway Gen 3 was announcing itself, Luma Labs was talking about what was coming next. They tweeted coming soon to Dream Machine, powerful, editability, and intuitive controls. They shared a video of a new editing suite that allows for much more fine-grained precision editing when it comes to these video generations, which is something that hasn't been widely
Starting point is 00:13:08 available yet. They also released something called extend video, which they said is aware of what's happening in your video and extends it in a consistent way to follow instructions. It's clear that as many advantages as the big companies have, the smaller startups in this space have some advantages as well. Dan Kay formerly of Google responded to someone who was tweeting about Luma, Now you know why I left Google to join Luma. I was in the team that developed VO early on, but knew it would never be shipped to the masses for quite a long time, same as SORA.
Starting point is 00:13:35 Not until a company like Luma forces their hand, that is. For me, it's hard to look back 15 months ago at Will Smith eating spaghetti to see this variety of models that have come to fruition in the last three weeks and take seriously any of these claims that have started to float around that AI is somehow plateauing or slowing down. The speed of evolution is unbelievable and is going to come with changes that we can't even imagine yet. Investor Jared Hecht writes, given the pace at which Luma, Sora, and Kling are emerging and improving this may be the final generation of the global movie star. There is a future where Hollywood quality film can be created by anyone with an idea, computer, and internet connection.
Starting point is 00:14:13 Who knows how this will all go down, but right now, it is an absolutely flowering moment for AI video generation, and that's pretty cool to see. Quick shout out if you're interested in learning more about how to use these tools. Even if you haven't signed up for Super Intelligent yet at B-Supertai, we did just release one of our Luma tutorials for free on our YouTube channel. You can find that at YouTube.com slash at B-SuperaI. I'll also include a link in the show notes, but for now, that is going to do it for this AI Daily Brief. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.