The AI Daily Brief: Artificial Intelligence News and Analysis - Are World Models AI’s Next Big Frontier?

Starting point is 00:00:00 Today on the AI Daily Brief are world models AI's next big thing? Before that in the headlines, how a voice marketplace shows how industries are going to end up collaborating with AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick notes before we dive in. First of all, thank you to today's sponsors, KPMG, Rovo, robots and pencils, and Blitzy. To get an ad-free version of the show, go to Patreon.com slash AI Daily Brief or subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a note at sponsors at AIDailybrief.aI. In fact, you can head to AIDailybrief.ai to learn anything about the show, including seeing any job opportunities, figuring out how to reach out if you want me to come speak.

Starting point is 00:00:50 And of course, finding a link to our AI-R-OI benchmarking study. It is live now. You can find out more about that at R.Oisurvey.com. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Today we kick off with a story that is interesting on its own terms but is also reflective of a broader pattern of how I think different industries are going to ultimately interact with AI, which is of course to say that they are going to, rather than fight it, ultimately collaborate with it. We're talking of course today about 11 labs launching a new voice marketplace that will allow users to access celebrity voices including Michael Kane, Judy Garland and John Wayne. The iconic Voices Marketplace, that's what it's called, will allow companies to partner with. 11 Labs to license the celebrity voices for content and advertisements. So far, the list of 28 available voices is largely concentrated around historical figures and deceased entertainers. Mark Twain,

Starting point is 00:01:45 Alan Turing, and Thomas Edison are available, although it's questionable how much people would recognize their voices. The mainstays of the list are deceased celebrities with iconic voices like Maya Angelou and Bert Reynolds. 11 Labs suggested the service, which places them as the middleman in the licensing deals, addresses many of the critiques around AI content. They call their market place the, quote, consent-based, performer-first approach the industry has been calling for. One of the first living celebrities who lent his voice to the service is Michael Cain. In a statement, he said, For years, I've lent my voice to stories that moved people, tales of courage, of wit, of the human spirit.

Starting point is 00:02:18 Now I'm helping others find theirs. With 11 labs, we can preserve and share voices, not just mine, but anyone's. He continued that the company is, quote, using innovation not to replace humanity but to celebrate it, and that it's, quote, not about replacing voices, it's about amplifying them. There is an aspect of legacy in his statement that maybe suggests that he views this more as a legacy preservation technology as opposed to just a new revenue stream. Matthew McConaughey, while an investor in 11 Labs, is going a little slower, allowing the company to translate his newsletter lyrics of living into a Spanish language audio version

Starting point is 00:02:51 that uses his voice. Between this new marketplace, the recent settlement between UMG and Udio, I think we might be at the beginning phases of the commercialization of IP for AI as opposed to just to just a new marketplace, just outright fighting AI. I think the more examples of people they have who are actually still living and providing their own consent, the better. I think for a lot of folks, even though everything is legally on the up and up with the deceased celebrities, there's going to be a lingering feeling like they couldn't actually consent to this in life that may diminish people's enthusiasm. Moving over to markets, the big conversation yesterday was all about SoftBank dumping

Starting point is 00:03:24 their Nvidia stock to fund their investments in Open AI. In this week's quarterly results, softbank disclosed that they've sold all 32.1 million of their Nvidia shares for around $5.8 billion. The sale appears to confirm what was already pretty clear from recent reporting that SoftBank is reaching deep into their pockets in order to fund their $30 billion commitment to Open AI this year. The final $22.5 billion is due to be paid in December after OpenAI successfully converted to a for-profit corporation. SoftBank has issued several billion dollars in bonds and borrowed $5 billion against their armstock in order to fund the deal. Now for some, this is another indication of the AI bubble bursting, although it's pretty clear that this is just SoftBank digging in the couch cushions to finance

Starting point is 00:04:02 its commitments to Open AI. It's also worth noting that SoftBank's CEO Masayoshi-San is possibly the worst Nvidia trader on the planet. He owned a substantial stake in the company for many years, but sold it all in 2019, ultimately missing out on $100 billion in gains if he had held on for just a few more years. SoftBank started buying small tranches of the stock again in 2020, but the bulk of their investment came in March of this year. Gavin Baker, the managing partner at Atreides Capital Management, said someone should look into what happened after SoftBank sold their entire 4.9% stake in Nvidia back in 2019. Although most analysts don't think Masasan was really calling the top on the AI bubble, Nvidia stock was still down 3% on the day, and SoftBank itself took a beating,

Starting point is 00:04:40 losing 10%. Staying on the theme of OpenAI and its big capital needs, the company's Project Stargate has received $3 billion in new investment from Blue Owl Capital. Blue Owl is one of the largest private capital firms in the U.S. and has recently moved aggressively into data center development. They announced a string of data center funding over the past months, including contributing $7 billion to a meta facility in Louisiana. The firm now has over a thousand people working on designing, building, and operating data centers within a new division called Stack Infrastructure. The Stargate deal in this case relates to a data center being constructed in New Mexico in collaboration with Oracle. Blue Owl will contribute $3 billion in equity,

Starting point is 00:05:15 while a group of banks will put together $18 billion in debt funding. Meanwhile, AMD, CEO Lisa Su, said that her company could be able to carve out a double-digit market share in data Center chips over the next three to five years. Speaking at a company event on Tuesday, Sue said that she anticipates average revenue growth of 35% over that period holding the current pace. However, she expects to see the data center business grow at 60% driven by, in her words, insatiable demand for AI chips. She said, this is what we see as our potential given the customer traction, both with the announced customers, as well as customers that are currently working very closely with us. Now, next year will serve as a litmus test for AMD as they

Starting point is 00:05:51 attempt to take Invidia on directly. AMD has released a hands. full of GPUs suitable for AI workloads over recent years, but so far none of them have captured a significant slice of the market. In recent financial reports, however, AMD has boasted of strong growth in their data center sales, but they haven't split out GPU sales from their CPUs. AMD will be pushing their own rack scale deployments next year with the release of the MI400X chips. The servers will contain 72 chips, which is crucial to run the largest AI models. OpenAI recently committed to deploying a gigawatt of AMD's new chips, and the company also has long-term deals with Oracle and meta.

Starting point is 00:06:22 One interesting one that has some people scratching their heads, Meta AI has apparently seen a surprising surge in users over the past month. According to similar web data, the Meta AI web app saw 105% growth in traffic between September and October. That made them by far the fastest growing AI web app for the month, easily outpacing perplexity at 29% and clawed at 25% growth. Even on a full year basis, Meta is doing surprisingly well. Now, Gemini is the breakout hit of the year with 305% traffic growth, but meta is right there in second place with 149%.

Starting point is 00:06:54 By comparison, traffic to chatchipt.com grew by 68% for the year. Now, there were two main explanations offered. Either this is huge growth off a tiny baseline, making it less impressive than it seems at first glance, or Metasora competitor vibes in late September, which seems to be a sleeper hit. Traffic numbers don't seem to have been that low heading into October, and you can see a huge spike in app downloads that corroborates the success of vibes.

Starting point is 00:07:18 Still, for all of this, many are skeptical that it's a sense. sustainable trend. Chubby wrote, Who use meta AI intentionally and for a specific purpose? I don't know, man, I think it's easy to be skeptical, but at the same time, it's very possible to me that the terminally online AIX community might be in a huge bubble when it comes to understanding what normal people actually use and like. Is it possible that having a free and open version of SORA has really benefited meta in ways that the hardcore AI community just isn't appreciating?

Starting point is 00:07:44 Seems totally possible. Anyway, something to keep an eye on, but for now, that is going to do it for today's headlines. Next up, the main episode. What if AI wasn't just a buzzword, but a business imperative? On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward-thinking enterprises. Hosted by me, Nathaniel Wittamore, and powered by KPMG, this seven-part series delivers real-world insights from leaders who are scaling AI with purpose,

Starting point is 00:08:15 from aligning culture and leadership to building trust, data readiness, and deploying AI agents. Whether you're a C-suite executive, strategist, or innovator, this podcast, is your front row seat to the future of Enterprise AI. So go check it out at www.kpmg.org.us slash AI podcasts or search you can with AI on Spotify, Apple Podcasts, or wherever you get your podcasts. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with studio.

Starting point is 00:08:50 Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform. so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence, and Jira service management standard, premium and enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you rovo, you know.

Starting point is 00:09:22 Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com. AI changes fast. You need a partner built for the long game. Robots and pencils work side by side with organizations to turn AI ambition into real human impact. As an AWS certified partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value. And their partnerships don't end at launch.

Starting point is 00:09:48 As AI changes, robots and pencils stays by your side so you keep pace. The difference is close partnership that builds value. and compounds over time. Plus, with delivery centers across the U.S., Canada, Europe, and Latin America, clients get local expertise and global scale. For AI that delivers progress, not promises, visit robots and pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy,

Starting point is 00:10:12 the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours to understand Enterprise-scale code bases with millions of lines of code. enterprise engineering leaders start every development sprint with the Blitzie platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org.

Starting point is 00:10:51 Visit blitzie.com and press get a demo to learn how Blitzie transforms your SDLC from AI assisted to AI native. Welcome back to the AI Daily Brief. Today we have one of those interesting times when two pieces of seemingly unrelated news might actually be telling part of the same story. On the one hand, we have Meta's chief AI scientist, Jan Lacoon, reportedly leaving Meta and launching his own company. And on the other hand, we have a new essay from Dr. Fay-Fei Lee, all about, world models in spatial intelligence. At the root of both of these stories is a question about whether the right path to the next generation of advanced AI is via large language models, or if indeed there is a different

Starting point is 00:11:38 approach. Let's talk first, though, about Jan Lacoon leaving Meta. This is something of the end of an era. Lacoon has served as Meta's chief AI scientist since 2013. In other words, long before anyone was trying to catch up to chat GPT, even before there was an Open AI bombing around the valley thinking about the future of artificial intelligence. Bakun drove the development of Meta's early llama models through his leadership at Meta's Fundamental AI Research or Fair Lab.

Starting point is 00:12:06 Bakun is a highly decorated AI scientist, having won a Turing Award in 2018 for pioneering work done in the 1990s and 2000s. And yet for many, the writing for this was on the wall as soon as Zuckerberg began his hiring spree over the summer to establish Meta's Superintelligence Division and TBD Labs, and especially especially since he bought a big chunk of scale in order to recruit 28-year-old Alexander Wang to run the new initiative. Now, in all these moves, Lekun's Fair Lab did continue to exist, but it was rolled up within the new division, led by Alexander Wang. Now, as part of the move, Wang became the chief AI officer, while former Open AI lead scientist, Shang Ji A. Zhao became chief scientist for superintelligence labs. At the time, Lacoon reaffirmed his role at the company publicly writing,

Starting point is 00:12:51 my role as chief scientist for Fair has always been focused on long-term AI research and building the next AI paradigms. My role and Fair's mission are unchanged. In other words, the party line at the time was that Lacoon had always been concerned about longer-term building rather than immediate term initiatives and this didn't change that. And yet, rumors swirled of resources and personnel being drained from Fair as well as a broader shift away from pure research towards AI that could be commercialized. For some commentators, this is just the latest in a string of personnel issues surrounding Meta's AI. Thedi Dadas of Menlo Ventures writes, Meta's AI org is in disarray. First, Sumitra Chinchala, the inventor of Pi-Torch leaves. Now, Jan Lecun,

Starting point is 00:13:32 their AI head leaves. They have $600 billion in compute commits until 2028, but I guess it's up to Alex Wang and Nat Friedman to deliver. Computer science professor Pedro Domingos noted that the news wiped $30 billion off of Meta's market cap, approximately twice of what they paid to get Alexander Wang. Andrew Oralowski of the Daily Telegraph posted, Zuck basically hired Jan Yang, the hot dog, not hot dog guy from Silicon Valley, and made Lacoon report to him.

Starting point is 00:13:57 I'm surprised he took so long to bail, but very underreported, Zuck hasn't a clue what he's doing. Yet that was far from the only take. For some, this still feels like the natural fallout of adjustments in personnel when the stakes are this large. Jordan Tibido, formerly of Google and Slack,

Starting point is 00:14:14 responded to Didi Das saying, bro, come on, you've been around the block. Anytime a regime change happens, reorgs and exits happen. You got to give the story time to bake before jumping to conclusions. Of course the old regime is leaving. They did well during the status quo, but now it's all hands on deck and Facebook is under wartime and not many in the AI community are up for that. Others basically just thought that it was probably time to rip the Band-Aid.

Starting point is 00:14:36 Lacoon has not only been away from day-to-day duties for a while, he's been vocally against LLMs as a stepping stone towards AGI. You might remember that big Wall Street journal piece from about a year ago, where he very famously said that current AIs were dumber than a cat. Brasser X writes, Jan Lacoon leaving Meta is significant and probably overdue. He's a foundational figure in AI, but his research-first mindset often put meta out of sync

Starting point is 00:15:00 with the pace of the current landscape. While competitors pushed aggressively towards large-scale product-ready models, meta spent years debating theory. With Lacoon moving on, meta now has room to align its AI strategy with reality rather than philosophy. Less nostalgia, more execution. Others thought honestly as smart as Lekoun is that he just hasn't been the asset that he needed him to be

Starting point is 00:15:20 when it came to meta-competing in the AI race. John Hernandez writes, we all saw this coming, didn't we? First, if you're a big name on AI, anything you do will raise several billion overnight, hard to get that much money on a salary. Second, if you are a legend and they make your report to a kid that could be your grandson,

Starting point is 00:15:34 no matter how good he is, you won't feel appreciated. But truth be told, he hasn't helped meta much on the AI race, so it's a win-win. Jeffrey Emanuel writes, Jan Lacoon is better off working in a Bell Labs or Xerox park setting where things are measured in decades and there's no expectation or pressure to deliver anything commercially useful in the short or even medium term. Meta is way past that now given their AI capital spending. The point that Jeffery's making is that Zuckerberg is bending the farm on a huge infrastructure buildout, and that's going to force them to live in the real world of what they can deliver right now.

Starting point is 00:16:04 Emmanuel continues, I get the sense that he doesn't care enough about winning in the marketplace or about products to make a compelling startup now given the intense level of competition. Also, LLMs are the tech we have now, and he doesn't believe in them long term. Startups need to move fast. I think John's note, however, that if you have one of those big names, you can raise a ton of money very quickly, is well taken. A cynical take on this is that by starting his own new lab, Lacoon is basically locking in a two or three billion dollar hiring bonus when eventually that lab gets bought by Google DeepMind. And that might especially be the case if his interest in world models really starts to bear fruit.

Starting point is 00:16:39 Now, even within Meta's Fair Lab, Lacoon and his team have, taken some steps towards working on that, but the Financial Times piece suggests that he's going to go much farther with this new startup effort. They write, Within Fair, Lacoon has instead focused on developing an entirely new generation of AI systems that he hopes will power machines with human-level intelligence, known as world models. These systems aim to understand the physical world by learning from videos and spatial data rather than just language. Lacoon has said it could take a decade to fully develop the architecture. Lacoon's next endeavor is focused on furthering his work on world models, according to two people familiar with the matter. Which brings us to our

Starting point is 00:17:12 second companion story, which is not so much a story as this essay and accompanying Twitter post from Faye Faye Lee. On X, she writes, AI's next frontier is spatial intelligence, a technology that will turn seeing into reason, perception into action, and imagination into creation. The essay she released is called from words to worlds, spatial intelligence is AI's next frontier. In it, she calls large language models wordsmiths in the dark, eloquent but inexperienced, knowledgeable but ungrounded. Instead, she says, quote, Spatial Intelligence will transform how we create and interact with real and virtual worlds, revolutionizing storytelling, creativity, robotics, scientific discovery, and beyond.

Starting point is 00:17:51 This, she says, is AI's next frontier. So what does she mean by spatial intelligence? Now, first of all, it should be noted that whereas Lacoon is rather dismissive of LLM's, Fifi Lee is less so. She writes, it's no longer a question of whether AI will change the world, by any reasonable definition it already has. Yet she points out many of the big dreams and visions that we have for AI lie outside of our reach. For example, she says the dream of massively accelerated research in fields like

Starting point is 00:18:15 disease curation, new material discovery, and particle physics remains largely unfulfilled. And the promise of AI that truly understands and empowers human creators remains beyond reach. She writes, to learn why these capabilities remain elusive, we need to examine how spatial intelligence evolved and how it shapes our understanding of the world. Going back to the history of evolution, she suggests that long before we could communicate with language, the quote, simple act of sensing quietly sparked an evolutionary journey toward intelligence. She continues, the seemingly isolated ability to glean information from the external world,

Starting point is 00:18:47 whether a glimmer of light or the feeling of texture, created a bridge between perception and survival that only grew stronger and more elaborate as the generations passed. Layer upon layer of neurons grew from that bridge, forming nervous systems that interpret the world and coordinate interactions between an organism and its surroundings. Thus, many scientists have conjectured that perception and action became the core loop driving the evolution of intelligence,

Starting point is 00:19:09 and the foundation on which nature created our species. She goes on to point out all of the various ways in which spatial intelligence impact our everyday lives, but points out that it's not just functional but also at the root of our creativity. Ultimately, she writes, spatial intelligence is the scaffolding upon which our cognition is built. It's at work when we passively observe or actively seek to create, it drives our reasoning and planning even on the most abstract topics, and it's essential to the way we interact verbally or physically with our peers

Starting point is 00:19:35 or with the environment itself. Unfortunately, today's AI doesn't think like this, yet. Now you might be thinking, what then about multimodal LLMs? She writes that while they've had some progress, there are still real limitations. Multimodal LLM, she writes, trained with voluminous multimedia data, in addition to textual data, have introduced some basics of spatial awareness. Today's AI can analyze pictures, answer questions about them, and generate hyper-realistic images and short videos. Yet the candid truth is that AI's spatial capabilities remain far from human level, and the limits reveal themselves quickly. State-of-the-art MLLM models rarely

Starting point is 00:20:09 perform better than chance on estimating distance, orientation, and size, or mentally rotating objects by regenerating them from new angles. They can't navigate mazes, recognize shortcuts, or predict basic physics. AI-generated videos nascent and yes very cool often lose coherence after a few seconds. And ultimately, while this doesn't make LLMs not useful for the use cases that they're useful for, Lee argues that there is a whole world of use cases that are just outside of AI's capabilities. She provides then three essential capabilities that will define world models.

Starting point is 00:20:37 The first is generative. World models can generate worlds with perceptual, geometrical, and physical consistency. She writes, world models must be capable of spawning endlessly varied and diverse simulated worlds that follow semantic or perceptual instructions while remaining geometrically physically and dynamically consistent. Next is multimodal. World models being multimodal by design.

Starting point is 00:20:57 She writes, just as animals and humans do, a world model should be able to process inputs known as prompts in the generative AI realm in a wide range of forms. Given partial information, whether images, videos, depth maps, text, instructions, gestures, or actions, world models should predict or generate world states as complete as possible. Third and finally is interactive. World models can output the next states based on input actions.

Starting point is 00:21:20 Finally, she says if actions and or goals are part of the prompts to a world model, its outputs must include the next state of the world, represented either implicitly or explicitly. Together, she says, the scope of this challenge exceeds anything AI has faced before. While language is a purely generative phenomenon of human cognitive, cognition, worlds play by much more complex rules. Here on Earth, gravity governs motion, atomic structures determine how light produces colors and brightness, and countless physical laws constrain every interaction. Even the most fanciful creative worlds are composed of spatial objects

Starting point is 00:21:50 and agents that obey the physical laws and dynamic behaviors that define them. Reconciling all of this consistently, the semantic, the dynamic, and physical demands entirely new approaches. The dimensionality of representing a world is vastly more complex than that of a one-dimensional sequential signal-like language, which of course creates a whole set of new challenges. Some of the research topics at her world labs include developing a new universal task function for training, figuring out how to extract deeper spatial information from two-dimensional image or film-based training data, and creating new model architectures and representational learning. The payoffs, however, she argues, are immense.

Starting point is 00:22:25 In addition to new superpowers around creativity and creating new types of immersive gaming or visual experiences, there are the implications for robotics, which she calls embodied intelligence in action. Even beyond those things, though, she argues that the real unlock, for many use cases in science, health care, and education will come from this sort of spatial intelligence. For example, she writes, in healthcare, spatial intelligence will reshape everything from laboratory to bedside. AI can accelerate drug discovery by modeling molecular interactions in multi-dimensions, enhanced diagnostics by helping radiologists spot patterns in medical imaging, and enable ambient monitoring systems that support patients and caregivers without replacing the

Starting point is 00:23:01 human connection that healing requires. I think the point of all of this is just a reminder that as locked as we are in this current paradigm of LLMs, there are other paths to advanced AI. And while some will view Jan Lekun's departure as the inevitable byproduct of personnel changes, I think Dr. Lee's essay reminds us that there are reasons that someone of Lekun's stature would want to go work on something different. If he does start a new world model focused lab and gets billions of dollars, frankly, I think relative to a lot of the things that we're spending AIVC on, we could do a lot

Starting point is 00:23:31 worse. For now, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Are World Models AI’s Next Big Frontier?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.