TED Talks Daily - With spatial intelligence, AI will understand the real world | Fei-Fei Li

Episode Date: May 15, 2024

In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a si...milar moment is about to happen for computers and robots. She shows how machines are gaining "spatial intelligence" — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.

Transcript
Discussion (0)
Starting point is 00:00:00 TED Audio Collective. You're listening to TED Talks Daily, where we bring you new ideas to spark your curiosity every day. I'm your host, Elise Hume. At TED 2024, the Vancouver Conference Center buzzed with new ideas every morning. Each of the speakers this morning helped me to have a far more expansive view of what the beautiful and powerful potential of AI is. AI. It's on the topic of everyone's mind lately, and we've certainly brought it up on this podcast many, many times. So amid all of the hype, it just made sense to welcome one of the pioneers of artificial intelligence to the TED stage. It was really exciting to see some of the people who have kind of invented the ground of which we walk on in the AI space.
Starting point is 00:00:50 Fei-Fei Li, for instance, she really talks about the relationship between the real world and then this theoretical AI world. And I just really adore that. For more than two decades, Fei-Fei Li has been at the forefront of AI. She is the founding director of the Stanford Institute for Human-Centered AI. In her talk, she takes us on a journey through the past, present, and future of this technology. But first, a quick break to hear from our sponsors. Support for this show comes from Airbnb. If you know me, you know I love staying in Airbnbs when I travel. They make my family feel most at home when we're away from home.
Starting point is 00:01:26 As we settled down at our Airbnb during a recent vacation to Palm Springs, I pictured my own home sitting empty. Wouldn't it be smart and better put to use welcoming a family like mine by hosting it on Airbnb? It feels like the practical thing to do, and with the extra income, I could save up for renovations to make the space even more inviting for ourselves and for future guests. Your home might be worth more than you think. Find out how much at Airbnb.ca slash host. AI keeping you up at night?
Starting point is 00:02:00 Wondering what it means for your business? Don't miss the latest season of Disruptors, the podcast that takes a closer look at the innovations reshaping our economy. Join RBC's John Stackhouse and Sonia Sinek from Creative Destruction Lab as they ask bold questions like, why is Canada lagging in AI adoption and how to catch up? Don't get left behind. Listen to Disruptors, the innovation era, and stay ahead of the game in this fast-changing world.
Starting point is 00:02:28 Follow Disruptors on Apple Podcasts, Spotify, or your favorite podcast platform. I want to tell you about a podcast I love called Search Engine, hosted by PJ Vogt. Each week, he and his team answer these perfect questions, the kind of questions that, when you ask them at a dinner party completely derail conversation. Questions about business, tech, and society. Like, is everyone pretending to understand inflation? Why don't we have flying cars yet? And what does it feel like to believe in God? If you find this world bewildering but
Starting point is 00:03:01 also sometimes enjoy being bewildered by it, check out Search Engine with PJ Vogt, available now wherever you get your podcasts. And now, our TED Talk of the day. Let me show you something. To be precise, I'm going to show you nothing. This was the world 540 million years ago. Pure, endless darkness. It wasn't dark due to a lack of light. It was dark because of a lack of sight. Although sunshine did filter a thousand meters beneath the surface of ocean, and light permeated from hydrothermal vents to seafloor. Brimming with life, there was not a single eye to be found in these ancient waters.
Starting point is 00:03:55 No retinas, no corneas, no lenses. So all this light, all this life, went unseen. There was a time that the very idea of seeing didn't exist. It was simply never been done before. Till it was. So for reasons we're only beginning to understand, trilobites, the first organisms that could sense light, emerged. They're the first inhabitants of this reality that we take for granted,
Starting point is 00:04:35 first to discover that there's something other than oneself, a world of many selves. The ability to see is thought to have ushered in Cambrian explosion, a period in which a huge variety of animal species entered fossil records. What began as a passive experience, the simple act of letting light in, soon became far more active. The nervous system began to evolve. Sight turned into insight. Seeing became understanding.
Starting point is 00:05:10 Understanding led to actions, and all these give rise to intelligence. Today, we're no longer satisfied with just nature's gift of visual intelligence. Curiosity urges us to create machines to see just as intelligently as we can, if not better. Nine years ago, on the stage, I delivered an early progress report on computer vision,
Starting point is 00:05:39 a subfield of artificial intelligence. Three powerful forces converged for the first time. A family of algorithms called neural network, fast, specialized hardware called graphic processing units, or GPUs, and big data, like the 15 million images that my lab spent years curating called ImageNet. Together, they ushered in the age of modern AI. We've come a long way. Back then, just putting labels on images were a big breakthrough. But the speed and accuracy of these algorithms just improved rapidly. The annual ImageNet challenge led by my lab gauged the performance of this progress. We went a step further and created algorithms
Starting point is 00:06:32 that can segment objects or predict the dynamic relationships among them. And there's more. Recall last time I showed you the first computer vision algorithm that can describe a photo in human natural language. That was work done with my brilliant former student, Andre Kapathy. At that time, I pushed my luck and said, Andre, can we make computers to do the reverse? And Andre said, haha, that's impossible. The impossible has become possible. That's thanks to a family of diffusion models
Starting point is 00:07:09 that powers today's generative AI algorithm, which can take a human-prompted sentence and turn them into photos and videos of something that's entirely new. Many of you have seen the recent impressive results of Sora by OpenAI. But even without the enormous number of GPUs, my student and our collaborators have developed a generated video model called Vault months before Sora. There is room for improvement. We will learn from these mistakes and create a future we imagine. And in this future, we want AI to do everything it can for us,
Starting point is 00:07:57 or to help us. For years, I have been saying that taking a picture is not the same as seeing and understanding. Today, I would like to add to that. Simply seeing is not enough. Seeing is for doing and learning. When we act upon this world in 3D space and time, we learn and learn to see and do better. Nature has created this virtuous cycle of seeing and doing
Starting point is 00:08:28 powered by spatial intelligence. The urge to act is innate to all beings with spatial intelligence, which links perception with action. And if we want to advance AI beyond its current capabilities, we want more than AI that can see and talk. We want AI that can do. Indeed, we're making exciting progress. The recent milestones in spatial intelligence is teaching computers to see, learn, do, and learn to see and do better. This is not easy. It took nature millions of years to evolve spatial intelligence,
Starting point is 00:09:15 which depends on the eye taking light, project 2D images on the retina, and the brain to translate this data into 3D information. Only recently, a group of researchers from Google are able to develop an algorithm to take a bunch of photos and translate that into 3D space. My student and our collaborators have taken a step further and created an algorithm that takes one input image and turns that into a 3D shape. Recall we talked about computer programs that can take a human sentence and turn it into videos.
Starting point is 00:09:57 A group of researchers at the University of Michigan have figured out a way to translate that line of sentence into a 3D room layout. And my colleagues at Stanford and their students have developed an algorithm that takes one image and generates infinitely plausible spaces for viewers to explore. These are prototypes of the first budding science of a future possibility, one in which the human race can take our entire world and translate into digital forms and model the richness and nuances. What nature did to us implicitly in our individual minds,
Starting point is 00:10:48 spatial intelligence technology can hope to do for our collective consciousness. As the progress of spatial intelligence accelerates, a new era in this virtual cycle is taking place in front of our eyes. This back and forth is catalyzing robotic learning, a key component for any embodied intelligence system that needs to understand and interact with the 3D world. A decade ago, ImageNet from my lab enabled a database of millions of high-quality photos to help train computers to see. Today, we're doing the same with behaviors and actions
Starting point is 00:11:32 to train computers and robots how to act in the 3D world. But instead of collecting static images, we developed simulation environments powered by 3D spatial models We're also making exciting progress in robotic language intelligence. Using large language model-based input, my students and our collaborators are among the first team that can show a robotic language intelligence in a very simple way. We're also developing a new technology called AI,
Starting point is 00:12:02 which is a new technology that is being developed to help students learn to act in a very simple way. intelligence. Using large language model-based input, my students and our collaborators are among the first team that can show a robotic arm performing a variety of tasks based on verbal instructions, like unplugging a charged phone or making sandwiches using bread, lettuce, tomatoes, and even putting a napkin for the user. Typically, I would like a little more for my sandwich, but this is a good start. In that primordial ocean in our ancient times, the ability to see and perceive one's environment kicked off the Cambrian explosion of interactions with other life forms. Today, that light is reaching the digital minds. Spatial intelligence are allowing machines to interact
Starting point is 00:12:57 not only with one another, but with humans and with 3D worlds, real or virtual. And as that future is taking shape, it'll have a profound impact to many lives. Let's take healthcare as an example. For the past decade, my lab has been taking some of the first steps in applying AI to tackle challenges that impact patient outcome and medical staff burnout. Together with our collaborators from Stanford School of Medicine and partnering hospitals, we're piloting smart sensors that can detect clinicians going into patient rooms without properly washing their hands, or keep track of surgical instruments, or alert care teams when a patient is at physical risk, such as falling. We consider these techniques a form
Starting point is 00:13:54 of ambient intelligence, like extra pairs of eyes that do make a difference. But I would like more interactive help for our patients, clinicians and caretakers who desperately also need an extra pair of hands. Imagine autonomous robots transporting medical supplies while caretakers focus on our patients. Or augmented reality guiding surgeons to do safer, faster, and less invasive operations. Or imagine patients with severe paralysis controlling robots with their thoughts, that's right, brainwaves, to perform everyday tasks that you and I take for granted.
Starting point is 00:14:41 The emergence of vision half a billion years ago turned a world of darkness upside down. It set off the most profound evolutionary process, the development of intelligence in animal world. AI's breathtaking progress in the last decade is just as astounding. But I believe the full potential of this digital Cambrian explosion won't be fully realized until we power our computers and robots with spatial intelligence, just like what nature did to all of us. It's an exciting time to teach our digital companion
Starting point is 00:15:23 to learn to reason and to interact with this beautiful 3D space we call home, and also create many more new worlds that we can all explore. To realize this future won't be easy. It requires all of us to take thoughtful steps and develop technologies that always put humans in the center. But if we do this right, the computers and robots powered by spatial intelligence
Starting point is 00:15:54 will not only be useful tools, but also trusted partners to enhance and augment our productivity and humanity while respecting our individual dignity and lifting our collective prosperity. What excites me the most in the future is a future in which that AI grows more perceptive, insightful and spatially aware,
Starting point is 00:16:21 and they join us on our quest to always pursue a better way to make a better world. Thank you. Support for this show comes from Airbnb. If you know me, you know I love staying in Airbnbs when I travel. They make my family feel most at home when we're away from home. As we settled down at our Airbnb during a recent vacation to Palm Springs, I pictured my own home sitting empty. Wouldn't it be smart and better put to use welcoming a family like mine by hosting it on Airbnb? It feels like the practical thing to do,
Starting point is 00:17:00 and with the extra income, I could save up for renovations to make the space even more inviting for ourselves and for future guests. Your home might be worth more than you think. Find out how much at airbnb.ca slash host. That was Fei-Fei Li speaking at TED 2024. If you're curious about TED's curation, find out more at TED.com slash curation guidelines. And that's it for today. TED Talks Daily is part of the TED Audio Collective. This episode was produced and edited by our team, Martha Estefanos, Oliver Friedman, Brian Green, Autumn Thompson, and Alejandra Salazar.
Starting point is 00:17:38 It was mixed by Christopher Faisy-Bogan. Additional support from Emma Taubner, Daniela Balarezo, and Will Hennessy. I'm Elise Hugh. I'll be back tomorrow with a fresh idea for your feed. Thanks for listening. Looking for a fun challenge to share with your friends and family? TED now has games designed to keep your mind sharp while having fun. Visit TED.com slash games to explore the joy and wonder of TED Games.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.