TED Talks Daily - With spatial intelligence, AI will understand the real world | Fei-Fei Li
Episode Date: May 15, 2024In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a si...milar moment is about to happen for computers and robots. She shows how machines are gaining "spatial intelligence" — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.
Transcript
Discussion (0)
TED Audio Collective.
You're listening to TED Talks Daily,
where we bring you new ideas to spark your curiosity every day.
I'm your host, Elise Hume.
At TED 2024, the Vancouver Conference Center buzzed with new ideas every morning.
Each of the speakers this morning helped me to have a far more expansive view of what the beautiful and powerful potential of AI is.
AI. It's on the topic of everyone's mind lately, and we've certainly brought it up on this podcast many, many times.
So amid all of the hype, it just made sense to welcome one of the pioneers of artificial intelligence to the TED stage. It was really exciting to see some of the people who have kind of invented the ground of which we walk on in the AI space.
Fei-Fei Li, for instance, she really talks about the relationship between the real world and then this theoretical AI world.
And I just really adore that.
For more than two decades, Fei-Fei Li has been at the forefront of AI.
She is the founding director of the Stanford Institute
for Human-Centered AI. In her talk, she takes us on a journey through the past, present,
and future of this technology. But first, a quick break to hear from our sponsors.
Support for this show comes from Airbnb. If you know me, you know I love staying in Airbnbs when
I travel. They make my family feel most at home when we're away from home.
As we settled down at our Airbnb during a recent vacation to Palm Springs,
I pictured my own home sitting empty.
Wouldn't it be smart and better put to use welcoming a family like mine by hosting it on Airbnb?
It feels like the practical thing to do, and with the extra income,
I could save up for renovations to make the space even more inviting for ourselves and for future guests.
Your home might be worth more than you think.
Find out how much at Airbnb.ca slash host.
AI keeping you up at night?
Wondering what it means for your business?
Don't miss the latest season of Disruptors, the podcast that takes a closer look at the innovations reshaping our economy.
Join RBC's John Stackhouse and Sonia Sinek from Creative Destruction Lab
as they ask bold questions like,
why is Canada lagging in AI adoption and how to catch up?
Don't get left behind.
Listen to Disruptors, the innovation era,
and stay ahead of the game in this fast-changing world.
Follow Disruptors on Apple Podcasts, Spotify,
or your favorite podcast platform.
I want to tell you about a podcast I love
called Search Engine, hosted by PJ Vogt.
Each week, he and his team answer these perfect questions,
the kind of questions that, when you ask them at a dinner party completely derail conversation. Questions about business,
tech, and society. Like, is everyone pretending to understand inflation? Why don't we have flying
cars yet? And what does it feel like to believe in God? If you find this world bewildering but
also sometimes enjoy being bewildered by it, check out Search Engine with PJ Vogt, available now wherever you get your podcasts. And now, our TED Talk of the day.
Let me show you something. To be precise, I'm going to show you nothing. This was the world 540 million years ago. Pure, endless darkness.
It wasn't dark due to a lack of light.
It was dark because of a lack of sight.
Although sunshine did filter a thousand meters beneath the surface of ocean,
and light permeated from hydrothermal vents to seafloor.
Brimming with life,
there was not a single eye to be found in these ancient waters.
No retinas, no corneas, no lenses.
So all this light, all this life, went unseen.
There was a time that the very idea of seeing didn't exist.
It was simply never been done before.
Till it was.
So for reasons we're only beginning to understand,
trilobites, the first organisms that could sense light, emerged.
They're the first inhabitants of this reality that we take for granted,
first to discover that there's something other than oneself,
a world of many selves.
The ability to see is thought to have ushered in Cambrian explosion, a period in which a huge variety of animal species entered fossil records.
What began as a passive experience, the simple act of letting light in,
soon became far more active.
The nervous system began to evolve.
Sight turned into insight.
Seeing became understanding.
Understanding led to actions,
and all these give rise to intelligence.
Today, we're no longer satisfied
with just nature's gift of visual intelligence.
Curiosity urges us to create machines to see just as intelligently as we can,
if not better.
Nine years ago, on the stage,
I delivered an early progress report on computer vision,
a subfield of artificial intelligence.
Three powerful forces converged for the first time. A family
of algorithms called neural network, fast, specialized hardware called graphic processing
units, or GPUs, and big data, like the 15 million images that my lab spent years curating called ImageNet. Together, they ushered in the age of
modern AI. We've come a long way. Back then, just putting labels on images were a big breakthrough.
But the speed and accuracy of these algorithms just improved rapidly. The annual ImageNet challenge led by my lab
gauged the performance of this progress.
We went a step further and created algorithms
that can segment objects
or predict the dynamic relationships among them.
And there's more.
Recall last time I showed you the first computer vision algorithm that can describe a
photo in human natural language. That was work done with my brilliant former student, Andre Kapathy.
At that time, I pushed my luck and said, Andre, can we make computers to do the reverse?
And Andre said, haha, that's impossible. The impossible has become possible.
That's thanks to a family of diffusion models
that powers today's generative AI algorithm,
which can take a human-prompted sentence
and turn them into photos and videos of something that's entirely new.
Many of you have seen the recent impressive results of
Sora by OpenAI. But even without the enormous number of GPUs, my student and our collaborators
have developed a generated video model called Vault months before Sora. There is room for improvement.
We will learn from these mistakes and create a future we imagine.
And in this future, we want AI to do everything it can for us,
or to help us.
For years, I have been saying that taking a picture is not the same as seeing and understanding.
Today, I would like to add to that.
Simply seeing is not enough.
Seeing is for doing and learning.
When we act upon this world in 3D space and time,
we learn and learn to see and do better.
Nature has created this virtuous cycle of seeing and doing
powered by spatial intelligence. The urge to act is innate to all beings with spatial intelligence,
which links perception with action. And if we want to advance AI beyond its current capabilities,
we want more than AI that can see and talk.
We want AI that can do.
Indeed, we're making exciting progress.
The recent milestones in spatial intelligence
is teaching computers to see, learn, do, and learn to see and do better.
This is not easy. It took nature millions of years to evolve spatial intelligence,
which depends on the eye taking light, project 2D images on the retina, and the brain to translate this data into 3D information.
Only recently, a group of researchers from Google
are able to develop an algorithm to take a bunch of photos
and translate that into 3D space.
My student and our collaborators have taken a step further
and created an algorithm that takes one input image and turns that into a 3D shape.
Recall we talked about computer programs
that can take a human sentence and turn it into videos.
A group of researchers at the University of Michigan
have figured out a way to translate that line of sentence into a 3D room layout.
And my colleagues at Stanford and their students
have developed an algorithm that takes one image
and generates infinitely plausible spaces for viewers to explore.
These are prototypes of the first budding science of a future possibility,
one in which the human race can take our entire world and translate into digital forms and model
the richness and nuances. What nature did to us implicitly in our individual minds,
spatial intelligence technology can hope to do for our collective consciousness.
As the progress of spatial intelligence accelerates,
a new era in this virtual cycle is taking place in front of our eyes.
This back and forth is catalyzing robotic learning, a key component
for any embodied intelligence system that needs to understand and interact with the 3D world.
A decade ago, ImageNet from my lab enabled a database of millions of high-quality photos
to help train computers to see.
Today, we're doing the same with behaviors and actions
to train computers and robots how to act in the 3D world.
But instead of collecting static images,
we developed simulation environments powered by 3D spatial models We're also making exciting progress in robotic language intelligence.
Using large language model-based input,
my students and our collaborators are among the first team
that can show a robotic language intelligence
in a very simple way.
We're also developing a new technology called AI,
which is a new technology that is being developed to help students learn to act in a very simple way. intelligence. Using large language model-based input, my students and our collaborators are
among the first team that can show a robotic arm performing a variety of tasks based on verbal
instructions, like unplugging a charged phone or making sandwiches using bread, lettuce, tomatoes, and even putting a napkin for the user.
Typically, I would like a little more for my sandwich, but this is a good start.
In that primordial ocean in our ancient times, the ability to see and perceive one's environment
kicked off the Cambrian explosion of interactions with other life forms.
Today, that light is reaching the digital minds.
Spatial intelligence are allowing machines to interact
not only with one another,
but with humans and with 3D worlds, real or virtual.
And as that future is taking shape, it'll have a profound impact to many lives. Let's take healthcare as an example. For the past decade,
my lab has been taking some of the first steps in applying AI to tackle challenges that impact patient outcome and medical staff burnout.
Together with our collaborators from Stanford School of Medicine and partnering hospitals,
we're piloting smart sensors that can detect clinicians going into patient rooms without
properly washing their hands, or keep track of surgical instruments, or alert care teams
when a patient is at physical risk, such as falling. We consider these techniques a form
of ambient intelligence, like extra pairs of eyes that do make a difference. But I would like more interactive help for our patients, clinicians and caretakers
who desperately also need an extra pair of hands.
Imagine autonomous robots transporting medical supplies
while caretakers focus on our patients.
Or augmented reality guiding surgeons
to do safer, faster, and less invasive operations.
Or imagine patients with severe paralysis controlling robots with their thoughts,
that's right, brainwaves, to perform everyday tasks that you and I take for granted.
The emergence of vision half a billion years ago turned a world of darkness
upside down. It set off the most profound evolutionary process, the development of
intelligence in animal world. AI's breathtaking progress in the last decade is just as astounding.
But I believe the full potential of this digital Cambrian explosion
won't be fully realized
until we power our computers and robots with spatial intelligence,
just like what nature did to all of us.
It's an exciting time to teach our digital companion
to learn to reason and to interact
with this beautiful 3D space we call home,
and also create many more new worlds that we can all explore.
To realize this future won't be easy.
It requires all of us to take thoughtful steps
and develop technologies that always put humans in the center.
But if we do this right,
the computers and robots powered by spatial intelligence
will not only be useful tools,
but also trusted partners
to enhance and augment our productivity and humanity
while respecting our individual dignity
and lifting our collective prosperity.
What excites me the most in the future
is a future in which that AI grows more perceptive,
insightful and spatially aware,
and they join us on our quest to always pursue a better way
to make a better world. Thank you.
Support for this show comes from Airbnb. If you know me, you know I love staying in Airbnbs when
I travel. They make my family feel most at home when we're away from home.
As we settled down at our Airbnb during a recent vacation to Palm Springs,
I pictured my own home sitting empty.
Wouldn't it be smart and better put to use welcoming a family like mine by hosting it on Airbnb?
It feels like the practical thing to do,
and with the extra income, I could save up for renovations to make the space even more inviting for ourselves and for future guests. Your home might be worth more than you think.
Find out how much at airbnb.ca slash host. That was Fei-Fei Li speaking at TED 2024.
If you're curious about TED's curation, find out more at TED.com slash curation guidelines.
And that's it for today.
TED Talks Daily is part of the TED Audio Collective.
This episode was produced and edited by our team,
Martha Estefanos, Oliver Friedman, Brian Green,
Autumn Thompson, and Alejandra Salazar.
It was mixed by Christopher Faisy-Bogan.
Additional support from Emma Taubner,
Daniela Balarezo, and Will Hennessy.
I'm Elise Hugh. I'll be back tomorrow with a fresh idea for your feed. Thanks for listening.
Looking for a fun challenge to share with your friends and family? TED now has games designed
to keep your mind sharp while having fun. Visit TED.com slash games to explore the joy and wonder
of TED Games.