Behind The Tech with Kevin Scott - Fei-Fei Li: Human-Centered AI

Episode Date: June 21, 2019

Fei-Fei Li is a professor at Stanford University and one of the world’s pioneering researchers in AI. Her work focuses on human-centered artificial intelligence. Hear her ideas about everything from... visual intelligence to cognitive neuroscience.

Transcript
Discussion (0)
Starting point is 00:00:00 Whenever humanity creates a technology as powerful and potentially useful as AI, we owe it to ourselves and our future generation to make it right. Hi, everyone. Welcome to Behind the Tech. I'm your host, Kevin Scott, Chief Technology Officer for Microsoft. In this podcast, we're going to get behind the tech. We'll talk with some of the people who've made our modern tech world possible and understand what motivated them to create what they did. So join me to maybe learn a little bit about the history of computing and get a few behind-the-scenes insights into what's happening today. Stick around. Hello, and welcome to Behind the Tech. I'm Christina Warren, Senior Cloud Advocate at Microsoft.
Starting point is 00:00:58 And I'm Kevin Scott. And today, we're continuing our conversation about AI, this time with Stanford professor and researcher Fei-Fei Li. Now, if you didn't catch the last episode, be sure to check it out because Kevin has a conversation about AI with another Stanford academic and researcher, Surya Ganguly. And you guys went deep. Yeah, it's sometimes very hard to contain myself. We got super technical just because it was super fascinating what we were chatting about. Yeah. And so one of the things that we talked about last time, Kevin,
Starting point is 00:01:32 was the need for inspiring and positive stories about AI. And I think that you and I, I think we made a promise to write a screenplay. Yeah. In our copious free time, right? Exactly. We have so much of it. But, you know, I have been thinking about the screenplay. And so recently I was reading an article about AI in animals and how they think that they're only about a decade out from having a kind of language translator. So you mean like the dogs from the Pixar film Up, Squirrel? Yes, yes, yes. The small mailman smells like chocolate. So the research might have applications in animal welfare where, for instance, they might be able to use AI to track the faces of sheep or cows in order to detect pain and illness and then provide faster medical treatment. So they've also done some really incredible studies with prairie dogs who they believe have a very complex language system. And so that would be really tempting, in my opinion, to kind of write a show with the typical trope of the talking family pet, you know, like Mr. Ed. But, you know,
Starting point is 00:02:34 grumpy cat, may she rest in peace. You know, maybe she wasn't grumpy at all. Maybe she was like happy-go-lucky. But kind of back to the prairie dogs, you know, I do think that that's one that hasn't been done yet. You know, we could maybe do like a Little House on the Prairie remake, but with AI. Yeah, you know, we're trying to make a stab at comedy here. But funny enough, when Fei-Fei and I were chatting before the podcast, she was telling me about this Chinese company that does face recognition for cows. So, like, you may not be that far off. But despite that, like, maybe we should stick to our day jobs. Maybe, I guess. Okay. All right. So, we should definitely meet Fei-Fei Li.
Starting point is 00:03:16 Yeah, let's do it. Next up, we'll meet with Fei-Fei Li. Fei-Fei is considered one of the pioneering researchers in the field of artificial intelligence. She's a computer science professor at Stanford University and the co-director of the Human-Centered AI Institute there. Fei-Fei served as the director of Stanford's AI Lab, and during a recent sabbatical, she was a VP at Google, serving as chief scientist of AI and machine learning at Google Cloud. So thank you so much, Fei-Fei, for coming in today. I've been wanting to chat with you for a really long time now. Likewise, Kevin. Thank you for inviting me. So I usually start these things by trying to understand a little bit of the story of the folks that we're chatting with. And I'd be really interested to understand how you started to really get seriously interested in computer science. So I came to computer science through a pretty convoluted detour.
Starting point is 00:04:11 So I was always kind of a STEM kid, so to say. So I was interested in the nature, in the stars, the animals and all that. But my first passion, first love was physics. Awesome. So starting in junior high and then high school, I was just passionate about physics, studying relativity, you know, reading. And what was it about? Was it that you, like, it gave you a lens to understand the world? Was it that you just liked the mathematics of it? What was the thing?
Starting point is 00:04:44 I think it's the combination of the imagination and the mathematical rigor. But it's really about peeling off question after question to go after the very original questions, right? Like, where do I come from? Where do humans come from? What's human made of? Where do atoms come from? Where are the first atoms come from? Very's human made of? Where do atoms come from? Where do, you know, where are the first atoms come from? You very soon you go to Big Ben.
Starting point is 00:05:09 So basically you're infinitely curious. Yeah, so I was. So physics was my love and I majored in physics at Princeton when I went to college. Awesome. And, you know, Princeton is the mecca for physics. And first day in the freshman year physics class, the professor said, this is the very lecture hall that Einstein was sitting in. It was just like a dream, right? So, but around the sophomore, junior year in college,
Starting point is 00:05:41 I started reading books about these great, great physicists of 20th century, like Schrödinger and Einstein. And as I read, I noticed that towards the later half of their life, their interest turned from the physical atomic world to the more life science world. They are starting to ask questions of origin of life, intelligence, and that really piqued my interest. I started to get very interested in the question about intelligence. So I joined neuroscience research, and I was literally, as a summer intern research student, literally recording from mammalian brain and listening to the neurons, seeing the world. So fun. Yeah. So I decided to apply for grad school. And even there,
Starting point is 00:06:34 I chose to go to Caltech because I was able to find two advisors, one in neuroscience, one in, now we call AI, but at that time we call computer vision to do my PhD study in that combination. And why vision? Yeah, good question. from cats' neurons and watching their neuronal activity when cats see the world, you know, see oriented edges, see complex features. But really, if you think about vision, I want to tell you a story. 540 million years ago, the world is very different. It was mostly water and simple animals floating on earth. And there were
Starting point is 00:07:26 only a few species. And life was chill. You know, you just hang out by floating. And then what zoologists have found or evolutionary biologists have found is this incredibly mysterious phenomenon called Cambrian explosion within a short 10 million year span. In the history of Earth or in the evolution sense, it's such a narrow slice of time. The animal kingdom just exploded. Many, many animal species got created or evolved. And people called this the Big Bang of Evolution.
Starting point is 00:08:02 And no one understood why. So why from 540 million years to 530 million years ago, this Big Bang of Evolution happened? So fast forward a decade ago, a young evolutionary biologist called Andrew Parker from Australia studied a lot of fossils and conjectured, and it became a very convincing convincing theory that it was the onset of vision. It was the first animal, some kind of floating trilobite, developed a pinhole, a structure. It's very simple. It literally just collected some light. Once you see light, life changes.
Starting point is 00:08:41 You become active. You see your food and you can hide and escape to become someone else's food. So it becomes an evolutionary arms race for animal species. And with that kind of active lifestyle, so to say, because of vision, animals started to evolve much faster. Interesting. vision, animals started to evolve much faster. And fast forward 540 million years later, visual intelligence is the most fascinating to me and complex sensory system of human brain. Half of our brain are involved in visual processing and understanding. And that's a lot. And it's super interesting, because I think a lot
Starting point is 00:09:25 of people, you know, if you're trying to just sort of reflect on your, you know, your own intelligence, like a lot of that is about language, like, in fact, even the, you know, sort of the meta process of thinking about your intelligence is linguistic, like you're sort of having this dialogue with yourself and with everyone. But your suggestion is that like vision the more fundamental thing, maybe. So I'm not saying that language is not important and it's a unique part of human intelligence. But I recommend you read a book by Alison Gopnik called Scientists in a Crib. And she's a developmental psychologist and a philosopher who studies babies, very young babies. So when you say language, I just want to challenge you,
Starting point is 00:10:08 are babies intelligent before they develop language, right? In fact, they're the most fascinating creatures we create because in our first two years of life, which language is not the primary tool, they are just curious creatures exploring, understanding, and interacting with the world. They develop the theory of the other mind. They develop the sense of objects. They develop social intelligence.
Starting point is 00:10:37 They do face recognition. They navigate. They manipulate. They crawl. They understand space. And this is all without language. So I just want to highlight how incredibly deep, important, and useful visual intelligence is. And, of course, as soon as language gets developed, you can see the interaction between vision and language.
Starting point is 00:11:02 One of the most exciting areas of research that I'm doing right now is the interplay between vision and language. One of the most exciting areas of research that I'm doing right now is the interplay between vision and language. But vision is, for me, is just highly fascinating. So, tell us a little bit about your PhD work. So, like, you're in this program, you've got a neuroscience advisor and, like, basically a computer vision advisor. And so, like, what does your dissertation research look like? Yeah. So, that was a great question. So I literally did a combination of cognitive neuroscience
Starting point is 00:11:31 and computer vision. So on the cognitive neuroscience part, I started my PhD in the first year of 21st century, 2000. And little did the public know that even today's AI revolution owes a lot to the incredible advances in cognitive science, starting from 70s, 80s, and going well into the 90s, because we're mapping out some of the incredible capabilities of human intelligence system, including vision. And one of the most fascinating area of vision at that time I was studying is our ability to understand natural world. An earlier study by an English scientist living in France, Simon Thorpe, shows within 150 milliseconds of seeing a complex visual scene, humans are already capable of understanding if the scene contains an animal or it doesn't.
Starting point is 00:12:33 And here we're talking about all kinds of potential animals in all kinds of environments. So that processing was fascinating. One of my studies in PhD time was actually to quantify how much we see at the moment we open our eye from objects to people to movements. And how do you do that experiment? Oh, that's fun. So in cognitive neuroscience, it's called psychophysics. So what I would do at that time is collect hundreds of photos from Flickr, actually. And these photos are all daily user uploaded photos. So it goes from like birthday parties to, you know, surfing in the ocean to all kind of topics. And then we put a program where you code up a program, and then you put a human
Starting point is 00:13:22 subject in front of a computer screen. And then you flash the photo quickly, and you control the amount of time you flash the photo. We literally went down to 27 milliseconds all the way to 500 milliseconds to show the photo to the human. And then we control how long the picture gets seen. Right. And then we ask the people to write down what they see. Interesting. And we pay $10 for the undergrads who participate in the experiment. And then we collect a lot of data.
Starting point is 00:13:53 Of course, there is a scientific rigor had to kick in. Right. And then we statistically analyzed this and understood. It was one of the first studies that ever quantified how much people see within literally a glance of a scene. That's really interesting. So, like, in 150 milliseconds, like, your brain can see this image and decode enough of it where you can sort of explain back what's in it. Testing yourself, just open your eye and close it, and that will be longer than 150 milliseconds. But within that very short amount of time, the comprehension you have of the visual world is so rich. And that inspired my AI work.
Starting point is 00:14:34 Because at that time, most of computer vision was still, you know, recognizing letters. Oh, I remember. Right. And they were non-neural network models. Right. Well, neural network was invented. In fact,
Starting point is 00:14:49 my first course at Caltech was called neural network. But it was working on very simple stimuli like digits and numbers. And then computer vision
Starting point is 00:14:58 was still trying to understand edges. And then my advisor, Pietro Perona, was one of the pioneering scientists in computer vision who said, why don't we venture into real world object recognition? And with my study on the neuroscience side, we also have evidence that this is what humans are capable of and are good at,
Starting point is 00:15:21 and we really have to move computer vision towards that human capability. So I studied my AI research in enabling computers to see and understand everyday objects. That's a big leap. I mean, we sort of take for granted that image classifiers are reasonably good now. But when you're doing this back in 2000, to have the idea that I want to go from this relatively sort of simplistic and not to denigrate in any way all the computer vision research
Starting point is 00:15:52 because some of the best people I know, the smartest people I know were doing that work. But it's a big leap from that to I want to do whole object recognition inside of images. It was a leap. I mean, I didn't single-handedly do it. Like I said, there were a few incredibly forward-thinking scientists,
Starting point is 00:16:11 Jitendra Malik, David Lowe, Pietro Perona. They're starting to think in that way. But we've, as a field, made mistakes and had detours, right? At that time, we were thinking about how to mathematically construct those handmade models to describe objects. And that took years and years to do, and it didn't deliver the results we want. And one of the projects, it's really ironic, but also fun to reflect back is that data was so scarce at that time that my very first object recognition project is called One Shot Learning. It's to work on a setting that we only have one or two pictures to train the algorithm on. And today, you think about the big data age, and what I did for ImageNet, it was
Starting point is 00:17:05 almost the polar opposite. But it's a capability humans have. And we try to replicate that. Yeah. And I want to come back to that in a minute. But like you brought up ImageNet. So like, this is one of the things you're most well known for. And, you know, as I listened to you talking about your PhD work, it sort of seems like a natural extension to an extent of what you're what you're doing so like for the audience like why don't you describe what ImageNet is? Okay so ImageNet was a project we started in 2007 and more or less completed in 2009. The end result is at at that time, the largest database of natural object images in the world. It consisted of 15 million images organized in everyday English language of 22,000 vocabularies,
Starting point is 00:18:01 mostly nouns. And we collected this data set for about three years by labeling, cleaning, sorting almost a billion internet pictures. And what ImageNet did is it provided one of the most critical ingredients as data for enabling neural network architecture to train. And that was the onset of the deep learning revolution. Yeah, I mean, like you sort of are being a little bit understated about it, but like, it's almost impossible to imagine like how that rapid iteration loop with exploration of these DNN architectures and like the techniques that we developed to train them quickly on GPUs and whatnot. None of that really could have happened without this big database of training data.
Starting point is 00:18:51 Well, thank you. That's very nice of you for saying that. And we designed and developed ImageNet because we believed strongly in around 2006, 2007 that we have to hit the reset button for machine learning. The work we have been doing the past few years exploring different models didn't quite work for the scale and the scope of our real natural visual world of so many different objects and the varieties they represent. So my students and I conjectured that the way to really think about modeling objects through machine learning techniques is to think through
Starting point is 00:19:33 data. And that was a pretty bold statement at that time, because at that time, people were constructing small probabilistic models through hand design and a lightweight training of parameters. So for us to go in to say that, hold on, let's just rethink about this whole thing through a data point of view was kind of, you know, a minority way of thinking. Yeah. But, you know, again, like just sort of absolutely necessary. And like curious what you think about, we're sort of in this state right now where we're beginning to see people
Starting point is 00:20:13 do really interesting things with reinforcement learning and unsupervised learning where you're getting a little bit away from requiring so much explicitly labeled data. So like the data is still very important, but, like, you don't have to go through this exercise of annotating, like, oh,
Starting point is 00:20:29 there's a cat in this image, there's a red ball in this image. And, like, the results actually are the interesting ones that have come out over the past couple of years have mostly been natural language things with these big unsupervised models. Like, do you think that's sort of a trend that will continue across a bunch of different domains? Absolutely. I think this is very exciting. I think, you know, if you reflect on human intelligence, our way of learning is very multidimensional, right?
Starting point is 00:20:57 We do have training and supervision-based learning, especially when you teach a kid manner. It seems lots of supervision is needed. But children and grownups learn from trial and error, learn from few shots, learn in unsupervised setting, learn with rewards and punishment sometimes. So that kind of flexibility is clearly critical and evolution has built into human intelligence. And for machine intelligence to become more robust, to serve the human world better in different settings, I think that kind of unsupervised learning or few-shot learning or reinforcement learning is absolutely needed. I'll give you an example that I work with healthcare industry a lot because part of my research these days
Starting point is 00:21:46 deals with AI and healthcare. And one of the settings that we work with is senior well-being and senior safety. And fall detection is a huge thing for seniors. In fact, falling account for billions and billions of dollars of medical spending for American seniors. And, you know, it can be fatal. And even if not fatal, it can cause a lot of pain and issues for our aging population.
Starting point is 00:22:15 Well, when it comes to falling, it's a rare event. We immediately run into a lack of data problem when we are working with the doctors, right? It's very, very hard to collect an image net of seniors falling, and you don't want to. Right. And like the seniors don't have cameras in their homes. Right. And they fall in such different ways, and the situation is complex. Yep.
Starting point is 00:22:38 So when you want to work in these critical issues, you're immediately in this so-called few-shot learning situation. And you probably have to consider transfer learning, consider simulated learning and all that. So it just shows that the field now, I'm very glad to see, is moving beyond just large data-supervised learning. You mentioned healthcare, which is, I think, like one of the most promising focus areas for AI right now. And, you know, I know that you're one of the co-directors of the
Starting point is 00:23:13 Stanford Human-Centered AI Institute. And you're one of our advisors. So how are you thinking about what we need to do to get AI to better serve the interests of everyone? Well, Kevin, that's a big topic. And I think that's a really important topic, right? Whenever humanity creates a technology as powerful and potentially useful as AI, we owe it to ourselves and our future generation to make it right. So first of all, I think the institute that both of us are involved in is really laying out a framework of thinking about this. And the framework is human-centered, is that from the get-go, from the design and the basic science development of this technology all the way to the application and impact of this technology.
Starting point is 00:24:06 We want to make it human benevolent. And with this framework in mind, we have at Stanford, this institute works on three principles, founding principles to cover different aspects of human-centered AI. The first principle is actually what we've been talking about is to continue to develop AI technology, basic science technology that is humans-inspired and betting on the combination of cognitive science, psychology, behavior science, neuroscience to push AI forward so that the technology we will be using have better coherency or better capability to serve human society. So that's the first principle. Second principle is I would love to hear your thoughts. trained as a generation of technologists that the technology is solidly considered an engineering
Starting point is 00:25:11 field or computer science field. But I think AI really has turned a chapter. AI is no longer a computer science field. AI is so interdisciplinary today. In fact, some of the most interesting fields that AI should really contribute and also welcome to join force are social sciences and humanities. economists with ethicists, philosophers, education experts, legal scholars, and all that. To do this, our goal is to understand what this technology is really about, understand its impact, but also forecast, anticipate the perils, anticipate the pitfalls, anticipate unintended consequences, and really with the eventual goal of guided
Starting point is 00:26:04 and recommend policies that are good for all of us. So that's the second principle, really understand, anticipate, and guide AI's human and societal impact. The third and the last but not the least principle is something I know you and I feel passionate about, is really to emphasize the word enhance instead of replace. Because AI technology is talked about as a technology to replace humans. I think we should stay vigilant about job displacement and labor market,
Starting point is 00:26:40 but the real potential is using this technology to enhance and augment human capability, to improve productivity, to increase safety, and to really eventually to improve well-being of humans. And that's what this technology is about. And here we're talking about healthcare. Another vertical that we put a lot of passion and resource in is education, sustainability, manufacturing, and automation. These are really humanly and societally important areas of development. Well, just sort of sticking with healthcare and, like, your elder care example, like, this is something that I don't think a whole lot of people spend time thinking about unless they're taking care of an elderly parent or relative. We're not thinking about how systemically we can make the lives of elderly people better. And we're certainly not thinking about the big demographic shifts that are about to come.
Starting point is 00:27:40 It's going to come globally. Yeah, globally. I mean, you and I have chatted about this before, but, you know, we sort of see in almost all of the industrialized economies, but also in Japan, Korea, and China. Yeah, absolutely. You have this very large bubble of working age population that's getting older and older, and we just don't have high enough fertility rates in these younger generations to replace us. So, at some point, like, we, across the entire world, we're going to have far more old people than we will have working age people. And you have, like, a couple of big questions when that happens.
Starting point is 00:28:16 Like, who takes care of all the old people? And, like, who's going to do all the work? And it's actually not far enough away that we can not think about it. 2035 is, I think, we have to find the actual number. But the last baby boomers become the aging population, the youngest. So we're very close to that. And also, to do this research in aging population, I spend a lot of time in senior homes and senior centers. One thing I learned as a technologist is that we should really develop the kind of empathy and understanding of what we really are working on and working for.
Starting point is 00:28:57 For example, I cannot tell you how many Silicon Valley startups are there to create robots as senior companions. And when some of them feel robots can replace family, nurses, friends, I really worry. And I really want to encourage these entrepreneurs to spend a lot of time with the seniors. One thing I learned a lot about well-being with aging population is dignity, social connection is the biggest part of aging. And so my dream technology is something that you don't notice, but it's quietly there to help, to assist, to connect people, to ensure safety, rather than this big robot, you know, sitting in the middle of the living room and replacing the human connectivity. It's really funny that you're bringing all of this up. I'm writing a book right now
Starting point is 00:29:55 on why I think people should be hopeful about the potential of AI, like particularly in rural and middle America. And for the book, I went back to where I grew up in rural central Virginia in like this, you know, very small town. And I visited the nursing home where three of my grandparents spent the last chunk of their life. And I was just chatting with some of the people there. And I asked the nurses and the managers in this place, like, you know, what do you think AI, and like, when I say AI, like, the vision that conjures is like, oh, there's going to be some human equivalent android coming in, and they'd be like, no, the residents would be terrified by this thing, whereas, like, they've
Starting point is 00:30:39 got a bunch of things, like, dispensing medicine, for instance. Like, you know, when you're elderly, like you're taking this like complicated cocktail of medicines and like getting it dispensed in the right amounts at the right time through the day, making sure that you actually take the medicine. Like, that's a problem that we could solve with AI-like technologies, like, you know, combination of robotics and computer vision. But it wouldn't be like this talking, walking, you know, robot. It would be like a set of things that sort of disappear into the background and just sort of become part of the operation of the place. And like that, I think we should have more ambition for that sort of thing rather than this, you know.
Starting point is 00:31:19 That's why Stanford HAI wants to encourage that. The best technology is you don't notice the technology, but your life is better. Yes. That's the best technology. I could not agree more. And also, just talking about the rural America, this is something I feel passionate about. And I have a story to share with you. So you probably know that I co-founded and chair this nonprofit education organization called AI for All, right?
Starting point is 00:31:45 Yep. It started as a summer camp at Stanford about five years ago to encourage diversity students to get involved in AI, especially through human-centered AI, studying and research experience, to encourage them to stay in the field. And then our goal is in 10 years, we will change the workforce composition. Now, it became a national nonprofit and seed granted by Melinda Gates and Jensen Huang Foundation. That's awesome. I didn't know Jensen was involved. That's great. Yeah, it's Jensen and Lori Huang Foundation.
Starting point is 00:32:22 And this year, we're on 11 campuses nationwide. One of the populations we put a lot of focus on, in addition to gender, race, income, is geographic diversity and serving rural community. For example, our CMU campus is serving rural community in Pennsylvania. We also have Arizona campus. One story actually came out of our Stanford camp is Stephanie.
Starting point is 00:32:52 Stephanie is still a high school junior now, and she grew up in the backdrop of strawberry field in rural California in a trailer park with a Mexican mom. And she comes from that extremely rural community, but she's such a talented student and has this knack and interest for computer science. And she came to our AI for All program at Stanford two years ago. And after learning some basics about AI, one thing she really was inspired is she realized this technology is not a cold-blooded, just a bunch of codes.
Starting point is 00:33:32 It really can help people. So she went back to her rural community and started thinking about what she can do using AI to help. And one of the things she came up with is water quality. Yes. Really matters to her community.
Starting point is 00:33:45 And so she started to use machine learning techniques to look at water quality through water samples. And that's just such a beautiful example. I just love her story to show that when we democratize this technology to the communities, the diverse communities, especially these communities that technology hasn't reached enough in. The young people, the leaders, and the citizens of this community will come up with such innovative and relevant ideas and solutions to help those communities. And I think that getting this technology democratized is sort of a one-two punch. So, like, there's the technical things that you have to do.
Starting point is 00:34:28 So, and open source and, like, making sure that the research is open and freely available and being able to run these things on cloud platforms, you know, and in it. Like, all of that's super important. It's actually amazing. Cloud and edge. Yes. Cloud and edge for sure. And it's really amazing, like amazing how much is possible now. I know you probably have this all the time.
Starting point is 00:34:51 It's like you're sitting in 2019 and seeing what your students can do, and you sort of compare that to what you could do in 2000. That's because you have bright students, but it's also because the tools that are, like, incredibly sophisticated now. But that's only half of the story. Like, the other half, and, like, I'm so glad that you're doing this nonprofit work, because if we really want the benefits of this technology to be, you know, sort of equitably and widely distributed, you have to have people who have a connection to the communities and the human beings that the technology needs to serve. Absolutely. Because it's not that anybody's bad. It's just like if you don't have that context and that empathy, like you just don't really know what to do or maybe even how to do it.
Starting point is 00:35:36 Absolutely. We had an alum who one of the grandparents unfortunately passed away due to a delay of ambulance service. Now she's working on machine learning and optimizing ambulance dispatch. I think that's why we need all walks of life, is because they bring the understanding and empathy you said, and also the experience to innovate and create in ways that just one slice of people couldn't possibly cover. And you said the right thing.
Starting point is 00:36:07 People is at the heart of all this. When AI for All was founded, our slogan was, AI will change the world. Who will change AI? Right. That is the core of this problem. Yeah, that's awesome. So what are you most excited about?
Starting point is 00:36:26 And I'll ask it two different ways. So, like, what are you most excited about from a research perspective right now in AI? And, like, what are you most excited about from a social good perspective? And hopefully they actually are not mutually exclusive. Yes, I very much hope that they're not. I think from basic research science point of view, there's one direction that I'm exploring with my collaborators at students at Stanford that really excites me. And it goes back to what we were saying about the babies and scientists in the crib. Because early childhood is this rich period of learning the world that is in such fascinating ways. This is where you're not labeling a thousand cat images and showing it to a baby and say,
Starting point is 00:37:12 cat, cat, cat, right? That just doesn't work. They're just exploring out of curiosity and all that. So there's a project at Stanford I'm involved in, and I have students working on it, is curiosity-based learning. It's where we design machine learning agents and put them in unfamiliar environment, and they have capability to interact with the objects in the environment and watch how the agent, through this kind of curiosity-based learning, develop capabilities of recognizing objects or understanding physical properties of objects.
Starting point is 00:37:47 And is this a variation of reinforcement learning where – It uses reinforcement learning. It uses definitely deep learning as the early representation of the world. Deep learning is very useful. Right. It's a combination of deep learning, reinforcement learning. It's curiosity-driven. And how do you articulate the – so, like, there must be some metric for it, right? It's a combination of deep learning, reinforcement learning. It's curiosity-driven.
Starting point is 00:38:07 And how do you articulate the curiosity? So, like, there must be some metric for it, right? Right. Curiosity is expressed through your known model of the world and the difference you observe. Oh, interesting. And for babies, it's the same, right? If they keep seeing the same thing, they get bored. So, they want to explore different aspects. So, they want to create new things. So
Starting point is 00:38:25 if you give a baby a ball, maybe he or she would first look at the ball, and then he or she would drop the ball. If you give him or her two balls, they will start banging the two balls. So these are the different aspects of interacting with the world. So we start seeing that. And it's still early research, but what I would love to see is behavioral patterns emerge from the machine learning agent. And then we can do human experiment to contrast and compare and see if we can improve our machine learning algorithm, but also to see what emerges from machines that are different from humans. And do you imagine in this research that the models will be very large just because you want something that's sort of expansive and has the room to learn different sorts of
Starting point is 00:39:16 representations in this space? So this is where it's very different from the brain. We start small. The models, because they simulate environments, they're pretty simple. So a couple of objects with simple shape and color and material. But we want to grow the world, the machine agent world. And I'm not going to be surprised if this model becomes larger and larger. Yeah, I mean, the reason I ask is, like, one of the things that really has started to intrigue me over the past few years is, and, like, I think this has sort of been true for, like,
Starting point is 00:39:55 the past decade or so, the things that have been making the fastest progress in AI are things that have some sort of connection to one or more things that are growing exponentially fast. Chips. So like computing data have been the two big things, you know, that are driving. They're not the sole things. They're not driving progress at all, actually. They're facilitating very rapid progress.
Starting point is 00:40:19 And so like, I'm always looking for that connection. On the other hand, a human brain operates on less than 20 watts. Yeah, I know. It's a brilliantly efficient thing. Exactly. It doesn't take that many neurons to get the first impression of the world when you open your eyes. So there are some really interesting contrasts in biological intelligence and machine intelligence. Yeah, I'm probably getting the details on this wrong, but I remember like even a
Starting point is 00:40:46 couple of years ago reading, reading a, like a little short note in Science or Nature about how someone had used fMRI to map out the primate neural network, like biological neural network that does face recognition. And it was like tiny, like a little, little bitty network. It's called, yeah, the central area is called FFA. In fact, that was in 1990s, late 1990s, the MIT researcher Nancy Kowensher and many of her colleagues were at the forefront of that study and really give rise to a lot of neuro-correlate belief that there are areas of brain with those kind of expertise. And they're not that huge. Yeah.
Starting point is 00:41:28 And so before we get onto the social stuff, which I'm super interested in, like, tell me a little bit more about this work that you're doing that sort of blends vision and language together, because that seems really quite exciting. Yeah. So it actually is a continuation or a step forward from ImageNet. If you look at what ImageNet is, for every picture, we give one label of an object. Fine. That's cool. You have 15 million of them. It becomes a large data set to drive object recognition. But it's such an impoverished representation of the visual world. So the next step forward is obviously to look at multiple objects and, you know, be able to recognize more. But what's even more fascinating to me is not the list of 10 or 20 objects in a scene.
Starting point is 00:42:20 It's really the story. And so right after the bunch of work we have done with ImageNet around 2014 when deep learning was, you know, showing its power, my students and I started to work on what we call image storytelling or captioning. And we show you a picture. You say that two people are sitting in a room having a conversation. That's the storytelling. And that is a sentence or two, right? And honestly, I'll tell you, Kevin, when I was in grad school in early 2000, I thought I wouldn't see that happen in my lifetime.
Starting point is 00:42:58 Because it's such an unbelievable capability humans have to connect visual intelligence with language, with that. But in early 2015, my group and my students and I published the first work that shows computers having the capability of seeing a picture and generate a sentence that describes the scene. And that's the storytelling work. And we used, obviously, a lot of deep learning algorithm, especially on the language side, we use recurrent models like LSTM to train the language model, whereas on the image side, we use convolutional neural network representation. But stitching those together and seeing the effect was really quite a wow-ing moment. I could not believe that I saw that in my lifetime, that capability. Yeah.
Starting point is 00:43:59 I sort of wonder whether or not these big unsupervised language models right now, these transformer things that people are building, the models that come out of them have such, they're just very large and there's not much, you sort of barely have any signal in the parameters at all. It's just diffuse across the entire model. I just wonder whether whether getting like a vision model coordinated with training these things is going to be the way that like they more concisely learn. Oh, I see. Well, yeah, I mean human intelligence is very multimodal.
Starting point is 00:44:39 So, multimodality is definitely not only complementary but sometimes it's more efficient. Yeah. is definitely not only complementary, but sometimes it's more efficient. So we should also just recognize that, by and large, these storytelling models are still fitting patterns. They lack the kind of comprehension and abstraction and deep understanding that humans have. They can say two people are sitting in a room having a conversation, but they lack the common sense knowledge of the social interactions or, you know, why are we having eye contact or whatever, right? So, there is a lot more deeper things going on that we don't know how to do yet.
Starting point is 00:45:18 And so, on the social side, like, what are you excited about? And obviously, like, you've already talked about a ton of it. Like, you're doing this really interesting work in healthcare, HAI, like, I think both in and of itself and as, you know, just sort of an example and role model for the, like, wider academic world is, like, a fantastically good thing and sort of calling out that, like, this has to be, you know, inclusive and multidisciplinary. But, like, what are you hopeful about in the future? Oh, God, so many things, right? Even on the healthcare side, we just touched the aging population. But what I really feel so passionate about is my collaborator, Dr. Arne Milstein, and I really see that while there's a lot of talks about AI in healthcare, much of that is in the diagnosis and genomic side. But there's a huge open issue in the
Starting point is 00:46:08 care delivery side. In fact, in America, that medical error-induced fatality is causing a quarter of a million lives every year, right? Hospital-acquired infection alone kills three times more people than car accidents. And you mentioned smart sensors. The same technology we're using for self-driving cars between smart sensors and deep learning algorithms can become so critical and helpful in improving care quality from our surgical rooms to ICUs to senior homes. And so we are passionate about continue working that space and to change healthcare delivery quality. In addition to that, both by working Stanford's HAI and AI for All,
Starting point is 00:46:58 what I really want to do is to create this platform. I cannot possibly do all the work. I don't possibly have all the good ideas. But creating a platform to welcome the kind of talented people, thinkers, students, leaders, practitioners, policymakers, civic society, to participate in this effort and movement, I think will be critical for our future. You know, we talk about how to predict our future. Well, the best way to predict is to create. Yeah.
Starting point is 00:47:30 Oh, God, I could not agree more. And, like, showing everyone that, you know, there are far more hopeful paths than there are sort of pessimistic ones, like, gives everyone both inspiration and permission to go off and like create that more hopeful future. Yeah. And I particularly want to encourage and inspire people. You do not have to be a coder to join AI and to change AI. I think that myth from Silicon Valley that you have to be a, you know, a coder from 11-year-old and no TensorFlow or whatever inside out in order to be part of this AI movement.
Starting point is 00:48:11 That's absolutely not true. We need artists. We need writers. We need social scientists. We need philosophers. I totally agree. We need more people involved in more ways with this technology than we ever have in the, like, sort of lifetime of digital technologies. And, like, I would even argue that AI itself is making
Starting point is 00:48:32 the task of developing things, like the engineering task, different and more inclusive. Yes. Like, you and I, you know, got into, you know, sort of computer science because, like, you and I got into sort of computer science because we have a certain sort of analytical way of seeing the world, and we really enjoy all of the apparatus of that analytical world. But there are these machine teaching systems where you're going to be able to, rather than tell the computer what to do in these minute step-by-step algorithmic ways, you're going to be able to teach a computer what to do in these, like, minute, you know, step-by-step algorithmic ways, you're going to be able to teach a computer how to do something.
Starting point is 00:49:09 Exactly. And, like, that is a really, like, much broader mode of, you know, sort of building these bits of technology. I can't tell you how many artists have reached out to me and to Stanford HAI about AI helping the creative process. They're so excited. Yeah, it's super exciting. One of my, you know, I hope to be able to get them on the podcast at some point, but there's this fantastically talented young jazz musician named Jacob Collier who he's like a genius with harmonic theory.
Starting point is 00:49:41 And he does like all of these like super interesting, innovative arrangements. And like he got famous by recording these layered things that he was making on YouTube, but he like really enjoys performing these things live. And so he's been collaborating with this really talented engineer at MIT to build instruments where he can reproduce some of this self-harmonization stuff. Oh, that's so cool. And, like, AI is going to do nothing but help him, like, be able to, like, deliver these richer experiences to his audience. I mean, it's just amazing. Like, this is stuff that makes me really, really super excited.
Starting point is 00:50:22 So, like, one last question before we wrap up. So, I know you're a mom, and you've got a nonprofit. You're an institute director. You're a professor. Yeah, researcher. You know, like, you were just telling me you've got, like, this stack of submissions that are going into the neural infrastructure. In 24 hours. Yeah, and so, like, thank you, by the way, for doing the podcast when you've got this big deadline.
Starting point is 00:50:44 But, you know, aside from these things, like, what do you do for fun? You know what? My students asked me the same question a month ago. And they even laughed. They don't think I could have a good answer. And I don't know if I could have a good answer. So, what do you define fun? The bliss for myself is that my work is fun to me.
Starting point is 00:51:04 Hanging out with my kids is fun for me. I mean, granted, if they throw a tantrum, it's not a fun moment. But I love being with my kids. I love my students, talking to them on research ideas. Even if we come up with a bunch of stupid ideas, that process is fun. HAI is so much fun meeting. You know, we have 200-plus faculty across the campus working on different aspects. Just talking to any one of them is fun. HAI is so much fun meeting, you know, we have 200-plus faculty across the campus working on different aspects. Just talking to any one of them is fun.
Starting point is 00:51:30 So from that point of view, I mean, I do miss some of the early pre-kids, two-people world where my husband and I would go for movies or travel to a foreign country. I haven't had that for a while. I mean, I travel, but not for vacation. But I love reading. I always read a lot of different books. I love food. Good food is always fun. Yeah, so when I was a kid, I actually do painting.
Starting point is 00:52:01 One day I'll pick that up. Well, I think you're right. What do you do for fun? I pray for more time because I – Podcasting is fun. It's really just a great thing to be able to enjoy the work that you're doing, to sort of combine the things that you're most interested in with the things that, you know, are somehow creating some sort of positive benefit for other people.
Starting point is 00:52:31 Yeah, like feeling passionate. And so, like, that for me is fun. Yeah, exactly. Like, in one month, Stanford is going to welcome our 2019 class of AI for all students. I'm just so looking forward to that, right? Like, I'll be getting to know another group of 32 unbelievable high schoolers. And that's fun. Yeah. My fun, like if I had to boil it down into two things, it is being able to do something that fulfills my curiosity and to be able to make
Starting point is 00:53:00 things, you know, aside from like my number one fun thing is being with my kids. But, you know, like if I'm just sort of looking selfishly at myself, it's like the, you know, sort of the curiosity in making. Absolutely. Yeah. Absolutely. Awesome. Being a professor is a lot of fun. Yeah. Which is great to hear because like I contemplated being a professor for a long while.
Starting point is 00:53:22 It's never too late. No, I think it might be too late for me, Fei Fei. Well, if you don't try, how do you know? Well, thank you. Okay, thank you. Thank you so much for being on the podcast. Thank you, Kevin. And more importantly, like, thank you for, like, all of the great work that you're doing now.
Starting point is 00:53:38 And good luck to your book. I look forward to that. Thank you. Awesome. Thank you. Awesome. Thank you. We hope you enjoyed Kevin's interview with Fei-Fei Li, researcher and professor at Stanford University. So, what I thought was really
Starting point is 00:54:00 interesting about this conversation, Kevin, was she described her convoluted, in her words, entry point to computer science and that her passion for physics was kind of what led her into this. But this is something that we've kind of seen with other guests on the show, where people have an untraditional way of getting into these subjects. Yeah. And it's really amazing, like a bunch of different people find their way to AI from a bunch of different paths, although the physics one is not that uncommon. I was going to say, that's the one we've kind of heard again and again, is that there's something to that, I guess. I've been thinking for a bit about this idea that maybe artificial intelligence for human intelligence is sort of the same thing as physics is for the natural world.
Starting point is 00:54:46 So, like, when you think about physics, it's the way that a curious person can approach, like, all of these super complicated phenomena that occur in the natural world. And so, like, you can describe them and, like, build these models of them and understand them and be able to predict them. And human intelligence is a super duper complicated thing. I'm sure that one of the first thoughts that human beings had as soon as we were self-aware and had language was like, what is this thing? Like, you know, why do I have the thoughts that I have? Like, what's the nature of my own intelligence? And so we've been thinking about it philosophically for thousands and thousands of years, just sort of the nature of human intelligence. And we've been thinking about it scientifically for several hundred years now with increasingly that, you know, one of the things
Starting point is 00:55:48 that may be very interesting about AI is that it could be a system that shines light on how human intelligence actually works by giving us a way to model it in some, you know, analytical system. Yeah, no, and that's really interesting, too. I think when you look at the role that physics obviously plays with things like quantum and how that could then also go into looking at those models and furthering AI and whatnot. And that kind of leads me to another thing that Feifei was talking about, and you were as well, but this concept of human-centered AI and the idea that AI isn't only computer science, that it's interdisciplinary. Yeah. And I think we can see that more and more all the time. I mean, it's very obvious to me,
Starting point is 00:56:32 at least, that if you are building a technology that's going to have such a massive potential impact on the world that you want everyone to be thinking as hard as they possibly can about how to make sure that it is sort of providing some set of human-centered benefits that are equitably distributed and sort of fair, like all of the things that we just sort of want for society itself, like we should embed into AI and guide its development accordingly. And I think that's not just a, I mean, it's obviously not a computer science only thing. It has to be about philosophers and ethicists and economists and business folks and historians and writers and artists.
Starting point is 00:57:24 And so, like, we really, really have to make all of this a multidisciplinary effort if we want to get a thing that is truly a reflection of our own humanity. One of the things that I really liked about your discussion is that oftentimes you and I, when we talk about AI, it's about, like, the downsides or the potential challenges and maybe the scary aspects. In this case, really, the idea of AI, you know, augmenting or enhancing and helping rather than replacing how things work in the world. How can we use this to make things better rather than how is this a threat? Yeah, and this is the thing that I tell people all the time. AI is just another tool that we human beings have invented to do things.
Starting point is 00:58:09 And, like, we get to choose what we have the tool do. And, like, when we make choices about, say, for instance, applying AI to healthcare to make things less expensive and more accessible and higher quality for everyone, like, it obviously creates this amazing positive human benefit. And so, like, I think the trick to getting, you know, the balance of AI to be beneficial and good for everyone is us choosing to do that. And so, it's, like, really, really amazing to have a computer scientist and one of the pioneers of the field like Fei Fei spending so much of her energy thinking about what those beneficial applications of AI are. No, I totally agree. It definitely makes me feel better, I guess, about like the future, both of, humanity and, you know, the world we live in with all of this stuff.
Starting point is 00:59:07 Yeah. I have faith in us. Great. Okay. So we are out of time for now. But if you haven't listened to all of our past podcasts, you might want to spend a few minutes catching up. Now, Kevin, do you have a favorite past episode? I love them all equally, like my children.
Starting point is 00:59:25 My children are, like like my media collection, so I know how you feel. Okay, but our listeners will have to make the decision for themselves. But you can also write to us anytime at BehindTheTech at Microsoft.com and tell us what's your favorite show and maybe what you'd like to hear more about. Yeah, absolutely. We'd love to hear from you. And with that, we'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.