Y Combinator Startup Podcast - Fei-Fei Li: Spatial Intelligence is the Next Frontier in AI

Starting point is 00:00:00 My entire career is going after problems that are just so hard, bordering delusional. To me, AGI will not be complete without spatial intelligence. And I want to solve that problem. I just love being an entrepreneur. Forget about what you have done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone.

Starting point is 00:00:25 So I'm super excited here to have Dr. Fei-Fei Lee. She has such a long career in AI. I'm sure a lot of you know her, right? Raise your hand. I know you too. She's been named, because of the godmother of AI. One of the first projects that you created was ImageNet in 2009, 16 years ago. Oh my God.

Starting point is 00:01:05 Don't remind me of that. Now it has over 80,000 citations, and it really kicked off one of the legs of tools for AI, which is the data problem. Tell us about how that project came about. It was pretty pioneering work back then. Yeah, well, first of all, Diana and Gary and everybody, thanks for inviting me here. I'm so excited to be here because I feel like I'm just one of you. I'm also an entrepreneur right now.

Starting point is 00:01:35 I just started a small company, so very excited to be here. ImageNet was, yeah, you're right. We actually conceived that almost 18 years ago. Time really flies. I was a first year assistant professor at Princeton. Oh, wow. Hi! Hi, Tigers.

Starting point is 00:01:59 Yeah, and the world of AI and Machine Learning was so different at that time. There was very little data. Algorithms, at least in computer vision, did not work. There was no industry. As far as the public was concerned, the word AI doesn't exist. But there is still a group of us, starting from the founding fathers of AI, right? John McCarthy, and then we go through people like Jeff Hinton.

Starting point is 00:02:30 I think we just had an AI dream. We really, really want to make machines to think. think and to work. And with that dream, I was, my own personal dream was to make machines see. Because seeing is such a cornerstone of intelligence. Visual intelligence is not just perceiving, it's really understanding the world and do things in the world. So I was obsessed with the problem of making machine see.

Starting point is 00:02:57 And as I was obsessively, obsessively developing machine learning algorithms, at that time, we did try neural network, but it didn't work. We pivoted to base net, to support vector machines, whatever it was. But one problem always haunted me, and it was the problem of generalization. If you work in machine learning, you have to respect that generalization is the core mathematical foundation or goal of machine learning. And in order to generalize these algorithms, these data, yet no one had data at that time in computer vision.

Starting point is 00:03:34 And I was the first generation of grad student who was starting to dabble into data because I was the first generation of graduate student who saw the internet, the big internet of things. So fast forward around 2007-ish, my student and I decided that we have to take a bold bet. We have to bet that there needs to be a paradigm shift in machine learning. And that paradigm shift has to be led by data-driven methods. And there was no data. So we're like, OK, let's go to the internet, download a billion images. That's the highest number we can get on the internet, and then just create the world's

Starting point is 00:04:18 the entire world's visual taxonomy. And we use that to train and benchmark machine learning algorithm. And that was why image that was conceived and came to life. And it took a while until there were algorithms that were promising. It wasn't until 2012 when Alex Nett came out, and that was the second part of the equation with getting to AI, was getting the compute and throwing enough at it and algorithms. Tell us about what was that moment where you started to see,

Starting point is 00:04:52 oh, you seeded it with data, and now the community started to figure more things out for AI. Right. So between 2009, we published this tiny little CVPR poster in 2009 to 2012, the Alex that there were three years that we really believe that data will drive AI, but we had very little signal in terms of if that was working. So we did a couple of things. One is we open source. We believed from the get-go we have to open-source this to the entire research community.

Starting point is 00:05:32 for everybody to work on this. The other thing we did is we created a challenge because we want the whole world's smartest students and researchers to work on this problem. So that was what we call the ImageNet challenge. So every year we release a testing data set. Well, the whole image that is there for training, but we release testing,

Starting point is 00:05:58 and then we invite everybody openly to participate And then the first couple of years was really setting the baseline. You know, the performance was in the 30% error rate. It wasn't zero. I mean, it wasn't completely random, but it wasn't that great. But the third year 2012, I, you know, I wrote this in a book that I published, but I still remember it was around the end of summer that summer that we were taking all the results of ImageNet Challenge and running it on our servers.

Starting point is 00:06:39 And I remember it was late night. One day I got a ping from my graduate student. I was home and said, we got a result that really, really stand out. And you should take a look. And we looked into it. It was convolutional neural network. something, it wasn't called Alex at that time, that team, that Jeff Hinton's team was called the supervision. It was a very clever play of the word super as well as supervised learning,

Starting point is 00:07:11 so supervision. And we look at what supervision did. It was an old algorithm. Convolution on Neuronetwork was published in the 1980s. There was a couple of tweaks in terms of the algorithm, But it was pretty surprising at the beginning for us to see that there was such a step change. And of course, we, you know, I mean, the rest of the history, you all know, we presented this in the image that challenge workshop in that year's ICCV Flores, Italy. and Alex Khrushchevsky came, and many people came. I remember Yang LeKugos also came, and now the world knows this moment as the ImageNet Challenge Alex that moment. I do want to say that it's not just convolutional neural network.

Starting point is 00:08:11 It was also the first time that two GPUs were put together by Alex, his team, and were used for the computing of deep learning. So it was really the first moment of data, GPUs, and neural network coming together. Now, following this trend of the arc of intelligence for computer vision, ImageNet was really the seat to solve the concept of object recognition. Then, right after that, it started to also AI got to the point that could solve the scenes, right? because you had a lot of the work with your students like Andrew Kaparthi being able to describe scenes. Tell us about that transition from objects to scenes.

Starting point is 00:08:55 Yeah, so ImageNet was solving the problem of you're presented with the image and then you call out objects. There's a cat, there's a chair and all that. That's a fundamental problem in visual recognition. But ever since I was a graduate student entering the field of AI, I had a dream. I thought it was a 100 year dream, which is storytelling of the world, is that when humans open their eyes, imagine you just open your eye in this room. You don't just see person, person, person, chair, chair.

Starting point is 00:09:31 You actually see a conference room, you know, with screen, with stage, with people, with the crowd, the cameras. You actually can describe the entire. scene. And that's a human ability that is at the foundation of visual intelligence. And it's so critical for us to use in terms of our everyday life. So I really thought that problem will take my entire life. I literally, when I graduated as a graduate student, I told myself, on my deathbed, if I can create an algorithm that can tell the story of a scene, I've succeeded. That was how I thought

Starting point is 00:10:14 my career will be. Imagine Alex that moment came. Deep learning took off. And then when Andre and then later Justin Johnson enter my lab, we start to see signals of natural language, you know, and visions start to collide. And then Andre's and I proposed this problem of captioning images or storytelling.

Starting point is 00:10:41 And long story short, 2015. Around 2015, Andre and I published a series of papers that was among the first with a couple of concurrent papers of making literally a computer that captioned an image. It was, I almost felt like, what am I going to do with my life? That was my lifelong goal, you know. It was such an incredible moment for both of us. And, you know, last year I gave a TED talk, and I actually used something that Andre tweeted a couple of years ago. Around the time he finished image captioning work,

Starting point is 00:11:30 that was pretty much his dissertation, I actually joked with him. I said, hey, Andre, why don't we do the reverse? Take a sentence and generate an image. And of course, he knew I was joking. and he said, ha ha, I'm out of here. The world was just not ready. But now fast forward, now we all know, generative AI.

Starting point is 00:11:53 You know, now we can take a sentence and generate beautiful pictures. So the moral of the story is AI has seen incredible growth. And personally, I feel I'm the luckiest person in the world because my entire career started at the very, beginning of the end of AI winter, the beginning of AI starting to, you know, take off. And so much part of my own work, my own career is part of this change or helped with this change. So I feel so fortunate and lucky and in a way proud. And I think the wildest thing, even to achieve your lifelong dream of describing a season,

Starting point is 00:12:42 and even generating them with diffusion models, you actually dream and bigger because the whole arc of computer vision went from objects to ascents and now this concept of world. And you actually decided to move from academia being a professor to now being a founder and CEO of World Labs. Tell us about what world is. It's even harder than scenes and objects.

Starting point is 00:13:07 Yeah, it is. It is kind of wild. So, of course, you all know the past, it's really hard to summarize the past five or six years. For me, we're living in such a civilizational moment of this technology's progress, right? While computer vision, as a computer vision scientist, we're seeing this incredible growth, you know, from image net to image captioning to image generation using some of the diffusion techniques. While this is happening in a very exciting way, we also have another extremely exciting thread, which is language, which is LLMs, which is that really 2022 November Chad GBTBT blasted open the door of truly working generation models that can basically

Starting point is 00:14:04 pass the touring test and all that. So, so that, that, that, This becomes very inspirational, even for someone as old as me, is to really think audaciously about what's next. And I have a habit as a computer vision scientist. A lot of my inspiration actually come from evolution as well as brain science. I find myself in many moments of my career where I'm looking for the next North Star problem to solve. I asked myself what evolution has done or what brain development has done. And there's something that's really important to notice or to appreciate. The development of human language in evolution took about, if you're super generous,

Starting point is 00:15:01 let's just say it took about 300 to 500 million years, less than a million years. That's the length of evolution that took to develop a human language. And pretty much humans are the only animals that have sophisticated language. We can argue about animal language, but really language in its totality in terms of being a tool of communication, reasoning, abstraction, it's really humans. So that took less than even half a million years. But think about vision. Think about the capability of understanding 3D world, figuring out what to do in this 3D world,

Starting point is 00:15:44 navigate the 3D world, interact with the 3D world, comprehend the 3D world, communicate the 3D world. That journey took evolution 540 million years. The first trilobite developed a sense of vision underwater, 500,000. 140 million years ago. And since then, really vision was the reason that set off this evolutionary arms race. Before vision, animals were simple for, you know, the half billion years before vision, there's just simple animals. But the next half billion years, 540 million years, because of the capability of seeing the world,

Starting point is 00:16:29 understanding the world, evolutionary arms race began. and animal intelligence just start to race each other. So for me, solving the problem of spatial intelligence, to understand the 3D world, to generate the 3D world, to reason about the 3D world, to do things in the 3D world is a fundamental problem of AI. To me, AGI will not be complete without spatial intelligence. And I want to solve that problem.

Starting point is 00:17:00 And that involves creating world. models, world models that goes beyond flat pixels, world models that goes beyond language, world model that truly capture the 3D structure and the spatial intelligence of the world. And the luckiest thing in my life is no matter how old I am, I always get to work with the best young people. So I, you know, I founded a company with three incredible companies. young but world-class technologist, Justin Johnson, Ben Mildenhall and Christopher Laster. And we are just going to try to solve, in my opinion, the hardest problem in AI right now. Which is incredible talent. I mean, Chris, he was the creator of Pulsar, which was the initial

Starting point is 00:17:55 seat before Gaution Splats, that there were a little differentiable rendering. There's Justin Johnson your former student who really has this super system engineering mind that got real-time neural style transfer, then you got Ben, who was the author of NERF paper. So this is a super crack team. And you need such a crack team because we were chatting a bit about that, that vision is actually harder than LLM to some extent. Maybe this is a controversial thing to say, because LLMs are basically 1D, right? But you're talking about understanding a lot of the 3D structures. Why is this so hard? And it's still behind language. Research. Yeah, no, I really appreciate Diana. You empathize how hard our problem is. Yeah. So language is fundamentally 1D, right? Syllabus comes in sequence.

Starting point is 00:18:53 I mean, this is why sequence to sequence. Sequence modeling is so classic. There's something else that is language that people don't appreciate. Language is purely generated. There's no language in nature. You don't touch language. You don't see language. Language literally comes out of everybody's head, and that's a purely generative signal. Of course, you put it on a piece of paper, it's there, but the generation, the construction, the utility of language is very, very generative. The world is far more complex than that. First of all, the real world is 3D. And if you add time, it's 4D,

Starting point is 00:19:37 but let's just confine ourselves within space. It's fundamentally 3D. So that by itself is a much more combinatorially harder problem. Second, the sensing, the reception of the visual world is a projection. is a projection. Whether it's your eye, your retina, or a camera, it's always collapsing 3D to 2D. And you have to appreciate how hard it is. It's mathematically ill-posed. So you have to, this is why humans and animals have multi-censors, and then you have to solve that problem. And third, the world is not purely generated. Yes, we could generate virtual 3D world. It's

Starting point is 00:20:27 still has to obey physics and all that, but there is also a real world out there. You are now suddenly dialing between generation and reconstruction in a very fluid way, and the user behavior, the utility, the use cases are very different. If you dial all the way to generation, we can talk about gaming and metaverse and all that if you dial all the way to real world, you're talking about robotics and all that. But all this is on the continuum of world modeling and spatial intelligence. So it's a, and of course the elephant in the room is there's a lot of data on the internet for language.

Starting point is 00:21:14 And where is the data for spatial intelligence? You know, it's all in our head, of course, but it's not as easily as accessible. accessible as language. So these are the reason it's so hard, but frankly, it excites me because if it's easy, somebody else has solved it. And my entire career is going after problems that are just so hard, bordering delusional. And I think this is the delusional problem. Thank you for supporting that. And even thinking about this from first principles, the huge. Human brain has a lot more in the visual cortex and amount of neurons that process visual data as opposed to language. How does that translate into the model architectures? They're very different from LLMs, from what you're kind of finding out, right? Yeah.

Starting point is 00:22:15 That's actually a really good question. I mean, there's still different schools of thoughts out there, right? There is the LLM, a lot of what we see in LLM is really writing, the writing, skating, all the way to happy ending. And you can almost, you can just brute force self-supervision all the way. Constructed world model might be a little more nuanced. The world is more structured. There might be signals that we need to use to guide it. You can call it in a shape of prior. You can call it supervision in your data, whatever it is. I think that these are some of the open questions that we have to solve, but you're right.

Starting point is 00:23:04 And also, if you think about human, first of all, we don't have all the answers even to human perception, right? How does 3D work in human vision is not a solve problem? We know mechanically the two eyes had to triangulate information, but even after that, where is the mathematical model and we're not that great. Humans are not that great as 3D animals. So there's a lot that that is to be to be answered. So we are definitely at World Lab. I'm just counting on really counting on one thing. I'm counting on we have the smartest people in the pixel world to solve this. Is it hard to say that what you're building at World Labs is these

Starting point is 00:23:51 whole new foundation models where the output are 3D worlds. And what are some of the applications that you're envisioning? Because I think you listed everything from perception to generation. This is always this tension between generative models and discriminated models. So what would these 3D worlds do? Yeah, so I'm not going to be able to talk too much about the details of world. labs per se, but in terms of spatial intelligence, that's what it also excites me. Just like language, the use case is so huge from creation, which you can think about

Starting point is 00:24:34 designers, architects, industrial designers, industrial designers, as well as just artists, 3D artists, game developers, from creation all the way to robotics, robotic learning. The utility of spatial intelligence model or world models is really, really big. So, and then there are many related industries from marketing to entertainment to even Metaverse. I'm actually really, really excited by Metaverse. I know so many people are kind of still like, ugh, it's still not working. I know it's still not working.

Starting point is 00:25:20 That's why I'm excited because I think the convergence of hardware and software will be coming. So that's also another great use case down the road. I'm personally very excited that you're solving Metaverse. I gave it a try in my previous company, so I'm so excited that you're doing about that. Yeah, well, I think there's more signal. I mean, I do think hardware is part of the hurdle, but you know, you need content creation. I mean, Metaverse content creation needs world models. Let's switch gears a little bit.

Starting point is 00:25:52 So maybe to some of the audience, they might find your transition from going from academia to now being a founder, CEO, to be sudden. But you actually have the remarkable journey through your whole life. This is not your first time you've gone zero to one. You were telling me about how you immigrated to the U.S. and you didn't speak any English in your teens. And you even ran a laundromat for a good number of years. Tell us about how all those skills shaped who you are now. Right.

Starting point is 00:26:25 I'm sure you guys are here trying to listen to how to start a laundromat. That was when you were 19, right? Yeah, I was 19 and that was out of desperation. So I had no means of supporting my family, my parents, and I need to go to college to be a physics major at Princeton. So I started a dry cleaning shop. And in Silicon Valley language, I fundraised. I was the founder CEO.

Starting point is 00:26:56 I was also the cashier and all the other things. And I exited. So after seven years. All right. You guys are very kind. I've never got claps for my laundromat. But thank you. But anyway, I think that.

Starting point is 00:27:17 Diana's point, especially to all of you, I look at you, I'm so excited for you because you're like literally half my age or even, even, you know, maybe 30% of my age and you're so talented. Just do it. Don't be afraid. You know, all my entire career, of course, I did laundromat. But even as a professor, I chose a couple of times, I chose to go to departments where I was the first computer vision professor. And that was against a lot of advice. You know, as a young professor, you should go to a place where there's a community and senior mentors. Of course, I would love to have senior mentors. But if they're not there, I still have to blaze my trail blaze my way, right? So I wasn't afraid of that.

Starting point is 00:28:04 And then I did go to Google to learn a lot about business in Google Cloud and B2B and all those. And then I started a startup within Stanford because around 2018 AI. was not only taking over the industry, AI became a human problem. Humanity will always advance our technology, but we cannot lose our humanity. And I really care about creating a beacon of light in the progress of AI and try to imagine how AI can be human-centered, how we can create AI to help humanity. So I went back to Stanford and created Human Center AI Institute and ran that as a startup for five years. Probably some people were not too happy.

Starting point is 00:28:56 I ran it as a startup for five years in a university, but I was very proud of that. So in a way, I think I just love being an entrepreneur. I love the feeling of ground zero, like standing on ground zero. So forget about what you have done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone, and I just love that. The other really cool thing about you, another, on top of all the awesome things you've done,

Starting point is 00:29:31 you advise a lot of legendary researchers like Andrew Kaparthi, Jim Fan, who's at NVIDIA, Ja Dan, who's your co-author for ImageNet. they all went on to have these incredible careers. What really stood out about them when they were students? Advice for the audience that you could tell, ah, this person is going to change the field of AI, and you could tell. So first of all, I'm the lucky one. I think I owe more to my students than the other way around.

Starting point is 00:30:02 They really make me a better person, better teacher, better researcher. And having work with so many, like you said, legendary students, students is really the honor of my life. So they're very, very different. Some of them are just pure scientists trying to hunker down and solve a scientific problem. Some of them are industrial leaders. Some of them are, you know, the greatest disseminator of AI knowledge. But I think there is one thing that

Starting point is 00:30:41 unifies them and I would encourage every single one of you to think about this. I also for those founders who are hiring, this is also my hiring criteria, is I look for intellectual fearlessness. I think it doesn't matter what where you come from, it doesn't matter what problem we're trying to solve that courage, that fearlessness of embracing something hard and go about it and be all in and trying to solve that in however way you want is really a core characteristic of people who succeed. I learn this from them and I really look for young people who have that and then that as a CEO at World Labs. In my hiring, I look for that quality. So you're hiring a lot for World Labs.

Starting point is 00:31:41 too. So you're looking for that same trade, right? Yes. I get permission from Diana to say that we're hiring. So yes, so we are hiring a lot. We're hiring engineering talents. We're hiring product talents. We're hiring 3D talents. We're hiring generative model talents. So if you feel you're fearless and you're passionate about solving spatial intelligence, talk to me or come to our way. website. Cool. We're going to open it up for questions for the next 10 minutes. Hi, Pfei, thank you for your talk. I'm a big, big, big fan. And yeah, so my question is, more than two decades ago, you worked on visual recognition. I want to start my PhD. What should I work on so I become a legend like you are? I want to give you a thoughtful answer, because I can always say, do whatever excite you. So first of all,

Starting point is 00:32:41 I think AI research has changed because academia, if you're starting a PhD, you're in academia. Academia no longer has most of the AI resources. It's very different from my time, right? The chip, the compute and the data are kind of, are really low in terms of resourcing academia. And then there are problems that industry can run a lot faster. So as a, as a, PhD student, I would recommend you to look for those North Stars that are not on the collision course of problems that industry can solve better using better compute, better data, and team science. But there are some really fundamental problems that we can still identify in academia, that it doesn't matter how many chips you have. You can make a lot of progress, you know,

Starting point is 00:33:40 First of all, interdisciplinary AI to me is a really, really exciting area in academia, especially for scientific discovery. There's just so many disciplines that can cross AI. I think that's a big area that one could go to. On the theoretical side, I find it fascinating the AI capability has 100% outrun theory. We don't know how, you know, we don't have explainability. We don't know how to figure out the causality. There's just so much in the models.

Starting point is 00:34:17 We don't understand that one could push forward. And, you know, the list could go on. In computer vision, there's still representational problems. We have it solved and also, you know, small data. That's another really interesting domain. So, yeah, these are the possibilities. Thank you so much, Fei Fei. Thank you, Professor Lee, and congratulations again on your honorary doctorate from Yale.

Starting point is 00:34:47 I was honored there to witness that moment one month ago. And my question is, in your perspective, will HGI emerge more likely as a unified, single unified model or as a model agent system? The way you ask this question is already two kind of definition. One definition is more theoretical, which is defined AGI as if there is an IQ test that one passes, that defines AGI. The other part of your, the other half of your question is much more utilitarian. Is it functional if it's agent-based? What tasks can it do?

Starting point is 00:35:28 I struggle with this definition of AGI, to be honest. Here's why. The founding fathers of AI, who came together in 19. in Dartmouth, you know, the John McCarthy and Marvin Minsky of them, they wanted to solve the problem of machines that can think. And that's a problem that Turing, Alan Turing also put forward a few years earlier, 10 years or whatever earlier than them. And that statement is not a narrow, it's not a narrow AI.

Starting point is 00:36:06 It's a statement of intelligence. So I don't really know how to differentiate that funding question of AI versus this new word AI. To me, they're the same thing. But I get it that the industry today like to call AGI as if that's beyond AI. And I struggle with that because I feel there, I don't know what exactly is AGI different from AI, if we say today's AGI-ish system performs better than the narrower AI system in 80s, 70s, 90s or whatever, I think that's right.

Starting point is 00:36:46 That's just the progression of the field. But fundamentally, I think the size of AI is the size of intelligence, is to create machines that can think and do things as intelligently or even more intelligently as humans. So I don't know how to define AGI. So I don't know, without defining it, I don't know if it's monolithic. If you look at the brain, it's one thing. You know, you can call it monolithic, but it does have different functionalities. And you can even, there's broco area for language, there's a visual cortex, there's motor cortex.

Starting point is 00:37:25 So I don't really know how to answer that question. Hi, my name is Yashna. And I just want to say thank you. I think it's really inspiring to see a woman playing a leading role in this field. And as a researcher, educator, and entrepreneur, I wanted to ask what type of person do you think should pursue graduate school in this rapid rise of AI? That's a great question.

Starting point is 00:37:56 That's a question even parents ask me. I really think graduate school is the four or five years where you have burning curiosity. you're led by curiosity. And that curiosity is so strong that there's no better other place to do it. It's different from a startup because startup is not just, you have to be a little careful. Startup cannot be just led by curiosity.

Starting point is 00:38:28 Your investors will be mad at you. It's a startup has a more focused commercial goal. And some part of it is curiosity, but it's not just curiosity. Whereas for grad school, that curiosity to solve problem or to ask the right questions is so important that I think those going in with that intense curiosity would really enjoy the four or five years, even if the outside world is passing by at the speed of light, you'll still be happy because you're there following that curiosity. First I want to say thank you for your time. Thank you for coming out to speak to us. You mentioned that open sourcing was a big part of the growth from ImageNet. And now with the recent release and growth of large language models, we've seen organizations taking different approaches with open source, with some organizations staying fully closed source, some organizations fully releasing their entire research stack,

Starting point is 00:39:29 some being somewhere in the middle, open sourcing weights, or having restrictive licenses and things of that nature. So I wanted to ask, what do you think of these different approaches? to open source and what do you believe the right way to go about open source as an AI company is? I think the ecosystem is healthy when there are different approaches. I'm not religious in terms of you must open source or you must close source. It depends on the company's business strategy and for example, it's clear why Facebook, Meta wants to open source, right? They are right now their business model is not a business model is not.

Starting point is 00:40:07 Their business model is not selling the model yet. They're using it to grow the ecosystem so that people come to their platform. So open source makes a lot of sense. Whereas another company that is really monetizing on the, even monetizing, you can think about an open source tier and a closed source tier. So I'm pretty open to that category or a meta level is, I think open source is that open source should be protected. I think if there is efforts of open source, both in public sector, like academia as well as private sector, it's so important. It's so important for the

Starting point is 00:40:50 entrepreneurial ecosystem. It's so important for public sector that I think that should be protected. It shouldn't be penalized. Hi, my name is Carl. I flew in from Estonia. I have a question about data. So you called very well the shift in machine learning towards data driven methods with ImageNet. Now that you're working on world models and you mentioned that we don't have this spatial data on the intranet. It exists only in our heads. How are you solving this problem? What are you betting on? Are you collecting this data from the real world? Are you doing synthetic data to believe in that or to believe in good old priors. Thanks. You should join World Labs and I'll tell you. Oh, it's a good one. Look, as a company, I'm not going to be

Starting point is 00:41:46 able to share a lot, but I think it's important to acknowledge that we're taking a hybrid approach. It is really important to have a lot of data, but also have a lot of quality data. data, at the end of the day, there is still garbage in garbage out if you're not careful with the quality of data. We'll do one last question. Hi, Dr. Lee. My name is Annie, and thank you very much for speaking with us. So in your book, The World I See, you talk the challenges you face as an immigrant girl and woman in STEM. I'm curious to know if there's a time that you feel the moment of being a minority in the workplace, and if so, how did you manage to overcome this or pursue others?

Starting point is 00:42:35 Thank you for that question. I want to be very, very careful or thoughtful in answering you because we all come from different background and how each of us feel is very unique. It almost doesn't even matter what are the big categories. All of us have moments that we feel were the minority or the only person in the room. So of course I've felt that way.

Starting point is 00:43:01 Sometimes it's based on who I am. Sometimes it's based on my idea. Sometimes it's just based on, I don't know, the color of my shirt, whatever that is. I have. But this is where I do want to encourage everybody. Maybe it is because since I was young coming to this country, I kind of have experienced. it is what it is. I am an immigrant woman. I almost developed a capability to not over-index on that. I'm here just like every one of you. I'm here to learn or to do things or to create

Starting point is 00:43:44 things. Thank you. That was a great answer. And I really, all of you, you're about to embark on something or in the middle of embarking something, and you're going to be able to. to have moments of weakness or or strangeness or I feel this every day. Especially start up life. Sometimes I'm like, oh my God, I don't know what I'm doing. Just focus on doing it. Gradient descend yourself to the optimized solution. All right.

Starting point is 00:44:16 That's a great way to ending. Thank you, Dr. Lee.

Y Combinator Startup Podcast - Fei-Fei Li: Spatial Intelligence is the Next Frontier in AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.