ACM ByteCast - Anima Anandkumar - Episode 42

Starting point is 00:00:00 This is ACM ByteCast, a podcast series from the Association for Computing Machinery, the world's largest education and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their visions for the future of computing. I am your host, Rashmi Mohan. If you had a crystal ball to predict into the future, what would you ask of it?

Starting point is 00:00:37 Whatever it might be, our next guest might have the answer. An AI expert, a mathematician, a changemaker, a computer scientist, an advocate, Anima Anandkumar wears all these hats with aplomb. Anima is the Bren Professor of Computing, while also being the youngest named Chair Professor at the California Institute of Technology, or Caltech. She is also the Director of Machine Learning at NVIDIA, where she leads a group developing the next generation of AI algorithms. With a string of awards, including a Guggenheim Fellowship, an NSF Career Award, Young Investigator

Starting point is 00:01:14 Award from the US Air Force and the Army, and most recently named an ACM Fellow, she is no stranger to the spotlight. Her work has been extensively covered in PBS, Wired Magazine, MIT Tech Review, Your Story, and Forbes. With a laser focus on using AI for good tech, she is a force to be reckoned with. Anima, welcome to ACM ByteCast. Thank you, Rashmi. That's very kind. Wonderful. We're so excited to have you here. And I'd love to lead with a simple question that I ask all my guests, Anima. If you could please introduce yourself and talk about what you currently do, and also give us some insight into what drew you into the field of computing.

Starting point is 00:01:56 Yeah, it's a pleasure to be on this podcast. And at the moment, I'm very excited about how AI intersects all the other domains, right? Especially in scientific domains, how we can integrate AI into existing workflows, as well as create new ones. Can we reimagine how we do science and along with it, bring better inventions and discoveries at a much faster pace? And that's a great place to be because I'm always learning about new areas and seeing how AI intersects and enables all of these different domains. That's amazing. I'd love to sort of get into more of that and understand. It just could go in so many different directions that I'm really looking forward to digging deeper into that.

Starting point is 00:02:49 But what was that introduction into computing or computer science, Anima? Was there early exposure that drove that interest? Yeah, I was just, you know, really it was a privilege to be in a household where math, science and computer science was very much part of daily life. Right. So both my parents are engineers. Maths science and computer science was very much part of daily life, right? So both my parents are engineers. My mother is an engineer, which was not so common in the community I grew up. In fact, you know, she tells me that she went on a hunger strike for three days to get into engineering because my grandparents were worried that she may not get a suitable husband after studying engineering. But she was a trial blazer. And that really helped me a lot to think of this as not something an anomaly, right? It's something that women could be very good at. And I had role model to at home itself. That was great to have. And it was also great to see that intersection between

Starting point is 00:03:47 science and math and engineering. So my parents ran a small scale industry, manufacturing automotive components, different parts of the manufacturing processes. So I would go see those machines. And at some point, they decided to make them computerized, you know, what we call CNC or computerized numerically controlled machines. And so that helped me think about programming as something very tangible and physical, because the program code would go and move these huge components, remove material from that. So seeing that at an end age and seeing computing as not something in its own bubble, but that some interacts with the physical world in such a real way was really great. That sounds amazing. I mean, for one, I think just the idea of,

Starting point is 00:04:40 you know, inspiration starts at home. And I think you had enough inspiration from both your parents, which is incredible. And I think the other thing that you bring up is, yes, I mean, I think for us, many of us, computing may have started only with, you know, introduction to maybe software programming, right? But you saw the applications of it and how it could actually sort of solve engineering problems or real life problems. And I can imagine how, you know, that could have basically not made this a difficult or a challenging sort of field to get into, right? It felt very real.

Starting point is 00:05:10 It felt like something that you could definitely achieve. Yeah, absolutely. And I would just, you know, do these exercises at home for fun. Like my parents would have these manuals and, you know, we would just be like looking at how to move like the turret to a different location. So it was very geometric and real and the program to enable that. So all of these different areas came together that way. I also had my grandparents be very passionate

Starting point is 00:05:38 about math. My grandfather was a math teacher, so he would always give me like problem sets and concepts even before they were taught in school. And that was, you know, he would initially say, oh, no, no, that's too advanced for you. And that was almost like a primer to me rebelling and saying, no way, you know, I can do this, bring it on. And so we had this back and forth. And my grandmother as well would give me all these puzzles. She was very much hands on, like, you know, showing us different puzzles and embroidery and all these aspects that are very geometric and, you know, physical. So seeing both the abstract and the physical come together was just a great way to grow up. That sounds like such an amazing environment to be exposed to all of these topics. I mean, we often talk about how the pipeline to get women interested in computing is so weak. And, you know, maybe we should start earlier to be able to expose folks to the areas of both technology and computing. Sounds like you had that perfectly set

Starting point is 00:06:43 up for you. So we'll definitely get into that in a little bit. But I know that for your undergraduate education, Anima, you went to IIT Madras. Was there anything there that sort of changed the way you thought about the field or did it inspire you or did all the sort of the training whether conscious or unconscious

Starting point is 00:07:01 that you received through your childhood help you be successful in your undergraduate education? Yeah, I mean, first of all, getting into IIT, as you would be aware, is not easy, right? So it's an entrance exam that is highly competitive and you have to be in the top few hundred to get into an IIT with your choice of major. To me, I think tackling that exam and doing it in Mysore, where I grew up, where there wasn't a lot of formal preparation classes. So, you know, I did consult with university professors, but I had to take the initiative. You know, it was me being very proactive and mindful about how do I tackle such a competitive exam and finding the right mentors and supporters to help with that, you know, it gave me a sense of purpose and a

Starting point is 00:07:52 sense of almost entrepreneurship, because if I needed to do it, this was well beyond what the standard syllabus would cover. But that also really forced me to understand the concepts deeply, you know, and I think this will connect to how we think about, you know, education in the future, especially with CHAT-GPT and so on. I think the IIT entrance exams were those that a CHAT-GPT wouldn't easily solve. I'm pretty sure because, you know, they weren't standardized problems, right? So they required some out of the box thinking that required you to really understand the concepts deeply. And but that challenged me. So having that kind of exam gave me that reason to go deep. And then I realized this would be great to do it all my life to really not just touch the

Starting point is 00:08:47 surface, but go deeply into concepts. And once I landed in IIT, being in a cohort of students who are just amazingly smart and also multidimensional, I think that surprised me because I assumed everybody would be just like books down notes, which, you know, they're great at. But also we did like these, we organized student-run conferences where we brought in researchers from across the world. We built like bridges made entirely of paper on which we could walk on. So we did a lot of cool nerdy things. So having that cohort of students who are still my

Starting point is 00:09:27 friends was great. And, and of course, all the professors and the environment that it brought about. So yeah, I have great memories from that. That's amazing. I mean, sounds like you've been a hustler throughout your life. And I love the fact that you were talking about multidimensional exposure, right? Because if I look at your sort of career and the work that you've done and the work that you're doing, it is how you sort of branched out. It's not sort of, you know, the straight and narrow, you've used the computing principles and the skills that you've learned to apply it to solve real world problems. And, you know, I'd love to talk about that as well. Yeah, absolutely. I think it's important to think about education, not as a fixed curriculum or a fixed path, right? And even like

Starting point is 00:10:14 the job will land up as one you would do for life, you know, that unfortunately, is mostly over for most of us. And so the way we keep up with this ever-changing technology and the landscape of jobs is to keep reskilling, to keep learning. And I think inculcating that early on is so important. And to me, I think having my parents be entrepreneurs who are always bringing on new technology, they're not just satisfied with something that's already working, but taking big risks in terms of capital investment and bringing new technologies to ground was something that I grew up as an everyday thing. I also take inspiration from my great-great-grandfather, Dr. Shama Shastri, who, from what I hear, the story goes that he left his village as a teenager,

Starting point is 00:11:09 came to Mysore, ultimately did his PhD in English and Sanskrit, rediscovered the Arthashastra from 300 BC. So having this really needle in a haystack, I mean, it's this ability to understand that there is something deep in these palm leaves, that's a language that is unknown and being able to crack the code and connecting it way back to history and to such a landmark text gave me the inspiration that if you follow the path of curiosity, great things will come. Yeah, no, I mean, it's incredible what you said about your great-great-grandfather in itself is mind-blowing. But also, I wonder if that kind of inspired your journey, because you seek out challenges from what I've read, from what I've heard of you. That's something that is core to

Starting point is 00:12:02 who you are and the way you approach your work. I was wondering if you could maybe take us through some of the early work that you did. I know during your PhD, you were working with low power, lightweight communication methods and IoT devices. That was many years before maybe it became as popular as it is now. What were some of those breakthrough moments? How did you pick those areas of interest? And where did you feel like you really developed your skill as a researcher? Oh, yeah, that credit there goes absolutely to my advisor Lang Tong, because he was working on sensor networks and connecting them wirelessly way before it became this edge AI and IoT or whatever you may call it now.

Starting point is 00:12:47 And the core mathematical challenges are still the same, right? If you have these noisy sensors with limited battery power, limited abilities to communicate, how much do you process on the sensors and how much do you send it out to the cloud? Because ultimately the goal is not to record all the raw data from the sensors, and how much do you send it out to the cloud? Because ultimately, the goal is not to record all the raw data from the sensors, but to really make inferences about, say, if it's a chemical plant, is this working normally or anomalous, right? If it's a temperature field, what is maybe the average temperature and other such measurements. And so thinking about co-design of communication, learning, quantization, everything together was something that I

Starting point is 00:13:34 learned very early on. And all of those lessons still hold up. Yeah, no, that sounds amazing. And so through that work, Anima, did you sort of gain your interest in sort of looking at these large data sets and starting to think about using machine learning to make sense of them? this highly interdisciplinary problem. It wasn't just wireless communication, it was statistical signal processing. And that certainly shares all the foundations with machine learning. But as I started to think about modeling the correlated measurements in these different sensors, what would be a good way to model high dimensional distributions. And that's what got me into the area of probabilistic graphical models. And I connected with Alan Wilski at MIT and spent the last year of my PhD as well as a year of postdoctoral research there. And so that got me almost seamlessly into machine learning because as I delved more into the probabilistic modeling questions,

Starting point is 00:14:45 it became clear that there is just such a wealth of problems to be solved. You know, there is theoretical foundations that are yet to be resolved. There's, of course, hugely practical implications. And so, yeah, it was great to see all that even before deep learning took off, because it was clear that we do have now lots and lots more data available. And we need methods that are able to take advantage of that. And classical statistics wasn't the answer. Yeah, no, that's amazing. And also, it sounds like your really strong background in mathematics set you up so well to be able to think about solutions to these problems, right?

Starting point is 00:15:28 Like, I mean, you're often referred to as, I guess, the inventor of use of tensor algorithms to process large data sets. How did that come about? How did you think about that application? Yeah, so first of all, I'd like to emphasize that, you know, math is a very important foundation for thinking about these problems deeply. And, you know, in high school, the first time I took probability courses and saw that there was this foundation to deal with uncertainty, right? Because until then, you look at Newton's laws of physics, geometry, everything deterministic, where the answer, there can be only one. But the world is certainly not deterministic. So seeing those tools of probability that are so powerful for us to make sense of this

Starting point is 00:16:15 uncertain world around us, I think changed. It had a deep change in me in high school. And so I was always kind of trying to take courses like that. You know, I was doing statistical signal processing before that wasn't a compulsory course. I was always going towards that. So in that sense, I was preparing for machine learning even before it was considered as a mainstream topic or there were many courses offered when I was in school. And yeah, so I'm very thankful to have that foundation. But I also want to emphasize that that shouldn't be a limiting factor, right? So I can go into that

Starting point is 00:16:53 in detail later. But as deep learning started taking off, there are certain things we won't be able to analyze in a mathematically rigorous manner. But that doesn't mean they don't work, right? So sometimes the theory and practice are not happening at the same time. And I follow both those journeys and see where it takes us. And a strongly performed empirical work where we are very careful about trying to understand the phenomenon, trying to test it rigorously is as strong as any theoretical work. And we need to do both. Yeah, so with that in mind, you know, coming back to your question about tensors and how I came about that, you know, as I was working on probabilistic graphical models, a natural question was, how do we model these high dimensional distributions with many,

Starting point is 00:17:48 many variables? We do have more data to learn, but still not enough to learn an arbitrary distribution over them. By which I mean, these are from, say, some real world scenarios where typically the actual distribution is lower dimensional, right? So think of, for instance, if you want to model all the topics of conversation on some social media, let's say Twitter, these topics are constantly evolving. Even the acronyms and the words are new. So there's no way we can just pre-label a few topics and decide that this is all we need to categorize. So we need what we call an unsupervised learning method. So we cannot just go by labels of pre-categorized topics.

Starting point is 00:18:36 And so how we went about modeling these kind of scenarios is to think about hidden variables. So the topics as hidden variables that relate to the observed distribution of words. And by looking at co-currents of words together, we can ask what are the underlying topics. And so these kind of what we call latent variable models where we have hidden variables in addition to what we observe. So in this example, the hidden variables

Starting point is 00:19:06 are the topics, the observed variables are the words we see in our Twitter feeds, for instance. Modeling that relationship allows us to then ask about efficient algorithms to extract information from them and to learn the underlying topics. And so we went about thinking about what are efficient ways to extract information, right? So one simple way that people have been doing for a long time is to think about the occurrence of words in different conversations and trying to do a low-rank decomposition, meaning somehow like, you know, assume that if words tend to be spoken similarly for a topic, like you can hope to extract the topics. But it turns out there's some deep algebraic reasons why matrix decomposition methods do not always tell you the underlying topics.

Starting point is 00:20:06 So what we call fundamental identifiability. So doing it as matrix operations is just not enough to extract all the underlying topics that may be present in these Twitter data. So what we instead did was to ask, okay, matrices are not enough. What is the next object we can consider? And that's where we happened about tensors, which are extensions of matrices to higher dimensions. And so here we are now thinking about co-occurrence of triplets of words or higher order co-occurrences. So it's not just looking at pairs of words, but higher order co-occurrences. So it's not just looking at pairs of words, but higher order co-occurrences. And that intuition was deep because it said now with tensor algebra, we are able to extract topics that wouldn't be possible with just matrix algebra. Wow. I mean, I think that's

Starting point is 00:21:00 amazing that you were able to sort of extrapolate or hit upon that idea. I was wondering, Anima, would you be able to give us an example of how that might work or how you have used it? Yeah, I mean, going back to topic modeling, think about now looking at all the words that occur together in a tweet, right? So let's say I tell you that in a tweet the word apple occurs you know just if i tell you only this information what can you say about the topic you know you can probably say that it's either the fruit or the company right so but you don't know which of the two so what if now i told you a pair of words like there's apple and orange but but orange is also a company, so there is still some ambiguity. But what if I told you the words apple, orange, and banana occur together

Starting point is 00:21:52 in this tweet? You're now more certain that the topic that is talked about is a fruit, right? So some conversation around fruits. So this is just a very simple toy example to drive home the point that if you look at co-occurrence of multiple words in these different tweets, you will have much more information than looking at just pairs of words co-occurring in these tweets. And that way you can extract the underlying topics much more effectively. ACM Bytecast is available on Apple Podcasts,

Starting point is 00:22:35 Google Podcast, Podbean, Spotify, Stitcher, and TuneIn. If you're enjoying this episode, please subscribe and leave us a review on your favorite platform. Yeah, no, that's perfect for a lay person like me. I think it's perfect for me to understand the concept that you were driving. I've also seen the video that you often share,

Starting point is 00:23:00 Anima, about the one where the robot is doing the backflip that you compared it with the dog. And you talk about reinforcement learning and that's been maybe a cornerstone of some of your work. I was wondering, could you maybe explain that further and how you've used it? Yeah, that's a great question, Rashmi. And things are developing so fast in that area, right? So maybe I should just go to what's happening now, especially with foundation models and our ability to come up with zero-shot learning of high-level concepts and with that be able to do a variety of tasks together.

Starting point is 00:23:41 So when I showed the video before a few years ago, I was commenting on how even if like Boston Dynamics robot can do backflips very impressively, it's not as agile or as general as a dog. Like dog can take our commands, learn new skills, you know, generalize to new tasks very quickly. But that story is fast changing, right? So in a paper that we'll have appearing at ICML conference over the summer, but that's already available online, called WIMA for visual motor learning, what we show is that now you can have models that can take in text and image commands. So for instance, if you want to instruct your robot manipulator to rearrange the chess pieces in a certain pattern, it can go and do that.

Starting point is 00:24:38 So you don't tell that as a very laborious text command, but you also show the image and say, rearrange it this way. And so that's much more convenient. So this ability to give what we call multimodal prompts and interact with the robot in a very natural way is something we show is possible. And this way it can just generalize to a variety of different tasks. You can ask it to rearrange,

Starting point is 00:25:04 you can ask it to put one blocks over another. You can do these different tasks in what we call zero-shot manner, meaning you can easily come up with good performance on new tasks without having to explicitly train on them. And so that's really exciting to see how far we've come along in such a short amount of time. That is very impressive and definitely is a great view into sort of where our sort of

Starting point is 00:25:33 future is headed. But I'd love to understand from you, Anima, in terms of the areas that you sort of apply both machine learning as well as moving into AI and the tools that you've built around it. I know you've done some work around healthcare. You've obviously done work in the robotics world. And I know that there is, you know, recent interest also, there's a lot of work that you've done around climate change. Any of those topics, would love to hear more. But also, how do you determine sort of where to go next? What drives your interest? Yeah, and I think that's a great question. You know, like, to me, it's a mix of exploration and exploitation, right, to use the reinforcement learning terminology, because I am curious, and I want to explore enough about new

Starting point is 00:26:17 topics, especially in ones that I'm certainly not the expert, you know, when it comes to climate modeling, or climate change or even healthcare, working with surgeons. There is the deep domain expertise that I can never hope to replicate unless I put all my focus and energy on that. But the idea is to explore enough to learn abstractions, right? So how do I go from something that is so complicated to enough of an abstraction to see what can machine learning do here? And it could be something very straightforward. You know,

Starting point is 00:26:52 there's already an existing model, you just plug it in. Or it could be something very deep in terms of requiring us to invent new algorithms, new frameworks, new architectures. And I've seen both, but you can only do that with enough exploration to learn new areas where machine learning is still a big challenge. And that's what I've done in all of these areas that you mentioned. And also trying to see how that unifies, right? So we can go to each of these areas, try to solve the problems. But along with the abstraction comes the benefit that once you invent these algorithms, we can swarm many, many other here is a problem that is unique and could be a great application for ML. And so let's seek out sort of new, potentially new algorithms that can then be applied to other areas. I wish it were so systematic, but most of the time it starts with conversations, right? And this is what I think, especially the value of in-person and chance meetings are, whether it's at Caltech, whether it's NVIDIA or other conferences and events.

Starting point is 00:28:11 So it's learning new things and extracting knowledge from all that. And many times also it's scientists reaching out to me to ask how machine learning could make a difference. So it's a combination of all this. But what I always try to do is to learn enough about it and keep my curiosity open that even if it's not something I can help today, maybe in the future, right, this is something that can be solved. When it comes to specific topics you mentioned, you know, in healthcare, we've been working on a number of areas. In particular, I've been collaborating with Dr. Andrew Hung from USC, and now he's recently joined CEDARS, CINAI here at LA, who works on robotic surgery. And so he's a robotic surgeon. He has

Starting point is 00:29:00 wealth of data and expertise in terms of collecting videos, both of life surgery, as well as students being trained to become surgeons. And so our goal was to ask, you know, can AI understand these gestures, right? And this is such a safety critical application. Although one day we may see AI completely take over human surgeons, I don't think it'll be anytime soon. And it requires very careful testing. So instead, we wanted to ask, how can AI help today the current surgeons as well as future surgeons for training? And so this is where we are developing now systems that provide real-time feedback.

Starting point is 00:29:43 So as the surgery is happening, can you provide feedback? And we've started testing this on trainees for doing it on 3D printed tissues. And so with safety-critical applications, it's also a great lesson on how we want to start in cases where safety is not critical and build our way up. And it's great to see so many interdisciplinary areas come together. Of course, computer vision and video understanding is important. There's also kinematics and robot control, but as well as the human interface aspect, when you're having surgeons performing these careful maneuvers, how do you ensure that you give them the right feedback? Don't distract them. And we are also personalizing that because some people

Starting point is 00:30:32 prefer audio feedback. Some may prefer like a quick vibration. So this is something that I think is such a nascent area. And so by working closely, we are bringing back this lesson of how do we bring AI to safety-critical applications? How do we think about the human interaction aspect as we are delivering real-time feedback? Yeah. So I wonder, Anima, all of these areas sound absolutely amazing. They sound like great sort of, you know, deep areas of research, absolute dream for somebody who is in computing research. Do you feel like these opportunities come your way because you've been able to sort of, you have the credibility of having been at very strong academic institutions, as well as having understanding of how these things work

Starting point is 00:31:23 in industry and what does it mean to actually build a product out of some of these concepts? I think both Caltech and NVIDIA have been just great places for me to connect with amazing people and having, of course, the history of working with interdisciplinary areas, I think will encourage people to approach me more. So that's probably the case. But on the other hand, I'd also think of the conversations these days and the connections these days can happen much more easily.

Starting point is 00:31:54 Before Twitter made big changes, we used to have so many technical conversations on Twitter. And so we need forums like, where it becomes much more democratized and easy to connect researchers across different domains, across different age groups, different institutions, much more seamlessly. So we need a replacement, to be honest. For sure. Yeah. I mean, I'm sure the pandemic had its own share of sort of challenges thrown in with us not being able to meet. I'm sure there were many conferences. That's probably a great hub to meet like-minded folks or folks that are working on interesting areas that you'd like

Starting point is 00:32:35 to collaborate with, and you don't have the ability to do that. Hopefully now with things sort of, you know, returning back to in-person, there'll be more of that. Yeah, I do think certainly the in-person brainstorming has its qualities, but at the same time, you know, the pandemic showed what can be done remotely, right? And especially people with families or having long commutes, disabilities, all these aspects that may prevent them to be always there in person, we showed that we can also be very productive. And this was also the time where we were able to do a lot of interdisciplinary collaborations online and communicate across different areas because we had started developing this framework of what I call neural operators, which are fundamentally a new way of expanding upon

Starting point is 00:33:27 standard neural networks to ones that can handle different resolutions, different shapes of grids. And this is very applicable for solving partial differential equations and other phenomenon continuous domains. You mentioned climate models, weather forecasting. So weather is a phenomenon that happens all across the globe, right? We may discretize and have measurements only at a few points, but the underlying phenomenon is continuous. And so we've developed frameworks that not only take in data at one resolution that can ingest training data at different resolutions, and more importantly, can tell you predictions

Starting point is 00:34:13 at any location, at any different resolution. And I think this is fundamental because this allows us to now think about both a theoretical framework where we show we can accurately capture these kind of complex multi-scale phenomena, but we can also see practical gains where we've seen orders of magnitude speed ups. In the case of weather forecasting, it was about 45,000 times faster. The case of climate change mitigation, where we modeled how carbon dioxide would be pumped underground into wells with water, we were able to do this multi-phase flow modeling very accurately and very fast, about 700,000 times faster than current numerical simulations. And so seeing both this theory get developed, but taking that to practice

Starting point is 00:35:10 and working with so many amazing researchers and collaborators with deep domain expertise has been very fulfilling. Yeah, those are some stupendous results, right? And to see that kind of results come from the work that you're doing, I'm sure is very gratifying. What would you say, Anima, are sort of the biggest challenges then that you face today from your perspective in this field of work? Yeah, so I think AI for science is something that we're only getting started now in terms of bringing all these developments that we've seen in

Starting point is 00:35:46 generative AI to the scientific domains, right? So when we think about text models like chat GPT or image models like image diffusion models, we see such impressive results. But right now, they're still in the creative realm. You know, they don't have guarantees of being physically valid or correct. You may generate a very fashionable looking shoe, but it's not clear in the real world it would be comfortable or healthy on your feet. And to do that, we need to incorporate the physics. We need to incorporate what I call multi-scale structure, right? So you really need to accurately model the physical

Starting point is 00:36:26 phenomenon or chemical phenomenon or biological phenomenon. And that could be like equations, like I mentioned, like partial differential equations. It could be symmetries. For instance, you know, the weather is happening on a sphere, right, on the globe, not on a flat surface. So those kind of geometries and symmetries we need to incorporate. So we need to think about all these domain-specific constraints and knowledge, and how do we combine that with learning frameworks like neural operators that I mentioned. And that brings about some deep optimization challenges. So one thing that has happened in deep learning is to think of optimization as very simple. Stochastic gradient descent, Adam, all of these optimizers are readily available in the software

Starting point is 00:37:19 packages, and they mostly work well. You have enough data, they work well. But when it comes to scientific modeling, we need to be very mindful of the constraints because maybe there isn't enough data, right? So in fact, in the case of climate modeling, we have no data of the future, right? That's impossible. But we have equations and we have data of the past. What was the weather over all these decades since we started recording it? And so how do we take that and extrapolate it

Starting point is 00:37:51 under climate change, under now a new level of carbon dioxide? So that extrapolation requires us to use the knowledge of physics. And so my one line answer is we cannot escape physics, which means we cannot escape optimization challenges. And so my earlier work when I worked on tensor algorithms was to deeply study the optimization landscapes of these non-convex surfaces and landscapes. So we need to go back to thinking about those fundamental tools and how we can improve upon it. So in short, there is now new challenges that standard deep

Starting point is 00:38:36 learning doesn't address. Yeah, no, thank you for that really sort of detailed answer. The other thing I also think about is, as you're trying to answer some of these questions that none of us have thought about, and obviously, many will rely on in terms of guidance, in terms of decisions, I'm guessing, governments and agencies will want will be interested in the work that you're putting out. You know, conversation of AI can go by without talking about just ethics and AI. And I know that that's a very deep area of interest for you as well. I would love to hear your thoughts around that. Like, why is that everyone's responsibilities? How do we collectively take ownership of defining

Starting point is 00:39:16 and solving problems around ethics and AI? Yeah, that's absolutely right, Rashmi. It's much more urgent now as these models are now in the wild, right? So we have chat GPT not always giving you the right answers and the ease with which misinformation can flow and is already taking over large parts of internet is disturbing. And, you know, I work with Mike Alvarez here at Caltech, where we recently launched a new center for science, society and public policy. And that's very timely because we want to understand how we can test these generative models at scale. So how would you even test them for bias or toxicity or misinformation? Because it's not just giving one input or one dimension, right? So it's highly complicated to test this across a distribution of different inputs that others may be using. So that's one aspect. The other one is, of course, how this interacts with how humans perceive this knowledge, how humans interact. Mike Alvarez works on social media conversations, and we worked with him on using the earlier tensor algorithms for extracting topics at scale very quickly. And now as a next stage of that, it is to understand how misinformation at scale can be tackled and how do we provide everybody on this planet with the right tools to glean what is real and what's not, right? So I think this is something we need to figure out as a community. So that's one important aspect. The other is, of course,

Starting point is 00:41:05 as I think deeply about new algorithms that ideally come with some kind of guarantees or some kind of principled approaches, the more we build into it, the more we can trust the outputs, whether it's robotic surgery, where if you bring that to life surgery, that is very critical. You need strong guarantees. It's not just a few test cases that will convince anybody to take it there. So we need to architect it in a way where we can prove that it is going to do the things we expect it to do. And same with self-driving cars, drones, all these different areas. So by working on those areas, I think it also forces us to tackle these algorithmic issues head on. And that'll help us bring towards trustworthy AI in all domains.

Starting point is 00:42:00 Yeah, no, I mean, and these are difficult, difficult problems to solve, right? I mean, they inherently make your job so much more challenging. And that's one thing I observed through all of your conversations, Anima, through this interview, as well as previous ones, is you have this aura of fearless leadership. You champion a thought or an idea and you stand by it. I would love to understand how do you build that sort of a muscle? Thank you. That's very kind. Yeah, I, you know, I think to me, it's all the support I have. I know, as I said, growing up, it's my parents who encouraged me to take risks, think about preparing for this entrance exam on my own. But also like now, my husband Benedict is always just there, just there for me as I think about very hard challenges and encourage me to go for it. That's wonderful. Really happy to hear that. And

Starting point is 00:43:01 thank you. Thank you for sharing that. It certainly is an inspiration to me and I'm sure to many of our listeners, because oftentimes some of these problems, again, if you've never encountered them, they're in going into domains that are completely unexplored, can seem pretty daunting. And to, I think, you know, believe in your skills and believe in the ability to break down a problem and just go for it, like you say, is probably the best way to move forward. I think of that as an adventure. And I also, you know, for researchers who are younger and who may want to take less of the risks, you can think of it as a portfolio approach, right?

Starting point is 00:43:38 So you could try something that is very new as an adventure, but you could also have another project where, you know, things will mostly work out. And so that way you hedge. And if the new adventure pays off, that's great. Otherwise you still learn something. Absolutely. So yeah, for our final bite, Anima, I would love to understand

Starting point is 00:44:01 what are you most excited about in the field of AI over the next five years? So I'm very excited about the future in this era of generative AI and bringing that to scientific domains, right? So can we think of one universal foundation model that could understand different areas of sciences? What would that look like? What would be the kind of data that would be available for that? How do we bring in the physics knowledge and all of these domain constraints? How do we solve the optimization challenges? How do we incorporate stochasticity at such complex domains and the multi-scale nature of many of these data sets. So to me, I think we're just

Starting point is 00:44:46 getting started here. Wonderful. Lots of mathematics ahead of us. Math, engineering, science, everything, social sciences for human interaction, everything, right? Perfect. All of it. Wonderful. This has been such an inspiring conversation. Thank you so much for taking the time to speak with us at ACM ByteCast. Thank you, Rashmi. It's been such a pleasure. ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org. That's learning.acm.org.

ACM ByteCast - Anima Anandkumar - Episode 42

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.