ACM ByteCast - Anima Anandkumar - Episode 42
Episode Date: August 21, 2023In this episode of ACM ByteCast, Rashmi Mohan hosts Anima Anandkumar, a Bren Professor of Computing at California Institute of Technology (the youngest named chair professor at Caltech) and the Senior... Director of AI Research at NVIDIA, where she leads a group developing the next generation of AI algorithms. Her work has spanned healthcare, robotics, and climate change modeling. She is the recipient of a Guggenheim Fellowship and an NSF Career Award, and was most recently named an ACM Fellow, among many other prestigious honors and recognitions. Her work has been extensively covered on PBS, in Wired magazine, MIT Tech Review, YourStory, and Forbes, with a focus on using AI for good. Anima talks about her journey, growing up in a house where computer science was a way of life and family members who served as strong role models. She shares her path in education and research at the highly selective IIT-Madras, the importance of a strong background in math in her computing work, and some of the breakthrough moments in her career, including work on using tensor algorithms to process large datasets. Anima spends some time discussing topic modeling and reinforcement learning, what drives her interests, the possibilities of interdisciplinary collaboration, and the promise and challenges brought about by the age of generative AI.
Transcript
Discussion (0)
This is ACM ByteCast, a podcast series from the Association for Computing Machinery,
the world's largest education and scientific computing society.
We talk to researchers, practitioners, and innovators
who are at the intersection of computing research and practice.
They share their experiences, the lessons they've learned,
and their visions for the future of computing.
I am your host, Rashmi Mohan.
If you had a crystal ball to predict into the future, what would you ask of it?
Whatever it might be, our next guest might have the answer.
An AI expert, a mathematician, a changemaker, a computer scientist, an advocate,
Anima Anandkumar wears all these hats with aplomb.
Anima is the Bren Professor of Computing,
while also being the youngest named Chair Professor at the California Institute of Technology, or Caltech.
She is also the Director of Machine Learning at NVIDIA,
where she leads a group developing the next generation of AI algorithms.
With a string of awards, including a Guggenheim Fellowship, an NSF Career Award, Young Investigator
Award from the US Air Force and the Army, and most recently named an ACM Fellow, she is no stranger
to the spotlight. Her work has been extensively covered in PBS, Wired Magazine,
MIT Tech Review, Your Story, and Forbes. With a laser focus on using AI for good tech,
she is a force to be reckoned with. Anima, welcome to ACM ByteCast.
Thank you, Rashmi. That's very kind.
Wonderful. We're so excited to have you here. And I'd love to lead with a simple question
that I ask all my guests, Anima. If you could please introduce yourself and talk about what
you currently do, and also give us some insight into what drew you into the field of computing.
Yeah, it's a pleasure to be on this podcast. And at the moment, I'm very excited about how AI
intersects all the other domains, right? Especially in
scientific domains, how we can integrate AI into existing workflows, as well as create new ones.
Can we reimagine how we do science and along with it, bring better inventions and discoveries
at a much faster pace? And that's a great place to be because I'm always
learning about new areas and seeing how AI intersects and enables all of these different
domains. That's amazing. I'd love to sort of get into more of that and understand. It just could
go in so many different directions that I'm really looking forward to digging deeper into that.
But what was that introduction into computing or computer science, Anima?
Was there early exposure that drove that interest? Yeah, I was just, you know, really it was a privilege to be in a household where math, science and computer science was very much part of daily life.
Right. So both my parents are engineers. Maths science and computer science was very much part of daily life, right?
So both my parents are engineers.
My mother is an engineer, which was not so common in the community I grew up. In fact, you know, she tells me that she went on a hunger strike for three days to get into engineering because my grandparents were worried that she may not get a suitable husband after studying
engineering. But she was a trial blazer. And that really helped me a lot to think of this as not
something an anomaly, right? It's something that women could be very good at. And I had role model
to at home itself. That was great to have. And it was also great to see that intersection between
science and math and engineering. So my parents ran a small scale industry,
manufacturing automotive components, different parts of the manufacturing processes. So I would
go see those machines. And at some point, they decided to make them computerized, you know, what we call
CNC or computerized numerically controlled machines. And so that helped me think about
programming as something very tangible and physical, because the program code would go and
move these huge components, remove material from that. So seeing that at an end age and seeing computing
as not something in its own bubble, but that some interacts with the physical world in such a real
way was really great. That sounds amazing. I mean, for one, I think just the idea of,
you know, inspiration starts at home. And I think you had enough inspiration from both
your parents, which is incredible. And I think the other thing that you bring up is, yes, I mean,
I think for us, many of us, computing may have started only with, you know, introduction to maybe
software programming, right? But you saw the applications of it and how it could actually
sort of solve engineering problems or real life problems. And I can imagine how, you know,
that could have basically not made this a difficult
or a challenging sort of field to get into, right?
It felt very real.
It felt like something that you could definitely achieve.
Yeah, absolutely.
And I would just, you know, do these exercises at home for fun.
Like my parents would have these manuals
and, you know, we would just be like looking at
how to move like the turret to a
different location. So it was very geometric and real and the program to enable that. So
all of these different areas came together that way. I also had my grandparents be very passionate
about math. My grandfather was a math teacher, so he would always give me like problem sets and concepts even before they were taught in school.
And that was, you know, he would initially say, oh, no, no, that's too advanced for you.
And that was almost like a primer to me rebelling and saying, no way, you know, I can do this, bring it on.
And so we had this back and forth.
And my grandmother as well would give me all these puzzles. She was very much hands on, like, you know, showing us different puzzles and embroidery and all these aspects that are very geometric and, you know, physical. So seeing both the abstract and the physical come together was just a great way to grow up. That sounds like such an amazing environment
to be exposed to all of these topics. I mean, we often talk about how the pipeline to get women
interested in computing is so weak. And, you know, maybe we should start earlier to be able to expose
folks to the areas of both technology and computing. Sounds like you had that perfectly set
up for you. So we'll definitely get into that in a little bit.
But I know that for your undergraduate education,
Anima, you went to IIT Madras.
Was there anything there that sort of changed
the way you thought about the field
or did it inspire you
or did all the sort of the training
whether conscious or unconscious
that you received through your childhood
help you be successful in your undergraduate education?
Yeah, I mean, first of all, getting into IIT, as you would be aware, is not easy, right?
So it's an entrance exam that is highly competitive and you have to be in the top few hundred to get into an IIT with your choice of major. To me, I think tackling that exam and doing it in Mysore,
where I grew up, where there wasn't a lot of formal preparation classes. So, you know, I did
consult with university professors, but I had to take the initiative. You know, it was
me being very proactive and mindful about how do I tackle such a competitive exam and finding the
right mentors and supporters to help with that, you know, it gave me a sense of purpose and a
sense of almost entrepreneurship, because if I needed to do it, this was well beyond what the
standard syllabus would cover. But that also really forced me to understand the concepts
deeply, you know, and I think this will connect to how we think about, you know, education in the
future, especially with CHAT-GPT and so on. I think the IIT entrance exams were those that a
CHAT-GPT wouldn't easily solve. I'm pretty sure because, you know, they weren't standardized problems,
right? So they required some out of the box thinking that required you to really understand
the concepts deeply. And but that challenged me. So having that kind of exam gave me that
reason to go deep. And then I realized this would be great to do it all my life to really not just touch the
surface, but go deeply into concepts. And once I landed in IIT, being in a cohort of students who
are just amazingly smart and also multidimensional, I think that surprised me because I assumed
everybody would be just like books down notes, which, you know, they're great at.
But also we did like these, we organized student-run conferences
where we brought in researchers from across the world.
We built like bridges made entirely of paper on which we could walk on.
So we did a lot of cool nerdy things.
So having that cohort of students who are still my
friends was great. And, and of course, all the professors and the environment that it brought
about. So yeah, I have great memories from that. That's amazing. I mean, sounds like you've been a
hustler throughout your life. And I love the fact that you were talking about multidimensional
exposure, right? Because if I look at your sort of career and the work that you've done and the work that you're
doing, it is how you sort of branched out. It's not sort of, you know, the straight and narrow,
you've used the computing principles and the skills that you've learned to apply it to solve
real world problems. And, you know, I'd love to talk about that as well. Yeah, absolutely. I think it's
important to think about education, not as a fixed curriculum or a fixed path, right? And even like
the job will land up as one you would do for life, you know, that unfortunately, is mostly over for
most of us. And so the way we keep up with this ever-changing technology
and the landscape of jobs is to keep reskilling, to keep learning. And I think inculcating that
early on is so important. And to me, I think having my parents be entrepreneurs who are
always bringing on new technology, they're not just satisfied with something that's already working,
but taking big risks in terms of capital investment and bringing new technologies to ground was something that I grew up as an everyday thing.
I also take inspiration from my great-great-grandfather, Dr. Shama Shastri,
who, from what I hear, the story goes that he left his village as a teenager,
came to Mysore, ultimately did his PhD in English and Sanskrit, rediscovered the Arthashastra from 300 BC.
So having this really needle in a haystack, I mean, it's this ability to understand that there is something
deep in these palm leaves, that's a language that is unknown and being able to crack the code and
connecting it way back to history and to such a landmark text gave me the inspiration that
if you follow the path of curiosity, great things will come.
Yeah, no, I mean, it's incredible what you said about your great-great-grandfather in itself is
mind-blowing. But also, I wonder if that kind of inspired your journey, because you seek out
challenges from what I've read, from what I've heard of you. That's something that is core to
who you are and the way you approach your work.
I was wondering if you could maybe take us through some of the early work that you did. I know during
your PhD, you were working with low power, lightweight communication methods and IoT devices.
That was many years before maybe it became as popular as it is now. What were some of those
breakthrough moments? How did you pick those areas
of interest? And where did you feel like you really developed your skill as a researcher?
Oh, yeah, that credit there goes absolutely to my advisor Lang Tong, because he was working on
sensor networks and connecting them wirelessly way before it became this edge AI and IoT or whatever you may call it now.
And the core mathematical challenges are still the same, right?
If you have these noisy sensors with limited battery power, limited abilities to communicate,
how much do you process on the sensors and how much do you send it out to the cloud?
Because ultimately the goal is not to record all the raw data from the sensors, and how much do you send it out to the cloud? Because ultimately, the goal is not to record all the raw data from the sensors, but to
really make inferences about, say, if it's a chemical plant, is this working normally
or anomalous, right?
If it's a temperature field, what is maybe the average temperature and other such measurements. And so thinking about co-design
of communication, learning, quantization, everything together was something that I
learned very early on. And all of those lessons still hold up.
Yeah, no, that sounds amazing. And so through that work, Anima, did you sort of gain your interest in sort of looking at these large data sets and starting to think about using machine learning to make sense of them? this highly interdisciplinary problem. It wasn't just wireless communication, it was statistical
signal processing. And that certainly shares all the foundations with machine learning.
But as I started to think about modeling the correlated measurements in these different
sensors, what would be a good way to model high dimensional distributions. And that's what got me into the area of
probabilistic graphical models. And I connected with Alan Wilski at MIT and spent the last year
of my PhD as well as a year of postdoctoral research there. And so that got me almost
seamlessly into machine learning because as I delved more into the probabilistic modeling questions,
it became clear that there is just such a wealth of problems to be solved.
You know, there is theoretical foundations that are yet to be resolved.
There's, of course, hugely practical implications.
And so, yeah, it was great to see all that even before deep learning took off, because
it was clear that we do have now lots and lots more data available.
And we need methods that are able to take advantage of that.
And classical statistics wasn't the answer.
Yeah, no, that's amazing. And also, it sounds like your really strong background in mathematics set you up so well to be able to think about solutions to these problems, right?
Like, I mean, you're often referred to as, I guess, the inventor of use of tensor algorithms to process large data sets.
How did that come about? How did you think about that application?
Yeah, so first of all, I'd like to emphasize that, you know, math is a very important foundation for thinking about these
problems deeply. And, you know, in high school, the first time I took probability courses and saw
that there was this foundation to deal with uncertainty, right? Because until then, you look
at Newton's laws of physics, geometry, everything deterministic, where the answer, there can be only one.
But the world is certainly not deterministic.
So seeing those tools of probability that are so powerful for us to make sense of this
uncertain world around us, I think changed.
It had a deep change in me in high school.
And so I was always kind of trying to take courses like that. You know,
I was doing statistical signal processing before that wasn't a compulsory course. I was always
going towards that. So in that sense, I was preparing for machine learning even before
it was considered as a mainstream topic or there were many courses offered when I was in school.
And yeah, so I'm very thankful to have that foundation.
But I also want to emphasize that that shouldn't be a limiting factor, right? So I can go into that
in detail later. But as deep learning started taking off, there are certain things we won't
be able to analyze in a mathematically rigorous manner. But that doesn't mean they don't work, right? So
sometimes the theory and practice are not happening at the same time. And I follow both those
journeys and see where it takes us. And a strongly performed empirical work where we are very careful
about trying to understand the phenomenon, trying to test it rigorously is as strong as any theoretical work.
And we need to do both.
Yeah, so with that in mind, you know, coming back to your question about tensors and how I came about that, you know, as I was working on probabilistic graphical models,
a natural question was, how do we model these high dimensional distributions with many,
many variables? We do have more data to learn, but still not enough to learn an arbitrary
distribution over them. By which I mean, these are from, say, some real world scenarios where
typically the actual distribution is lower dimensional, right? So think
of, for instance, if you want to model all the topics of conversation on some social media,
let's say Twitter, these topics are constantly evolving. Even the acronyms and the words are new.
So there's no way we can just pre-label a few topics and decide that this is all we need to categorize.
So we need what we call an unsupervised learning method.
So we cannot just go by labels of pre-categorized topics.
And so how we went about modeling these kind of scenarios is to think about hidden variables.
So the topics as hidden variables that relate to the observed distribution of words.
And by looking at co-currents of words together,
we can ask what are the underlying topics.
And so these kind of what we call latent variable models
where we have hidden variables
in addition to what we observe.
So in this example, the hidden variables
are the topics, the observed variables are the words we see in our Twitter feeds, for instance.
Modeling that relationship allows us to then ask about efficient algorithms to extract information from them and to learn the underlying topics.
And so we went about thinking about what are efficient ways to extract information, right?
So one simple way that people have been doing for a long time is to think about the occurrence
of words in different conversations and trying to do a low-rank decomposition,
meaning somehow like, you know, assume that if words tend to be spoken similarly for a topic,
like you can hope to extract the topics. But it turns out there's some deep algebraic reasons
why matrix decomposition methods do not always tell you the underlying topics.
So what we call fundamental identifiability. So doing it as matrix operations is just not enough
to extract all the underlying topics that may be present in these Twitter data. So what we instead
did was to ask, okay, matrices are not enough. What is the next object we can consider?
And that's where we happened about tensors, which are extensions of matrices to higher dimensions.
And so here we are now thinking about co-occurrence of triplets of words or higher
order co-occurrences. So it's not just looking at pairs of words, but higher order co-occurrences. So it's not just looking at pairs of words, but higher order
co-occurrences. And that intuition was deep because it said now with tensor algebra, we are able to
extract topics that wouldn't be possible with just matrix algebra. Wow. I mean, I think that's
amazing that you were able to sort of extrapolate or hit upon that idea. I was wondering, Anima,
would you be able to give us an example of how that might work or how you have used it?
Yeah, I mean, going back to topic modeling, think about now looking at all the words that
occur together in a tweet, right? So let's say I tell you that in a tweet the word apple occurs you know just if i tell you only
this information what can you say about the topic you know you can probably say that it's either the
fruit or the company right so but you don't know which of the two so what if now i told you a pair
of words like there's apple and orange but but orange is also a company, so there is
still some ambiguity. But what if I told you the words apple, orange, and banana occur together
in this tweet? You're now more certain that the topic that is talked about is a fruit, right? So
some conversation around fruits. So this is just a very simple toy example to
drive home the point that if you look at co-occurrence of multiple words in these
different tweets, you will have much more information than looking at just pairs of
words co-occurring in these tweets. And that way you can extract the underlying topics
much more effectively.
ACM Bytecast is available
on Apple Podcasts,
Google Podcast,
Podbean, Spotify,
Stitcher, and TuneIn.
If you're enjoying this episode,
please subscribe
and leave us a review on your
favorite platform. Yeah, no, that's perfect for a lay person like me. I think it's perfect for me
to understand the concept that you were driving. I've also seen the video that you often share,
Anima, about the one where the robot is doing the backflip that you compared it with the
dog. And you talk about reinforcement learning and that's been maybe a cornerstone of some of
your work. I was wondering, could you maybe explain that further and how you've used it?
Yeah, that's a great question, Rashmi. And things are developing so fast in that area, right? So maybe I should just go to what's happening now,
especially with foundation models
and our ability to come up with zero-shot learning
of high-level concepts
and with that be able to do a variety of tasks together.
So when I showed the video before a few years ago,
I was commenting
on how even if like Boston Dynamics robot can do backflips very impressively, it's not as agile or
as general as a dog. Like dog can take our commands, learn new skills, you know, generalize
to new tasks very quickly. But that story is fast changing, right? So in a
paper that we'll have appearing at ICML conference over the summer, but that's already available
online, called WIMA for visual motor learning, what we show is that now you can have models that can take in text and image commands.
So for instance, if you want to instruct your robot manipulator to rearrange the chess pieces in a certain pattern, it can go and do that.
So you don't tell that as a very laborious text command, but you also show the image and say, rearrange it this way.
And so that's much more convenient.
So this ability to give what we call multimodal prompts
and interact with the robot in a very natural way
is something we show is possible.
And this way it can just generalize
to a variety of different tasks.
You can ask it to rearrange,
you can ask it to put one blocks over another.
You can do these different tasks
in what we call zero-shot manner,
meaning you can easily come up with good performance
on new tasks without having to explicitly train on them.
And so that's really exciting to see
how far we've come along in such a short amount
of time. That is very impressive and definitely is a great view into sort of where our sort of
future is headed. But I'd love to understand from you, Anima, in terms of the areas that you sort of
apply both machine learning as well as moving into AI and the tools that you've built
around it. I know you've done some work around healthcare. You've obviously done work in the
robotics world. And I know that there is, you know, recent interest also, there's a lot of
work that you've done around climate change. Any of those topics, would love to hear more.
But also, how do you determine sort of where to go next? What drives your interest?
Yeah, and I think that's a great question. You know, like, to me, it's a mix of exploration and exploitation, right, to use the
reinforcement learning terminology, because I am curious, and I want to explore enough about new
topics, especially in ones that I'm certainly not the expert, you know, when it comes to
climate modeling, or climate change or even
healthcare, working with surgeons.
There is the deep domain expertise that I can never hope to replicate unless I put all
my focus and energy on that.
But the idea is to explore enough to learn abstractions, right?
So how do I go from something that is so complicated to enough of an abstraction
to see what can machine learning do here? And it could be something very straightforward. You know,
there's already an existing model, you just plug it in. Or it could be something very deep in terms
of requiring us to invent new algorithms, new frameworks, new architectures. And I've seen both, but you can
only do that with enough exploration to learn new areas where machine learning is still a big
challenge. And that's what I've done in all of these areas that you mentioned. And also trying
to see how that unifies, right? So we can go to each of these areas, try to solve the problems.
But along with the abstraction comes the benefit that once you invent these algorithms, we can swarm many, many other here is a problem that is unique and could be a great application for ML.
And so let's seek out sort of new, potentially new algorithms that can then be applied to other areas.
I wish it were so systematic, but most of the time it starts with conversations, right? And this is what I think, especially the value of in-person and chance meetings are, whether it's at Caltech, whether it's NVIDIA or other conferences and events.
So it's learning new things and extracting knowledge from all that.
And many times also it's scientists reaching out to me to ask how machine learning could make a difference.
So it's a combination
of all this. But what I always try to do is to learn enough about it and keep my curiosity open
that even if it's not something I can help today, maybe in the future, right, this is something that
can be solved. When it comes to specific topics you mentioned, you know, in healthcare, we've been working on a number of areas. In
particular, I've been collaborating with Dr. Andrew Hung from USC, and now he's recently joined
CEDARS, CINAI here at LA, who works on robotic surgery. And so he's a robotic surgeon. He has
wealth of data and expertise in terms of collecting videos, both of life surgery,
as well as students being trained to become surgeons.
And so our goal was to ask, you know, can AI understand these gestures, right?
And this is such a safety critical application.
Although one day we may see AI completely take over human surgeons, I don't think it'll be anytime soon.
And it requires very careful testing.
So instead, we wanted to ask, how can AI help today the current surgeons as well as future surgeons for training?
And so this is where we are developing now systems that provide real-time feedback.
So as the surgery is happening, can you provide
feedback? And we've started testing this on trainees for doing it on 3D printed tissues.
And so with safety-critical applications, it's also a great lesson on how we want to start
in cases where safety is not critical and build our way up. And it's great to see
so many interdisciplinary areas come together. Of course, computer vision and video understanding
is important. There's also kinematics and robot control, but as well as the human interface aspect,
when you're having surgeons performing these careful maneuvers, how do you ensure that you give them
the right feedback? Don't distract them. And we are also personalizing that because some people
prefer audio feedback. Some may prefer like a quick vibration. So this is something that I
think is such a nascent area. And so by working closely, we are bringing back this lesson of how do we
bring AI to safety-critical applications? How do we think about the human interaction aspect
as we are delivering real-time feedback? Yeah. So I wonder, Anima, all of these areas
sound absolutely amazing. They sound like great sort of, you know, deep areas of research,
absolute dream for somebody who is in computing research. Do you feel like these opportunities
come your way because you've been able to sort of, you have the credibility of having been at
very strong academic institutions, as well as having understanding of how these things work
in industry and what does it mean to
actually build a product out of some of these concepts?
I think both Caltech and NVIDIA have been just great places for me to connect with amazing
people and having, of course, the history of working with interdisciplinary areas, I
think will encourage people to approach me more.
So that's probably the case.
But on the other hand, I'd also think of the conversations these days
and the connections these days can happen much more easily.
Before Twitter made big changes,
we used to have so many technical conversations on Twitter.
And so we need forums like, where it becomes much more
democratized and easy to connect researchers across different domains, across different age
groups, different institutions, much more seamlessly. So we need a replacement, to be honest.
For sure. Yeah. I mean, I'm sure the pandemic had its own share of sort of challenges
thrown in with us not being able to meet. I'm sure there were many conferences. That's probably a
great hub to meet like-minded folks or folks that are working on interesting areas that you'd like
to collaborate with, and you don't have the ability to do that. Hopefully now with things
sort of, you know, returning back to in-person, there'll be more of that. Yeah, I do think certainly the in-person brainstorming has its qualities, but at the
same time, you know, the pandemic showed what can be done remotely, right? And especially
people with families or having long commutes, disabilities, all these aspects that may prevent
them to be always there in person, we showed that we can also be
very productive. And this was also the time where we were able to do a lot of interdisciplinary
collaborations online and communicate across different areas because we had started developing
this framework of what I call neural operators, which are fundamentally a new way of expanding upon
standard neural networks to ones that can handle different resolutions, different shapes of grids.
And this is very applicable for solving partial differential equations and other phenomenon
continuous domains. You mentioned climate models, weather forecasting.
So weather is a phenomenon that happens all across the globe, right?
We may discretize and have measurements only at a few points,
but the underlying phenomenon is continuous.
And so we've developed frameworks that not only take in data at one resolution that can
ingest training data at different resolutions, and more importantly, can tell you predictions
at any location, at any different resolution.
And I think this is fundamental because this allows us to now think about both a theoretical framework where we show we can accurately
capture these kind of complex multi-scale phenomena, but we can also see practical gains
where we've seen orders of magnitude speed ups. In the case of weather forecasting, it was about
45,000 times faster. The case of climate change mitigation, where we modeled how carbon dioxide
would be pumped underground into wells with water, we were able to do this multi-phase flow modeling
very accurately and very fast, about 700,000 times faster than current numerical simulations. And so seeing both this theory get developed,
but taking that to practice
and working with so many amazing researchers
and collaborators with deep domain expertise
has been very fulfilling.
Yeah, those are some stupendous results, right?
And to see that kind of results
come from the work that you're doing, I'm sure is very gratifying.
What would you say, Anima, are sort of the biggest challenges then that you face today from your perspective in this field of work?
Yeah, so I think AI for science is something that we're only getting started now in terms of bringing all these developments that we've seen in
generative AI to the scientific domains, right?
So when we think about text models like chat GPT or image models like image diffusion models,
we see such impressive results.
But right now, they're still in the creative realm.
You know, they don't have guarantees of being physically valid or
correct. You may generate a very fashionable looking shoe, but it's not clear in the real
world it would be comfortable or healthy on your feet. And to do that, we need to incorporate the
physics. We need to incorporate what I call multi-scale structure, right? So you really need to accurately model the physical
phenomenon or chemical phenomenon or biological phenomenon. And that could be like equations,
like I mentioned, like partial differential equations. It could be symmetries. For instance,
you know, the weather is happening on a sphere, right, on the globe, not on a flat surface. So those kind
of geometries and symmetries we need to incorporate. So we need to think about all these
domain-specific constraints and knowledge, and how do we combine that with learning frameworks
like neural operators that I mentioned. And that brings about some deep optimization challenges.
So one thing that has happened in deep learning is to think of optimization as very simple.
Stochastic gradient descent, Adam, all of these optimizers are readily available in the software
packages, and they mostly work well. You have enough data, they work well. But when it comes to scientific modeling, we need to be very mindful of the constraints
because maybe there isn't enough data, right?
So in fact, in the case of climate modeling, we have no data of the future, right?
That's impossible.
But we have equations and we have data of the past.
What was the weather over all these decades
since we started recording it?
And so how do we take that and extrapolate it
under climate change,
under now a new level of carbon dioxide?
So that extrapolation requires us
to use the knowledge of physics.
And so my one line answer is we cannot escape physics,
which means we cannot escape optimization challenges. And so my earlier work when I
worked on tensor algorithms was to deeply study the optimization landscapes of these non-convex surfaces and landscapes. So we need to go back to thinking about those fundamental
tools and how we can improve upon it. So in short, there is now new challenges that standard deep
learning doesn't address. Yeah, no, thank you for that really sort of detailed answer. The other
thing I also think about is, as you're trying to answer some
of these questions that none of us have thought about, and obviously, many will rely on in terms
of guidance, in terms of decisions, I'm guessing, governments and agencies will want will be
interested in the work that you're putting out. You know, conversation of AI can go by without
talking about just ethics and AI. And I know that that's
a very deep area of interest for you as well. I would love to hear your thoughts around that.
Like, why is that everyone's responsibilities? How do we collectively take ownership of defining
and solving problems around ethics and AI? Yeah, that's absolutely right, Rashmi. It's much more urgent now as these models are now in the wild, right? So we have chat GPT not always giving you the right answers and the ease with which misinformation can flow and is already taking over large parts of internet is disturbing. And, you know, I work with Mike Alvarez here at Caltech, where we recently launched a new center for science, society and public policy.
And that's very timely because we want to understand how we can test these generative models at scale. So how would you even test them for bias or toxicity or misinformation? Because it's not
just giving one input or one dimension, right? So it's highly complicated to test this across
a distribution of different inputs that others may be using. So that's one aspect. The other one is,
of course, how this interacts with how humans perceive this knowledge, how humans interact. Mike Alvarez works on social media conversations, and we worked with him on using the earlier tensor algorithms for extracting topics at scale very quickly. And now as a next stage of that, it is to understand how misinformation at
scale can be tackled and how do we provide everybody on this planet with the right tools to
glean what is real and what's not, right? So I think this is something we need to figure out
as a community. So that's one important aspect. The other is, of course,
as I think deeply about new algorithms that ideally come with some kind of guarantees or
some kind of principled approaches, the more we build into it, the more we can trust the outputs,
whether it's robotic surgery, where if you bring that to life surgery, that is very critical. You need strong guarantees.
It's not just a few test cases that will convince anybody to take it there. So we need to architect
it in a way where we can prove that it is going to do the things we expect it to do. And same with
self-driving cars, drones, all these different areas. So by working
on those areas, I think it also forces us to tackle these algorithmic issues head on. And
that'll help us bring towards trustworthy AI in all domains.
Yeah, no, I mean, and these are difficult, difficult problems to solve, right? I mean, they inherently make your job so much more challenging. And that's one thing I observed through all of your conversations, Anima, through this interview, as well as previous ones, is you have this aura of fearless leadership. You champion a thought or an idea and you stand by it. I would love to understand how do you build that sort of a muscle?
Thank you.
That's very kind.
Yeah, I, you know, I think to me, it's all the support I have.
I know, as I said, growing up, it's my parents who encouraged me to take risks, think about
preparing for this entrance exam on my own. But also like now,
my husband Benedict is always just there, just there for me as I think about very
hard challenges and encourage me to go for it. That's wonderful. Really happy to hear that. And
thank you. Thank you for sharing that. It certainly is an inspiration to me and I'm sure to many of our listeners, because oftentimes some of these problems,
again, if you've never encountered them, they're in going into domains that are completely
unexplored, can seem pretty daunting. And to, I think, you know, believe in your skills and
believe in the ability to break down a problem and just go for it, like you say,
is probably the best way to move forward.
I think of that as an adventure.
And I also, you know, for researchers who are younger and who may want to take less
of the risks, you can think of it as a portfolio approach, right?
So you could try something that is very new as an adventure, but you could also have another
project where, you know, things will mostly work out.
And so that way you hedge.
And if the new adventure pays off, that's great.
Otherwise you still learn something.
Absolutely.
So yeah, for our final bite, Anima,
I would love to understand
what are you most excited about
in the field of AI over the next
five years? So I'm very excited about the future in this era of generative AI and bringing that
to scientific domains, right? So can we think of one universal foundation model that could
understand different areas of sciences? What would that look like? What would be the kind of data that would
be available for that? How do we bring in the physics knowledge and all of these domain constraints?
How do we solve the optimization challenges? How do we incorporate stochasticity at such
complex domains and the multi-scale nature of many of these data sets. So to me, I think we're just
getting started here. Wonderful. Lots of mathematics ahead of us. Math, engineering,
science, everything, social sciences for human interaction, everything, right? Perfect.
All of it. Wonderful. This has been such an inspiring conversation. Thank you so much for taking the time to speak with us at ACM ByteCast.
Thank you, Rashmi. It's been such a pleasure.
ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org.
For more information about this and other episodes, please visit our website at learning.acm.org.
That's learning.acm.org.