Microsoft Research Podcast - 063 - Competing in the X Games of machine learning with Dr. Manik Varma

Episode Date: February 13, 2019

If every question in life could be answered by choosing from just a few options, machine learning would be pretty simple, and life for machine learning researchers would be pretty sweet. Unfortunately..., in both life and machine learning, things are a bit more complicated. That’s why Dr. Manik Varma, Principal Researcher at MSR India, is developing extreme classification systems to answer multiple-choice questions that have millions of possible options and help people find what they are looking for online more quickly, more accurately and less expensively. On today’s podcast, Dr. Varma tells us all about extreme classification (including where in the world you might actually run into 10 or 100 million options), reveals how his Parabel and Slice algorithms are making high quality recommendations in milliseconds, and proves, with both his life and his work, that being blind need not be a barrier to extreme accomplishment.

Transcript
Discussion (0)
Starting point is 00:00:00 In 2013, I thought there is no way we can learn 10 million or 100 million classifiers. And even if you could learn them, where would we store them? And even if you could store them, how would we make a prediction in a millisecond? And so I just turned away from one versus all approaches and we tried developing trees in embeddings. But today, we've actually managed to overcome all of those limitations. And the key trick is to go from linear time training and predictions to log time training and prediction. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga. If every question in life could be answered by
Starting point is 00:00:50 choosing from just a few options, machine learning would be pretty simple, and life for machine learning researchers would be pretty sweet. Unfortunately, in both life and machine learning, things are a bit more complicated. That's why Dr. Manik Varma, Principal Researcher at MSR India, is developing extreme classification systems to answer multiple choice questions that have millions of possible options and help people find what they're looking for online more quickly, more accurately, and less expensively. On today's podcast, Dr. Varma tells us all about extreme classification, including where in the world you might actually run into 10 or 100 million options, reveals how his parable and slice algorithms are making high-quality recommendations in milliseconds, and proves with both his life and his work that being blind need not be a barrier to extreme accomplishment. That and much more on this episode of the Microsoft Research Podcast.
Starting point is 00:01:57 Manik Varma, welcome to the podcast. Thanks, Gretchen. The pleasure is entirely mine. So you're a principal researcher at MSR India and an adjunct professor of computer science at IIT Delhi. In addition to your work in computer science, you're a physicist, a theoretician, an engineer, and a mathematician. You are a Rhodes Scholar at Oxford and a university scholar when you did your doctoral work there. And you were a postdoc at MSRI Berkeley. And you're blind. I'm not going to call you Superman, but I would really love to know what has inspired you to get to this level of accomplishment in your life. What gets you up in the morning?
Starting point is 00:02:35 I guess it's a combination of my hopes and desires on the one hand, and I guess fears as well on the other. So I guess hopes and desires. I hope every day I get up, I learn something new. That's one of the best feelings I've had. And that's what's driven me all this while. And actually, that's why I'm in research, because I get to ask new questions and learn about new things throughout my career. So that's fantastic. And the other hope and desire that I have that drives me is to build things and build things that will help millions of people around the world. And I guess there's some fears lurking behind that as well. I've been worried about the imposter syndrome all my life. And yeah, so I guess the best way to tackle that is actually to try and do things and get things out there and have people use them and be happy with them. So I guess that's the fear that's driving the hopes and desires. So Monik, all of this has been without the use of your eyes. How have you gone about all this? Right. So I have a condition where the cells in my retina deteriorate over time.
Starting point is 00:03:41 So in about 2009, I think I started using a blind stick and then I lost the ability to read papers, then recognize faces, stuff like that. But it makes life very interesting, right? I go up to my wife and ask, who are you? That's the secret to happily married life. Well, so you haven't been blind since birth? No. Well, it's a hereditary condition, but nobody else in my family has it. So it started deteriorating from the time I was in school and I lost the ability to do some things in school. But it's only over the last, let's say, 10 years where it's become really significant. But again, it's been no big deal also, right?
Starting point is 00:04:22 It's only in the last decade where I've had to think about it. And that's, I think, because of my kids, right? I want to set them an example that they can, because this is hereditary, there's a probability they might have it. And if they do, I don't want them to feel that they can't do something, right? So if they want, they can go climb Mount Everest or become the best historian or the best scientist or whatever. As long as they set their minds up, they can do that. We're going to talk about this relatively new area of machine learning research called extreme classification in a minute. But let's start a bit upstream of your current work first.
Starting point is 00:05:01 Give us a quick review of what we might call the machine learning research universe, as it were, focusing on a bit of the history and the types of classification that you work with. And this is just to give us a framework for how extreme classification is different from other methodologies and why we need it. So if you look at the history of machine learning, then the most well-studied problem is binary classification, where we learn to answer yes-no questions involving uncertainty. And then the community realized that actually there are many high-impact applications out there in the real world that are not just simple yes-no questions, right? They're actually multiple-choice questions. And this leads to the field of multi-class
Starting point is 00:05:45 classification and then after that the community realized that there's some high impact applications that are not only multiple choice but they also have multiple correct answers and this led to the establishment of the area of multi-label classification so just to take examples of all of these if you ask the question, is Gretchen speaking right now or not, then that's binary classification. Whereas if you turn this into a multiple choice question, such as who's speaking right now, is it Gretchen or Manik or Satyana Dela? So that's a multiple choice question that'll be multi-class classification. But now suppose you threw a cocktail party and you invited all the top machine learning AI scientists over there. And then you took a short clip and asked who's speaking
Starting point is 00:06:29 in this clip. That becomes a multi-label classification problem because now multiple people could be speaking at the same time. And so if you have L choices in front of you, then in multi-class, the output space is L dimensional or order L but if you have a multi-label then the output space is two to the power L because every person may or may not be speaking so you go from two choices in binary classification to tens to hundreds and thousands of choices in multi-class and multi-label learning and if you looked at the state of the art in about 2012 the largest multi-label data set out there had about 5,000 labels. And I remember all my colleagues like running their algorithms for weeks on big clusters to try and solve this problem because 2 to the power 5,000 is way more than the number of atoms in the universe.
Starting point is 00:07:19 So it's a hard problem. So this has been a problem for quite some time. It's not brand new, right? But it's getting bigger. Is this our issue? Right. And actually, that's how extreme classification got started. So as I mentioned, in 2012, the largest publicly available multi-label data set had about 5000
Starting point is 00:07:39 labels. But then in 2013, we published a paper which exploded the number of labels being considered in a multi-label classifier from 5,000 to 10 million. Wow. And that really changed the nature of the game. So the motivating application was to build a classifier that could be used as a tool by advertisers that would predict which Bing queries would lead to a click on the ad or the document. And you can well imagine from the context of the application that this is a really important problem from both the research as well as the commercial perspective.
Starting point is 00:08:15 And so many sophisticated natural language processing, machine learning, information retrieval techniques have been developed in the literature to solve this problem. But unfortunately, none of these were working for our ads team. They had billions of ads for which all these sophisticated approaches were not making good quality predictions. And so we decided to go back to the drawing board.
Starting point is 00:08:37 We set aside all of these approaches and simply reformulated the problem as a multi-label classification problem, where we would take the document as input, and then we would treat each of the top queries on Bing as a label. So we took the top 10 million monetizable queries on Bing, and now you just learn a classifier that will predict for this particular document or ad
Starting point is 00:08:59 which subset of top 10 million Bing queries will lead to a click on the ad. Top 10 million. Yeah, so from 5,000 to 10 million. This was just a very different and completely new way of looking at the problem. And it took us two years to build this system, run the experiments, publish our results and check everything out. But once the results came in, we found that our approach was much better than all these traditional approaches. So the number of ads for which we were making good quality recommendations went up from about 60% for the Bing system to about 95, 98% for us. And the quality of our recommendations also improved a lot. And so that led to the establishment of the area of extreme
Starting point is 00:09:43 classification, which deals with multi-class and multi-label problems in extremely large label spaces, say millions or billions of labels. And I think that's exactly why extreme classification grew to be a whole new research area in itself. And that's because I think fundamentally new research questions arise when you go from, let's say, 100 labels to 100 million labels. Let me just give you a couple of examples, if you'll permit me the time. Yes, please. The whole premise in supervised machine learning is that there's an expert sitting out there who we can give our data to, and he or she will label the data with ground truth what's the right answer right
Starting point is 00:10:25 what's the right prediction to make for this but unfortunately in extreme classification there is no human being who can go through a list of 100 million labels to tell you what are the right predictions to make for this data point so even the most fundamental machine learning techniques, such as cross-validation, might go for a toss at the extreme scale. And you'll have missing labels in your test set, in your validation set, in your training set. And this is like a fundamental difference that you have with traditional classification, where a human being could go through a list of 100 labels and mark out the right subset. Another really interesting question is the whole notion of what constitutes a good prediction changes when you go from 100 labels to let's say 100 million. When you have 100 labels, you need to go through that list of 100 and say, okay, which labels are relevant, which labels are irrelevant. But when you have 100 million,
Starting point is 00:11:21 nobody has the time or patience to go through it. So you need to give your top five best predictions very quickly. And you need to have them ranked with your best prediction at the very top and then the worst one at the bottom. And you need to make sure that you handle this missing labels problem. Because some of the answers that you predict might not have been marked by the expert. So all of this changes as you go from one scale to the next scale. Let's talk for a second about how extreme classification can be applied in other areas besides advertising. Tell us a little bit about the different applications in this field and where you think extreme classification is going. I think one of the most interesting questions that came out of our research was when or where in the world will you ever have 10 million or 100 million labels to choose from if you just think about it for a minute a hundred million is a really large number just to put it
Starting point is 00:12:32 in context to see how big a number this is when i was doing my phd in computer vision the luminaries in the field would get up and tell the community that according to biedermann's counts there are only 60 000 object categories in the world so none of the classical visual problems will make the cut. And even if you were to pick up a fat Oxford English dictionary, it would have somewhere around half a million words in it. So many traditional NLP problems might not also make the cut. And over the last five years, people have actually found very high impact applications of extreme classification. And so for example, one of them leads to reformulation
Starting point is 00:13:11 of well-known problems in machine learning like ranking and recommendation, which are critical for our industry. So suppose you wanted to, for instance, design a search engine, right? You can treat each document on the web as a label and now when a query comes in you can learn a classifier that will take the query's feature vectors input and predict which subset of documents on the web are relevant to this particular query and so then
Starting point is 00:13:38 you can show those documents and you can rank them on the strength of the classifier's probabilities and you can reformulate ranking as a classification problem and similarly think about like recommendation right so suppose you were to go on to a retailer's website they have product catalogs that run into the millions if not hundreds of millions and so no human being can go through the entire catalog to find what they're looking for and therefore recommendation becomes critical for helping users find things they're looking for. And now you can treat each of the 100 million products that the retailer is selling as a particular category, and you learn a classifier that takes the user's feature vector as input, and simply predicts which subset of categories are going to be of interest to the user. And you recommend the items corresponding to those categories to the user.
Starting point is 00:14:28 And so you can reformulate ranking and recommendation as extreme classification. And sometimes this can lead to very large performance gains as compared to traditional methods such as collaborative filtering or learning to rank or content-based methods. And so that's what extreme classification is really good for. Let's talk about results for a minute. How do we measure the quality of any machine learning classification system? I imagine there are some standard benchmarks, but if it's like any other discipline,
Starting point is 00:14:55 there are trade-offs and you can't have everything. So tell us, how should we think about measurement? What kinds of measurements are you looking at? And how does extreme classification help there? So the axes along which we measure the quality of a solution are training time, prediction time, model size, and accuracy. So let's look at these one by one. And they're all linked also.
Starting point is 00:15:19 So if you look at training time, this is absolutely critical for us. We cannot afford to let our extreme classifiers take months or years to train so think of the number of markets in which bing operates all over the world if it were to take you let's say two days to train an extreme classifier then every time you wanted to deploy in a new market that would be two days then every time you wanted to deploy in a new market, that would be two days gone. Every time you wanted to run an experiment where you change a feature, two days gone. Every time you want to tune hyperparameters, two days gone. And again, when I'm saying two days, this is probably on a cluster of a thousand cores, right? So how many people or how many groups
Starting point is 00:16:00 have access to those kinds of clusters? Whereas if you could bring your training time down to let's say 17 minutes of a single core of a standard desktop, now you can run millions of experiments and anyone can run these experiments on their personal machines. And so the speed with which you can run experiments and improve the quality of your product
Starting point is 00:16:21 and your algorithm, there's a sea change in that. Yeah. So training time becomes very important. But as you pointed out, right, the quality of your product and your algorithm there's a sea change in that yeah so training time becomes very important but as you pointed out right you want to maintain accuracy so you can't say oh i'll cut down my training time but then i'll make really bad predictions that's not acceptable and then the second thing is model size so if your classifier is going to take let's say a terabyte to build its model then your cost of
Starting point is 00:16:46 making recommendations will go up you'll need to buy more expensive hardware and again the number of machines that have a terabyte is limited so the number of applications you can deal with in one go is limited so again you want to bring your model size down to a few gigabytes so that it can fit on standard hardware and anybody can use this for making predictions and again there's a trade-off between model size and prediction time right you can always trade speed for memory but now the question is how can you bring down your model size to a few gigabytes without hitting your prediction time too much and the reason prediction time is important is because again at the scale of bing we we might have 100 billion documents that we need to process regularly. So if your prediction time is more than a few milliseconds per document, then there is no way you can make 100 billion predictions in a reasonable amount of time.
Starting point is 00:17:38 So your solution simply will not fly. And all of this, as I said, is tied to accuracy because people will not suffer poor recommendations. No, I won't. Well, so circling back, how does extreme classification impact your training time, model size, speed, accuracy, all of that? If you look at the papers that we are publishing now, like Parable and Slice, we can train on that same data set in, as I said, 17 minutes of a single core of a standard machine. And so we've really managed to cut down training time, let's say by 10,000 times over the span of five years. We've managed to reduce our model size from a terabyte to a few gigabytes. Our prediction time is a few milliseconds per test point. And we managed to increase accuracy from about 19% precision at one to about 65% today.
Starting point is 00:18:43 So precision at one is if you just look at the top prediction that was made, was that correct or not? So on a standard benchmark data set, it was 19% in 2013, and we managed to increase that to 65% today. So there's been a remarkable improvement in almost all metrics over the last five, six years. You just mentioned Parable and Slice, and let's talk about those right now. First of all, Parable, it's a hierarchical algorithm for extreme classification. Tell us about it and how it's advanced both the literature of machine learning research in the field in general. So for the last five years, we've been trying out lots of different approaches to extreme classification. We tried out trees, we tried out embeddings and deep learning, and we looked at one versus all approaches.
Starting point is 00:19:27 And in one versus all approach, you have to learn a classifier per label. And then you use all of them at prediction time to see which ones have the highest scores. And then that's the classes or the labels you recommend to the user. And in 2013, I thought there is no way we can learn 10 million or 100 million classifiers and even if you could learn them where would we store them and even if you could store them how would we make a prediction in a millisecond and so i just turned away from one versus all approaches and we tried developing trees in embeddings but today we've actually managed to overcome all of those limitations and the key trick is to go from
Starting point is 00:20:07 linear time training and predictions to log time training and prediction and originally i thought this is not possible right because you have to learn one classifier per label so if you have 100 million labels you have to have 100 million classifiers so your training time has to be linear in the number of labels but with parable we managed to come up with a log time training and a log time prediction algorithm. And the key is to learn hierarchy over your labels. So each node in the hierarchy inherits only about half of its parents labels. So there's an exponential reduction as you go down the hierarchy.
Starting point is 00:20:43 And that lets you make predictions in logarithmic time now the trick in parable was how do you make sure that the training is also logarithmic time because at the leaves of the hierarchy you will have one let's say label in the leaf node and so you have to have at least as many leaves as classified so that gives you back to linear costs but somehow par, you'll need to read the paper in order to get the trick. It's a really cute trick and where you can get away with long time training. And so that's why Parable manages to get this 10,000 times speed up in prediction, in training, in reduction in model size and accuracy is great. It's a really cool algorithm. More recently, you've published a paper on what you've called the Slice algorithm.
Starting point is 00:21:25 What's the key innovation in Slice? How is it different from or how does it build on Parable? Right. So Slice is also a one versus all algorithm where you learn a classifier per label. It also has log time training and prediction, but it's based on a completely different principle. So in Parable, our approach was to learn a hierarchy. So you keep splitting the label space in half as you go down each node.
Starting point is 00:21:52 Slice has a very different approach to the problem. It says, okay, let me look at my entire feature space. And now if I look at a very small region in the feature space, then only a small number of labels will be active in this region so now when a new user comes in if i can find out what region of feature space he belongs in or she belongs in very quickly then i can just apply the classifiers in that region and that's the key
Starting point is 00:22:18 approach that slice takes to cutting down training time and prediction time from linear to logarithmic and it's about to appear in wisdom this month and it scales to 100 million labels i mean if you look at image net right so that was a thousand classes with a thousand training points per class and now we have a hundred million classes so we used it for the problem of recommending related searches on bing so when you go and type in a query on a search engine, the search engine will often recommend other queries that you could have asked that might have been more helpful to you, or that will highlight other facets of the topic. And so for obvious queries, the recommendations are pretty standard and everyone can do a good job.
Starting point is 00:23:02 The real fight is in the tail for these queries that are rare, that we don't know how to answer well. Can you still make good quality recommendations? And they're getting even a 1% lift in the success rate is a big deal. Like it takes dedicated time and effort to do that. And we managed to get a 12% lift in performance with Slice in the success rate. So that's like really satisfying. Yeah. And you had some other percentages that were pretty phenomenal too in other areas. Can you talk about those a little? Right. So we also saw increases across the board in trigger coverage, suggestion density, and so on. So trigger coverage is the number of queries for which you could make
Starting point is 00:23:46 recommendations. And we saw a 52% increase in that. 52. Yeah, that's right. That was amazing. Statistically significant on steroids. And then the suggestion density is the number of recommendations you make per query. And there was a 33% increase in that as well. So, yeah, we had pretty significant lifts. And I'm very glad to say like Slice is making billions of recommendations so far. And people are really happy. It's really improved the success rate of people asking queries on Bing so far. That's cool. Speaking of where people can
Starting point is 00:24:25 find stuff, I imagine a number of listeners would be interested to plunge in here. Where can they get this? Is it available? Where are the resources? So we maintain an extreme classification repository, which makes it really easy for researchers and practitioners who are new to the area to come in and get started. If you go to Bing and search for the extreme classification repository or your favorite search engine, you can find it very easily. And there you'll find not just our code, but you'll find data sets, you'll find metrics on how to evaluate the performance of your own algorithm, you'll find benchmark results showing you what everybody else has achieved in the field so far. You'll find publications. And if you look at my homepage,
Starting point is 00:25:10 you'll also find a lot of talks. So you can go and look at the recordings to explore more about whatever area fascinates you. And all of this is publicly available, freely available to the academic community. So people can come in and explore whatever area of extreme classification they like. So, Monique, we've talked about the large end of extreme classification, but there's another extreme that lies at the small end of the spectrum, and it deals with really, really,
Starting point is 00:25:53 really tiny devices. Tell us a bit about the work you've done with what you call resource-efficient ML. Yeah, that's the only other project I'm working on. And that's super cool too, right? Because for the first time in the world, we managed to put machine learning algorithms on a microcontroller that's smaller than a grain of rice. Think of the possibilities that opens up, right? So you can now implant these things in the brains of people who might be suffering from seizures
Starting point is 00:26:22 to predict the onset of a seizure. You could put them all over a farm to try and do precision agriculture tell you where to water where to put fertilizer and where to put pesticide and all of that the applications are completely endless especially once you start thinking about the internet of things number of applications in the medical domain in smart cities smart housing so in 2017 we put two classical machine learning algorithms based on trees and nearest neighbors called bonsai and proton onto this microcontroller it has only two kilobytes of ram 32 kilobytes of read-only flash
Starting point is 00:26:58 memory no support and hardware for floating point operations and yet we managed to get our classifiers to run on them. And then last year, we released two recurrent neural network algorithms called FastGRNN and EMIRNN. And again, all of these are publicly available from GitHub. So if you go to github.com slash Microsoft slash HML, you can download all these algorithms and play with them and use them for your own applications. So while we're on the topic of your other work, you said that was the only other project you're working on, but it isn't.
Starting point is 00:27:30 And maybe they're tied together, but I've also heard you're involved in some rather extraterrestrial research. Can you tell us about the work you're doing with SETI? Yeah, so they're related, but apparently some of these algorithms that we've been developing could have many applications in astronomy and astrophysics. So if you look at our telescopes right now, they're receiving data at such a high rate, that it's not possible to process all of this data or even store all of this data. So because the algorithms we've developed are so efficient, if we could put them on the telescope itself, it could help us analyze all types of signals that we are receiving, including perhaps our search for extraterrestrial intelligence. So that's a really cool project being run out of Berkeley. But there are also lots of other applications because if you're trying to put something on a satellite, I'm told by my astronomer friends that the amount of energy that an algorithm can consume is very limited because energy is at a premium
Starting point is 00:28:35 out there in space. And so things that are incredibly energy efficient or will have very low response time are very interesting to the astronomers. So, Monica, I ask all my guests some form of the question, is there anything that keeps you up at night? You're looking at some really interesting problems, big ones and small ones. Is there anything you see that could be of concern? And what are you doing about it? Extreme classification touches people. People use it to find things they're looking for. And so they reveal a lot of personal information. So we have to be extremely careful that we behave in a trustworthy fashion where we make sure that everybody's data is private to them. And this is not just at the individual level,
Starting point is 00:29:26 but also the corporate level, right? So if you're a company that's looking to come to Microsoft and leverage extreme classification technology, then again, your transaction history and your logs and stuff, we'll make sure that those are private and you can trust us and we won't share them with anybody else. And because we're making recommendations, there are all these issues about fairness,
Starting point is 00:29:47 about transparency, about explainability. And these are really important research challenges and ones that we are thinking of at the extreme scale. At the beginning of the podcast, we talked briefly about your CV and it's phenomenally impressive. But tell us a little bit more about yourself. How did your interest in physics and theory and engineering and math shape what you're doing at Microsoft Research? Yeah, so because I've been exposed to all of these different areas, it helps me really
Starting point is 00:30:18 appreciate all the different facets of the problem that we're working on. So the theoretical underpinnings are extremely important. And then I've come to realize how important the mathematical and statistical modeling of the problem is. And then once you have built the model, then engineering a really good quality solution, how to do that, what kind of approximations to make that you start from theoretically well-founded principles, but then you make good engineering approximations to make that you start from theoretically well-founded principles but then you make good engineering approximations that will help you deliver a world-class solution and so it helps me look at all of these aspects of the problem and try and tackle them holistically so what about the physics part actually to tell you the truth. I'm a miserable physicist. I completely failed. Yeah, I'm not very good at physics, unfortunately, which is why I switched. You know, I gotta be honest. I'll bet your bad at physics is way better than my guests to offer some parting advice or inspiration to listeners, many of them who are aspiring researchers in the field of AI and machine learning.
Starting point is 00:31:32 What big, interesting problems await researchers just getting into the field right now? Yeah, we've been working on it for several years, but we barely scratched the surface. I mean, there's so many exciting and high impact problems that are still open and need good quality solutions, right? So if you're interested in theory, even figuring out what are the theoretical underpinnings of extreme classification, how do we think about generalization error? And how do we think about the complexity of a problem that would be such a fundamental contribution to make. If you're interested in engineering or in the machine learning side of things, then how do you come up with algorithms that bring down your training prediction and cost and model size from linear to
Starting point is 00:32:14 logarithmic? So can we have an algorithm that is log time training, log time prediction, log model size, and yet it is as accurate as a one versus all classified that has linear costs and if you're interested in deep learning then how can you do deep learning at the extreme scale how do you learn at the scale where you have a hundred million classes to deal with how do you personalize extreme classification if you wanted to build something that was be personalized for each user how would you do that and if you're interested in applications then how do you take extreme classification and apply it to all the really cool problems that there are in search advertising recommendation
Starting point is 00:32:57 retail computer vision all of these problems are open and we're looking for really talented people to come and join us and work with us on this team. And location is no bar. No matter where you're in the world, we're happy to have you. Level is no bar. As long as you're passionate about having impact on the real world and reaching millions of users, we'd love to have you with us. So we've talked about going from 100 to 100 million labels. What's it going to take to go from 100 million to a billion labels, I want to go to an infinite set of labels. That would be the next extreme and extreme classification. And that's really important, again, for the real world,
Starting point is 00:33:51 because if you think about applications on Bing or search or recommendation, you have new documents coming on the web all the time, or you have new products coming into a catalog all the time. And currently, our classifiers don't deal with that. Thankfully, we managed to cut down our training costs so that you can just retrain the entire classifier periodically when you have a batch of new items come in. But if you have new items coming in at a very fast rate, then you need to be able to deal with them from the get go. And so we now are starting to look at problems where there's no limit on the number of classes that you're dealing with. Well, I imagine the theory and the math and then the
Starting point is 00:34:30 engineering to get us to that level is going to be significant as well. But it'll be a lot of fun. You should come and join us. Come on, it'll be fun, he said. Manik Varma, it's been an absolute delight talking to you today. Thanks for coming on the podcast. Thank you, Gretchen. As I said, the pleasure is entirely mine. To learn more about Dr. Manik Varma and how researchers are tackling extreme problems with extreme classification, visit Microsoft.com slash research.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.