ACM ByteCast - Ilias Diakonikolas - Episode 76

Starting point is 00:00:00 This is ACM Bycast, a podcast series from the Association for Computing Machinery, the world's largest education and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their own visions for the future of computing. I am your host, Brooke Kifle. Today we explore the foundation. of robust algorithmic statistics, the study of designing algorithms that can reliably learn from real-world data, even when it's noisy or corrupted.

Starting point is 00:00:39 Such methods are increasingly vital in today's world, where data is abundant, but oftentimes messy, incomplete, or even adversarial, making robustness essential for building reliable and trustworthy AI systems. Our next guest is Professor Elias Diaco-Nicolos, who is on the faculty of the University of Wisconsin-Madison and recently awarded the prestigious 2024 ACM Grace Murray-Hopper Award for his breakthrough work in high-dimensional robust statistics, solving problems that had puzzled researchers since the 1960s. He co-authored the textbook Algorithmic High Dimensional Robust Statistics and has been honored with an NSF Career Award, Sloan Fellowship, and NERUP's Best Paper Award.

Starting point is 00:01:23 Professor Elias, welcome to ACM Bycast. Good to be with you, Brooke. You know, I like to start off with kind of a question to understand your origins. And so as you kind of look back on your personal journey and your path into the field of computing and, you know, perhaps theoretical computer science and algorithms, what are some key inflection points over the course of your life that have led you into this domain? So I grew up in Greece, in Athens. So my undergraduate studies were not in computer science, were in electrical and computer engineering at the National Technical University of Athens. So the curriculum of that school is quite broad, so I was able to focus more on computer

Starting point is 00:02:06 science courses towards the second half of the studies. It's like a five-year program. So I distinctly remember that the semester when I took algorithms, so I really liked math as a high schooler and also during undergrad, but somehow that semester of taking algorithms for the first time changed my life. Because I found this to be the most interesting topic that I have ever encountered. That's the time when I made the decision to do a PhD in computer science, and in particular, with a focus on theoretical questions, with focus on algorithms.

Starting point is 00:02:37 So after that, I moved for a PhD to New York, so at Columbia University. This is, you know, the time when I started doing research, so I didn't have any experience of undergraduate research. And, you know, I'm here now. Very interesting. And maybe as you look back, was there, you know, a particular moment or project that highlighted to you, the importance or sort of the value of algorithms and potentially specifically robustness in data analysis?

Starting point is 00:03:06 Right. So, you know, throughout the sort of my career, I have worked on many different algorithm problems. Like, for example, my thesis, my PhD thesis was on a completely different topic. It wasn't even within the span of learning theory. It was on approximation algorithms for optimization problems with multiple objectives. So something that is really completely different. However, during the PhD, I also had the opportunity to immerse myself into sort of questions

Starting point is 00:03:34 in learning theory that didn't end up being part of my thesis, but I did a lot of work in learning theory during the time of my PhD, various topics, various theoretical topics. Sort of this topic of robustness came up after I had finished my PhD, sort of during my postdoc at Berkeley. I was thinking on various questions of unsupervised. learning, in particular something that's called dense estimation. This is like the problem of estimating a distribution from observations, from independent samples.

Starting point is 00:04:05 And during that time, I did a bunch of work on this topic in a completely different setting than the one related to this awarded work. At that point, I realized that robustness is important, that this assumption that we typically make, that you get IID samples that follow, you know, your model and you get super clean data is something that, of course, is unrealistic. Everyone can imagine that. But there is also sort of a crucial difference between algorithms that work under this assumption and algorithms that do not need it. So in particular, I realized like during those years, maybe 2011 to 2013, that essentially all algorithms that we have for high dimensional data analysis are not robust.

Starting point is 00:04:49 They're very, very sensitive to errors in the inputs. And this is what gave me the motivation to work in the field. So really, the motivation didn't come from any, of working on this, at least initially, did not come from any practical considerations. It's not that I had a specific data set or specific application in mind. I just had this theoretical question in mind. And in fact, before we sort of solved the first version of this problem, which happened around 2014 when I was on the faculty of the University of Edinburgh,

Starting point is 00:05:22 we didn't even know the robust statistics literature. well. We defined this problem, we thought it was natural, and we solved it, and then we understood the literature after that. Very interesting. And, you know, I think you kind of described sort of the core problem a bit, but, you know, for those who are less familiar, maybe if you could help me concretely understand, when you say robust statistics, what does that refer to and why does that matter in today's world? Right. Okay, so everyone understands to some degree what the statistics means, right? So basically, you get observations and you want to use them to make an estimation or to make a deduction inference. So the most basic task in this setting, arguably is

Starting point is 00:06:06 that of estimating the mean of this. We get many samples, let's say, distributed, assumed to be distributed as a Gaussian. And what you do not know is you do not know the mean of the Gaussian. You do not know the standard deviation. And your goal is to estimate those parameters. And as a result of estimating those parameters, you also estimate the underlying distribution. So what is the easiest way to estimate the mean of a distribution on numbers? You just take the average, right? You sum up all the samples and you divide by the number of observations. And it turns out that this is a very good estimator for the mean under this vanilla

Starting point is 00:06:43 setting of having clean IID observations. And it turns out that as soon as you depart from this assumption, all bets are off. And sort of this is what robust statistics studies. It studies the possibilities and limitations of estimating distributions when you cannot make assumptions for a small fraction of your dataset. Let's say you observe a million data points, 90% of them actually come from the model that you're interested in, and 10% of them you have no model for. So what do you do?

Starting point is 00:07:19 So in this particular example that we talked about of estimating the mean, If you actually use the average, it would be terribly off. Even a single point that doesn't come through the model could skew the empirical mean arbitrary. The question is what to do? Now, for this vanilla-sharing problem, there is a simple answer since at least the 60s. People knew that if you just take the median of your dataset

Starting point is 00:07:47 as opposed to the empirical mean as opposed to the average, then this is going to be robust. and it's going to be robust to even a constant fraction of incorrect observations. The question that we addressed was, like, how do you solve the problem when your observations are not numbers, but they are high-dimensional vectors? So is it the high-dimensional version of mean estimation and, you know, generalizations of this? I see. So to some extent, is it fair to describe robustness as a form of, quote-unquote, air-tolerance

Starting point is 00:08:21 in algorithms? You know, that's a perfect way of phrasing it. Okay, so you want algorithms that perform well, even in the presence of errors. And then it's very important how you define this notion of error. Okay, so the definition of error changes the type of algorithm that you would use and it changed what is possible and what is not. So when you think about real-world scenarios where this problem arises, whether it's noisy data or corrupted data,

Starting point is 00:08:50 what are maybe some compelling examples and how do robust algorithms actually help manage that? Right. One thing I would like to say is that we've been using the estimation of a mean of a random variable as a running example because this is something that I expect everyone with basic background can relate to.

Starting point is 00:09:09 But to actually solve real world problems, we need to be able to robustly estimate much more complex statistical tasks. So there are a number of applications. like the easiest one to describe is something that's called data poisoning in engineering. So these are like training time

Starting point is 00:09:28 attacks. These are all jargon of adversarial ML. So what does this mean? Let's say you have some kind of machine learning system. Let's say a recommendation system, Amazon and the inputs to the system come from the outside. Okay, you have various users giving recommendations. And now

Starting point is 00:09:44 imagine that there is a statistical algorithm taking those recommendations and making predictions. So What happens if, let's say, you have significant fraction of the users, let's say 20%, that try to give polluted data, malicious data to the system as inputs, how would this affect the predictions of the system? So ideally, what you want to do is detect those sort of inputs as malicious and ignore them in your predictions, because if you don't, they will pollute the result.

Starting point is 00:10:15 So you could view a robust algorithm as a method of doing so, of efficiently detect. and removing the outlying data, the malicious data. Now, the difficulty of doing that is that, you know, you don't know necessarily a priori if a data point is malicious or not. If you think about this problem geometrically, you can think, let's say, you have a cluster of points that look kind of like a sphere, and then you have a point that's very far from this sphere. You could imagine reasonably that this point is an outlier,

Starting point is 00:10:47 but these are like obvious outliers that we knew how to detect and remove a long time ago. But unfortunately, the high dimensions, these are not the only outliers that we need to deal with in the sense that there are much more subtle outliers that could arise, that you cannot just eyeball. And detecting those is important. Maybe not to get too technical,

Starting point is 00:11:10 but what are some of the primary methodologies that you can actually detect and remove these outlier data points? As you mentioned, I think in more traditional settings, it might be easier to say if it's outside, you know, a typical range or whatever it might be. So how do you actually undergo the task of flagging something as adversarial data or an outline piece of data? And I'm sure this is decades' worth of research, but... Yeah, so this is kind of the main technical contribution of this line of works.

Starting point is 00:11:39 I'm not sure I can describe it in a way that is going to be, let's say, approachable to a non-technical audience. But roughly speaking, the idea is that... you need to sort of understand the global structure of the data sets, as opposed to the local structure. Like the oboos outliers that you can detect just by eyeballing the dataset, the ones that are, let's say, far out, are basically sort of depend only on distances between points. Basically, you have that this outlier is going to be far from the rest of the dataset. So this is kind of, in some sense, a local property. Unfortunately, sort of in high dimensions, the outliers that you need to detect,

Starting point is 00:12:19 depend on more global properties of the data set, and this is a challenge. If you want me to tell you the algorithm, like for the most basic definition, the kind of the idea is to, at least for the basic task of mean estimation, is to use the data, use the entire data sets, to find specific directions in which the outliers stand out. So essentially what you do is you use the data to reduce your high dimensional problem to many low dimensional problems

Starting point is 00:12:53 and then for each one of those carefully selected low dimensional problems the outliers can stand out I see, okay so somewhat a stage of dimensionality reduction to help better uncover. Yes, yes, but the way that you do, this

Starting point is 00:13:09 dimensionality reduction is non-standard. In particular, like you cannot use random projections, which is something that people do for other purposes. for example, for the purpose of compression. In the setting of robust statistics, random projections probably fail. You need to look at too many of them to be able to get any useful information.

Starting point is 00:13:31 I see. Okay. Super interesting. Now, you know, as you think about how to find the practical application of these kind of breakthroughs, I'm sure one of the biggest considerations is balancing statistical efficiency with computational efficiency. And I understand that your work has done a good job of actually, balancing that. So how do you generally think about this trade-off in practice? So ensuring that we have these efficient, robust algorithms that are able to achieve better performance, even in the presence of noisy data, but doing it in a way that's constrained on maybe compute? Right. I mean,

Starting point is 00:14:05 so the initial algorithms we develop around, published first around 2016, had the disadvantage in the sense that they needed to use more data than what was needed information theoretically. So in statistical tasks, there are two limits. One limit is a statistical limit, which means that ignoring compute, how many data points do you need to achieve a desired accuracy? And the second is the computational statistical limits, which is like, how many data points do you need to achieve a desired level of accuracy given a budget of compute?

Starting point is 00:14:42 So this is a very interesting dichotomy that for many of the basic problems in robust statistics does not cause. any problems. In particular, the two limits actually are essentially identical. You can achieve statistically optimal, robust estimate errors that give you very good compute. In fact, for the basic problems we have been able to eventually get algorithms whose compute is near linear in the size of the input, so in the size of the data sets. But it's not always the case. So for more complex tasks, there are trade-offs between sample size, accuracy, and compute. And one of the interesting byproducts of this line of work is that it gave us methodologists,

Starting point is 00:15:26 it gave us tools to actually rigorously argue about those traders. So in particular, like, one of the reasons that I like robust statistics from a computational standpoint is not just the fact that it did make progress on these old statistical questions, but because the methodologies we developed, the algorithms and complexity tools have been useful outside. the field. They have been able to give us implications for problems that are not about robustness, both in terms of algorithms and in terms of computational limitations. And what are some of the more interesting implications that this has had,

Starting point is 00:16:06 maybe outside of the direct field, without getting maybe into some of the practical tools, which I would love to discuss. Like, for example, and this is also in the citation of the Hopper Award, like we were able to actually get algorithms for learning mixture models out of this work that have complexity much better than the previously known algorithms, even though robustness is not obviously related to the task of learning mixture models. Another direction is that we were able to actually prove statistical computational trade-offs for various five-dimensional learning tasks that were surprising.

Starting point is 00:16:45 people didn't expect that these status would exist, and it turns out that we were able to establish them by drawing on techniques developed in the context of algorithmic robust statistics. Very interesting. And maybe as you think about the implications of this research, not just on other areas of research, but also how these approaches make their way

Starting point is 00:17:07 into practical tools or industry systems. Like you mentioned Amazon recommendation system in passing earlier, but are there specific real-world systems, whether it's healthcare or finance or recommendation systems, whatever, it might be that benefit from these kind of robust algorithms that really tolerate noisy data? All right. So let me sort of start from the get-goes.

Starting point is 00:17:28 I cannot talk about private things. But I can say that there have been many, at least I am not aware of any, let's say, industrial settings where these tools are currently deployed. But I am aware of many academic papers. where these techniques are used for the purpose of, you know, academic research. Like one sort of line of work is in the context of trustworthy machine learning, where, you know, these tools have become standard in the context of defending against poisoning attacks and different types of attacks.

Starting point is 00:18:05 I had a paper a couple of years ago with one of my colleagues here at Madison on using some of these ideas for something that's called out-of-distribution detection. which is not the same model as the model that we consider in robust statistics. It's related, but still, you know, the tools, the algorithmic ideas are useful to get empirical improvements over prior work. There are biological settings. Another application I didn't mention much is in the context of biological data sets, where there outliers arise naturally.

Starting point is 00:18:38 They are not the product of some malicious entity. You just have a real data set and some of the data points there are very different than the rest in some sense, but in a way that is not obviously detectable. It does require sort of high-dimensional robust statistics to be able to detect them. So in those cases, what you care about is not to detect the outlier

Starting point is 00:18:59 and throw it away, as you would in the case of a, let's say, malicious Amazon recommendation. You want to look at it because it might teach you something about the underlying phenomenon you didn't know before. And in fact, in one of the first applications that we had of this line of work

Starting point is 00:19:17 and perhaps the first application, we actually get improvements for a biological setting, again in an academic context, with these algorithms. ACM Bytecast is available on Apple Podcasts, Google Podcasts, Podbean, Spotify, Stitcher, and Tunein. If you're enjoying this episode, please subscribe and leave us a review

Starting point is 00:19:40 on your favorite platform. Very interesting. You know, one thing that stood out to me earlier, as you described your journey into this field was, you know, it wasn't necessarily a problem or a data set that you initially encountered that had motivated your interest in the space. I'm curious now, as we've seen sort of this rapid acceleration of massive datasets, sources of data from multiple inputs, how much of a close connection do you keep with, you know, industry to maybe benefit from new data sets that. might drive or motivate some of your new areas of further research. That's a good question. So I don't have current collaboration with the industry on these topics, but I do have, let's say, more practical ML colleagues that have questions of that flavor, and we try

Starting point is 00:20:30 to do academic research in their field using these ideas. But one of them, typically most difficult questions in this domain, is what is the right model of robustness? because if you want to have algorithms with rigorous performance guarantees, you need to know against what type of adversary these algorithms are supposed to work. And one could define various types of, let's say, contamination models. Some of them are impossible. Some of them are too easy.

Starting point is 00:21:00 Some of them give you good algorithms. But one of the biggest challenges is to figure out what is the correct model that you need to use in every specific application. For example, like one of the things I've been doing now is looking at, some biological settings and drawing on ideas by some old ideas by statisticians like Ephron that define certain sort of less stringent contamination models that allow for sort of stronger algorithms than, you know, what we had before. Like the dream in this domain would be to have sort of optimally adaptive algorithmic methods in the sense that if your adversary is too strong, you're not going to be able to do much.

Starting point is 00:21:42 if it so happens that the adversary is easier to deal with, your algorithm can realize that and do much better. Very interesting. Super, super insightful. I also noticed, I know it's been maybe a year or two, but that you recently co-authored a book on algorithmic high dimensional robust statistics. So I'm curious what inspired that book, who's it for, and the adoption since the publication. Right.

Starting point is 00:22:06 So, I mean, the book happened a little bit by chance. So I was at a statistics conference, the yearly statistics meeting in 2019, and an editor from Cambridge University Press contacted me during the event, asking me if I was interested in writing a book on the topic. Somehow they knew about the work. It was actually the right time for me to do this, because during the first three years of progress after the initial paper,

Starting point is 00:22:34 there was an explosion of work by various communities. like statisticians, picked this up, ML people pick this up, and other TST theory people who picked it up. So somehow we wanted to sort of present the progress, my collaborator, Daniel, Kane, and myself, we wanted to present sort of the techniques in a way that makes sense, that is kind of unified and give the correct algorithm and the correct proof. Sort of that was the motivation, right? So it was a new field, and we were kind of the people who started it.

Starting point is 00:23:08 So we wanted to present it in a cohesive way. Very interesting. And I think to your point, yes, robustness is certainly a growing research area. And I know I've made a lot of emphasis on the practical applications, and it's primarily because there's so much value to be realized from these advancements and how they can actually improve performance in modern-day systems. And so what progress have you seen in bridging theory with real-world deploy? and, you know, what gaps do you think still remain?

Starting point is 00:23:42 Right. Maybe I'm not the best person to answer this question because, you know, I come from the theory side. But certainly the adoption of some of the initial ideas in our algorithms, in the context of detecting training time, like defending against training time attacks, makes sense. In reality, you know, you're not going to have a clean model. You're not going to have a clean notion of corruptions. Things are much more messy than they are in theory. certainly you do not have the theoretical assumptions. Like even the assumption that you have 90% of clean data is a big assumption.

Starting point is 00:24:16 Somehow it turns out that the algorithms that we developed, at least some of them, are practically useful. Okay, you can run them on real datasets without any theoretical assumptions and observe improvements over what was previously possible. And okay, like the word that I have done is basically in these two different branches, the practical work, One of them is in data poisoning. The other one is in this biological outlier detection setting. But I am aware of many sort of different contexts where these algorithms have been used. One of them being even in the context of reinforcement learning, when you have like 40 feedback.

Starting point is 00:24:54 So, you know, these types of problems are so basic that some form of them appears hidden in various other disciplines. When you are able to detect that, then you're able to make progress in that series. And I think even beyond that, I think, you know, you'd mentioned sort of the domain of trustworthy AI, and it seems like a lot of the core principles of what you've described are the foundations of robust statistics really intersect with some of the key foundations of fairness, of adversarial defenses. So how do you see this line of work intersecting with the broader field of responsible AI where we consider things like fairness and privacy and are there synergies across these

Starting point is 00:25:35 concerns. Right. So that's a good question. I should have mentioned that already. So it turns out there are technical connections that people have not observed. For example, like in my early talks on the topic, I had the question at the end, like can we have formal connections between privacy and robustness? And it turns out that this has been done. Okay. So there are stylized statistical settings where like one can prove that any robust algorithm, can be transformed in a completely black box way to a private algorithm. So in some sense that robustness implies privacy. Not a sort of generic fact that is universally true,

Starting point is 00:26:18 but in a bunch of interesting statistical settings, this is the case. So there are successes in this domain of connecting these different notions of, in some sense, algorithmic stability, privacy and robustness is a very successful and I expect more is going to come out in the next few years. so I've recently been working on a notion of stability called replicability that I'm not going to define. But it turns out that this also may have non-trivial connections to robust statistics. I see.

Starting point is 00:26:49 Well, super exciting that there's some technical connections there. And I think to maybe your last point, as you look forward maybe over the next few years, what do you see as some of the key research frontiers in robust algorithm design? Right. So there are many. So from the theory point of view, there are still some core questions that we have not been able to address. It seems that they need new technical tools. And I don't want to go into this in too much detail because it would make the discussion perhaps a little bit less approachable.

Starting point is 00:27:22 But I would say that at the conceptual point of view, like a good question is to understand like what are sort of the right models of robustness that strike the right trade-off between being realistic. and allow for efficient algorithms. An example of that is this paper I had five years ago with my postdoc at the time and my colleague, Christo Jamos, on supervised learning, like binary classification, in the presence of a type of noise that's called massar noise. This is a different corruption model,

Starting point is 00:27:54 a different contamination model, that doesn't touch the feature vectors and only touches the labels in a very basic way. Okay, so basically, if Y is the correct label of point X, then we probability, at most, one-third, we're going to observe, so probability, there's at least two-thirds, we're going to observe the correct label Y, and probability at most a third, we're going to observe the opposite. Okay, so the correct label is flipped, with probability at most a third. Okay, that's the model. And now this one-third, the probability of, you know, seeing the incorrect. label is not uniform across feature vectors. Maybe some feature vectors, you always see the

Starting point is 00:28:39 correct label. Some others, you flip with probability one in a hundred. But what is true is that no matter with X's, the probability of viewing the incorrect label is going to be bounded above by one-third. So this is also called bounded noise for obvious reasons. So this was a classical model that was defined in theory of computer science in the early 80s. Much later, statisticians, in particular Massar, studied them from a statistical point of view. And it turns out that for a long time, nothing was known. And it's interesting because if you were in the setting where every label is going to be flipped with probability exactly a third, which means that you add more noise to every

Starting point is 00:29:23 label, okay? It's very easy to learn. And the reason is that you can essentially invert the noise. The noise is predictable. On the other hand, when the noise is bounded, but we actually do not know a priori if a point is potentially corrupted and if so by, you know, with what probability, this makes the task algorithmically much more challenging. Now, why would we study this model? You know, it is natural. It arises in various settings.

Starting point is 00:29:49 Imagine that you have a human expert classifying images, right? So, you know, in some settings, they are certain that an image has a specific label. In some other settings, they're less certain. okay so the probability of error depends on the particular image that they observe so it arises in various applications and we knew nothing like we couldn't even get even for linear classifiers which is the basic class that we start from in machine learning we didn't know how to get error 0.49 so 0.5 is the obvious error just randomly randomly select without looking at the data you're going to get 0.5 we didn't know how to get 0.49 it turns out that it is possible to develop better algorithms.

Starting point is 00:30:33 And what's the advantage? The advantage is that these algorithms work for any distribution on the future vectors. So you do not need any distributional assumptions. And this is something very rare that is impossible in robust statistics settings where, let's say if you're trying to estimate the mean of a distribution and you know nothing about the distribution,

Starting point is 00:30:54 robustness is impossible, even in terms of information theory. not even considering compute. And it turns out that in this sort of less adversarial model, in this semi-random Massar model, it is possible to do much better. Very interesting. That's quite compelling,

Starting point is 00:31:12 and I think even some of the real-world sort of scenarios or applications that you described are quite compelling as well. You know, maybe somewhat of a bonus question, but something that kind of came up to mind, a lot of interesting applications of your line of work in obviously modern AI systems. And I think for traditional folks, maybe in industry, I think we see the benefits of AI in improving efficiency and, you know, operations, etc. As a theorist, how and what has been the role of AI in your research beyond just the work that you do to improve AI systems, how have you benefited maybe as a user of AI and advancing your line of work? maybe not at all

Starting point is 00:31:57 I don't know I mean I have used the chat GPT maybe three times in my life the first one to understand the hallucination effect some years ago it was actually funny that I was asking for references in my line of work and it was producing references

Starting point is 00:32:16 that of course do not exist but they look realistic this title looks and these authors and they look legit okay what is this paper But no, I have not used AI in any way that is beneficial in my research. Well, hopefully we can get to a point where AI can actually help advanced theoretical research as well. I mean, there are potential settings where potentially can help you solve some math problems, but just I have not explored that direction.

Starting point is 00:32:45 So I don't want to say that it is not useful currently. I can only say that it has not been used by me. That's a good clarification to make. As we wrap up, I just want to get maybe some reflections and advice. You know, you've made a lot of foundational contributions, solving problems that have stumped researchers for decades. And, you know, as you've described, some of it has had practical applications in real-world settings. And so as a professor, as a researcher, what advice would you give to early career researchers

Starting point is 00:33:16 who want to advance their area of research and hopefully have contributions to real-world systems as well? Right. So I'm very bad at giving advice is, like, one of the reasons is I do not believe that advice works, right? I believe people, you know, look at examples, right, and, you know, based on the examples. Like, people want to see an example as opposed to, you know, advice. Like, what has helped me, like, over the years, has been sort of curiosity, sort of I remain curious on a daily basis. And some kind of consistency. Sort of research is not like a sort of 100-meter dust. It's a marathon. So, you know, if you sort of remain consistent over a long period of time with long-term goals, but also intermediate steps, okay, that are reachable within a reasonable period of time, you eventually, like, once in a while, get something that is important. So curiosity and consistency. And maybe as you look back on your journey so far, you know, solving long-standing problems to building a research community, to, you know, publishing texts and pioneering a new field, what's been the most meaning?

Starting point is 00:34:24 for you. Right. So I want to say that, you know, there is an aspect of luck involved in all of these things, right? So I don't necessarily think that it's only, you know, hard work and anything like, you know, luck plays a big role. It's a combination of lack of lack of preparation. But you need the both, okay, to be able to do well. So, sorry, I missed the second part of your question. Well, you know, what's been the most meaningful or fulfilling part of your journey so far. I mean, the thing that makes me sort of content on a daily basis is the fact that I can define the types of questions that I'm going to think about. I try to think about things that are interesting to me, not necessarily having sort of success or immediate results as a goal,

Starting point is 00:35:13 but doing something that I consider sort of scientifically valid. And, you know, like it's, as I said, a matter of luck if in a period of time, you know, this is going to work. But in general, like in contrast to some recent trends, and, you know, in particular in, let's say, some branches of a mail, I try to think a little bit more about what I consider meaningful as opposed to what is currently the trend. For example, like robust statistics as a field, at least from the algorithmic standpoint, did not exist before we made the first progress in 2016. There was no results in high dimensions.

Starting point is 00:35:51 for a long time to the extent that some statisticians, when we told them about the result, they didn't believe us that this would be possible. So sometimes it pays off to try to make your own path as opposed to following some trend that is the current focus. So stay curious, stay consistent, and don't be scared to carve your own path. Professor Elias, I think this has been a very insightful conversation.

Starting point is 00:36:20 I think certainly building these systems and algorithms that can handle noisy and corrupted or adversarial data isn't just an academic challenge, as we've uncovered during our conversation. It's really central to making modern-day systems more trustworthy and more effective. And so thanks to your pioneering work, I think we're getting closer to learning systems that function reliably, even in the presence of messy and unpredictable conditions of the real world. And so thank you so much for your contributions to the field. and thank you so much for joining us on Bightcasts. Thank you for your time.

Starting point is 00:36:53 It was great talking to you. ACM Bycast is a production of the Association for Computing Machinery's Practitioner Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org. C-A-S-T. That's learning.acm.org slash bikecast.

ACM ByteCast - Ilias Diakonikolas - Episode 76

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.