ACM ByteCast - Jelani Nelson - Episode 21

Starting point is 00:00:00 This is ACM ByteCast, a podcast series from the Association for Computing Machinery, the world's largest education and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their own visions for the future of computing. I'm your host today, Scott Hanselman. Hi, I'm Scott Hanselman. This is another episode of Hansel Minutes, and we're doing this one in association with the ACM ByteCast. So thank you to them for this partnership. And today we're talking with Dr. Jelani Nelson, who's a professor at the Department of EECS at UC Berkeley and a member of the Theory Group. How are you, sir? I'm doing well. Thanks for having me. Yeah, thanks for hanging out. If people go and Google for you, you have been doing all kinds

Starting point is 00:00:47 of really cool stuff online and in the computer science world for many, many years. It's really impressive. It's very different than I think a lot of people's backgrounds, because I'm very much a practitioner who has very minimal computer science background. And I'm really trying to get my head around, we're trying to build new coders, trying to build new developers, trying to build new computer science people. It's such a completely different spectrum between theory and algorithms and computer science and text boxes over data, which is what a lot of developers find themselves doing right now. Are these completely

Starting point is 00:01:26 different concepts or how do we deal with that spectrum? How do we kind of reconcile that between the HTML of it all and the JavaScript of it all and the really good work that happens in the world of computer science itself? And when you ask, are they really different? You mean software engineering versus doing theory? Yeah, because I'm sure you code all the time, you know, certainly. But like if you were going to make a website today, what would you do? Versus when I go and make a website, the algorithms that I might use would be very naive and very inefficient and never the twain shall meet. Yeah, actually, speaking of websites, it's funny you ask because i learned html when i was maybe 12 or so you know that was like that was my knowledge of web design and then when i when i was a grad

Starting point is 00:02:10 student at some point i wanted to have a website so i you know remembered that knowledge i acquired when i was 12 and made made myself a website in emacs just typing the html by hand you know it wasn't very good but it had the information that I wanted to put there, so it was good enough. Yeah, exactly. And then more recently, I wanted to make my website a little nicer, so I learned a little bootstrap and Hugo and put something together that's a little bit better. Not a lot better, but it's okay. That is very different from what I usually do. I am a theorist. I've been a theorist since, well, even as an undergrad,

Starting point is 00:02:45 I would say I primarily took theory courses, but there were some non-theory courses I was required to take. And as a theorist, you know, as an academic too, my main output has been research papers, right? And not code. Although I did code for fun a lot starting undergrad, I was really into programming competitions, carried that into grad school too. Recently, I actually did have an itch to do more stuff that's a little bit more connected to software engineering. Because, you know, I teach courses like the titles like algorithms for big data, for example, right?

Starting point is 00:03:13 But as an academic, it's like, well, I'm just in theory land. I'm not actually dealing with real data usually, right? So I wanted to do something a little bit more applied just, you know, to do something different. So starting this past summer of 2021, I actually spend a day a week at Google. So I'm actually also a Google employee, but like, you know, 20% time. And it's been pretty interesting. I mean, I get a chance to use my theory knowledge to do real things and interface with other researchers at Google, but also like software engineers. What does that feel like? So like you're spending 20 plus years doing theory and then suddenly

Starting point is 00:03:47 you're in the weeds. What's a similar analogy? It's like you're a biologist and now you're in surgery and someone's cut open in front of you and you're like, mitosis. Yeah. It hasn't been exactly like that because the team that I'm working with there is within the research part of Google. And my manager is someone I've known since I was a starting graduate student. We've written papers on similar topics or even the same topic. There's still some people there in that team who are still like intermediaries between me and people actually working on products, although I do meet with the product people too. It's been pretty interesting in the sense that I knew that people were using some of the kinds of things I worked on in the real world outside of academia.

Starting point is 00:04:28 But, you know, I was like a little removed from that. And now I get to see like why they actually care about these things. I mean, I kind of knew why they cared, but I get to see it like right in front of me and give my input and share my expertise in ways that hopefully help. That's cool. That is the goal, right? Like the theory group at Berkeley is trying to take those applications of the theoretical computer science and apply it not just to software engineering, but economics and physics and pure mathematics. Yeah. I mean, I would say theory is very broad. So yes, you know, there are parts

Starting point is 00:04:58 of theory that are more connected to software engineering and, you know, here's a faster algorithm. Here's an algorithm that uses less memory, less bandwidth, or whatever resource you care about. I think theory is more than that. Theoretical computer science is also philosophical. What is the power of randomness in computation? Do we need randomness to have efficient algorithms? That's like complexity theories,

Starting point is 00:05:21 like studying the limits of computation and what resources are required to solve various computational problems or like quantum computing right we don't actually have quantum computers yet but i mean so it's somewhere between the boundary of like science and computing right so if we have them what would they be able to do what would they not be able to do there's actually one quantum computing quantum complexity theorist scott aronson who likes to talk about quantum complexity theorist, Scott Aronson, who likes to talk about quantum complexity theory is about what we can't do with computers that we don't have. But I mean, there's some philosophical aspect to it too. There's also the theory footprint,

Starting point is 00:05:57 the theory CS footprint on areas that are not part of CS at all, like economics. So now there's a lot of people have been working in that intersection between economics and computer science. And what does theoretical computer science have to say about economics, for example? Like there's someone who's an alumnus of UC Berkeley. He got his PhD here, Kostas Daskalakis, who was working on,

Starting point is 00:06:20 and a former faculty member here, Christos Papadimitriou, that was his advisor, you know, proving things about the hardness of finding Nash equilibrium, right? So a Nash equilibrium, that's a concept that was invented by economists, right? Which Nash, I think, won a Nobel Prize for in economics. But, you know, computer science came along, or theorists from computer science came along and said, hey, wait, you say that this equilibrium exists, but is it computationally tractable to find one, right? And if it's not computationally tractable to find one,

Starting point is 00:06:48 then how important is the concept, right? Does it actually model what people really do if they can't efficiently compute this thing? And then you've got this interesting relationship between the math world in the context of John Forbes Nash Jr., who's a mathematician, then game theory, which brings in economics, and now you're bringing in computer science. And it's this cross-collaborative triangle trying to figure that problem out. And there are a lot of these examples, right? Now, another big one is privacy, differential privacy, right? So law, the field of law cares about privacy, right? But then now computer science is coming in and saying, hey, there's this notion that, you know, was invented or introduced

Starting point is 00:07:25 maybe more than 15 years ago called differential privacy. You can mathematically define what it means to maintain privacy and then design algorithms that provably or mechanisms that provably maintain privacy under this mathematical definition. So I mean, theory CS has really, I think, made impact. I mean, to software engineering, yes, but also beyond, even beyond computer science. So that means that it's not just people like me that are doing, like I said, text boxes over data. It's my parents and my non-technical friends that these things are all affecting. It's things that show up in the news all the time, the differential privacy one being a

Starting point is 00:07:59 perfect example when we have conversations about metadata versus data, trying to explain what appears to be subtle differences to my non-technical family members, but is actually a huge difference. And even now, dealing with data, everyone seems to become a scientist now during this time of COVID. It seems that people who are least likely to do any work on the group project in high school also seem to be the ones that are telling me that I should do my own research. Right. Yeah. I've seen a lot of memes centered around that. Yeah. So you're teaching a ton of courses at the undergraduate and graduate level, but if you're an undergrad in computer science and you're taking one of your classes,

Starting point is 00:08:33 for example, efficient algorithms and intractable problems, what kind of a student and what kind of a researcher are you creating? Where do you anticipate that the person would go? Would they go into the computer science theory direction or they just become a more well-rounded engineer? Right. So that class you just mentioned is the UC Berkeley introductory algorithms course. So you learn techniques like various graph problems, Dijkstra's and Bellman-Ford, various dynamic programming algorithms and the technique of dynamic programming, max flow. There are many different things and there are some randomized algorithms. The class is quite big. This semester I have on the order of 730 students in that class. Wow. Yeah. The vast majority are not going to go

Starting point is 00:09:13 into research, right? They're going to go into industry. A lot of software engineers. A lot of students view the class as a way to train themselves for, you know, coding puzzles that the big software engineering companies like to give, right? Algorithmic questions. I think aside from puzzles, there is value in knowing how to design efficient algorithms, minimizing resource consumption, whether it be time or memory. How do you not lose 600 of them? I mean, that's a gauntlet. It's almost the definition of gatekeeping to have a 700 person class. How do you make an inclusive course with that many people and not lose people just in the chaff? More teaching staff.

Starting point is 00:09:51 So we have a teaching staff whose size is on the order of 50. Lots of office hours. Piazza. I don't know if you know about Piazza. It's a website that lots of universities use for, it's kind of like a forum where students can ask questions, possibly anonymously, and get answers from teaching staff members and or other students. So hands-on. So while it might sound like it's 700 people thrown into a pile and it's a big, you know, battle royale for algorithmic complexity superiority, it's really,

Starting point is 00:10:20 there's a community within that organization itself, 50 TAs and beyond. Oh, yeah. It's a lot. Yeah. TAs and graders combined. Yeah. Yes. Around 50.

Starting point is 00:10:30 And are you finding that a certain flavor of person pops out the other side? Is it a certain personality that succeeds in a course like this? Or are we now in 2021 building more inclusive classes than I took in the late 80s and the early 90s? Well, I can't compare with the 80s and 90s because I wasn't in CS. I was a little kid back then. Of course, but you know what I'm saying? Like you've been teaching for some years.

Starting point is 00:10:52 Can you maybe think about how has your style changed in the last even 10 years to make sure that you make more successful theorists, more successful scientists, and more successful engineers? My style has definitely changed. I think when I first started, because I learned this stuff myself so long ago, and it's second nature to me, I forgot how hard it is to absorb it the first time around. So I think the first time I taught a class like this, I probably went too fast and tried to pack too many problems in each homework assignment. Actually, you mentioned this right before we started talking today. I run this program in Ethiopia where I train high school students there in algorithms.

Starting point is 00:11:34 We've trained more than 500 students so far in the last 10 years. We started in 2011. And I remember the first time I ran that program. That was July 2011. It was a four-week program. We met Monday through Friday every weekday for about two and a half hours. Almost none of the students knew how to write any code at the beginning of that program. By the last week, I was showing them IOI problems.

Starting point is 00:12:00 This is like the high school programming Olympics, right? Kids who have been training in algorithms for years and were like the top in their country. I was showing them problems from those Olympic contests and like solutions to them. In retrospect, I was going way too fast, like for kids who had never coded a day in their lives. Three weeks of training followed by a fourth week of like Olympic level programming is probably a little too fast. But yeah, I've turned that down a little bit since then. The IOI being the International Olympiad in Informatics, as you said, the World Championships, the Olympics of Algorithms. This is a four-week course for high schoolers, and it's not even nine to five. It's two or three hours a day. That's another change I made. Now it is nine to

Starting point is 00:12:38 five. It is nine to five. Okay. So this would be a bootcamp. It's an algorithmic bootcamp. Yes, that's right. Yes. Okay. And again, is it just one of those things like a bootcamp where if you survive, you'll be a better person on the other side? How do you keep from losing people? And what's the secret sauce that makes your graduates so successful? Because if you take a look at the Addis Coder website that we'll link to in the show notes, you have not only 500 students, but tons of them have gone on to MIT and become very successful and well thought of in their space. Yeah, right. So let me back up a little bit. So you said,

Starting point is 00:13:10 what helps to make the students successful? Yeah, like what's the special sauce? Is it just you? How can someone become successful in four weeks? That seems like they may have had the right head space on the way in. Yeah. So first of all, we recruit our students from all across the country. So like the most populous city in Ethiopia is the capital, Addis Ababa, but less than 10% of our students come from the capital. We bring them from everywhere, from every single region of the country. And we do that in partnership with help from our co-organizing organization. It's called the Melazenawi Foundation. They're based in Ethiopia. They help us with the on-the-ground logistics. They help us work with the Ministry of Education,

Starting point is 00:13:48 the government, to get top students from all across the country. And Ethiopia has, on the order of 110 million people, it's a big country. It's the second most populous country in Africa after Nigeria. So you're pulling the top... Nowadays, one summer program might train 175 students or so. So you're pulling 175 students top 175 students out of a population of size 110 million right so first of all they're really smart right so that helps a lot we do some like admissions are on our own outside from the ministry but we tell the ministry like the students you try to find for our program make sure though they're like top in mathematics top in physics like anything mathematical

Starting point is 00:14:23 because albums is a mathematical thing. It's a mathematical field. And then when they come, they're really not doing anything for those four weeks except our assignments, right? So they're really into it. We have a very large teaching staff. It's not just me. We usually have three to four instructors over the summer. I'm one of them. But we also have a teaching staff that I think typically has on the order of 40 TAs. Usually half of them are based in Ethiopia, various undergraduates, you know, undergraduates from various institutions in Ethiopia, or some of them are lecturers at universities or work in industry in Ethiopia. And then the other half, we fly in from all over the world. And they're really strong TAs. We have a lot of PhD holders or current PhD students or people who themselves did IOI when they were in high school or similar kinds of competitions. We've had early engineers at big companies. I think the second

Starting point is 00:15:17 hire at Dropbox was a TA for Audiscoater a few years ago, 2018. As was another early Dropboxer. And we've had people from Google, from Microsoft, from other companies. So it helps that we have very high quality teaching staff. The importance of algorithms. Do you think that amongst people like myself, who are more on the practitioner side, do you think that we're sloppy? Do you think that maybe there's a style of coding right now that is not respecting algorithms and stuff like that? Because things like IOI that I even hadn't thought of, maybe you were surprised that I hadn't thought about it. The rank and file developer doesn't think as much about algorithms

Starting point is 00:15:53 as I think people in your world do. Is that something that you think that the computer scientists on Twitter or the software engineers on Twitter maybe need to be refreshing themselves on? Possibly. I mean, again, most of my life I've been in academia. So it's only recently that I've started more closely interacting with industry. And computer science is a big space, right? So there's room for a lot of different kinds of

Starting point is 00:16:15 people with different expertise. Not everyone, I think, needs to be an expert in making websites, as you asked me earlier. Not everyone needs to be an expert in algorithms. But even me as an algorithmist, I have an idea, and then I code it up and do some experiments, and I want to have a pretty interface to display the results of my experiments. I still need to have some kind of UI there that is useful. So it still could be useful to have knowledge from something that's not your core. Right. Well, the reason that I asked that, and the reason why I'm kind of pounding on that is trying to understand the lenses through which different people look at different problems, you know, and there's that idea that like, when all you have is a hammer, everything looks like a nail. And I, having now been in software engineering, not with a computer science,

Starting point is 00:17:03 but with a software engineering degree in the background for now 30 years, I'm starting to realize looking back that I had had be pulling from. And maybe I need to go back and fill in some of those missing Tetris pieces in my own knowledge and help make a more well-rounded, either academic or a more rounded practitioner. And that's why I'm picking on you a little bit with those questions. Yeah. Again, though, I think that CS is so broad that maybe it's necessary, maybe it's not, I'm not sure. You were at Microsoft, you mentioned. I don't know, how long were you at Microsoft? 14 years. Okay. I interacted with some theorists who were at Microsoft. Microsoft has a pretty strong theory group. For example, in Seattle and Redmond, there are also some theorists in,

Starting point is 00:17:58 there used to be more, but I think there's still some in the Boston area right next to MIT. There was a really good lab that unfortunately Microsoft closed seven years ago that was in Silicon Valley. I visited that place before. I gave a talk there once and I knew a lot of the people there. I want to say two of the researchers there, Parikshit Gopalan and Sergey Yakonin, they had some expertise in an area called coding theory, which is a part of theoretical computer science. And my understanding is that they designed some coding scheme that was implemented in Azure storage and saved Microsoft a lot of money. I don't know the details myself. It's not exactly my field, but there's coding theory, right? Coding theory even traditionally is, I think, considered part of electrical

Starting point is 00:18:38 engineering. There's room for, I think, all kinds of different people and different areas of expertise to contribute at a large company like that. ACM ByteCast is available on Apple Podcasts, Google Podcasts, Podbean, Spotify, Stitcher, and TuneIn. If you're enjoying this episode, please do subscribe and leave us a review on your favorite platform. What is the sense of achievement that you get? When do you feel like you're done or you've done a thing? Like, is it applying an algorithm or someone taking one of your research papers and taking it and saving someone money or saving someone's life or something like that? Or like, when does, I know I feel achievement when I hit deploy and then that, you know, the thing has a

Starting point is 00:19:15 URL or the app is in the app store. But if someone took an algorithm of yours and did something with it or a paper of yours and applied it, does that give an extra sense of accomplishment? I mean, it gives some sense of accomplishment, yes. I would say, though, most of the senses of accomplishment I obtain are from just pursuit of knowledge, academic pursuit, right? So there's this problem that seems like a really foundational or interesting problem, important problem, and no one in the world knows how to do it, right? And you keep thinking about it and thinking about it, maybe with some friends some colleagues, and finally you crack it. Right. And then, you know, just that feeling of like cracking the problem and solving it, I think is

Starting point is 00:19:53 the sense of accomplishment. Then you write it up and move on. Sometimes, you know, it might make practical impact. In my case, most of the time, I think it doesn't, but it's okay. Yeah, it is okay. It does make me feel some kind of way though, when you say, and then I write it up and move on because it's out there. Like someone could need that information. Yes. Yeah. How are you thinking about that? I asked Leslie Lamport the same question and I'm interested, is this a thing where you are, you know, fingers steepled and you're just walking around the quad with your head down? Or is this a eureka moments in the shower? Or is it really just years of hard work, and then it slowly cracks open? I've been in academia now since I

Starting point is 00:20:31 started my PhD. It's like 15 years. So I've been involved in a number of projects. I would say both have occurred. I've had projects where we had the problem crisply defined, and we're thinking about it. We got stuck. Maybe we could solve some special cases. We didn't eventually solve it until a couple of years later, or in some cases we never solved it. Right. And then I've had other projects where we read someone else's paper and realize there was some problem there. And it's like, oh, we happen to like really understand the tools really well that would be useful for this problem. And we work it out and it just works, you know, it's like a day or two. Right. So, I mean, that's happened too. And we work it out and it just works. It's like a day

Starting point is 00:21:05 or two, right? So I mean, that's happened too. Wow. So I've had the full range. Yeah. What are some of the problems that you've been most proud of over the last 15 years? I know you did a lot of work in streaming algorithms and things like that. Is there anything that you're particularly proud of looking back? Yeah. I mean, some of the papers I like of my own. I would say one was in my thesis, the distinct elements problem. So what is that problem? You see your website, your server, right?

Starting point is 00:21:31 People visit your website. When they visit your website, they have some IP address and you want to know what is the number of distinct IP addresses that have visited my website. So you don't want to just maintain a counter because if the same person

Starting point is 00:21:43 visits your website multiple times, you'll count them multiple times. I only want to count them once because I want to know distinct count, right? So how much memory do you need to solve this problem? So one of the things I did in my PhD thesis together with two other researchers, Daniel King and David Woodruff, was to basically nail the complexity of this problem. So the relevant quantities are the universe that the items in the stream are coming from. So IP addresses say integers from 0 to 2 to the 32 minus 1, right? So that's the universe. It's an approximation algorithm.

Starting point is 00:22:12 So it doesn't give you exactly the right answer, but it gives you the right answer up to, say, 1% error or 5% error, like a tunable parameter. So the memory goes up if you want better accuracy, right? And understanding how to trade off the memory with the accuracy. So basically, we can understand exactly what the optimal space complexity was of this problem. Actually, one thing we didn't fully nail was the dependence on the failure probability because it's a randomized algorithm that has some chance of outputting a wrong answer. But one of my PhD students actually, when I became a professor, went back and revisited the problem and also nailed the dependence on the failure probability. And he wrote a paper on that. It was in his thesis. He won a best student paper award,

Starting point is 00:22:53 a student best paper award. So yeah, I mean, I think that's something that has been valuable. Related accounting, I had another accounting paper, actually, which we're about to submit to a conference in a couple months. And that is just maintain a counter. So imagine now, I just want to know how many visits have I had in my website, distinct or not, it doesn't matter. So every time I see a new user, I just increment, increment, increment. And at the end, query, tell me the value of the counter. So the naive thing is just maintain a counter, right? So if the number gets to be as big as n, you would need log n bits to write down a number that's as big as n. But it turns out that you can actually do better if you're willing to settle for approximation

Starting point is 00:23:28 and randomness. And there was an algorithm from the 70s due to this guy named Rob Morris. Not Rob Morris, the professor at MIT and the inventor of the Morris worm, but his father. The father, Rob Morris, he worked for Bell Labs for a while. He was chief scientist at the NSA for some time. But back in, I think, the early days of Unix, he wrote a spell checker. And I mean, the story's online, let me not go into it. But basically, in the course of writing that spell checker, he realized that he wanted to

Starting point is 00:23:53 maintain a lot of different counts in his code. And the computers he had back then didn't have enough memory to store all these counts explicitly. So he needed some way to like compress them. So he needed to be able to store, you know, a 20-bit number in only 10 bits, right? What does that really mean? But, you know, you can do it if you're willing to approximate. So based on the 10 bits, you won't know exactly what that 20-bit number was, but you can know it up to say 1% error, right? And how do you increment the counter when you're only maintaining this compressed representation? So he gave an algorithm for this back in the 70s. What me and this colleague of mine, he's now a faculty at Princeton, Hua Cheng Yu, we showed, when we proved the lower bound, showing any solution to this problem must use at least blah bits of memory. And then we showed that that lower bound we proved is actually optimal.

Starting point is 00:24:40 Like there was an algorithm, or maybe I should have said the other way around. There was an optimal algorithm for this problem that we can analyze and prove. Like there is no better algorithm up to a small constant factor. That's very cool. And then you take these concepts, though, after you've written the paper. And then a couple of years later, for example, you might go and teach something about that. Like you've taught the distinct elements problem in your Harvard CS 229 class. So something that you worked on, a paper you worked on now becomes part of who you

Starting point is 00:25:05 are and part of how you teach, thereby moving things forward for everyone else. Yeah. That class you mentioned, I actually taught a version of it at Berkeley too in fall 2020. That's a graduate class. Whenever I teach graduate classes, I always have a final project component, which is almost half the grade. And I tell the students, find an interesting problem or research problem and try to solve it. If you fail to do so, it's okay. Research problems are hard. You can fall back on writing a survey of what you learned or, you know, doing an implementation project where you implement lots of different algorithms you've learned and experimentally compare them. But I encourage them to try and actually solve an open problem in research. And some of them

Starting point is 00:25:42 actually do. There have been a few projects that turned into research papers that actually solved some open problem. That's amazing. That's kind of that goodwill hunting idea where someone who you don't expect suddenly fills out the whole impossible problem outside of the hallway there. Right. Yeah, I remember seeing that movie. I think there is a little extreme where, you know, it was like just written on a board in the hallway and he just solves it, right? Whereas here it's like these students are working hard. Yeah, they're working very hard. Speaking of working very hard, back to this concept as we get towards the end of our show here, we were talking about the IOI, the International Olympiad Informatics.

Starting point is 00:26:13 Yes. And you had an interesting observation about who goes to these. And I was trying to get to that root idea of how do we make more people? How do we have them come from different places, be different backgrounds, age, race, socioeconomic? Who goes to these events? Sure. Well, I can tell you a little bit about the United States. I know less about how it works, how the teams are selected in other countries. So just for everyone listening, a little bit more about the IOI. It's a high school competition. It's a programming competition where most of the problems are algorithmic in nature, meaning you need like efficient code, not just code that solves the

Starting point is 00:26:50 problem. And it's like the Olympics, right? So every country has a team, selects a team, and that team represents the country in these Olympics and the location of Olympics changes from year to year. Every team has four students on it. You ask the question, how do you get to be on the USA team, right? How do you get to represent the United States in this Olympics? So there's an organization called the USACO, USA Computing Olympiad, USACO. And they run a sequence of contests throughout the year. Most of them, or maybe all of them are online. Anyone can go to their website. I think it's Google for USACO Training Gateway, maybe train.usaco.org. You can make an account. You can do practice problems on their website. And then you can also participate in their contests. And you start off in some division. I forgot. Maybe when you do well enough there, you get promoted to platinum. And if you're one of the top platinum high schoolers in America, you get invited to a live training camp in South Carolina at Clemson. I think because the head coach of the USA team is a faculty member there, that's why they have it there.

Starting point is 00:27:57 And, you know, on the order of like 15 to 20, I forgot exactly the number of high schoolers, make it to that camp every summer. And then they run some contests there and provide some training as well and have some invited speakers to give lectures. And then based on your performance in that camp, you either make the team or you don't, right? This team of four. Okay. So about, you know, who participates in this? You know, when I was in college, I was class of 05 at MIT.

Starting point is 00:28:23 Many of the former IOI USA team members would go to MIT. And I noticed just anecdotally, like the ones I was meeting, not all, but many of them came from this one particular high school in Virginia called Thomas Jefferson High School of Science and Technology. It's a really strong public high school. It's a magnet school in Northern Virginia. And I always found that odd because, you know, America is a big place. And how did it happen that in a team of four, two of them are from the same high school in Northern Virginia? You know, I was invited to give a talk to the

Starting point is 00:28:57 summer team, the training camp team in summer 2020. And I mostly talked about my research and I tried to get them excited about modern algorithmic research. But I also had a portion at the end of the talk on the following. So in preparation for that talk, I went to the USACO website and I looked at who made the training camp in the last decade. It had the list of all the student names and the high schools they had attended. And I made a heat map for myself. I can actually send you a link to that heat map. So if you want to put it in a note for the podcast, and I just looked at like, you know, what states were the students coming from? And the first thing that really stood out was like half the country had zero students.

Starting point is 00:29:37 Yeah. Like in the past decade, half the states were such that no student from that state made it to the camp. So let's look at the half of the states which did send students. Of those, half of them, all the students who came from that state came from one high school in the state, which was weird. I think it was like 26 states had participated in a decade. 13 had the property, satisfied the property that only one high school ever made it to, you know, had representation at this camp. Okay. And then we look at, okay, what are the top performers in terms of sending kids to the camp? California was number one. It sent something like 60 people to

Starting point is 00:30:16 the camp over a decade. Virginia was number two, sent maybe a dozen or a little more than a dozen. And then it quickly started falling off. I think Massachusetts, Texas, and I don't remember who else, but like New York sent like three or four in a decade compared to California, which sent 60. And I thought that was weird. And then I zoomed in a little more on the California kids and I looked at where were they coming from? And of the 60 Californian kids,

Starting point is 00:30:40 I think 57 were from the Bay Area. And from those 57, I believe all but one were from the South Bay. So what that means is, you know, maybe like a five to 10 mile radius around the Stanford University campus, right? Like Cupertino, Sunnyvale, you know, et cetera, San Jose. And I thought that was really weird, right? You know, the same phenomenon that I noticed when I was in college, now it's like phenomenon that I noticed when I was in college, now it's like shifted. Virginia is still there. A lot of those kids came from TJ, but now we see a lot of kids from the Bay Area. And I realized it was all about exposure and access. So it turns out there's an afterschool training camp in the South Bay where kids can

Starting point is 00:31:20 pay to train, right? So that camp, it's an after-school program. They hire like former Olympiad competitors and other really smart people to train these kids. And this is a kind of training that you're not going to get in public school or even in most private schools, actually, right? You really can't access this kind of training unless you live in the Bay Area, right? So I think that's a big part of it, right?

Starting point is 00:31:43 It's like, first of all, even knowing that it exists. When I was in high school, I didn't even know to train for this thing because I didn't know this thing existed, right? I didn't know what an algorithm was until I went to college, let alone training for algorithms as a high school student. By the way, I didn't even know what MIT was before I applied to MIT. I found out that MIT existed the fall of my senior year in high school. My dad bought this US News college ranking magazine. He brought it home in September. I looked at it. I saw that MIT was ranked first at the time in CS and first in math. And then I only applied to MIT. I also didn't know that it was very hard to get in.

Starting point is 00:32:22 That reduced the stress of my life, not knowing, but it could have been a disaster. Yeah, I'm feeling a little bit inadequate myself, having just now learned that this algorithmic Olympiad exists even now. So access, organizational willpower, someone putting on that event, someone, a parent or a teacher encouraging someone to go to that event, it becomes a snowball that goes downhill, which is great for the people within a five-mile radius of one particular area of a certain demographic of a certain area in Southern California. But what does it mean for the rest of us? Getting that access to folks, whether it be through programs like Addiscoder or their online resources, seems essential for the future.

Starting point is 00:33:01 Yeah. Actually, now that you mentioned Addiscoder, again, access and, you know, awareness. We do surveys in Audiscoder, right, where we ask them, prior to Audiscoder, what did you want to major in in college? Most popular answer, medicine. And then we ask them at the end of the program, now what do you want to major in in college? And some of them still say medicine, but a lot of them start saying computer science. It's like, I think that I forgot exactly what the number was, but it like tripled, right? The number who said they wanted to do it before the program to after the program tripled or more than tripled. And it's just because if you're growing up in rural Ethiopia, you know, like what your parents tell you, what society tells you is like, if you're really smart and you want

Starting point is 00:33:37 to like be successful and have a good salary, be a doctor, right? People just don't know that CS is out there and like what CS really means, right? Maybe they've like used email before, or they have a Facebook account, or I guess, Instagram or TikTok for the young folks now, but they don't really know what it means to be a computer scientist, right? And I think that's part of what Audis Cutter does is it makes them aware and they realize, oh, this is actually interesting. Very cool. Access and awareness is really where it all starts. I think it plays a big role.

Starting point is 00:34:13 Well, thank you so much, Dr. Jelani Nelson, for chatting with me today in cooperation and in partnership with the ACM ByteCast. This has been another episode of Hansel Minutes, and we'll see you again next week. Thank you. ACM ByteCast is a production of the Association for Computing Machinery's Practitioner Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please do visit our website at learning.acm.org. That's B-Y-T-E-C-A-S-T. learning.acm.org slash ByteCast.

ACM ByteCast - Jelani Nelson - Episode 21

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.