ACM ByteCast - Jelani Nelson - Episode 21
Episode Date: October 26, 2021In this episode of ACM ByteCast, our special guest host Scott Hanselman (of The Hanselminutes Podcast) welcomes Jelani Nelson, Professor of Electrical Engineering and Computer Science and a member of ...the Theory Group at the University of California, Berkeley and a Research Scientist at Google. His areas of interest include the theory of computation, as well as the design and analysis of algorithms, especially for massive datasets. Jelani is a member of ACM's Special Interest Group on Algorithms and Computation Theory (SIGACT)’s Committee for the Advancement of Theoretical Computer Science (CATCS). Among his honors, he won the 2014 Presidential Early Career Award for Scientists and Engineers. He is the creator of AddisCoder, a computer science summer program for Ethiopian high school students in Addis Ababa. Jelani and Scott discuss his journey from learning HTML when he was 12 to becoming a theoretical computer scientist. They talk about the spectrum between software engineering and theory and how even theoretical CS research can have an impact on industry practice; teaching his introduction to algorithms course of more than 700 students; running a highly successful algorithmic boot camp for students in Ethiopia to learn coding; and the times he feels most accomplished in his work. Links: AddisCoder People of ACM interview with Jelani Nelson CATCS SIGACT
Transcript
Discussion (0)
This is ACM ByteCast, a podcast series from the Association for Computing Machinery, the world's largest education and scientific computing society.
We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice.
They share their experiences, the lessons they've learned, and their own visions for the future of computing.
I'm your host today, Scott Hanselman. Hi, I'm Scott Hanselman. This is another episode of Hansel
Minutes, and we're doing this one in association with the ACM ByteCast. So thank you to them
for this partnership. And today we're talking with Dr. Jelani Nelson, who's a professor at the
Department of EECS at UC Berkeley and a member of the Theory Group. How are you, sir? I'm doing
well. Thanks for having me. Yeah, thanks for hanging out. If people go and Google for you, you have been doing all kinds
of really cool stuff online and in the computer science world for many, many years. It's really
impressive. It's very different than I think a lot of people's backgrounds, because I'm very
much a practitioner who has very minimal computer science background.
And I'm really trying to get my head around, we're trying to build new coders,
trying to build new developers, trying to build new computer science people.
It's such a completely different spectrum between theory and algorithms and computer science
and text boxes over data, which is what a lot of developers find themselves doing right now.
Are these completely
different concepts or how do we deal with that spectrum? How do we kind of reconcile that between
the HTML of it all and the JavaScript of it all and the really good work that happens in the world
of computer science itself? And when you ask, are they really different? You mean software engineering
versus doing theory? Yeah, because I'm sure you code all the time, you know, certainly.
But like if you were going to make a website today, what would you do?
Versus when I go and make a website, the algorithms that I might use would be very naive and very inefficient and never the twain shall meet.
Yeah, actually, speaking of websites, it's funny you ask because i learned html when i was maybe 12 or so
you know that was like that was my knowledge of web design and then when i when i was a grad
student at some point i wanted to have a website so i you know remembered that knowledge i acquired
when i was 12 and made made myself a website in emacs just typing the html by hand you know it
wasn't very good but it had the information that I wanted to put there,
so it was good enough. Yeah, exactly.
And then more recently, I wanted to make my website a little nicer, so I learned a little
bootstrap and Hugo and put something together that's a little bit better. Not a lot better,
but it's okay. That is very different from what I usually do. I am a theorist. I've been a theorist
since, well, even as an undergrad,
I would say I primarily took theory courses, but there were some non-theory courses I was
required to take. And as a theorist, you know, as an academic too, my main output has been
research papers, right? And not code. Although I did code for fun a lot starting undergrad,
I was really into programming competitions, carried that into grad school too. Recently,
I actually did have an itch to do more stuff that's a little bit more connected
to software engineering.
Because, you know, I teach courses like the titles like algorithms for big data, for example,
right?
But as an academic, it's like, well, I'm just in theory land.
I'm not actually dealing with real data usually, right?
So I wanted to do something a little bit more applied just, you know, to do something different.
So starting this past
summer of 2021, I actually spend a day a week at Google. So I'm actually also a Google employee,
but like, you know, 20% time. And it's been pretty interesting. I mean, I get a chance to
use my theory knowledge to do real things and interface with other researchers at Google,
but also like software engineers. What does that feel like? So like you're spending 20 plus years doing theory and then suddenly
you're in the weeds. What's a similar analogy? It's like you're a biologist and now you're in
surgery and someone's cut open in front of you and you're like, mitosis.
Yeah. It hasn't been exactly like that because the team that I'm working with there is within
the research part of Google. And my manager is someone I've known
since I was a starting graduate student. We've written papers on similar topics or even the same
topic. There's still some people there in that team who are still like intermediaries between me
and people actually working on products, although I do meet with the product people too.
It's been pretty interesting in the sense that I knew that people were using some of the kinds of things I worked on in the real world outside of academia.
But, you know, I was like a little removed from that.
And now I get to see like why they actually care about these things.
I mean, I kind of knew why they cared, but I get to see it like right in front of me and give my input and share my expertise in ways that hopefully help.
That's cool.
That is the goal, right? Like the theory
group at Berkeley is trying to take those applications of the theoretical computer
science and apply it not just to software engineering, but economics and physics and
pure mathematics. Yeah. I mean, I would say theory is very broad. So yes, you know, there are parts
of theory that are more connected to software engineering and, you know, here's a faster
algorithm. Here's an algorithm that uses less memory, less bandwidth,
or whatever resource you care about.
I think theory is more than that.
Theoretical computer science is also philosophical.
What is the power of randomness in computation?
Do we need randomness to have efficient algorithms?
That's like complexity theories,
like studying the limits of computation
and what resources are required to solve various computational problems or like quantum computing
right we don't actually have quantum computers yet but i mean so it's somewhere between the
boundary of like science and computing right so if we have them what would they be able to do what
would they not be able to do there's actually one quantum computing quantum complexity theorist
scott aronson who likes to talk about quantum complexity theorist, Scott Aronson,
who likes to talk about quantum complexity theory is about what we can't do with computers that we
don't have. But I mean, there's some philosophical aspect to it too. There's also the theory footprint,
the theory CS footprint on areas that are not part of CS at all, like economics. So now there's
a lot of people have been working in that intersection
between economics and computer science.
And what does theoretical computer science
have to say about economics, for example?
Like there's someone who's an alumnus of UC Berkeley.
He got his PhD here, Kostas Daskalakis,
who was working on,
and a former faculty member here,
Christos Papadimitriou, that was his advisor,
you know, proving things about the hardness of finding Nash equilibrium, right?
So a Nash equilibrium, that's a concept that was invented by economists, right?
Which Nash, I think, won a Nobel Prize for in economics.
But, you know, computer science came along, or theorists from computer science came along
and said, hey, wait, you say that this equilibrium exists,
but is it computationally tractable to find one, right? And if it's not computationally tractable to find one,
then how important is the concept, right? Does it actually model what people really do if they
can't efficiently compute this thing? And then you've got this interesting relationship between
the math world in the context of John Forbes Nash Jr., who's a mathematician,
then game theory, which brings in economics, and now you're bringing in computer science. And it's this cross-collaborative triangle
trying to figure that problem out. And there are a lot of these examples, right? Now, another big
one is privacy, differential privacy, right? So law, the field of law cares about privacy, right?
But then now computer science is coming in and saying, hey, there's this notion that,
you know, was invented or introduced
maybe more than 15 years ago called differential privacy. You can mathematically define what it
means to maintain privacy and then design algorithms that provably or mechanisms that
provably maintain privacy under this mathematical definition. So I mean, theory CS has really,
I think, made impact. I mean, to software engineering, yes, but also beyond, even beyond computer science.
So that means that it's not just people like me that are doing, like I said, text boxes
over data.
It's my parents and my non-technical friends that these things are all affecting.
It's things that show up in the news all the time, the differential privacy one being a
perfect example when we have conversations about metadata versus data, trying to explain
what appears to be subtle
differences to my non-technical family members, but is actually a huge difference. And even now,
dealing with data, everyone seems to become a scientist now during this time of COVID. It seems
that people who are least likely to do any work on the group project in high school also seem to be
the ones that are telling me that I should do my own research. Right. Yeah. I've seen a lot of memes
centered around that. Yeah. So you're teaching a ton of courses at the undergraduate and graduate
level, but if you're an undergrad in computer science and you're taking one of your classes,
for example, efficient algorithms and intractable problems, what kind of a student and what kind of
a researcher are you creating? Where do you anticipate that the person would go? Would
they go into the computer science theory direction or they just become a more well-rounded engineer?
Right. So that class you just mentioned is the UC Berkeley introductory algorithms course.
So you learn techniques like various graph problems, Dijkstra's and Bellman-Ford,
various dynamic programming algorithms and the technique of dynamic programming,
max flow. There are many different things and there are some randomized algorithms. The class is quite big. This semester
I have on the order of 730 students in that class. Wow. Yeah. The vast majority are not going to go
into research, right? They're going to go into industry. A lot of software engineers. A lot of
students view the class as a way to train themselves for, you know, coding puzzles that the big software
engineering companies like to give,
right? Algorithmic questions. I think aside from puzzles, there is value in knowing how to design
efficient algorithms, minimizing resource consumption, whether it be time or memory.
How do you not lose 600 of them? I mean, that's a gauntlet. It's almost the definition of
gatekeeping to have a 700 person class. How do you make an inclusive course with that many people and not lose people just in the chaff?
More teaching staff.
So we have a teaching staff whose size is on the order of 50.
Lots of office hours.
Piazza.
I don't know if you know about Piazza.
It's a website that lots of universities use for, it's kind of like a forum where students
can ask questions, possibly anonymously, and get answers from teaching staff members and or other students.
So hands-on. So while it might sound like it's 700 people thrown into a pile and it's a big,
you know, battle royale for algorithmic complexity superiority, it's really,
there's a community within that organization itself, 50 TAs and beyond.
Oh, yeah.
It's a lot.
Yeah.
TAs and graders combined.
Yeah.
Yes.
Around 50.
And are you finding that a certain flavor of person pops out the other side?
Is it a certain personality that succeeds in a course like this?
Or are we now in 2021 building more inclusive classes than I took in the late 80s and the
early 90s?
Well, I can't compare with the 80s and 90s because I wasn't in CS.
I was a little kid back then.
Of course, but you know what I'm saying?
Like you've been teaching for some years.
Can you maybe think about how has your style changed in the last even 10 years
to make sure that you make more successful theorists,
more successful scientists, and more successful engineers?
My style has definitely changed. I think when I first started, because I learned this stuff
myself so long ago, and it's second nature to me, I forgot how hard it is to absorb it the
first time around. So I think the first time I taught a class like this, I probably went too
fast and tried to pack too many problems in each homework assignment. Actually, you mentioned this right before we started talking today.
I run this program in Ethiopia where I train high school students there in algorithms.
We've trained more than 500 students so far in the last 10 years.
We started in 2011.
And I remember the first time I ran that program.
That was July 2011.
It was a four-week program.
We met Monday through Friday every weekday for about two and a half hours.
Almost none of the students knew how to write any code at the beginning of that program.
By the last week, I was showing them IOI problems.
This is like the high school programming Olympics, right?
Kids who have been training in algorithms for years and were like the top in their country. I was showing them problems from
those Olympic contests and like solutions to them. In retrospect, I was going way too fast,
like for kids who had never coded a day in their lives. Three weeks of training followed by a fourth
week of like Olympic level programming is probably a little too fast. But yeah, I've turned that down
a little bit since then. The IOI being the International Olympiad in Informatics, as you said, the World Championships,
the Olympics of Algorithms. This is a four-week course for high schoolers, and it's not even
nine to five. It's two or three hours a day. That's another change I made. Now it is nine to
five. It is nine to five. Okay. So this would be a bootcamp. It's an algorithmic bootcamp.
Yes, that's right. Yes. Okay. And again, is it just one of those things like a bootcamp where if you survive, you'll
be a better person on the other side?
How do you keep from losing people?
And what's the secret sauce that makes your graduates so successful?
Because if you take a look at the Addis Coder website that we'll link to in the show notes,
you have not only 500 students, but tons of them have gone on to MIT and become very successful
and well thought of in their space. Yeah, right. So let me back up a little bit. So you said,
what helps to make the students successful? Yeah, like what's the special sauce? Is it just you?
How can someone become successful in four weeks? That seems like they may have had the right head
space on the way in. Yeah. So first of all, we recruit our students from all across
the country. So like the most populous city in Ethiopia is the capital, Addis Ababa, but less
than 10% of our students come from the capital. We bring them from everywhere, from every single
region of the country. And we do that in partnership with help from our co-organizing
organization. It's called the Melazenawi Foundation. They're based in Ethiopia.
They help us with the on-the-ground logistics. They help us work with the Ministry of Education,
the government, to get top students from all across the country. And Ethiopia has,
on the order of 110 million people, it's a big country. It's the second most populous country
in Africa after Nigeria. So you're pulling the top... Nowadays, one summer program might train
175 students or so. So you're pulling 175 students top 175 students out of a population of size 110 million
right so first of all they're really smart right so that helps a lot we do
some like admissions are on our own outside from the ministry but we tell
the ministry like the students you try to find for our program make sure though
they're like top in mathematics top in physics like anything mathematical
because albums is a mathematical thing. It's a mathematical field. And then when
they come, they're really not doing anything for those four weeks except our assignments, right?
So they're really into it. We have a very large teaching staff. It's not just me. We usually have
three to four instructors over the summer. I'm one of them. But we also have a teaching staff that I think typically has on the order of 40 TAs. Usually half of them are based in Ethiopia,
various undergraduates, you know, undergraduates from various institutions in Ethiopia, or some
of them are lecturers at universities or work in industry in Ethiopia. And then the other half,
we fly in from all over the world. And they're really strong TAs. We have a lot of PhD holders or current PhD students or people who themselves did IOI when they were in high school
or similar kinds of competitions. We've had early engineers at big companies. I think the second
hire at Dropbox was a TA for Audiscoater a few years ago, 2018. As was another early Dropboxer.
And we've had people from Google, from Microsoft, from other companies. So it helps that we have very high quality
teaching staff. The importance of algorithms. Do you think that amongst people like myself,
who are more on the practitioner side, do you think that we're sloppy? Do you think that maybe
there's a style of coding right now that is not respecting algorithms and stuff like that? Because
things like IOI that I even hadn't thought of,
maybe you were surprised that I hadn't thought about it.
The rank and file developer doesn't think as much about algorithms
as I think people in your world do.
Is that something that you think that the computer scientists on Twitter
or the software engineers on Twitter
maybe need to be refreshing themselves on?
Possibly.
I mean, again, most of my life I've been in academia.
So it's only recently that I've started more closely interacting with industry.
And computer science is a big space, right? So there's room for a lot of different kinds of
people with different expertise. Not everyone, I think, needs to be an expert in making websites,
as you asked me earlier. Not everyone needs to be an expert in algorithms.
But even me as an algorithmist, I have an idea, and then I code it up and do some experiments, and I want to have a pretty interface to display the results of my experiments. I still need to
have some kind of UI there that is useful. So it still could be useful to have knowledge from
something that's not your core.
Right. Well, the reason that I asked that, and the reason why I'm kind of pounding on that is trying to understand the lenses through which different people look at different problems,
you know, and there's that idea that like, when all you have is a hammer, everything looks like
a nail. And I, having now been in software engineering, not with a computer science,
but with a software engineering degree in the background for now 30 years, I'm starting to realize looking back that I had had be pulling from. And maybe I need to go back and fill
in some of those missing Tetris pieces in my own knowledge and help make a more well-rounded,
either academic or a more rounded practitioner. And that's why I'm picking on you a little bit
with those questions. Yeah. Again, though, I think that CS is so broad that maybe it's
necessary, maybe it's not, I'm not sure. You were at Microsoft, you mentioned. I don't know, how long were you at Microsoft?
14 years.
Okay. I interacted with some theorists who were at Microsoft. Microsoft has a pretty
strong theory group. For example, in Seattle and Redmond, there are also some theorists in,
there used to be more, but I think there's still some in the Boston area right next to MIT.
There was a really good lab that unfortunately Microsoft
closed seven years ago that was in Silicon Valley. I visited that place before. I gave a talk there
once and I knew a lot of the people there. I want to say two of the researchers there,
Parikshit Gopalan and Sergey Yakonin, they had some expertise in an area called coding theory,
which is a part of theoretical computer science. And my understanding is that they designed some coding scheme that was implemented in Azure storage and saved Microsoft
a lot of money. I don't know the details myself. It's not exactly my field, but there's coding
theory, right? Coding theory even traditionally is, I think, considered part of electrical
engineering. There's room for, I think, all kinds of different people and different areas of
expertise to contribute at a large company like
that. ACM ByteCast is available on Apple Podcasts, Google Podcasts, Podbean, Spotify, Stitcher,
and TuneIn. If you're enjoying this episode, please do subscribe and leave us a review on
your favorite platform. What is the sense of achievement that you get? When do you feel like
you're done or you've done a thing? Like, is it applying an algorithm or someone taking one of your research papers and
taking it and saving someone money or saving someone's life or something like that? Or like,
when does, I know I feel achievement when I hit deploy and then that, you know, the thing has a
URL or the app is in the app store. But if someone took an algorithm of yours and did something with
it or a paper of yours and applied it, does that give an extra sense of accomplishment?
I mean, it gives some sense of accomplishment, yes. I would say, though,
most of the senses of accomplishment I obtain are from just pursuit of knowledge, academic pursuit,
right? So there's this problem that seems like a really foundational or interesting problem,
important problem, and no one in the world knows how to do it, right? And you keep thinking about
it and thinking about it, maybe with some friends some colleagues, and finally you crack it. Right.
And then, you know, just that feeling of like cracking the problem and solving it, I think is
the sense of accomplishment. Then you write it up and move on. Sometimes, you know, it might make
practical impact. In my case, most of the time, I think it doesn't, but it's okay.
Yeah, it is okay. It does make me feel some kind of way though, when you say, and then I write it up and move on because it's out there. Like
someone could need that information. Yes. Yeah. How are you thinking about that? I asked Leslie
Lamport the same question and I'm interested, is this a thing where you are, you know, fingers
steepled and you're just walking around the quad with your head down? Or is this a eureka moments
in the shower? Or is it
really just years of hard work, and then it slowly cracks open? I've been in academia now since I
started my PhD. It's like 15 years. So I've been involved in a number of projects. I would say both
have occurred. I've had projects where we had the problem crisply defined, and we're thinking about
it. We got stuck. Maybe we could solve some special cases.
We didn't eventually solve it until a couple of years later, or in some cases we never solved it.
Right. And then I've had other projects where we read someone else's paper and realize there
was some problem there. And it's like, oh, we happen to like really understand the tools really
well that would be useful for this problem. And we work it out and it just works, you know,
it's like a day or two. Right. So, I mean, that's happened too. And we work it out and it just works. It's like a day
or two, right? So I mean, that's happened too. Wow.
So I've had the full range. Yeah. What are some of the problems
that you've been most proud of over the last 15 years? I know you did a lot of work in streaming
algorithms and things like that. Is there anything that you're particularly proud of looking back?
Yeah. I mean, some of the papers I like of my own. I would say one was in my thesis,
the distinct elements problem.
So what is that problem?
You see your website, your server, right?
People visit your website.
When they visit your website,
they have some IP address
and you want to know
what is the number of distinct IP addresses
that have visited my website.
So you don't want to just maintain a counter
because if the same person
visits your website multiple times,
you'll count them multiple times. I only want to count them once because I want to know distinct count, right?
So how much memory do you need to solve this problem? So one of the things I did in my PhD
thesis together with two other researchers, Daniel King and David Woodruff, was to basically nail the
complexity of this problem. So the relevant quantities are the universe that the items in
the stream are coming from. So IP addresses say integers from 0 to 2 to the 32 minus 1, right?
So that's the universe.
It's an approximation algorithm.
So it doesn't give you exactly the right answer, but it gives you the right answer up to, say, 1% error or 5% error, like a tunable parameter.
So the memory goes up if you want better accuracy, right?
And understanding how to trade off the memory with the accuracy.
So basically, we can understand exactly what the optimal space complexity was of this problem. Actually, one thing we didn't fully nail was the dependence on the failure
probability because it's a randomized algorithm that has some chance of outputting a wrong answer.
But one of my PhD students actually, when I became a professor, went back and revisited the problem
and also nailed the dependence on the failure probability.
And he wrote a paper on that. It was in his thesis. He won a best student paper award,
a student best paper award. So yeah, I mean, I think that's something that has been valuable.
Related accounting, I had another accounting paper, actually, which we're about to submit to a conference in a couple months. And that is just maintain a counter. So imagine now,
I just want to know how many visits
have I had in my website, distinct or not, it doesn't matter. So every time I see a new user,
I just increment, increment, increment. And at the end, query, tell me the value of the counter.
So the naive thing is just maintain a counter, right? So if the number gets to be as big as n,
you would need log n bits to write down a number that's as big as n.
But it turns out that you can actually do better if you're willing to settle for approximation
and randomness.
And there was an algorithm from the 70s due to this guy named Rob Morris.
Not Rob Morris, the professor at MIT and the inventor of the Morris worm, but his father.
The father, Rob Morris, he worked for Bell Labs for a while.
He was chief scientist at the NSA for some time.
But back in, I think, the early
days of Unix, he wrote a spell checker. And I mean, the story's online, let me not go into it.
But basically, in the course of writing that spell checker, he realized that he wanted to
maintain a lot of different counts in his code. And the computers he had back then didn't have
enough memory to store all these counts explicitly. So he needed some way to like compress them. So he needed to be able to store, you know, a 20-bit number in only 10 bits, right? What does that really mean? But,
you know, you can do it if you're willing to approximate. So based on the 10 bits,
you won't know exactly what that 20-bit number was, but you can know it up to say 1% error,
right? And how do you increment the counter when you're only maintaining this compressed
representation? So he gave an algorithm for this back in the 70s.
What me and this colleague of mine, he's now a faculty at Princeton, Hua Cheng Yu, we showed, when we proved the lower bound, showing any solution to this problem must use at least blah bits of memory.
And then we showed that that lower bound we proved is actually optimal.
Like there was an algorithm, or maybe I should have said the other way around.
There was an optimal algorithm for this problem that we can analyze and prove.
Like there is no better algorithm up to a small constant factor.
That's very cool.
And then you take these concepts, though, after you've written the paper.
And then a couple of years later, for example, you might go and teach something about that.
Like you've taught the distinct elements problem in your Harvard CS 229 class.
So something that you worked on, a paper you worked on now becomes part of who you
are and part of how you teach, thereby moving things forward for everyone else.
Yeah. That class you mentioned, I actually taught a version of it at Berkeley too in fall 2020.
That's a graduate class. Whenever I teach graduate classes, I always have a final project component,
which is almost half the grade. And I tell the students, find an interesting problem or research
problem and try to solve it. If you fail to do so, it's okay. Research problems are hard. You can fall
back on writing a survey of what you learned or, you know, doing an implementation project where
you implement lots of different algorithms you've learned and experimentally compare them.
But I encourage them to try and actually solve an open problem in research. And some of them
actually do. There have been a few projects that turned into research papers that actually solved some open problem.
That's amazing. That's kind of that goodwill hunting idea where someone who you don't expect
suddenly fills out the whole impossible problem outside of the hallway there.
Right. Yeah, I remember seeing that movie. I think there is a little extreme where,
you know, it was like just written on a board in the hallway and he just solves it, right?
Whereas here it's like these students are working hard.
Yeah, they're working very hard.
Speaking of working very hard, back to this concept as we get towards the end of our show here, we were talking about the IOI, the International Olympiad Informatics.
Yes.
And you had an interesting observation about who goes to these.
And I was trying to get to that root idea of how do we make more people?
How do we have them come from different places, be different backgrounds, age, race, socioeconomic? Who goes to these events?
Sure. Well, I can tell you a little bit about the United States. I know less about how it works,
how the teams are selected in other countries. So just for everyone listening, a little bit more
about the IOI. It's a high school competition. It's a programming competition where most of the problems
are algorithmic in nature, meaning you need like efficient code, not just code that solves the
problem. And it's like the Olympics, right? So every country has a team, selects a team,
and that team represents the country in these Olympics and the location of Olympics changes
from year to year. Every team has four students on it. You ask the question, how do you get to
be on the USA team, right? How do you get to represent the United States in this Olympics? So
there's an organization called the USACO, USA Computing Olympiad, USACO. And they run a sequence
of contests throughout the year. Most of them, or maybe all of them are online. Anyone can go to
their website. I think it's Google for USACO Training Gateway, maybe train.usaco.org. You can make an account. You can do practice problems on their website. And then you can also participate in their contests. And you start off in some division. I forgot. Maybe when you do well enough there, you get promoted to platinum. And if you're one of the top platinum high schoolers in America, you get invited to a live training camp in South Carolina at Clemson.
I think because the head coach of the USA team is a faculty member there, that's why they have it there.
And, you know, on the order of like 15 to 20, I forgot exactly the number of high schoolers, make it to that camp every summer. And then they run some contests there and provide some training as well and have some
invited speakers to give lectures.
And then based on your performance in that camp, you either make the team or you don't,
right?
This team of four.
Okay.
So about, you know, who participates in this?
You know, when I was in college, I was class of 05 at MIT.
Many of the former IOI USA team members would go to MIT.
And I noticed just anecdotally, like the ones I was meeting, not all, but many of them came
from this one particular high school in Virginia called Thomas Jefferson High School of Science
and Technology.
It's a really strong public high school.
It's a magnet school in Northern Virginia.
And I always found that odd because,
you know, America is a big place. And how did it happen that in a team of four, two of them are from the same high school in Northern Virginia? You know, I was invited to give a talk to the
summer team, the training camp team in summer 2020. And I mostly talked about my research and
I tried to get them excited about
modern algorithmic research. But I also had a portion at the end of the talk on the following.
So in preparation for that talk, I went to the USACO website and I looked at who made the training
camp in the last decade. It had the list of all the student names and the high schools they had
attended. And I made a heat map for myself. I can actually send you a link to that heat map. So if you want to put it in a note for the podcast,
and I just looked at like, you know, what states were the students coming from?
And the first thing that really stood out was like half the country had zero students.
Yeah. Like in the past decade, half the states were such that no student from that state made
it to the camp. So let's look at the half of the states which did send students.
Of those, half of them, all the students who came from that state
came from one high school in the state, which was weird.
I think it was like 26 states had participated in a decade.
13 had the property, satisfied the property that only one high school
ever made it to, you know, had representation at this camp. Okay. And then we look at, okay, what are the top performers in
terms of sending kids to the camp? California was number one. It sent something like 60 people to
the camp over a decade. Virginia was number two, sent maybe a dozen or a little more than a dozen.
And then it quickly started falling off. I think Massachusetts, Texas, and I don't remember who else,
but like New York sent like three or four in a decade
compared to California, which sent 60.
And I thought that was weird.
And then I zoomed in a little more on the California kids
and I looked at where were they coming from?
And of the 60 Californian kids,
I think 57 were from the Bay Area. And from those 57, I believe all but one were from the
South Bay. So what that means is, you know, maybe like a five to 10 mile radius around the Stanford
University campus, right? Like Cupertino, Sunnyvale, you know, et cetera, San Jose.
And I thought that was really weird, right? You know, the same phenomenon that I noticed when I
was in college, now it's like phenomenon that I noticed when I was in
college, now it's like shifted. Virginia is still there. A lot of those kids came from TJ,
but now we see a lot of kids from the Bay Area. And I realized it was all about exposure and
access. So it turns out there's an afterschool training camp in the South Bay where kids can
pay to train, right? So that camp, it's an after-school program. They hire like former Olympiad competitors
and other really smart people to train these kids.
And this is a kind of training
that you're not going to get in public school
or even in most private schools, actually, right?
You really can't access this kind of training
unless you live in the Bay Area, right?
So I think that's a big part of it, right?
It's like, first of all, even knowing that it exists. When I was in high school, I didn't even know to train
for this thing because I didn't know this thing existed, right? I didn't know what an algorithm
was until I went to college, let alone training for algorithms as a high school student. By the
way, I didn't even know what MIT was before I applied to MIT. I found out that MIT existed
the fall of my senior year in high school.
My dad bought this US News college ranking magazine. He brought it home in September.
I looked at it. I saw that MIT was ranked first at the time in CS and first in math.
And then I only applied to MIT. I also didn't know that it was very hard to get in.
That reduced the stress of my life, not knowing, but it could have been a disaster.
Yeah, I'm feeling a little bit inadequate myself, having just now learned that this
algorithmic Olympiad exists even now.
So access, organizational willpower, someone putting on that event, someone, a parent or
a teacher encouraging someone to go to that event, it becomes a snowball that goes downhill, which is great for the people within a five-mile radius of one particular area
of a certain demographic of a certain area in Southern California. But what does it mean for
the rest of us? Getting that access to folks, whether it be through programs like Addiscoder
or their online resources, seems essential for the future.
Yeah. Actually, now that you mentioned Addiscoder, again, access and, you know, awareness. We do surveys in Audiscoder, right, where we ask them, prior to Audiscoder, what did
you want to major in in college? Most popular answer, medicine. And then we ask them at the
end of the program, now what do you want to major in in college? And some of them still say medicine,
but a lot of them start saying computer science. It's like, I think that I forgot exactly what the
number was, but it like tripled, right? The number who said they wanted to do it before the program to after the program tripled or
more than tripled.
And it's just because if you're growing up in rural Ethiopia, you know, like what your
parents tell you, what society tells you is like, if you're really smart and you want
to like be successful and have a good salary, be a doctor, right?
People just don't know that CS is out there and like what CS really means,
right? Maybe they've like used email before, or they have a Facebook account, or I guess,
Instagram or TikTok for the young folks now, but they don't really know what it means to be a
computer scientist, right? And I think that's part of what Audis Cutter does is it makes them aware
and they realize, oh, this is actually interesting. Very cool. Access and awareness is really where
it all starts.
I think it plays a big role.
Well, thank you so much, Dr. Jelani Nelson, for chatting with me today in cooperation and in partnership with the ACM ByteCast.
This has been another episode of Hansel Minutes, and we'll see you again next week.
Thank you.
ACM ByteCast is a production of the Association for Computing Machinery's Practitioner Board.
To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please do visit our website at learning.acm.org.
That's B-Y-T-E-C-A-S-T.
learning.acm.org slash ByteCast.