Algorithms + Data Structures = Programs - Episode 185: Name the Algorithm

Starting point is 00:00:00 name the algorithm that you can solve this sequentially, not in parallel, that exists in the standard algorithm. And I actually don't think this algorithm exists as a thrust algorithm. I could be wrong about that. Welcome to ADSP, the podcast, episode 185, recorded on June 5th, 6th, and 7th. My name is Connor and today with my co-host Bryce, we do phone tag episode number two. And in the episode, we chat about what we've been up to and a algorithm to solve how to find the top K elements in a sequence. Hey, Connor, how are you doing? It's been a while since we've talked. I don't even know.

Starting point is 00:00:55 It's probably been like four or five, maybe even six weeks. I know you're traveling right now. I have no idea where you are, but I know you've been very busy. So I just figured I would, uh, give you a call and, uh, leave you a message. So, uh, I'm pretty busy too, because, uh, we are moving apartments in about two weeks. It's a very big move moving to a completely different area, which is exactly three blocks north of where we currently live, but on the other side of 2nd Avenue. So that's a big move for me. I can actually see the new apartment building from our current apartment

Starting point is 00:01:40 building. But we're getting a bigger place on a higher floor. And some big and sad news, which is that the famed Bryce coffee table is not going to be coming to the new apartment. It is the one piece of furniture that we are saying goodbye to. It is actually my oldest piece of furniture. When I started working at. It is actually my oldest piece of furniture. When I started working at NVIDIA, I bought myself that coffee table. It was the first nice piece of furniture that I would own. So if anybody, if any listeners are interested in acquiring Bryce's coffee table, you can get it for free. If you can figure out how to get it from here in New York to wherever you are, you can have my coffee table.

Starting point is 00:02:27 Anyways, I've been working on ask you about is selection algorithms or like finding the top K of some, you know, set of data. Like I don't want to sort the entire set of data. I just want like the top 10 things or the top, you know, five things. And this is a pretty useful algorithm. It's used in a bunch of places. Apparently it gets used a lot in machine learning. Obviously you can imagine if you're like, you're making a website and you want to, you know, show people the top X results that match some search criteria, you'd probably use some algorithm similar to this. And maybe there's like a related algorithm of finding not just the top 10,

Starting point is 00:03:32 but like the, what if I wanted to find like the 11th through the 20th? But I was sort of thinking about this algorithm because I was thinking about whether there's any sort of scan-based approach to implementing this. I mean, obviously, you have to do some sort of sorting here. But there's sort of a different trade-off than the trade-off you have with typical sorting algorithms because, especially in the case where the number of things that you're looking to pick is much smaller than the entire data set, some sorting algorithms like the tonic sort that may not be efficient as you scale up can actually be a win for this sort of problem.

Starting point is 00:04:21 So I was just wondering if you were familiar with this problem and if you had any thoughts about it. Hope to hear from you soon. Hey, Bryce. Good to hear from you. I apologize for not being able to do this recording in person. I have indeed been incredibly busy over the last two to three weeks, and you are correct. It has been, I think, five or six weeks since we've talked because the last time we chatted was when we were interviewing Doug Greger, which our listeners know was the last five episodes. Fantastic episodes. I hope all the listeners enjoyed. And currently I am in Pacific Grove, which is right next to Monterey, which is probably more familiar to the listeners. And if you aren't familiar with Monterey, it's at the top of Highway 1 on the Pacific Coast

Starting point is 00:05:20 in California. It's a couple hours away from San Jose, or I guess actually an hour and a half. And I'm here at the NVIDIA Research off-site. Technically, it's an on-site for me because I work remotely from Toronto, Canada. And it's been absolutely awesome so far. It's the first time, to be honest, I've been able to meet my coworkers in NVIDIA Research since I joined just under two years ago. And getting to meet your coworkers is quite nice in the post-COVID pandemic days. And yeah, so that's what I'm doing here. We're staying at the Asilomar Conference Center right next to the Pebble Beach, the famous

Starting point is 00:06:07 Pebble Beach golf course. And it's absolutely gorgeous. A little bit cold, to be honest. But anyways, that's what I'm doing right now. Last week, I was at the CraftConf 2024 conference in Budapest. Not for the full week, just for a couple days. And that was awesome as well. A couple of really good talks I saw. Got to meet, probably the highlight was getting to meet folks that I hadn't

Starting point is 00:06:28 met before. I got to meet Richard Feldman, who we've had on the podcast before, link in the show notes. Also got to meet Kevlin Henney for the first time. He's probably one of my favorite speakers of all time. I will link in the show notes, probably my favorite talk of his, which is declarative thinking, declarative practice. And who else did I meet? Jessica Kerr was there who keynoted at CPP North, C++ North 2023, where we will be both at the 2024 edition of that, honestly, in like just a month. And there was a couple other folks. Oh, yeah, Dave Thomas, who used to work at KX, which is the company that sells the Q and KDB Plus executable. And he's given a really good talk in the past at a YOW conference on vector

Starting point is 00:07:22 thinking, which I will also link in the show notes. And I didn't actually realize this, but he's the individual that organizes the GoTo conferences and the Yao conferences. So anyways, got to meet a bunch of people. And now I'm at the NVIDIA Research Offsite. So yeah, just incredibly busy. I've also been preparing because I've got a poster that I'll be presenting on this evening. Anyways, that's what I've been up to. And to answer your question, have I thought about the top K problem? I am shocked. You actually told me about this probably like a month and a half ago, right before we interviewed Doug. And I wish we were recording this in person because this is an algorithm that we have talked about. There might even be an episode of ADSP that is named the name of the algorithm that is used for solving this.

Starting point is 00:08:14 I have talked about it in my Better Algorithm Intuition Talk, which is the follow-up to the original Algorithm Intuition Talk. Kate Gregory has talked about this. There's tons of people that have talked about this algorithm. And it's a standard library algorithm. Now you're talking about it from a parallel, you know, is this implementable in terms of a scan perspective, but we got to out of the gate, out of the gate, folks, we got to we got to talk about how you can solve this in with a sequential algorithm that exists. It exists in the algorithm standard library. So I'm going to throw this back over to you, Bryce. Name the algorithm that you can solve this sequentially, not in parallel, that exists in the standard algorithm. And I actually don't think this

Starting point is 00:08:57 algorithm exists as a thrust algorithm. I could be wrong about that, but I don't think it actually... So it exists in the C++ algorithm header, but I don't think it exists as a thrust parallel algorithm. Anyways, excited to hear back from you. It'll be nice to record, hopefully in person. I'm also traveling next week en route to NDC Oslo, So we're going to have to find some time to record because then I'm going on vacation for a couple of weeks. And I need to edit these before I go on vacation. Anyways, hope to talk in person soon. Looking forward to hearing back from you. I know that you can do these with either nth element or partial sort. Like what the question is, is how do you efficiently implement nth element or partial sort in parallel? And you are correct that thrust does not have any of these algorithms. There's no nth element, partial sort, et cetera, in Thrust. It's been like a longstanding feature request.

Starting point is 00:10:16 What people today do when they want to do this in parallel with Thrust is, like, a lot of people just sort the entire sequence. So I was actually talking to one of our coworkers, Elias, about this the other day, who's looked into these algorithms more than me. And his belief is that the best way to implement these in parallel is with partitioning. And he feels that this is a good approach in particular because partitioning is applicable to like a lot of different problems. So if we add the partitioning primitives that we need to solve selection

Starting point is 00:10:58 and top-k and partial sorting to Thrust and Cobb, then there will be a bunch of other things we can do with them. So the basic idea with partitioning approach is that you do like either a radix select or sample select to put everything into N buckets. And you're going to do multiple rounds of putting things into buckets. And then, so you know, like you want the, you know, 10th element, or you want like the 10th to the 20th elements, or you want like the, the, the 10th to the 20th elements, or you want like the zero to the, like the top 10 elements, the zero to the 10th element.

Starting point is 00:11:31 And after you have the things into buckets, you can figure out based on what things you want, which of these buckets you can get rid of. Now in, in Frost and Cub today, we do have a partitioning primitive, but it's a two-way partitioning, a binary partitioning. So, you have a partition function and you are either in the group or out of the group. So, it splits the input into two different groups. But what we need is we need a multi-way partitioning primitive that instead of a function that takes, you know, an element and returns a bool of whether or not it passes the predicate, you need a function that takes an item and returns an index into, you know, which bucket does this belong into? And then the partitioning, the multi-way partitioning will split the input up into N buckets. And then, you know, you do this process iteratively. You continue doing selection down. So, in the case of like radix selection, you're going through the actual bits of the

Starting point is 00:12:33 number. Now, one of the challenges here, we use radix selection for parallel sort of things that are radix selectable. So things like ints, you know, things that like we can go through and like look at each one of their bits to sort them. But for radix sort, it's a least significant bit radix sort. So we start from the low bit

Starting point is 00:13:03 and then we do some like black magic. And one of the reasons why this is significant is that when you start with the low bit, imagine like sorting ints and you're starting with the low bit. How much bias do you have in your data for this selection mechanism? How likely is it that like everything is going to end up in one bucket? Well, if everything, if all your numbers are even, then you might end up in one bucket. But like, if, you know, you've got, you've got like a variety of numbers, like, you know, of different orders of magnitude, you're, you're, you're fine. And then you move on to the next highest bit, the next highest, the next highest, etc. Now, one of the big advantages of doing this least significant bit radix sorting is that if you have a set of numbers that, you know, like maybe you've got like a million ints from like zero to 10,000.

Starting point is 00:14:06 Or even if you've got like a million ints from like zero to like, you know, a billion or a million. In either case there, even with that huge range of numbers for what your possible values are, you end up with a lot of unused bits on the significant side of things. And so, if you start your selection on the significant side, then you'd end up with a lot of bias, right? That if like, if you've got only small numbers and you're starting on the left side, then everything's going to end up in the same bucket because those first few bits are going to be zero for everybody. So for radix sort, we're able to get around this because we start from the other side. But for reasons that I don't fully understand yet, for partial sort, selection, and top K,

Starting point is 00:15:01 we can't do that because basically for radix sort, we end up going through all the bits anyways. And so like, it doesn't matter which direction we start from. But when you're doing some selection problem, you have to start from the left. Again, for reasons I don't fully understand but the implication of this is that you can end up with like bias in your selection function where you can end up with

Starting point is 00:15:36 you wanted to sort things into 100 buckets but you ended up actually putting everything into one or two buckets something like that. And so that's one of the challenges with this problem space, doing this in parallel. And then for things that are not radix selectable, you need to do some sort of sample select, which is sort of like the basis of something like sample sorts.

Starting point is 00:16:04 The idea here is that you're going to just pick some pivot elements arbitrarily, randomly from within the data set. And then you're going to say like, okay, we're going to select based on these. We're going to partition into buckets based on these things. And figuring out how to pick good samples is also, I think, a little bit tricky. Yeah. So, you know, the other thing that I've been thinking about a bit, too, is if you, like, if you just want to find, so the case of selection is, like, an interesting one where it's, like, I just want, like, the, you know, the 50th or the 500th element or just like the medium and median element.

Starting point is 00:16:48 And it just feels to me like you should be able to do something better than having to do any form of like partial sorting or partial partitioning of the sequence. It feels to me, I know I'm almost certain it's not the case, but it feels to me like it should be something that should, you should be able to do by reduction. You know, you can do a max reduction to find the maximum. If you wanted to find, you know, the second from maximum, you could also do a reduction to find that. You would have to, you know, store whatever the maximum is and then also store the second

Starting point is 00:17:19 from the maximum. And if you want to find like the, you know, the 10th max element, if you're willing to store, you know, a buffer of 10 elements and copy that around with, like, a state that's copied with your reduction element, your reduction operator, and then you'd have to have some way of combining it, presumably, but it's not, I think, an issue. Then you could do that without having to like do a full sort um and uh you know i think essentially what i'm getting at is that in the case where like your your your k is small um like you like you want to do top k with a small k or you want to do selection with like a with a where you want a small, like something near the front of the end, um, you can do more efficient, uh, uh, reduction or maybe even scan based, um, algorithms. Anyways, I'm just rambling now a little bit at this point. Um, I, uh, I will send you this message and then I got to go because, uh, it's just,

Starting point is 00:18:21 just like now hitting me that like i'm moving next week and now i gotta leave to go to albany to pick up the girlfriend on sunday instead of uh like tuesday like i planned so i basically got like three days left in this apartment and a lot of stuff to do so well uh i will uh talk to you soon buddy once you you stop all your traveling. It sounds like you're having a great time. I hope you've got some beautiful weather down there. And, yeah, I'm really glad that you're getting to go to all these conferences. I know how much you enjoy and thrive on that energy. So, yeah, next time we talk, I should be in the new apartment.

Starting point is 00:19:04 And hopefully it will have better acoustics, a little bit less background noise. It's on the 37th floor as opposed to the 11th floor where we're at right now. So we're moving on up in the world. And that means it's a little bit quieter. Anyways, enjoy the rest of your travels. And I will talk to you soon. Hey, Bryce, just got your message. And yeah, good luck with all the moving.

Starting point is 00:19:27 It sounds like you're going to have an incredibly busy weekend, which also probably means we're not going to be able to record episodes 186 through 188. So we will have to figure out what we're going to do for the rest of the month. But that is a problem for future us. And I it is Thursday, I think we exchanged messages yesterday, or potentially, I think it was yesterday while we were mid NVIDIA research offsite. I am now back in Santa Clara, just like five minutes away from the NVIDIA headquarters, where I'll be working for half a day tomorrow before I fly back to Toronto. And yeah, the research offsite's over. It was a blast. Awesome to meet, you know, coworkers, folks, and other research teams that I don't really interact with much. Lots of awesome ideas. And yeah, looking forward to, I'm not sure if there's going to be

Starting point is 00:20:22 one next year, but I think they're definitely doing one once every two years. I think maybe the on-site one will be every two years and there's a virtual one. I'm not actually entirely sure. Anyways, also looking forward to hearing if the coffee table finds a home, because I'm not sure if any listeners have reached out, as you mentioned in your last message, or was it two messages ago, if someone is looking to acquire Bryce's coffee table, it is theirs for the taking. It is a rather large coffee table, so I'm not sure how, if they're not in New York, how they're going to get it. Anyways, back to the topic of the parallel nth element. So you didn't mention it in your first message, but you did answer my question.

Starting point is 00:21:05 The name of the algorithm, probably most of the listeners were thinking about it, is nth element in C++, which gets you the nth element if it were in a sorted sequence. So you can think of this as kind of if you need to find the median or a percentile, this is the exact algorithm that you need. And also one of the properties of the nth element is that it will partition all the values that are less than the element and greater than the element on the left and right sides for the default binary operation. I'm pretty sure you can customize the binary operation on nth element. And you also mentioned you could solve this with partial sort. However, you said that your goal was to solve this in parallel. And as you mentioned,

Starting point is 00:21:51 there is no thrust nth element. So the question is what to do. While you were explaining the different partitioning and radix sort solutions, I was thinking in my head, how large is K or more precisely, how small is K? Because if it's small enough, you could just do a reduction and then store all the values. And that is actually what you said at the end of your message. So I was while you were mentioning the partitioning and radix sorting, I was thinking, if K is small enough, just do a reduce. And then if K is large enough, just do a sort. So yeah, the question is, if k is not small enough to do a reduction and not large enough to just sort the whole thing, what do you do? I don't actually know off the top of my head. So maybe we will discuss this in a follow-up episode. Because yeah, I don't have any great

Starting point is 00:22:39 ideas off the top of my head. A partition takes a unary predicate and kind of does what nth element does, but it doesn't guarantee you that at the nth spot you have the value that would be there if the data was sorted. It just basically does a predicate sort. Everything that passes the predicate goes at the front and everything that fails the predicate goes at the back, but you don't have any other guarantees other than that. So that doesn't really help you much. Yeah, I'll think about this because I think this is bringing us to about the 30 minute mark and we'll revisit this problem in a future episode. So you're probably not going to hear this because you don't listen to the episodes, but good luck moving. Hopefully we'll find some time to chat. If we don't, I'm not exactly sure what's going to happen for episode

Starting point is 00:23:31 186, but the listeners, I promise there'll be something. I promise there'll be something. Anyways, good luck with the move to both you and Ramona. I'm happy you're moving to a higher floor and that it'll be quieter. And I'm looking forward to chatting once you'veona. I'm happy you're moving to a higher floor and that it'll be quieter. And I'm looking forward to chatting once you've moved and I'm done traveling. And yeah, we'll talk to you then. Be sure to check these show notes, either in your podcast app or at ADSPthepodcast.com

Starting point is 00:23:55 for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quantity. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

Pet Camera - EBO Air 2

Algorithms + Data Structures = Programs - Episode 185: Name the Algorithm

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

Algorithms + Data Structures = Programs - Episode 185: Name the Algorithm

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.