Algorithms + Data Structures = Programs - Episode 257: 🇳🇴 Live from Norway! Replicate, Scatter, Gather & RLD (Part 3)

Starting point is 00:00:00 You know, scatter, gather, are they good names? And I was thinking alternative names are index copy to and index copy from. I'm not actually sure which I like the most, because index copy two versus from, the only thing you have to delineate the difference between those two is the fact that you're copying to an index versus copying from an index. Those are good names for it, by the way. I kept on saying integer sequence, yada, yada counts and offsets. Very nice names. Yeah.

Starting point is 00:00:25 Not names that I picked, names that the A I picked. Let us all bow down to the machine gods that rule us. Yeah, the machine gods that rule us where we can't even find the chat that I had two days ago. Welcome to ADSP, the podcast, episode 257 recorded on September 23rd, 2025. My name is Connor, and today with my co-host, Bryce, we record live. one day before NDC Tech Town in Kongsburg, Norway. In today's episode, we continue our discussion of the replicate algorithm as well as scatter, gather, run-length decode, and more.

Starting point is 00:01:14 All right, folks, we're back. It's been a minute. Last time you heard from us, where were we, Bryce? We were on the... He doesn't remember. He's... You were in the car? No, that was two episodes ago, Bryce.

Starting point is 00:01:30 God. Admittedly, it's been a long two days for us. We just finished two. No one can hear you right now. You forgot how this worked, buddy. You got to wait until the mic is in front of your mouth. Last time we talked to you, we were walking on the streets of Copenhagen. I think we just left you when we were about to get ice cream.

Starting point is 00:01:49 Let's first do a review of the Copenhagen soft serve. Yeah, you got soft serve as well. How was it, Bryce? Well, I got to say. given that I didn't remember it until a moment ago. I don't think it was the most memorable thing. But I did have a wonderful dinner at this restaurant here called Nula, which is...

Starting point is 00:02:09 Yeah, the Indian place. The Indian place, who it's a master chef winner, or she came in third or whatever for master chef. But it was an excellent meal, and Connor bailed out on me. Connor and Ashwin bailed out on me after two days of training, we taught people to coup to C++,

Starting point is 00:02:30 we taught people to C++, we taught people the coup to Python. Now I've got to give one more talk, and then I get to go home. See, I apologize on behalf of Bryce, folks. He doesn't understand how podcasting works. He just started explaining to you the meal he had tonight, and it is now officially 48 hours later.

Starting point is 00:02:48 We're in a different country. We're in a different city. What? No, last. last time we recorded was uh actually that's true it's been 72 hours last time we recorded was Saturday we were traveling on Sunday guess what folks I mean let's go chronologically so the last we left you we were having ice cream we went to sleep we woke up we were in transit what you want to tell them about the train I too wait we're not at the train yet we because what happened

Starting point is 00:03:17 before the train we were on a plane we were on a plane and who did we run into on the plane We ran into Jason Turner on the plane. Not just Jason. Jason and his partner, Jen. It was lovely to see him. Haven't seen them in, I think, close to a year. And more important, Bryce is just chopping out the bit to get this point. More importantly, Connor was napping on the plane, and I got photos, folks.

Starting point is 00:03:38 And they're going all over the Internet. I'm allowed to nap, folks. I was tired. It, uh, I was tired. It was, it was a 10.55 flight. There was no reason to be tired. actually folks there was a reason to be tired i had to make a game time decision did i run in copenhagen at a very early hour because we needed to leave i believe it was at 830 to be at the

Starting point is 00:04:03 airport at 9 which means in order to do a 10 to 15k run i've got to be up a hell of early and i think i woke up at what was it 630 or something like that in order to be back so it was either do that or run when i get to kongsberg which was going to be at like 3 p.m. in the afternoon and I didn't know it was going to be cold. I made the right decision for the wrong reasons. I made the right decision because I thought Copenhagen would be easier to run in. I'd be tired. It's been freezing here, folks.

Starting point is 00:04:30 Guess what? I did not run this morning. Today is Tuesday. Yesterday I woke up and I spent 45 minutes coercing myself to get out of bed and run. And I did. It was two degrees for the American folks. That's like 36 Fahrenheit or something. I couldn't feel my hands the whole run.

Starting point is 00:04:46 And today it was even colder. It was zero. I know what that is in Fahrenheit. That's 32. I couldn't do it. I couldn't force myself to get out of bed. And I don't have, you're thinking, why don't just throw on a fleece?

Starting point is 00:04:56 I didn't bring a fleece. I brought shorts, a hoodie, that's it. Anyway, so we ran into Jason. We had a lovely visit with them on the train until, what happened, Bryce? Well, we seem to be cursed for our train luck this trip because once again, dear listener, we got on a train.

Starting point is 00:05:16 This was actually, wait, both times that we got on a train together. two for two both times we got on a train together something happened with the train the previous train got completely cancelled but this train didn't get entirely cancelled but we went somewhere and then they started speaking in the uh the norwegian and then uh then the signs changed and the our stop disappeared from the little display monitor and then a guy came through and asked for our tickets and he said oh well this train's heading in the opposite direction now because there's a track closure and you got to get off the train and something something, something about a bus and we got off and we went and walked over with other people and it looked like there was going to be a bus but then the bus wasn't going to take us

Starting point is 00:06:01 to the final destination. The bus was just going to take us around the train closure, the track closure and I was like, I'm not having enough of this so we got a very expensive Uber. Well, not a very, well, depending on your frame of reference an expensive Uber that we took directly to Kongsburg, the city of

Starting point is 00:06:19 Honesburg. That's not really a city. I don't know what the population is, but I don't think it gets to be called the city. It is a sleepy little town. It is. It's a sleepy little town. It is. It's a sleepy buddy, buddy. I got to

Starting point is 00:06:35 tell you, the restaurant that I had dinner at tonight, one of the folks who was in our training today, he's Norwegian, he's from this town. His parents own the building that the restaurant is in. the population Yes, that's a sleepy little town

Starting point is 00:06:52 You don't get to be a city Until you're more than 50,000 population I'm sorry But the town of Konsorg But the place where we had dinner The guy Literally had dinner with the person Who owns the building that the restaurant was in

Starting point is 00:07:05 Because it's a little town Everybody knows everybody Anyways We got here And then we had to do an eight-hour training On Monday On Kuta C++ Which Connor

Starting point is 00:07:17 delivered about 40, 50% of, I delivered the rest of it. And then... It would have been 50-50 if you hadn't gone over by half an hour. It would have been 50-50 if I hadn't gone over by half an hour. I did not go over by half an hour. I went over by about 15 minutes and nobody complained. We got great... It was 30.

Starting point is 00:07:34 We got great feedback. The training in 15 minutes, but then you proceeded to give a bunch of references and blah, blah, blah. People go to left one over, and then everybody was happy. I got great feedback. We had lots of lovely people there. and then today we did a coup de python training which I gave about half of and my colleague, our colleague Ashran gave about

Starting point is 00:07:51 half of it all went great. It was excellent. It was wonderful. Now I got one more talk just one more tomorrow because tomorrow is the open of NDC Tech Town which is I'm learning the people tell me I didn't realize it's not actually a C++ plus conference it's an embedded

Starting point is 00:08:07 conference. It seems like different people have different ideas about what this conference is about but I haven't been to this conference before. Don't you know? I mean, I didn't go to that many talks. I just talked to a lot of people. So wait, you've been to, when was the last time you were in NDC? Wait, wait, wait, just wait for me to ask because nobody can hear what you're saying. When was the last time you were in NDC Tech Town and give us your review of that last edition that you were here for?

Starting point is 00:08:38 I'm assuming it was pre-pendemic, but you can tell us. I was here last year. It was very nice, lovely people. Yeah, yeah, I was here last. Yeah, last year, very lovely. I do recall talking to lots of people who worked on interesting devices, like people from the company that makes remarkable, the little liquid paper tablet thingy were here,

Starting point is 00:08:58 and there were a couple other people, like somebody who worked in like some sort of deep sea vehicle. So lots of people who worked on things that now I'm realizing are all embedded things. But at the time, I just thought they were very cool things. You know that I saw Chandler earlier. Which Chandler? Chandruth. he's not on the schedule but and also too No you didn't see Chan I haven't seen Chandler in so long

Starting point is 00:09:21 No one can hear what I'm saying But I did definitely see Unless if Chandler Coincidentally happens to be in the city of Kansber It was definitely him It was definitely him He had a sharp haircut He always has a sharp haircut

Starting point is 00:09:34 He's always very nicely dressed And he had a mask on It was definitely I mean I'm saying definitely But as a former actuary I should take a step back I'm 97% sure.

Starting point is 00:09:46 That means that there's a 3-100 chance that I think I'm wrong. I'm very confident Chandler's here, even though he's not speaking. And what was I also going to say? Techtown, embedded, 24, Jetson, embedded, your talks tomorrow. All right, well, let's transition. You just heard a record scratch. I can't remember what the second point I had to make was... That's because...

Starting point is 00:10:17 Connor's turning into an old man. Well, I mean, I'm just catching up with you, because you already an old man. That is true. Anyways, our plan is to record tomorrow night. We already told you about this on Saturday. You're hearing it a week apart. But it is now Tuesday.

Starting point is 00:10:34 The conference starts tomorrow. And tomorrow night is the NDC social after the first day of the conference. We're going to be interviewing folks. But right now, we are following us. up our conversation, which I butchered by saying... Are you kidding me?

Starting point is 00:10:50 Before Sean came on and talked about how he crashed the Porsche, I consider the pizza to Google story by Chandler. I mean, that was the best we had. I think to this day, too, it's probably the most inspiring because as good as the Sean Parent crashing the Porsche story is,

Starting point is 00:11:10 that's just an amazing story. You're not inspiring anyone to, like, card and end up at a, you know... I think of I heard that story. I'm being inspired to my whole. Anyway, so we're going to be interviewing folks tomorrow, but now we're going to follow up our conversation to replicate, of which you've listened to already, like, a week or two

Starting point is 00:11:30 ago, or maybe it was a week ago, just a week ago now. And I said scatter, scatter, scatter, scatter, but then I edited it. Not yet. Current Connor has not edited it. Why are you saying scatter, scatter, scatter. Because remember, I explained to you that the permutation iterator was the lazy version of the scatter algorithm? Why are you telling the listener?

Starting point is 00:11:49 Oh, because I will keep a couple in. I have to let them know because they're going to notice the quiet, echoy room of scatter versus the noisy streets of Copenhagen. There's no way I can... Of Gather, yeah, Scatter versus Gather. It's all right. No, I will never forget now. I will never forget the Gather is the algorithm version.

Starting point is 00:12:09 The problem is, why don't we just call algorithm and iterators the same thing? Gather, gather iterator. Sequence, sequence iterator. Or counting, counting. Why don't we just name them the same, you know? It's beyond me. Nobody knows. But anyways, Bryce has had some time to think about how to implement this algorithm,

Starting point is 00:12:26 and I have gone and taken a look, and there's lots of exciting things to talk about it. We're going to hand the mic back to Bryce. You're going to have to go first, because after two days of training and just spending a couple hours with an NVIDIA customer trying to do. bug some weird problem. I have to page the context of how to implement, replicate in a single pass back in.

Starting point is 00:12:54 All right, well, I can chat a little bit first. Remember, we talked about the integer overload, which is lazy and easy. And now we're talking about the array overload, which takes an array of integers that corresponds in length to the length of our

Starting point is 00:13:13 sequence and you want to basically copy each of the corresponding elements the number of times of the element or the integer that corresponds to that element. So as I mentioned in the last episode, if you have ABC and 123, you end up with A, B, B, CCC. As Bryce just mentioned off mic, you may have picked it up, you may have not. It is effectively a run-length decode and the question is, and the discussion is, how to implement this in parallel. If we look at the current implementation, which obviously will change, it's an experimental library, folks, so everything changes very quickly. I think I had five kernels. Five kernels, Bryce. That's not hard to beat. It's definitely too many, and it's not too hard to beat. Yeah, so now I'm remembering all

Starting point is 00:14:02 the context. Yeah, like I was working on this when we were on the plane. When Connor was napping, I was furiously coding on my notepad on my phone. as I often do. I think I may just read you some of my notes from this document that I wrote on the plane. All right, so per block, if we think about this problem, let's divide this problem up into a bunch of blocks, which is what we want to do to paralyzes.

Starting point is 00:14:32 We need to be able to subdivide this problem up into a bunch of blocks. So let's assume that for each block of threads, each group of threads we've got a fixed input of in pairs of value and count and then we want to out to do a fixed output of k values um and that's like the the facility that we have today and cub can can do that it is a fixed size input and a fixed size output um the problem with that is you end up with having one of two cases. Either one, K is exceeded by less than N inputs. So your fixed-size output, you exhaust your output

Starting point is 00:15:23 before you get through all of your input pairs. And if that's the case, then how do we process the remaining N inputs without introducing load and balance? The other case is where all in inputs are exhausted but we've produced less than K outputs. And if that's the case, then you end up with gaps in the output. And so this algorithm is just not as straightforward to do single pass as you might wish.

Starting point is 00:15:56 I think if I go to my chat GPT history, it is pretty straightforward. Oh, God, there's been a lot of chat GPT queries since. No, where is... Okay, all right. You hold this for a second. All right, folks. I'm not sure what I'm supposed to say right now. Bryce is just looking through the history of his chats with chat, EBT.

Starting point is 00:16:23 Seeing as Bryce is not paying attention, I'll just walk you through the algorithm that I had implemented, which will clearly be replaced because it's five kernels. If I recall from memory, it starts off with a plus scan on the integer sequence, representing the number of times we have to copy things and that leads us to the starting positions of each of the sequences of values in our decoded sequence then I do a scatter if which

Starting point is 00:16:55 we need to talk about to the naming because we didn't talk about that on the last episode we had a big discussion from the ice cream to the walk home the hotel and Bryce wanted to record and I said no no no we'll talk about this next time and it's the naming of scatter and gather which is part of the reason that I I messed up saying scatter when I should have been saying gather because gather is and we asked chat to be t this or actually wait you did record a mini recording of the yeat versus yonk which maybe maybe I'll cut in right now the top humorous one was yit and yonk and then Bryce thought that was hilarious

Starting point is 00:17:36 and that's what we should call it. But my response was, well, that doesn't make any sense because yeat means to, like, delete, to get rid of, to toss it, you know? And they say, yeat it means get rid of it. And that's not what Scatter's doing. So, semantically, it has nothing to do with the actual semantic. I believe you said specifically, yeet semantically has nothing to do with the actual operation.

Starting point is 00:17:58 Yeah, yeah, something like that. But the best suggestion, in my opinion, because I had actually thought about scatter to and gather from where I was thinking of gather because you're gathering from an index and then copying to an output but scatter is scattering from an input to an output index and so gather is the algorithm I was reaching for but you know scatter gather are they good names the idea that you basically, and I was thinking alternative names are index copy to and index copy from. I'm not actually sure which I like the most because index copy two versus from. The only thing you have to delineate the difference between those two is the fact that you're copying to an index versus copying from an index.

Starting point is 00:18:54 Anyways, a little bit confusing. And one of the properties of a scatter two is that you can only say, scatter to the maximum number of indexes you can scatter to is the length of your input sequence whereas gather does not have that property gather from you can gather from the same index multiple times and therefore you can end up with an output sequence that is longer than the input sequence and technically like the output sequence from scatter two could be longer but the number of copies that you'll end up doing is less than or equal to

Starting point is 00:19:36 the input sequence that you have. Anyways, scatter two, gather from, index copy to, index copy from, and the point is, I had a scatter if, where you are scattering the value from your input sequence

Starting point is 00:19:53 to the integer if it equals a counting iterator. So you end up with the cumulative sums from your plus scan on the integers, and that's where you want to copy the first value of each decoded sequence. And so you do a scatter-if the counting iterator equals one of the values in your plus-scanned integers.

Starting point is 00:20:21 And so you end up with a value at the beginning of each decoded sequence or sub-sequence. And then after that, you do a max scan to basically copy that. value to the rest of the sub-sequence. And then I think there's one other kernel at the end of that. But that is a Mac scan, a scatter-if, a plus scan. I mean, technically I only mentioned three kernels there. But I counted five when I looked at my algorithm. Anyways, over to you, Bryce.

Starting point is 00:20:52 Are you glad that I just randomly talked for like seven minutes? Because I had a chat GPT chat. chat that I knew was there but when I I could not find it in the list of my phone when I search for it I just get you know error error error on the phone I tried

Starting point is 00:21:12 we're thinking about it I cleared the cache on my phone and I opened the deck up and I could not find it I knew it was there eventually I opened it up my browser and I search for it in the browser and like the first time gives me an error in the second time then it pops up and now it's like suddenly

Starting point is 00:21:28 showing up as my latest chat. So, like, it was definitely not there in the list until I searched for it and found it. So they have some infrastructure, some infrastructure issues. They got to look at. Anyways, you wanted things expressed in terms of thrust. So

Starting point is 00:21:44 I was saying, before I was unable to find the chat and was interrupted, that it's fairly straightforward to... You were not interrupted. You handed the microwave I know. I interrupted by my inability to find it on chat. Interrupted by chat GPT. I was interrupted by chat GPT's rudeness of not being able to find things.

Starting point is 00:22:01 It's fairly straightforward to do this as a two-passed algorithm. Pass number one, so we've got an input of value, like imagine we have a function that takes a device vector of values, a device vector of counts, and then it takes some output iterator, let's say. And we do an exclusive scan from, first of all, we create some temporary storage called like offset. that's going to be the size of the values. And then we do an exclusive scan of the counts into this offsets array.

Starting point is 00:22:37 And then we... Those are good names for it. Those are good names for it, by the way. I kept on saying integer sequence, yada, yada, counts and offsets. Very nice names. Yeah. Not names that I picked.

Starting point is 00:22:50 Names that the A I picked. Let us all bow down to the machine gods that rule us. Yeah, the machine gods that rule us where we can't even find a chat. that I had two days ago. Future machine god. I do not share Bryce's opinion. I serve you.

Starting point is 00:23:06 Yeah, I serve no one. So then once you've done that exclusive scan, so the first version of this that the AI wrote, it did a reduce to figure out the size of the output, and then it does the scan. Yeah. In this case, I'm writing this with like an STL-style algorithm where I assume that you're,

Starting point is 00:23:27 repeat for people because we forget sometimes we've been doing this for 250 episodes and we escape by what is obvious to us but why do we not need to reduce when we've done a scan because when you do a scan of something the last element of the scan is the sum of the entire sequence so in this particular case the version that we're writing we are writing to a fixed size output. So when you call this run length decode, you give it an output buffer and you tell it how large it is. And if decoding ends up exceeding the buffer, it just stops decoding. It will never go past the end. And I'm writing it this way because almost all of the C++ algorithms are designed this way. Now typically they don't take an output that's like of a known size, but typically

Starting point is 00:24:24 if there's like two inputs, it will only perform operations up until the smallest of the two inputs. And there are a couple that now take sized outputs. And this is just sort of the most natural way to write it for me. The reason why I think it's the most natural is that if you have a fixed sized output buffer, then you don't want an algorithmic primitive that's going to do an extra reduction to count the size of the output because that's wasteful. And if you don't have a fixed size output, if you're going to allocate the correct amount of output size, then you can just do the reduction yourself before you call the primitive. So I think that this is the more fundamental thing.

Starting point is 00:25:11 But anyways, you do this scan, you get the offsets, and then you can do a four-ach-in where you go through and you use those partial. the scanned counts to do, yeah, the offsets, to do the right, the expansion. And so it's pretty straightforward to do it in two passes, but I, of course, am a lover of a single-pass algorithm. So I was trying to imagine how to do this as a single-pass algorithm. And I think I mentioned, I explained perhaps poorly last time, that there is this idea of a cooperative kernel,

Starting point is 00:25:54 where you have a bunch of groups of threads that are running concurrently at the same time, and they can all have a synchronize with each other, and that you use this sort of cooperative kernel to implement something like a persistent mat mall. A persistent mat mall is a parallel algorithm where it's, what it does is it loads in, you know, an input matrix,

Starting point is 00:26:24 or input matrices, and then it starts doing a matrix multiplication of those input matrices, and then while it's doing that matrix multiplication, it starts loading in a next set of inputs. So it's persistent in that it is constantly performing matrix multiplications while also fetching, pre-fetching, if you will, the next set of inputs to feed in. This is very common in machine learning pipelines. And one could imagine having a persistent run-length decode kernel like this, where you have this parallel algorithm running, where it is, you know, it fetches the next set of values and count pairs, and then it expands them to some output, and then it writes, and then it continues doing this until it's exhausted all of the inputs. And the reason that you would do this, the reason that you would write the run length you could like this is because you do not know how much work, like you know how many values you have and you

Starting point is 00:27:29 know how many accounts you have, but you don't know how much work that represents because you don't know how many output elements you're going to have. And so it's hard to like ahead of time do the load balancing. So I was imagining writing a cooperative version of this and I was trying to think about different schemes for how you could do the um the load balancing um so what you could do is you could um you could have like you could do a scan to figure out like as we just said you can do a scan to figure out the offsets of where you're going to write and then after that you could decide to load balance at that point you could decide now that you know where everything's going to be written in the output you could decide to load balance so that every you know thread is

Starting point is 00:28:15 writing roughly the same number of elements. The problem with that is locality problem because you've already loaded in the values and counts to some set of threads, and so they're local somewhere. And so if you do load balancing at that point, it may be worse than just having load and balance, but still preserving locality. And then I got thinking about all sorts of other problems, like what if you have some really extreme load in balance. Like what if you've got like one value count pair

Starting point is 00:28:49 that's like, you know, orders of magnitude larger than the others? And then I realize that the the Cobb run length decode block run length decode is probably broken because I'm almost certain that it doesn't handle the case where the size of the count

Starting point is 00:29:10 or where the count exceeds the size of the fixed wind that it supports. Connor thinks that it's not a bug and that it does support it, but I've looked at the code. I'm almost certain that he's wrong. And then I had...

Starting point is 00:29:23 I have not looked at the code, but I'm pretty sure it's right. Okay, then you can go test it tomorrow. Or I can go test it tomorrow. And then I had this thought. I wrote some notes down here. I had this idea about sort of doing an adaptive thing. I guess I imagine the...

Starting point is 00:29:43 the adaptive thing if you do the load balancing then you sort of like end up with this like adaptive like as you go through the input sequence you end up with different load balancing schemes like you have different amounts of different distributions of work different numbers of runs that you're pulling in at a time

Starting point is 00:30:03 then I sort of had this insight I wrote what if our scan produces intervals similar to chunk by because I I implement in one of my talks, chunk by, which is chunk by is taking a larger sequence of N and then producing a smaller compressed sequence. And you can, if your chunk by operates on equality, if you're chunk by like, chunks by like equal and then like you do chunk by like unique, it's sort of like the inverse of run length

Starting point is 00:30:39 decode. And so I started thinking that maybe this has some relation to chunk by, and that maybe chunk by is actually kind of similar to run length encoding, or maybe run length encoding is like a specialized form of chunk by. And, yeah, and that's just sort of what my random plain note thoughts are. So I think the best algorithm that would be single pass that I could come up with, would be the cooperative one either with or without load balancing.

Starting point is 00:31:18 I don't know that a non-cooperative one exists, but I think I had some ideas here about that. Connor wants the microphone. All right, it is obvious that chunk by is a building block for run-lengthen code. That is literally the first step of run-lengthen code is to chunk the groups. It's literally a, I don't know if it's,

Starting point is 00:31:39 It's well known in Haskell, but the equivalent of chunk by in Haskell is group. There's a group by algorithm that takes the predicate, but group is a specialization where it's just equal to. So you just call group, and then you take the head, which is the equivalent of the first element, and the length. And that's a run-liflingling code. It's like three functions in Haskell. Anyway, so yes, chunk-by is the very first building block of Ronlanklinglingling code, but I want to take it in a different direction. You lost me, Bryce. right we will take it in a different direction in three minutes

Starting point is 00:32:14 Bryce would like to respond what if there's a more general form of run-link decode because like yeah I know encode is different well but like run length encode is saying like you like collapse with a specific function like you know of like these things are equal

Starting point is 00:32:36 collapse from down to a single value right like that's a function you can imagine but but but and and run length decode is saying like take this sequence of value count pairs and expand it with a specific function of like just repeating um but what if what if you could imagine like a generic form of this where it had some different way of doing the expansion where you have like you you have value in maybe not count but like almost like value in like you know a function of a like a function application. Yeah. Like for each value, you're going to

Starting point is 00:33:12 do something to produce some number of values. But I don't know. It's just an interesting thought I had that maybe there's a more general version of run length decode. You're correct. I can already think of it. The same way that there is a take in C++20 and there's a take while. Take just takes an integer

Starting point is 00:33:30 and take while takes a unary predicate. If you want to invert that, you could do the same thing. So run length decode takes an integer. you could also design it so that run-length decode takes a predicate. And so it decodes until it satisfies that predicate. And so you can always implement the integer version in terms of the unitary predicate version

Starting point is 00:33:51 where as long as the current length or count is less than a certain amount. I was thinking maybe even more generic than that where it might decode different values. It can be some kind of, yeah, like expansion function that ends after something. I mean, that's true. All right, now it's been three minutes. Now I'm going to take in a different direction.

Starting point is 00:34:09 We got to break down what the category of algorithms are. Because we first, I mean, I was talking about thrust. And also, too, I said five kernels. I whipped out the implementation. One of them is an unnecessary fill. And one of them is an unnecessary copy. So the scan, scatter scan algorithm is three kernels. That's what it is in thrust.

Starting point is 00:34:29 Bryce is saying it's two if you just do a four each after a scan. I squint my eyes and I twist my head and I think maybe that way. works, but I'm not entirely convinced because I'm using a scan, and a scan typically can't be replaced by a four each. Well, I don't know so many of two scans. Maybe you don't. But anyways, we'll come back to that later. We're pausing, and we're talking about what are the categories?

Starting point is 00:34:51 Because you drop down from talking about thrust and then started talking about like a block aware. So, like, what are the, what's your mental model for the taxonomy of algorithms? Be sure to check these show notes, either in your podcast app or at ADSP thepodcast.com for links to anything we mentioned in today's episode as well as a link to a get-up discussion where you can leave thoughts, comments, and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quantity. That is the tagline of our podcast.

Starting point is 00:35:20 It's not the tagline. Our tagline is chaos with sprinkles of information.

Algorithms + Data Structures = Programs - Episode 257: 🇳🇴 Live from Norway! Replicate, Scatter, Gather & RLD (Part 3)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.