Algorithms + Data Structures = Programs - Episode 257: 🇳🇴 Live from Norway! Replicate, Scatter, Gather & RLD (Part 3)
Episode Date: October 24, 2025In this episode, Conor and Bryce record live from Norway! They continue their chat about the replicate, scatter, gather and run length decode algorithms!Link to Episode 257 on WebsiteDiscuss this epis...ode, leave a comment, or ask a question (on GitHub)SocialsADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterDate Recorded: 2025-09-23Date Released: 2025-10-24thrust::gatherthrust::scatterthrust::permutation_iteratorthrust::counting_iteratorthrust::sequencethrust::transform_iteratorthrust::copy_if (stencil overload)parrot::replicate Implementationthrust::reduce_by_keycub::RunLengthDecodeC++20 std::views::takeC++20 std::views::take_whileAPL Wiki ReplicateArrayCast Episode 110: Implementing ReplicateIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
You know, scatter, gather, are they good names?
And I was thinking alternative names are index copy to and index copy from.
I'm not actually sure which I like the most, because index copy two versus from,
the only thing you have to delineate the difference between those two is the fact that you're copying to an index versus copying from an index.
Those are good names for it, by the way.
I kept on saying integer sequence, yada, yada counts and offsets.
Very nice names.
Yeah.
Not names that I picked, names that the A I picked.
Let us all bow down to the machine gods that rule us.
Yeah, the machine gods that rule us where we can't even find the chat that I had two days ago.
Welcome to ADSP, the podcast, episode 257 recorded on September 23rd, 2025.
My name is Connor, and today with my co-host, Bryce, we record live.
one day before NDC Tech Town in Kongsburg, Norway.
In today's episode, we continue our discussion of the replicate algorithm
as well as scatter, gather, run-length decode, and more.
All right, folks, we're back.
It's been a minute.
Last time you heard from us, where were we, Bryce?
We were on the...
He doesn't remember.
He's...
You were in the car?
No, that was two episodes ago, Bryce.
God.
Admittedly, it's been a long two days for us.
We just finished two.
No one can hear you right now.
You forgot how this worked, buddy.
You got to wait until the mic is in front of your mouth.
Last time we talked to you, we were walking on the streets of Copenhagen.
I think we just left you when we were about to get ice cream.
Let's first do a review of the Copenhagen soft serve.
Yeah, you got soft serve as well.
How was it, Bryce?
Well, I got to say.
given that I didn't remember it until a moment ago.
I don't think it was the most memorable thing.
But I did have a wonderful dinner at this restaurant here called Nula,
which is...
Yeah, the Indian place.
The Indian place, who it's a master chef winner,
or she came in third or whatever for master chef.
But it was an excellent meal,
and Connor bailed out on me.
Connor and Ashwin bailed out on me
after two days of training,
we taught people to coup to C++,
we taught people to C++, we taught people the coup to Python.
Now I've got to give one more talk,
and then I get to go home.
See, I apologize on behalf of Bryce, folks.
He doesn't understand how podcasting works.
He just started explaining to you
the meal he had tonight,
and it is now officially 48 hours later.
We're in a different country.
We're in a different city.
What?
No, last.
last time we recorded was uh actually that's true it's been 72 hours last time we recorded was
Saturday we were traveling on Sunday guess what folks I mean let's go chronologically so the last
we left you we were having ice cream we went to sleep we woke up we were in transit what
you want to tell them about the train I too wait we're not at the train yet we because what happened
before the train we were on a plane we were on a plane and who did we run into on the plane
We ran into Jason Turner on the plane.
Not just Jason.
Jason and his partner, Jen.
It was lovely to see him.
Haven't seen them in, I think, close to a year.
And more important, Bryce is just chopping out the bit to get this point.
More importantly, Connor was napping on the plane, and I got photos, folks.
And they're going all over the Internet.
I'm allowed to nap, folks.
I was tired.
It, uh, I was tired.
It was, it was a 10.55 flight.
There was no reason to be tired.
actually folks there was a reason to be tired i had to make a game time decision did i run in
copenhagen at a very early hour because we needed to leave i believe it was at 830 to be at the
airport at 9 which means in order to do a 10 to 15k run i've got to be up a hell of early and i think
i woke up at what was it 630 or something like that in order to be back so it was either do that
or run when i get to kongsberg which was going to be at like 3 p.m. in the afternoon
and I didn't know it was going to be cold.
I made the right decision for the wrong reasons.
I made the right decision because I thought Copenhagen would be easier to run in.
I'd be tired.
It's been freezing here, folks.
Guess what?
I did not run this morning.
Today is Tuesday.
Yesterday I woke up and I spent 45 minutes coercing myself to get out of bed and run.
And I did.
It was two degrees for the American folks.
That's like 36 Fahrenheit or something.
I couldn't feel my hands the whole run.
And today it was even colder.
It was zero.
I know what that is in Fahrenheit.
That's 32.
I couldn't do it.
I couldn't force myself to get out of bed.
And I don't have, you're thinking,
why don't just throw on a fleece?
I didn't bring a fleece.
I brought shorts, a hoodie, that's it.
Anyway, so we ran into Jason.
We had a lovely visit with them on the train
until, what happened, Bryce?
Well, we seem to be cursed for our train luck this trip
because once again, dear listener,
we got on a train.
This was actually, wait,
both times that we got on a train together.
two for two both times we got on a train together something happened with the train the previous train got completely cancelled but this train didn't get entirely cancelled but we went somewhere and then they started speaking in the uh the norwegian and then uh then the signs changed and the our stop disappeared from the little display monitor and then a guy came through and asked for our tickets and he said oh well this train's heading in the opposite direction now because there's a track closure
and you got to get off the train
and something something, something about a bus
and we got off and we went and walked over
with other people and it looked like there was going to be a bus
but then the bus wasn't going to take us
to the final destination.
The bus was just going to take us
around the train closure, the track closure
and I was like, I'm not having enough of this
so we got a very expensive Uber.
Well, not a very, well, depending on your frame of reference
an expensive Uber that we took directly
to Kongsburg, the city of
Honesburg. That's not really
a city. I don't know what the population is, but
I don't think it gets to be called the city.
It is
a sleepy little town.
It is. It's a sleepy little town.
It is. It's a sleepy
buddy, buddy. I got to
tell you, the restaurant that I had dinner at
tonight, one of the folks
who was in our training today, he's
Norwegian, he's from this town.
His parents own the building that the restaurant
is in.
the population
Yes, that's a sleepy little town
You don't get to be a city
Until you're more than 50,000 population
I'm sorry
But the town of Konsorg
But the place where we had dinner
The guy
Literally had dinner with the person
Who owns the building that the restaurant was in
Because it's a little town
Everybody knows everybody
Anyways
We got here
And then we had to do an eight-hour training
On Monday
On Kuta C++
Which Connor
delivered about 40, 50% of, I delivered the rest of it.
And then...
It would have been 50-50 if you hadn't gone over by half an hour.
It would have been 50-50 if I hadn't gone over by half an hour.
I did not go over by half an hour.
I went over by about 15 minutes and nobody complained.
We got great...
It was 30.
We got great feedback.
The training in 15 minutes, but then you proceeded to give a bunch of references and blah, blah, blah.
People go to left one over, and then everybody was happy.
I got great feedback.
We had lots of lovely people there.
and then today we did a coup de python training
which I gave about half of and
my colleague, our colleague Ashran gave about
half of it all went great. It was excellent. It was
wonderful. Now I got one more talk
just one more tomorrow because tomorrow
is the open of
NDC Tech Town which
is I'm learning
the people tell me I didn't realize it's not
actually a C++ plus conference it's an embedded
conference. It seems like different people have different ideas
about what this conference is about but I haven't been to this conference before.
Don't you know?
I mean, I didn't go to that many talks.
I just talked to a lot of people.
So wait, you've been to, when was the last time you were in NDC?
Wait, wait, wait, just wait for me to ask because nobody can hear what you're saying.
When was the last time you were in NDC Tech Town and give us your review of that last edition that you were here for?
I'm assuming it was pre-pendemic, but you can tell us.
I was here last year.
It was very nice, lovely people.
Yeah, yeah, I was here last.
Yeah, last year, very lovely.
I do recall talking to lots of people who worked on interesting devices,
like people from the company that makes remarkable,
the little liquid paper tablet thingy were here,
and there were a couple other people, like somebody who worked in like some sort of deep sea vehicle.
So lots of people who worked on things that now I'm realizing are all embedded things.
But at the time, I just thought they were very cool things.
You know that I saw Chandler earlier.
Which Chandler?
Chandruth.
he's not on the schedule but and also too
No you didn't see Chan I haven't seen Chandler in so long
No one can hear what I'm saying
But I did definitely see
Unless if Chandler
Coincidentally happens to be in the city of Kansber
It was definitely him
It was definitely him
He had a sharp haircut
He always has a sharp haircut
He's always very nicely dressed
And he had a mask on
It was definitely
I mean
I'm saying definitely
But as a former actuary
I should take a step back
I'm 97% sure.
That means that there's a 3-100 chance that I think I'm wrong.
I'm very confident Chandler's here, even though he's not speaking.
And what was I also going to say?
Techtown, embedded, 24, Jetson, embedded, your talks tomorrow.
All right, well, let's transition.
You just heard a record scratch.
I can't remember what the second point I had to make was...
That's because...
Connor's turning into an old man.
Well, I mean, I'm just catching up with you,
because you already an old man.
That is true.
Anyways, our plan is to record tomorrow night.
We already told you about this on Saturday.
You're hearing it a week apart.
But it is now Tuesday.
The conference starts tomorrow.
And tomorrow night is the NDC social
after the first day of the conference.
We're going to be interviewing folks.
But right now, we are following us.
up our conversation, which I
butchered by saying...
Are you kidding me?
Before Sean
came on and talked about how he crashed the Porsche,
I consider the pizza
to Google story by Chandler.
I mean, that was the best we had.
I think to this day, too, it's probably the most inspiring
because as good as the Sean Parent
crashing the Porsche story is,
that's just an amazing story.
You're not inspiring anyone to, like,
card and end up at a, you know...
I think of I heard that story.
I'm being inspired to my whole.
Anyway, so we're going to be interviewing
folks tomorrow, but now we're going to follow up our conversation
to replicate, of which you've listened to already, like, a week or two
ago, or maybe it was a week ago, just a week ago now.
And I said scatter, scatter, scatter,
scatter, but then I edited it. Not yet.
Current Connor has not edited it.
Why are you saying scatter, scatter, scatter.
Because remember, I explained to you that the
permutation iterator was the lazy version of the scatter algorithm?
Why are you telling the listener?
Oh, because I will keep a couple in.
I have to let them know because they're going to notice the quiet, echoy room of scatter
versus the noisy streets of Copenhagen.
There's no way I can...
Of Gather, yeah, Scatter versus Gather.
It's all right.
No, I will never forget now.
I will never forget the Gather is the algorithm version.
The problem is, why don't we just call algorithm and iterators the same thing?
Gather, gather iterator.
Sequence, sequence iterator.
Or counting, counting.
Why don't we just name them the same, you know?
It's beyond me.
Nobody knows.
But anyways, Bryce has had some time to think about how to implement this algorithm,
and I have gone and taken a look, and there's lots of exciting things to talk about it.
We're going to hand the mic back to Bryce.
You're going to have to go first, because after two days of training
and just spending a couple hours with an NVIDIA customer trying to do.
bug some weird problem.
I have to page the context of
how to implement,
replicate in a single pass back in.
All right, well,
I can chat a little bit first.
Remember, we talked about the integer overload,
which is lazy and easy.
And now we're talking about the array
overload, which takes an array of integers
that corresponds in length
to the length of our
sequence and you want to basically copy each of the corresponding elements the number of times
of the element or the integer that corresponds to that element. So as I mentioned in the last
episode, if you have ABC and 123, you end up with A, B, B, CCC. As Bryce just mentioned off mic,
you may have picked it up, you may have not. It is effectively a run-length decode and the question
is, and the discussion is, how to implement this in parallel. If we look at the current
implementation, which obviously will change, it's an experimental library, folks, so everything
changes very quickly. I think I had five kernels. Five kernels, Bryce. That's not hard to beat.
It's definitely too many, and it's not too hard to beat. Yeah, so now I'm remembering all
the context. Yeah, like I was working on this when we were on the plane. When Connor was
napping, I was furiously coding on my notepad on my phone.
as I often do.
I think I may just read you some of my notes
from this document that I wrote on the plane.
All right, so per block, if we think about this problem,
let's divide this problem up into a bunch of blocks,
which is what we want to do to paralyzes.
We need to be able to subdivide this problem up into a bunch of blocks.
So let's assume that for each block of threads,
each group of threads we've got a fixed input of in pairs of value and count and then we want to
out to do a fixed output of k values um and that's like the the facility that we have today and cub
can can do that it is a fixed size input and a fixed size output um the problem with that is
you end up with having one of two cases.
Either one, K is exceeded by less than N inputs.
So your fixed-size output, you exhaust your output
before you get through all of your input pairs.
And if that's the case,
then how do we process the remaining N inputs
without introducing load and balance?
The other case is where all in inputs are exhausted
but we've produced less than K outputs.
And if that's the case, then you end up with gaps in the output.
And so this algorithm is just not as straightforward to do single pass as you might wish.
I think if I go to my chat GPT history, it is pretty straightforward.
Oh, God, there's been a lot of chat GPT queries since.
No, where is...
Okay, all right.
You hold this for a second.
All right, folks.
I'm not sure what I'm supposed to say right now.
Bryce is just looking through the history of his chats with chat, EBT.
Seeing as Bryce is not paying attention,
I'll just walk you through the algorithm that I had implemented,
which will clearly be replaced because it's five kernels.
If I recall from memory, it starts off with a plus scan on the integer sequence,
representing the number of times we have to copy things
and that leads us to the starting positions
of each of the sequences of values in our decoded sequence
then I do a scatter if which
we need to talk about to the naming
because we didn't talk about that on the last episode
we had a big discussion from the ice cream to the walk home
the hotel and Bryce wanted to record and I said no no no we'll talk about this next time
and it's the naming of scatter and gather which is part of the reason that I I messed up saying
scatter when I should have been saying gather because gather is and we asked chat to be t this
or actually wait you did record a mini recording of the yeat versus yonk which maybe maybe I'll
cut in right now the top humorous one was yit and yonk and then Bryce thought that was hilarious
and that's what we should call it.
But my response was, well, that doesn't make any sense
because yeat means to, like, delete, to get rid of, to toss it, you know?
And they say, yeat it means get rid of it.
And that's not what Scatter's doing.
So, semantically, it has nothing to do with the actual semantic.
I believe you said specifically,
yeet semantically has nothing to do with the actual operation.
Yeah, yeah, something like that.
But the best suggestion, in my opinion, because I had actually thought about
scatter to and gather from where I was thinking of gather because you're gathering from an
index and then copying to an output but scatter is scattering from an input to an output index
and so gather is the algorithm I was reaching for but you know scatter gather are they good names
the idea that you basically, and I was thinking alternative names are index copy to and index copy from.
I'm not actually sure which I like the most because index copy two versus from.
The only thing you have to delineate the difference between those two is the fact that you're copying to an index versus copying from an index.
Anyways, a little bit confusing.
And one of the properties of a scatter two is that you can only say,
scatter to the maximum number of indexes you can scatter to is the length of your input sequence
whereas gather does not have that property gather from you can gather from the same index multiple
times and therefore you can end up with an output sequence that is longer than the input
sequence and technically like the output sequence from scatter two could be longer but the number
of copies that you'll end up doing
is less than or equal to
the input sequence that you have.
Anyways, scatter two, gather
from, index copy to,
index copy from,
and the point is,
I had a scatter if, where you
are scattering the value from
your input sequence
to the integer
if it equals
a counting iterator.
So you end up with the
cumulative sums from your plus scan on the integers,
and that's where you want to copy the first value of each decoded sequence.
And so you do a scatter-if the counting iterator equals one of the values in your plus-scanned
integers.
And so you end up with a value at the beginning of each decoded sequence or sub-sequence.
And then after that, you do a max scan to basically copy that.
value to the rest of the sub-sequence.
And then I think there's one other kernel at the end of that.
But that is a Mac scan, a scatter-if, a plus scan.
I mean, technically I only mentioned three kernels there.
But I counted five when I looked at my algorithm.
Anyways, over to you, Bryce.
Are you glad that I just randomly talked for like seven minutes?
Because I had a chat GPT chat.
chat that I knew was there
but when I
I could not find it in the list of my phone
when I search for it I just get
you know error error error
on the phone I tried
we're thinking about it
I cleared the cache on my phone
and I opened the deck up and I could
not find it I knew it was there eventually
I opened it up my browser and I search
for it in the browser and like the first time gives me an
error in the second time then
it pops up and now it's like suddenly
showing up as my latest chat.
So, like, it was definitely
not there in the list until I searched
for it and found it. So they
have some infrastructure, some
infrastructure issues. They got to look at.
Anyways, you wanted things
expressed in terms of thrust. So
I was saying, before
I was unable to find the chat and was interrupted,
that it's fairly straightforward to...
You were not interrupted. You handed the microwave
I know. I interrupted by my inability
to find it on chat. Interrupted by chat
GPT. I was interrupted by chat GPT's
rudeness of not being able to find things.
It's fairly straightforward to do this as a two-passed algorithm.
Pass number one, so we've got an input of value, like imagine we have a function that takes
a device vector of values, a device vector of counts, and then it takes some output iterator,
let's say.
And we do an exclusive scan from, first of all, we create some temporary storage called like offset.
that's going to be the size of the values.
And then we do an exclusive scan of the counts
into this offsets array.
And then we...
Those are good names for it.
Those are good names for it, by the way.
I kept on saying integer sequence, yada, yada,
counts and offsets.
Very nice names.
Yeah.
Not names that I picked.
Names that the A I picked.
Let us all bow down to the machine gods that rule us.
Yeah, the machine gods that rule us
where we can't even find a chat.
that I had two days ago.
Future machine god.
I do not share Bryce's opinion.
I serve you.
Yeah, I serve no one.
So then once you've done that exclusive scan,
so the first version of this that the AI wrote,
it did a reduce to figure out the size of the output,
and then it does the scan.
Yeah.
In this case, I'm writing this with like an STL-style algorithm
where I assume that you're,
repeat for people because we forget sometimes we've been doing this for 250 episodes and we
escape by what is obvious to us but why do we not need to reduce when we've done a scan
because when you do a scan of something the last element of the scan is the sum of the entire
sequence so in this particular case the version that we're writing we are writing to a fixed
size output. So when you call this run length decode, you give it an output buffer and you tell it
how large it is. And if decoding ends up exceeding the buffer, it just stops decoding. It will
never go past the end. And I'm writing it this way because almost all of the C++ algorithms are
designed this way. Now typically they don't take an output that's like of a known size, but typically
if there's like two inputs, it will only perform operations up until the smallest of the two
inputs. And there are a couple that now take sized outputs. And this is just sort of the most natural
way to write it for me. The reason why I think it's the most natural is that if you
have a fixed sized output buffer, then you don't want an algorithmic primitive that's going to do an
extra reduction to count the size of the output because that's wasteful.
And if you don't have a fixed size output, if you're going to allocate the correct amount
of output size, then you can just do the reduction yourself before you call the primitive.
So I think that this is the more fundamental thing.
But anyways, you do this scan, you get the offsets, and then you can do a four-ach-in
where you go through and you use those partial.
the scanned counts to do, yeah, the offsets, to do the right, the expansion.
And so it's pretty straightforward to do it in two passes,
but I, of course, am a lover of a single-pass algorithm.
So I was trying to imagine how to do this as a single-pass algorithm.
And I think I mentioned, I explained perhaps poorly last time,
that there is this idea of a cooperative kernel,
where you have a bunch of groups of threads
that are running concurrently at the same time,
and they can all have a synchronize with each other,
and that you use this sort of cooperative kernel
to implement something like a persistent mat mall.
A persistent mat mall is a parallel algorithm
where it's, what it does is it loads in,
you know, an input matrix,
or input matrices, and then it starts doing a matrix multiplication of those input matrices,
and then while it's doing that matrix multiplication, it starts loading in a next set of inputs.
So it's persistent in that it is constantly performing matrix multiplications
while also fetching, pre-fetching, if you will, the next set of inputs to feed in.
This is very common in machine learning pipelines.
And one could imagine having a persistent run-length decode kernel like this, where you have this parallel algorithm running, where it is, you know, it fetches the next set of values and count pairs, and then it expands them to some output, and then it writes, and then it continues doing this until it's exhausted all of the inputs.
And the reason that you would do this, the reason that you would write the run length you could like
this is because you do not know how much work, like you know how many values you have and you
know how many accounts you have, but you don't know how much work that represents because you don't
know how many output elements you're going to have. And so it's hard to like ahead of time do the load
balancing. So I was imagining writing a cooperative version of this and I was trying to think about
different schemes for how you could do the um the load balancing um so what you could do is you could um
you could have like you could do a scan to figure out like as we just said you can do a scan to figure out
the offsets of where you're going to write and then after that you could decide to load balance at
that point you could decide now that you know where everything's going to be written in the output
you could decide to load balance so that every you know thread is
writing roughly the same number of elements.
The problem with that is locality problem because you've already loaded in the values and counts
to some set of threads, and so they're local somewhere.
And so if you do load balancing at that point, it may be worse than just having load and balance,
but still preserving locality.
And then I got thinking about all sorts of other problems, like what if you have
some really extreme load in balance.
Like what if you've got like one value count pair
that's like, you know, orders of magnitude
larger than the others?
And then I realize that the
the Cobb run length decode
block run length decode
is probably broken because I'm almost certain
that it doesn't handle the case where
the size of the count
or where the count
exceeds the size of the fixed wind
that it supports.
Connor thinks that it's not a bug
and that it does support it,
but I've looked at the code.
I'm almost certain that he's wrong.
And then I had...
I have not looked at the code,
but I'm pretty sure it's right.
Okay, then you can go test it tomorrow.
Or I can go test it tomorrow.
And then I had this thought.
I wrote some notes down here.
I had this idea about sort of doing an adaptive thing.
I guess I imagine the...
the adaptive thing
if you do the load balancing
then you sort of like end up with this like adaptive
like as you go through the input sequence
you end up with different load balancing schemes
like you have different amounts of different distributions
of work different numbers of runs
that you're pulling in at a time
then I sort of had this insight
I wrote what if our scan produces intervals
similar to chunk by
because I
I implement in one of my talks, chunk by, which is chunk by is taking a larger sequence of
N and then producing a smaller compressed sequence.
And you can, if your chunk by operates on equality, if you're chunk by like, chunks by
like equal and then like you do chunk by like unique, it's sort of like the inverse of run length
decode.
And so I started thinking that maybe this has some relation to chunk by, and that maybe
chunk by is actually kind of similar to run length encoding, or maybe run length encoding
is like a specialized form of chunk by.
And, yeah, and that's just sort of what my random plain note thoughts are.
So I think the best algorithm that would be single pass that I could come up with,
would be the cooperative one
either with or without load balancing.
I don't know that a non-cooperative one
exists, but I think I had some ideas here about that.
Connor wants the microphone.
All right, it is obvious that chunk by
is a building block for run-lengthen code.
That is literally the first step of run-lengthen code
is to chunk the groups.
It's literally a, I don't know if it's,
It's well known in Haskell, but the equivalent of chunk by in Haskell is group.
There's a group by algorithm that takes the predicate, but group is a specialization where it's just equal to.
So you just call group, and then you take the head, which is the equivalent of the first element, and the length.
And that's a run-liflingling code.
It's like three functions in Haskell.
Anyway, so yes, chunk-by is the very first building block of Ronlanklinglingling code, but I want to take it in a different direction.
You lost me, Bryce.
right we will take it in a different direction in three minutes
Bryce would like to respond
what if there's a more general form of run-link decode
because like
yeah I know
encode is different well but
like run length encode is saying like
you like collapse with a specific
function like you know of like these things are equal
collapse from down to a single value right
like that's a function you can imagine but but but and and run length decode is saying like
take this sequence of value count pairs and expand it with a specific function of like just
repeating um but what if what if you could imagine like a generic form of this where it had
some different way of doing the expansion where you have like you you have value in maybe not
count but like almost like value in like you know a function of a
like a function application. Yeah.
Like for each value, you're going to
do something to produce some number of
values. But I don't know. It's just an
interesting thought I had that
maybe there's a more general version of run length decode.
You're correct. I can already think
of it. The same way that there is a
take in C++20 and there's a take while.
Take just takes an integer
and take while takes a unary predicate.
If you want to invert that, you
could do the same thing. So run length
decode takes an integer.
you could also design it so that run-length decode takes a predicate.
And so it decodes until it satisfies that predicate.
And so you can always implement the integer version
in terms of the unitary predicate version
where as long as the current length or count is less than a certain amount.
I was thinking maybe even more generic than that
where it might decode different values.
It can be some kind of, yeah, like expansion function
that ends after something.
I mean, that's true.
All right, now it's been three minutes.
Now I'm going to take in a different direction.
We got to break down what the category of algorithms are.
Because we first, I mean, I was talking about thrust.
And also, too, I said five kernels.
I whipped out the implementation.
One of them is an unnecessary fill.
And one of them is an unnecessary copy.
So the scan, scatter scan algorithm is three kernels.
That's what it is in thrust.
Bryce is saying it's two if you just do a four each after a scan.
I squint my eyes and I twist my head and I think maybe that way.
works, but I'm not entirely convinced because I'm using a scan, and a scan typically can't be replaced
by a four each.
Well, I don't know so many of two scans.
Maybe you don't.
But anyways, we'll come back to that later.
We're pausing, and we're talking about what are the categories?
Because you drop down from talking about thrust and then started talking about like a block
aware.
So, like, what are the, what's your mental model for the taxonomy of algorithms?
Be sure to check these show notes, either in your podcast app or at ADSP thepodcast.com
for links to anything we mentioned in today's episode as well as a link to a get-up discussion
where you can leave thoughts, comments, and questions. Thanks for listening. We hope you enjoyed
and have a great day.
Low quality, high quantity. That is the tagline of our podcast.
It's not the tagline. Our tagline is chaos with sprinkles of information.
