Algorithms + Data Structures = Programs - Episode 199: std::rotate
Episode Date: September 13, 2024In this episode, Conor and Bryce chat about std::rotate.Link to Episode 199 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Ad...elstein LelbachShow NotesDate Recorded: 2024-08-26Date Released: 2024-09-13C++ std::rotateProgramming Pearlsthrust::copy_ifthrust::permutation_iteratorMicrosoft MSVC STL rotate ImplementationIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
I also Sean if you're listening Sean it's a specific apology to you because we we should
have done better we should have done better. Welcome to ADSP the podcast episode 199 recorded
on August 26, 2024.
My name is Connor, and today with my co-host Bryce, we chat about Stood Rotate.
Okay, so let's talk about Rotate.
Let's talk about Rotate.
How do you implement Rotate?
Are you kidding me?
Have you not read Programming Pearls?
Have you not watched one of my talks that mentions Programming Pearls?
No. Really? Nope. this is famous man it's famous you connor i am completely shameless you will not make me feel guilty about not knowing something all right put your two hands
up put your two hands up this is called what is it called the hand algorithm you gotta put your
hands up like this all rightap the left one like that.
And so I like you turned it 180 degrees.
Do the same thing with the other one.
And now do the same thing with the whole thing.
What?
It's three reverses, my guy.
You implement rotate with three reverses.
You've got your midpoint.
Hang on.
This hand thing you made.
Whoa.
I got one, two.
It's brilliant.
Each time you flip your hand, that's the equivalent of reverse.
So like say you got the numbers one to ten representing your ten fingers.
You flip the first five.
So now you've got five, four, three, two, one.
You flip the next ones.
So you've got ten, nine, eight, seven, six.
And now you reverse the whole thing.
And what do you end up with?
Beautiful.
Beautiful, folks.
We love it.
6, 7, 8, 9, 10, 1, 2, 3, 4, 5.
We love it, folks.
My hands may be small, but the algorithm is beautiful.
You're rotating at the middle point?
Yeah, that's how you spend the 8.
In this example, yeah.
In this example, yeah. Or you can do it at the third point yeah that's how you spend the in this example yeah in this example
yeah well you can do it at the third point or whatever but let's let's get the programming
pearls uh programming pearls by john bentley we've all read it and by we i mean just me because
bryce has not and everyone should read it it's fantastic book. And if we talk about the rotate. How do you implement reverse?
Reverse?
Just a bunch of swaps.
Okay.
Now walk me through it.
Reverse?
You've got a list of 10 numbers.
Swap the first and last.
Increment, decrement.
Swap the second and second last.
Increment, decrement.
All right.
It's easy.
Beautiful.
So you take the midpoint right you you
but that approach you just okay how do you what do you do if you have like just like a forward
iterator if you just have a forward iterator well then that makes it a lot more complicated
yeah this is why i ask these questions uh because, okay, so let's take a step back.
The approach you just described, you've got a random access iterator to implement reverse.
You take, like, you know, the midpoint or you just have a way of being able to move from the back to the front.
And then you start with the first and to the front and then you start it
you start with the first and the last and then you swap them and then you increment the first
and you decrement the last and then you keep swapping until first equals last correct or
technically it's it might not be not equal to it might be a comparison yeah it's it's it's until
yeah until you've's until yeah until
you've reached the position until they collide or they go past each other yeah yeah um but uh
okay so what is what is stood reverse require iterator categorize but that but that but for
that to work the reason i said forward for that to work you have to be able to take the last and
then decrement it and to move backwards so reverse
reverse requires bi-directional iterators and c++ and this is probably the reason why all right let
me share my screen i will put a hopefully this doesn't crash anything folks i installed ubuntu
24.04 month ago or so and the windowing is terrible and the snapping every once in a while you try
to maximize something and boom crashes your whole ui and then it forces you to log out
so it could happen does it not screen window or tab can you see this actually it does not
indicate to me at all if you're seeing this i can see it this is in the programming pearls book
as we mentioned before by john bentley and it is the doug mcelroy hand waving description i was
searching for the wrong stuff because i'm pretty sure you can find this image on the internet but
i just went and got a version of the book online hopefully this Okay, so you reverse 0 to D minus 1, then you reverse D to N minus 1, then you reverse 0 to N minus 1.
D being the point at which you want to rotate and N being the length of the list.
And then it shows you with two hands, with five fingers, the exact thing that we just did.
It is.
Do we, do we, do we parallelize rotate?
We do.
That is a separate question, though.
It uses this algorithm?
That's actually a great question.
Not reverse, but the question is, does thrust rotate use three different thrust reverses?
That I would probably guess.
I can't imagine that it does, because that would be three separate launches.
I imagine it does something else weird.
Let's find out.
All right.
We are going, or Bryce is going to the Thrust implementation now, folks.
That last episode, episode 198, probably only ended up being 10 minutes.
I probably tightened it up real tight because we didn't talk about much.
But now.
10 minutes?
Yeah.
Why would you post a 10-minute episode?
Well, because we're bringing back algorithms and data structures content and we're just
10xing the rest.
So we had the 10x most of episode 198, but 199 folks, this is the content that the listener
has come for.
We're talking about rotate.
We're talking about reverses.
We're talking about programming pearls and We're talking about programming pearls.
And we're going to find out how we do this
in parallel in Thrust.
If this was a game show, we'd poll the audience
and we'd say, what do you think, folks?
Is it three reverses or is it a custom kernel?
Where is rotate?
Alright, he's taking too long.
Do we have a rotate? We might not.
Tragedy, folks. I'm at
the new CCCL Thl thrust documentation it's a beautiful
site i didn't even try to look at the docs because those docs are a mess all right well i was about
to say they're absolutely beautiful folks beautiful docs beautiful new docs i think
bryce is just a little bitter because he did some work in a dark mode kind of thing and then they
threw that out and they got beautiful new docs now folks they're beautiful i'm bitter because he did some work in a dark mode kind of thing and then they threw that out and they got beautiful new
docs now folks they're beautiful
I'm bitter because the search is not wonderful
the search is fantastic once you learn
how to use it
the team does great work and I shouldn't
and there
is no rotate folks
that's why we don't like C++ anymore
and I'm a full blown array
language programmer.
Let's, let's, let's, let's, gotta be a reverse though, right?
There's definitely a reverse.
Yeah, I see a reverse.
Why is there no rotate?
Probably because, you know, they didn't know how to do it.
They didn't know about Doug McElroy's algorithm.
I don't accept that.
What do you mean you don't accept that?
I do not accept that as the reason that there's no rotate.
You know what language does have a rotate?
APL and BQN.
Beautiful, folks.
Beautiful.
Okay, let's look at
the generic implementation of
reverse. What is looking at
the implementation of reverse going to tell
us about? So it just does it in terms
of swap ranges.
So it does the
midpoint thing, and then it does
swap ranges. Okay.
So what would a parallel
rotate look like? So the naive version is gonna
do three reverses yeah but the problem is we don't want to do three separate launches right so
we got to come up with something better um i mean you could do a you could do a rotate copy
in one pass very easily obviously what would that look like you i mean you the way i would do it
it could look like one of several ways but the way i would do it is i would combine a copy if
with a permutation iterator where the permutation iterator just indexes into the correct element.
And you could do that by wrapping the permutation.
Actually, yeah, you could use that by basically doing a...
They have it in Rapids, QDF.
Yeah, yeah, yeah.
But you could do a counting iterator
with a transform iterator and just do the modulus
or some kind of integer arithmetic to do the indexing.
It's a copy if, which boils down to a scan with some flags.
Yeah, I buy that.
I don't have the proof in front of me, but I buy that.
That would have to be a rotate copy, though, not a rotate. The problem with the in-place one is that – so in your three reverses, the first two reverses can definitely happen in parallel.
And you can come up with some sort of segmented from D to, you know, to the end of the thing.
Those are two completely different, you know, things.
They're completely independent of each other.
You're doing a similar operation.
I suspect, I mean, at the very least,
you could launch those two in parallel.
I suspect you could do them in a single kernel.
But then the problem is then you have to do this third reverse,
which operates in the entire thing and sort of it is by construction
the first two reverses have to happen first.
So maybe the answer is that the best you can do is two passes,
but as anybody who's listened to this podcast faithfully knows,
I find it hard to accept a multi-pass parallel algorithm.
So there's got to be a better way.
So there have to be other algorithms for rotate
that are less efficient in the serial case but maybe are better in the parallel case so
what else what else you got i'm trying to think of the word what do they say when they do like a
over under hyped but it's not hype it's that reverse is a intellectually extremely simple
algorithm but when it comes whoa whoa whoa whoa whoa whoa whoa hang on a second hang on a second
rotate reverse i said reverseate only requires forward iterators.
So that means that the implementation, I think, is not just three reverses.
So there must be some better algorithm than your three reverses.
You don't know.
You don't know.
Come on.
You're supposed to be Mr. Rotate.
You don't know any better way to do rotate than three reverses. You don't know, you don't, you don't know, come on, you're supposed to be Mr. Rotate. You don't know any better way to do, to do rotate than three reverses or a different way to do rotate than three reverses? Rotate algorithm. I'm gonna have Sean on here to come explain some,
there's got to be something else. Parallel rotate. If I google for parallel rotate, I just get,
you know, rotate along a parallel axis that
doesn't help me at all you're being quiet are you cooking something now i'm just looking at the msvc
stl implementation yeah yeah so so so i i looked at that too that those have um states so they're
not immediately well that might explain why you can do it with forward iterators
yeah let's go to cpp reference i want to like you ever watch that show who wants to be a millionaire
and it's like you have you have it was a show in like the late 90s the same age man we both
know what that what that show is i'm explaining the listener. You'd have three – we have some Gen Z and Gen Ys on here.
You'd have three lifelines that you could – it was a quiz show.
You'd answer questions and you'd win progressively more money and you'd have three lifelines.
And one of those lifelines was phone a friend.
And I feel like I want to use my phone a friend um and uh i feel like i want to use my phone a friend i don't know who i'd phone
but uh i guess sean i guess i would phone sean because he's mr rotate i called you mr rotate
before but that's you're just you're just mr rotate fan boy yeah interestingly cpp reference And boy. Yeah. Interestingly, CPP reference says that the complexity is at most a distance first last swaps, which means that it's not the three reverse implementation.
Rotate.
Yeah.
Yeah.
We've established it.
So what's the complexity of the three reverses?
It's two and.
Oh, it's like.
Yeah, it's two and.
Oh, that's awful.
Get that. Get that nonsense out of here why is there not like a wikipedia page for the rotated algorithm i'm sharing my screen again
folks because i you know what i was thinking i looked this up once i looked this up once you
tell me what you see here buddy you tell me what you see here yeah yeah so they do the three
reverses if they can yeah and they fall back to doing the
stateful thing otherwise so is this does this mean it's not i mean cpp reference could be wrong
in saying that um well okay let's let's let's stop reading cpp reference let's go read the standard
find out whether these people are compliant yeah so i was gonna say oh this this episode this
episode might end with us filing some bugs.
I'm not filing any bugs, but you can go ahead and...
Yeah, at most last minus first swaps, the standard says.
So yeah, this code, we will link this code.
It's Microsoft STL slash a bunch of stuff slash slash inc, slash xutility.
And it says, if constexpr is cpp17 random access iterator.
I've elaborated a bit.
That's not exactly how it reads.
And then it's got three calls to reverse.
Then it says, else if constexpr is cpp bidirectional iterator.
It's got the same thing except the auto temporary reverse until sentinel unchecked call and then otherwise it's got an else case where it does does a bunch of you know
uh stateful stuff and a couple iterators uh with a do while loop followed by a while loop you don't see that every day folks okay so i'm filing uh
i'm i'm getting we're getting the bottom of this here what do you find what are you filing i want
to know why they're not following the standard and also is there actually a way to do a rotate
i don't think that well to do a rotate in with last minus first swaps?
Yeah, there is.
There's the way, the stateful way, right?
Well, so say you've got 10 integers and you want to do a three rotate, meaning, and it's
just an iota sequence.
So it's one to 10 and you want to put one, two, three at the back.
So that's a three rotate.
The first thing you could do is just swap the first three with the last three and then you
end up with the sequence assuming that it's one to ten so there's no zero so we're one indexed here
you're going to end up with uh eight nine ten at the beginning then four five six seven one two
three and so then you've only done whatever uh in this case, three out of your 10 possible swaps.
So the question is, can you turn the 8, 9, 10, 4, 5, 6, 7 into 4, 5, 6, 7, 8, 9, 10 in seven swaps?
And if you use four swaps to get the 4, 5, 6, 7 to the front, what do you end up with?
You end up with, so I think it actually is possible. You end up with four, five, six, seven.
We don't have to file a bug against the standard, just against the implementation.
Yeah. Four, five, six, seven. And then the eight, nine, 10 is going to end up a little bit jumbled.
But the point being is that like whatever order the eight, 9, 10 is, you've got three swaps left to get three values into their correct positions.
And you won't need all of them because you really only need.
Right, so it can be done.
You need like maximum two swaps of those three.
And it'll vary depending on the rotate that you're doing.
So it definitely can be done.
Just not this way is that so
alak dot alak dot rotate paragraph four states that the complexity is at most
uh last minus first uh swaps uh but this three reverses approach is...
If you have a forward iterator, though...
Can you not do better with a forward iterator?
No, I was going to say,
does that kind of algorithm work with the forward iterator?
Let's actually read through what this says.
For the three reverses, is it exactly two N swaps?
Oh, wait.
I don't, we're, man, there's a few listeners that are upset.
How many swaps does reverse take?
We're idiots.
Oh, we are idiots.
Takes N over two.
Takes N over two.
Yes, we are idiots.
We are idiots.
We apologize to the listener that's been listening screaming into
their headphones you know what the funny thing is the funny thing is connor what's the funny thing
we like at the front end of this i asked how did we how do we implement reverse and we it's not
it's not like we didn't think about this we literally talked about how you implement reverse
and we talked about oh yeah it's like you take the midpoint and then like you swap the things
up to the midpoint yeah this is the swap the things up to the midpoint. Yeah. This is the problem though.
I almost filed the bug report.
I almost looked like a complete ass.
That's the thing though is in school,
they teach you that the coefficient doesn't matter
when it comes to time complexity.
But this is, and so your brain is a lot of the times,
or at least my brain shaves that.
But the standard in this case,
the standard in this case is not giving a time, is not giving a is not giving a like right they're talking about a number of swaps it's it's it's
saying exact yeah but the promise it will be but that's the problem is that like we have i have a
mental model that that thinks in terms of linear quadratic linear rhythmic and then whenever there
is like a it's technically half linear right it's half O of N, but like you ignore that half.
But in this case, if we had paid attention to it,
we would have realized that the three reverses,
when you've got two partial reverses that equal a full reverse
followed by a full reverse is ultimately that number of swaps.
The first two reverses, each one of those is a fourth of your swaps.
I also, Sean, if you're listening, Sean, it's a specific apology to you because we should have done better.
We should have done better.
But anyways, let me explain for the listener.
It may not be fine.
So the first two reverses each operate on half of the input.
And so each one of those reverses does a fourth of the swaps.
And then the – It's incorrect to say half they it works on a uh you know one minus percentage and then the other
one does the other so technically you can do a zero rotate which is just a reverse yeah then then
the then the the second or then the third reverse the of the whole thing does the other roughly half swaps.
Yes.
And I said, well.
So I'm going to show you how close I was to filing this bug report.
And just to correct what I said.
This is how close I've written the text.
All I had to do was put in the title.
Yes, he is very close to filing the bug.
But yes, when I said zero rotate equals reverse,
it means that the first half of the rotate algorithm would be reverse.
Technically, a zero rotate is a no op, just so that we're not getting, we don't have listeners once again yelling it into their.
So can we do a segmented reverse?
And by that, I mean, can we launch a reverse kernel where we say, here's two different ranges, reverse both of them?
Because that is what essentially we would need to get from three passes down to two passes.
And admittedly, you could also just launch the two independent reverses in parallel but if you didn't want to do that if you wanted to launch one big
single pass as you do when you're gonna gpu matt rhymed i appreciate that um can that be done is
this a scan well i was thinking that you could do something similar to unique by key but then i
remember that actually because like what you what you want is a reverse by key, similar to a reduced by key, but the reduced is a reduction.
So in Thrust, the way reverse gets implemented is you find the midpoint, and you just assume random access.
So you find the midpoint, and then you do swap ranges from first to mid, and then with the reverse iterator to last.
So what is swap ranges?
I don't even know where that is.
Oh, no, there it is. Okay.
I think that one's pretty simple.
Yeah, it's just a parallel four.
So I think that since reverse ends up being implemented with a parallel four, which has no communication,
I think any way of combining, any way of implementing a segmented reverse is going to be worse.
Because implementing a segmented reverse, you're going to have to know when you've reached
the end of one of those segments.
And that means you're going to
need communication. And so that's going to be strictly worse than an embarrassingly parallel
approach. And so I think we've finally hit upon an example on this podcast of an algorithm where
more passes is going to be better. Specifically, I think the best approach for this is going to be
to launch, assuming that there's not some other completely different algorithm, the best approach
to implementing the three reverse or a reverse-based rotate is going to be to launch the two
initial reverses, the two reverses of half the sequence in parallel um and then to launch the
third reverses like as a dependency that depends upon the first two unless unless he said unless
there's a way to do this all in a single pass in which case the benefit of keeping everything in memory may be worth it.
But actually, yeah, so if you could do this in a way where you didn't have to go back to global
memory, where you load and store each thing from global memory a single time, that would, I think, be better than the three reverses approach.
So I think there's a better algorithm out here.
Well, we will leave that to a future episode
because we have hit the limit.
Which brings us to episode 200, folks.
Be sure to check these show notes either in your podcast app or at ADSPthepodcast.com for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions.
Thanks for listening.
We hope you enjoyed and have a great day.
Low quality, high quantity.
That is the tagline of our podcast.
That's not the tagline.
Our tagline is chaos with sprinkles of information.