Algorithms + Data Structures = Programs - Episode 199: std::rotate

Episode Date: September 13, 2024

In this episode, Conor and Bryce chat about std::rotate.Link to Episode 199 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Ad...elstein LelbachShow NotesDate Recorded: 2024-08-26Date Released: 2024-09-13C++ std::rotateProgramming Pearlsthrust::copy_ifthrust::permutation_iteratorMicrosoft MSVC STL rotate ImplementationIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 I also Sean if you're listening Sean it's a specific apology to you because we we should have done better we should have done better. Welcome to ADSP the podcast episode 199 recorded on August 26, 2024. My name is Connor, and today with my co-host Bryce, we chat about Stood Rotate. Okay, so let's talk about Rotate. Let's talk about Rotate. How do you implement Rotate? Are you kidding me?
Starting point is 00:00:40 Have you not read Programming Pearls? Have you not watched one of my talks that mentions Programming Pearls? No. Really? Nope. this is famous man it's famous you connor i am completely shameless you will not make me feel guilty about not knowing something all right put your two hands up put your two hands up this is called what is it called the hand algorithm you gotta put your hands up like this all rightap the left one like that. And so I like you turned it 180 degrees. Do the same thing with the other one. And now do the same thing with the whole thing.
Starting point is 00:01:13 What? It's three reverses, my guy. You implement rotate with three reverses. You've got your midpoint. Hang on. This hand thing you made. Whoa. I got one, two.
Starting point is 00:01:24 It's brilliant. Each time you flip your hand, that's the equivalent of reverse. So like say you got the numbers one to ten representing your ten fingers. You flip the first five. So now you've got five, four, three, two, one. You flip the next ones. So you've got ten, nine, eight, seven, six. And now you reverse the whole thing.
Starting point is 00:01:46 And what do you end up with? Beautiful. Beautiful, folks. We love it. 6, 7, 8, 9, 10, 1, 2, 3, 4, 5. We love it, folks. My hands may be small, but the algorithm is beautiful. You're rotating at the middle point?
Starting point is 00:02:02 Yeah, that's how you spend the 8. In this example, yeah. In this example, yeah. Or you can do it at the third point yeah that's how you spend the in this example yeah in this example yeah well you can do it at the third point or whatever but let's let's get the programming pearls uh programming pearls by john bentley we've all read it and by we i mean just me because bryce has not and everyone should read it it's fantastic book. And if we talk about the rotate. How do you implement reverse? Reverse? Just a bunch of swaps.
Starting point is 00:02:28 Okay. Now walk me through it. Reverse? You've got a list of 10 numbers. Swap the first and last. Increment, decrement. Swap the second and second last. Increment, decrement.
Starting point is 00:02:41 All right. It's easy. Beautiful. So you take the midpoint right you you but that approach you just okay how do you what do you do if you have like just like a forward iterator if you just have a forward iterator well then that makes it a lot more complicated yeah this is why i ask these questions uh because, okay, so let's take a step back. The approach you just described, you've got a random access iterator to implement reverse.
Starting point is 00:03:14 You take, like, you know, the midpoint or you just have a way of being able to move from the back to the front. And then you start with the first and to the front and then you start it you start with the first and the last and then you swap them and then you increment the first and you decrement the last and then you keep swapping until first equals last correct or technically it's it might not be not equal to it might be a comparison yeah it's it's it's until yeah until you've's until yeah until you've reached the position until they collide or they go past each other yeah yeah um but uh okay so what is what is stood reverse require iterator categorize but that but that but for
Starting point is 00:03:57 that to work the reason i said forward for that to work you have to be able to take the last and then decrement it and to move backwards so reverse reverse requires bi-directional iterators and c++ and this is probably the reason why all right let me share my screen i will put a hopefully this doesn't crash anything folks i installed ubuntu 24.04 month ago or so and the windowing is terrible and the snapping every once in a while you try to maximize something and boom crashes your whole ui and then it forces you to log out so it could happen does it not screen window or tab can you see this actually it does not indicate to me at all if you're seeing this i can see it this is in the programming pearls book
Starting point is 00:04:45 as we mentioned before by john bentley and it is the doug mcelroy hand waving description i was searching for the wrong stuff because i'm pretty sure you can find this image on the internet but i just went and got a version of the book online hopefully this Okay, so you reverse 0 to D minus 1, then you reverse D to N minus 1, then you reverse 0 to N minus 1. D being the point at which you want to rotate and N being the length of the list. And then it shows you with two hands, with five fingers, the exact thing that we just did. It is. Do we, do we, do we parallelize rotate? We do.
Starting point is 00:05:25 That is a separate question, though. It uses this algorithm? That's actually a great question. Not reverse, but the question is, does thrust rotate use three different thrust reverses? That I would probably guess. I can't imagine that it does, because that would be three separate launches. I imagine it does something else weird. Let's find out.
Starting point is 00:05:47 All right. We are going, or Bryce is going to the Thrust implementation now, folks. That last episode, episode 198, probably only ended up being 10 minutes. I probably tightened it up real tight because we didn't talk about much. But now. 10 minutes? Yeah. Why would you post a 10-minute episode?
Starting point is 00:06:04 Well, because we're bringing back algorithms and data structures content and we're just 10xing the rest. So we had the 10x most of episode 198, but 199 folks, this is the content that the listener has come for. We're talking about rotate. We're talking about reverses. We're talking about programming pearls and We're talking about programming pearls. And we're going to find out how we do this
Starting point is 00:06:28 in parallel in Thrust. If this was a game show, we'd poll the audience and we'd say, what do you think, folks? Is it three reverses or is it a custom kernel? Where is rotate? Alright, he's taking too long. Do we have a rotate? We might not. Tragedy, folks. I'm at
Starting point is 00:06:44 the new CCCL Thl thrust documentation it's a beautiful site i didn't even try to look at the docs because those docs are a mess all right well i was about to say they're absolutely beautiful folks beautiful docs beautiful new docs i think bryce is just a little bitter because he did some work in a dark mode kind of thing and then they threw that out and they got beautiful new docs now folks they're beautiful i'm bitter because he did some work in a dark mode kind of thing and then they threw that out and they got beautiful new docs now folks they're beautiful I'm bitter because the search is not wonderful the search is fantastic once you learn
Starting point is 00:07:12 how to use it the team does great work and I shouldn't and there is no rotate folks that's why we don't like C++ anymore and I'm a full blown array language programmer. Let's, let's, let's, let's, gotta be a reverse though, right?
Starting point is 00:07:34 There's definitely a reverse. Yeah, I see a reverse. Why is there no rotate? Probably because, you know, they didn't know how to do it. They didn't know about Doug McElroy's algorithm. I don't accept that. What do you mean you don't accept that? I do not accept that as the reason that there's no rotate.
Starting point is 00:07:57 You know what language does have a rotate? APL and BQN. Beautiful, folks. Beautiful. Okay, let's look at the generic implementation of reverse. What is looking at the implementation of reverse going to tell
Starting point is 00:08:11 us about? So it just does it in terms of swap ranges. So it does the midpoint thing, and then it does swap ranges. Okay. So what would a parallel rotate look like? So the naive version is gonna do three reverses yeah but the problem is we don't want to do three separate launches right so
Starting point is 00:08:40 we got to come up with something better um i mean you could do a you could do a rotate copy in one pass very easily obviously what would that look like you i mean you the way i would do it it could look like one of several ways but the way i would do it is i would combine a copy if with a permutation iterator where the permutation iterator just indexes into the correct element. And you could do that by wrapping the permutation. Actually, yeah, you could use that by basically doing a... They have it in Rapids, QDF. Yeah, yeah, yeah.
Starting point is 00:09:19 But you could do a counting iterator with a transform iterator and just do the modulus or some kind of integer arithmetic to do the indexing. It's a copy if, which boils down to a scan with some flags. Yeah, I buy that. I don't have the proof in front of me, but I buy that. That would have to be a rotate copy, though, not a rotate. The problem with the in-place one is that – so in your three reverses, the first two reverses can definitely happen in parallel. And you can come up with some sort of segmented from D to, you know, to the end of the thing.
Starting point is 00:10:08 Those are two completely different, you know, things. They're completely independent of each other. You're doing a similar operation. I suspect, I mean, at the very least, you could launch those two in parallel. I suspect you could do them in a single kernel. But then the problem is then you have to do this third reverse, which operates in the entire thing and sort of it is by construction
Starting point is 00:10:40 the first two reverses have to happen first. So maybe the answer is that the best you can do is two passes, but as anybody who's listened to this podcast faithfully knows, I find it hard to accept a multi-pass parallel algorithm. So there's got to be a better way. So there have to be other algorithms for rotate that are less efficient in the serial case but maybe are better in the parallel case so what else what else you got i'm trying to think of the word what do they say when they do like a
Starting point is 00:11:20 over under hyped but it's not hype it's that reverse is a intellectually extremely simple algorithm but when it comes whoa whoa whoa whoa whoa whoa whoa hang on a second hang on a second rotate reverse i said reverseate only requires forward iterators. So that means that the implementation, I think, is not just three reverses. So there must be some better algorithm than your three reverses. You don't know. You don't know. Come on.
Starting point is 00:12:04 You're supposed to be Mr. Rotate. You don't know any better way to do rotate than three reverses. You don't know, you don't, you don't know, come on, you're supposed to be Mr. Rotate. You don't know any better way to do, to do rotate than three reverses or a different way to do rotate than three reverses? Rotate algorithm. I'm gonna have Sean on here to come explain some, there's got to be something else. Parallel rotate. If I google for parallel rotate, I just get, you know, rotate along a parallel axis that doesn't help me at all you're being quiet are you cooking something now i'm just looking at the msvc stl implementation yeah yeah so so so i i looked at that too that those have um states so they're not immediately well that might explain why you can do it with forward iterators yeah let's go to cpp reference i want to like you ever watch that show who wants to be a millionaire
Starting point is 00:12:53 and it's like you have you have it was a show in like the late 90s the same age man we both know what that what that show is i'm explaining the listener. You'd have three – we have some Gen Z and Gen Ys on here. You'd have three lifelines that you could – it was a quiz show. You'd answer questions and you'd win progressively more money and you'd have three lifelines. And one of those lifelines was phone a friend. And I feel like I want to use my phone a friend um and uh i feel like i want to use my phone a friend i don't know who i'd phone but uh i guess sean i guess i would phone sean because he's mr rotate i called you mr rotate before but that's you're just you're just mr rotate fan boy yeah interestingly cpp reference And boy. Yeah. Interestingly, CPP reference says that the complexity is at most a distance first last swaps, which means that it's not the three reverse implementation.
Starting point is 00:13:53 Rotate. Yeah. Yeah. We've established it. So what's the complexity of the three reverses? It's two and. Oh, it's like. Yeah, it's two and.
Starting point is 00:14:03 Oh, that's awful. Get that. Get that nonsense out of here why is there not like a wikipedia page for the rotated algorithm i'm sharing my screen again folks because i you know what i was thinking i looked this up once i looked this up once you tell me what you see here buddy you tell me what you see here yeah yeah so they do the three reverses if they can yeah and they fall back to doing the stateful thing otherwise so is this does this mean it's not i mean cpp reference could be wrong in saying that um well okay let's let's let's stop reading cpp reference let's go read the standard find out whether these people are compliant yeah so i was gonna say oh this this episode this
Starting point is 00:14:44 episode might end with us filing some bugs. I'm not filing any bugs, but you can go ahead and... Yeah, at most last minus first swaps, the standard says. So yeah, this code, we will link this code. It's Microsoft STL slash a bunch of stuff slash slash inc, slash xutility. And it says, if constexpr is cpp17 random access iterator. I've elaborated a bit. That's not exactly how it reads.
Starting point is 00:15:15 And then it's got three calls to reverse. Then it says, else if constexpr is cpp bidirectional iterator. It's got the same thing except the auto temporary reverse until sentinel unchecked call and then otherwise it's got an else case where it does does a bunch of you know uh stateful stuff and a couple iterators uh with a do while loop followed by a while loop you don't see that every day folks okay so i'm filing uh i'm i'm getting we're getting the bottom of this here what do you find what are you filing i want to know why they're not following the standard and also is there actually a way to do a rotate i don't think that well to do a rotate in with last minus first swaps? Yeah, there is.
Starting point is 00:16:07 There's the way, the stateful way, right? Well, so say you've got 10 integers and you want to do a three rotate, meaning, and it's just an iota sequence. So it's one to 10 and you want to put one, two, three at the back. So that's a three rotate. The first thing you could do is just swap the first three with the last three and then you end up with the sequence assuming that it's one to ten so there's no zero so we're one indexed here you're going to end up with uh eight nine ten at the beginning then four five six seven one two
Starting point is 00:16:38 three and so then you've only done whatever uh in this case, three out of your 10 possible swaps. So the question is, can you turn the 8, 9, 10, 4, 5, 6, 7 into 4, 5, 6, 7, 8, 9, 10 in seven swaps? And if you use four swaps to get the 4, 5, 6, 7 to the front, what do you end up with? You end up with, so I think it actually is possible. You end up with four, five, six, seven. We don't have to file a bug against the standard, just against the implementation. Yeah. Four, five, six, seven. And then the eight, nine, 10 is going to end up a little bit jumbled. But the point being is that like whatever order the eight, 9, 10 is, you've got three swaps left to get three values into their correct positions. And you won't need all of them because you really only need.
Starting point is 00:17:32 Right, so it can be done. You need like maximum two swaps of those three. And it'll vary depending on the rotate that you're doing. So it definitely can be done. Just not this way is that so alak dot alak dot rotate paragraph four states that the complexity is at most uh last minus first uh swaps uh but this three reverses approach is... If you have a forward iterator, though...
Starting point is 00:18:15 Can you not do better with a forward iterator? No, I was going to say, does that kind of algorithm work with the forward iterator? Let's actually read through what this says. For the three reverses, is it exactly two N swaps? Oh, wait. I don't, we're, man, there's a few listeners that are upset. How many swaps does reverse take?
Starting point is 00:18:34 We're idiots. Oh, we are idiots. Takes N over two. Takes N over two. Yes, we are idiots. We are idiots. We apologize to the listener that's been listening screaming into their headphones you know what the funny thing is the funny thing is connor what's the funny thing
Starting point is 00:18:50 we like at the front end of this i asked how did we how do we implement reverse and we it's not it's not like we didn't think about this we literally talked about how you implement reverse and we talked about oh yeah it's like you take the midpoint and then like you swap the things up to the midpoint yeah this is the swap the things up to the midpoint. Yeah. This is the problem though. I almost filed the bug report. I almost looked like a complete ass. That's the thing though is in school, they teach you that the coefficient doesn't matter
Starting point is 00:19:14 when it comes to time complexity. But this is, and so your brain is a lot of the times, or at least my brain shaves that. But the standard in this case, the standard in this case is not giving a time, is not giving a is not giving a like right they're talking about a number of swaps it's it's it's saying exact yeah but the promise it will be but that's the problem is that like we have i have a mental model that that thinks in terms of linear quadratic linear rhythmic and then whenever there is like a it's technically half linear right it's half O of N, but like you ignore that half.
Starting point is 00:19:46 But in this case, if we had paid attention to it, we would have realized that the three reverses, when you've got two partial reverses that equal a full reverse followed by a full reverse is ultimately that number of swaps. The first two reverses, each one of those is a fourth of your swaps. I also, Sean, if you're listening, Sean, it's a specific apology to you because we should have done better. We should have done better. But anyways, let me explain for the listener.
Starting point is 00:20:16 It may not be fine. So the first two reverses each operate on half of the input. And so each one of those reverses does a fourth of the swaps. And then the – It's incorrect to say half they it works on a uh you know one minus percentage and then the other one does the other so technically you can do a zero rotate which is just a reverse yeah then then the then the the second or then the third reverse the of the whole thing does the other roughly half swaps. Yes. And I said, well.
Starting point is 00:20:47 So I'm going to show you how close I was to filing this bug report. And just to correct what I said. This is how close I've written the text. All I had to do was put in the title. Yes, he is very close to filing the bug. But yes, when I said zero rotate equals reverse, it means that the first half of the rotate algorithm would be reverse. Technically, a zero rotate is a no op, just so that we're not getting, we don't have listeners once again yelling it into their.
Starting point is 00:21:14 So can we do a segmented reverse? And by that, I mean, can we launch a reverse kernel where we say, here's two different ranges, reverse both of them? Because that is what essentially we would need to get from three passes down to two passes. And admittedly, you could also just launch the two independent reverses in parallel but if you didn't want to do that if you wanted to launch one big single pass as you do when you're gonna gpu matt rhymed i appreciate that um can that be done is this a scan well i was thinking that you could do something similar to unique by key but then i remember that actually because like what you what you want is a reverse by key, similar to a reduced by key, but the reduced is a reduction. So in Thrust, the way reverse gets implemented is you find the midpoint, and you just assume random access.
Starting point is 00:22:15 So you find the midpoint, and then you do swap ranges from first to mid, and then with the reverse iterator to last. So what is swap ranges? I don't even know where that is. Oh, no, there it is. Okay. I think that one's pretty simple. Yeah, it's just a parallel four. So I think that since reverse ends up being implemented with a parallel four, which has no communication, I think any way of combining, any way of implementing a segmented reverse is going to be worse.
Starting point is 00:22:57 Because implementing a segmented reverse, you're going to have to know when you've reached the end of one of those segments. And that means you're going to need communication. And so that's going to be strictly worse than an embarrassingly parallel approach. And so I think we've finally hit upon an example on this podcast of an algorithm where more passes is going to be better. Specifically, I think the best approach for this is going to be to launch, assuming that there's not some other completely different algorithm, the best approach to implementing the three reverse or a reverse-based rotate is going to be to launch the two
Starting point is 00:23:38 initial reverses, the two reverses of half the sequence in parallel um and then to launch the third reverses like as a dependency that depends upon the first two unless unless he said unless there's a way to do this all in a single pass in which case the benefit of keeping everything in memory may be worth it. But actually, yeah, so if you could do this in a way where you didn't have to go back to global memory, where you load and store each thing from global memory a single time, that would, I think, be better than the three reverses approach. So I think there's a better algorithm out here. Well, we will leave that to a future episode because we have hit the limit.
Starting point is 00:24:40 Which brings us to episode 200, folks. Be sure to check these show notes either in your podcast app or at ADSPthepodcast.com for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quantity. That is the tagline of our podcast. That's not the tagline. Our tagline is chaos with sprinkles of information.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.