Algorithms + Data Structures = Programs - Episode 115: Max Gap in C++23

Starting point is 00:00:00 See, the problem is we've added too many of them. Yep. That's why you got me, Bryce. I memorize all this stuff for fun. It makes me happy. See, there's this thing called Google that replaces you pretty well. I don't need Google because, I mean, don't get me wrong. I don't think about the set algorithms very often, like asymmetric difference or et cetera, those ones.

Starting point is 00:00:26 But you know, these ones are, these ones are fundamental. Welcome to ADSP, the podcast episode 115 recorded on February 1st, 2023. My name is Connor, and today with my co-host Bryce, we discuss a problem entitled Max Gap and solve it in C++23. See if you can listen along and solve it yourself. So I will be coming to Toronto, not once, but twice, and not seeing you. What? to toronto not once but twice and not seeing you what because it this is apparently the the easiest routing for me to get upgrades to london on is is the la guardia to to toronto then toronto to um london on british airways it's it's this is my secret this. This is my secret. This is my secret. The LaGuardia

Starting point is 00:01:29 to London by way of Canada special. I mean, you can come hang out with me at the airport. I can't. We already talked about this. Because you'll be in the International. Yeah, yeah, yeah. But I'm going to be

Starting point is 00:01:44 coming twice maybe the second maybe you can you can come somewhere with me come to london with me so uh so what's going on what are we talking about today um oh am i supposed to have a topic because i'm not no no no i've got tons of topics you see kind of that that's what works about this relationship is that I can be lazy and just show up unprepared and you are just, you're just happy and eager to. Oh, yeah. I mean, I've literally got like 17 different episodes that we could chat about right now. Is it actually 17? No, I mean, I just made that number up.

Starting point is 00:02:19 But like 15 of them are off topic, you know, probably like a good five of them have to do with combinatorial logic. Another three have to do with this new term I heard about called collection-oriented programming, which I submitted to give a talk at Closure Conj, which is in April. And anyways, we're not going to, these are things that we could talk about, but I'm not going to talk about just to, you know, if folks are interested, I'll leave a link to this paper, Parallel Block Delayed Sequences, which is about this sort of parallel algorithms model. But it mentions collection-oriented programming, which sent me on this deep dive like a couple weeks ago. What are you going to say? This is going to show you, dear listener, the difference between what Connor thinks about in his spare time and what I think about in my spare time. And you're going to have to air this because this is a call for help.

Starting point is 00:03:14 Here it is. I need somebody who speaks Japanese who can make a phone call for me. All right. Thanks for that, Bryce. You're not going to ask why? Go ahead. Why? Because I need them to call this little restaurant

Starting point is 00:03:34 that I ate at in Kyoto because they had this amazing cheesecake. It was this little French restaurant. It's sort of an immediate problem because my girlfriend really liked it, and I want to make it for her birthday, which is coming up. So please, somebody reach out. I need some help here. Rest assured.

Starting point is 00:04:05 I will fast forward your voice at 10x for most of that and just leave in like a couple words to give people the sense. But someone will DM you or, you know, post a comment on the ADSP Twitter, Twitter post with I speak Japanese, how may I be of service? I'm sure back to all my topics that we're not going to talk about. I mean, a whole other thing I've spent, I have this, I'm working up towards building a course on algorithms that at some point I'm going to create. And it's going to be, it's going to be based around a set of 10 problems. Um, some of which we've discussed like the maximum consecutive ones, that's, uh, the third or second problem. And these are problems that I just have spent a lot of time thinking about and, you know, like to think about them while I'm running. And I recently came across a problem called Max, Max gap. And we should actually throw it to a different episode. But there's like two different flavors of

Starting point is 00:04:58 this problem. And it's just been, it's been amazing. I'll leave a link in the description for folks that want to go. You can't just tell me that and not like now okay okay i'm talking about max gap i just i just don't want to i don't want to dedicate this whole episode maybe we'll dedicate this whole episode we'll keep the chit chat at the beginning in but so max gap it's given a list of integers determine the difference, the maximum difference between two adjacent elements. Okay, they're adjacent. If the list is sorted. So you could simplify the problem

Starting point is 00:05:34 and say, given a sorted list, what's the maximum difference between two adjacent elements? What is it, if it's not sorted, what is it supposed to give you? Well, I mean, so there, so I just simplified the problem. And in the more complicated problem, the steps are sort it first, and then...

Starting point is 00:05:53 Okay, all right. So you want to just assume that we have a sorted list. Yeah, so I'm just simplifying. Like, the actual problem is given an arbitrarily, whatever, shuffled list of integers. If you sort it, what's the maximum difference between two adjacent elements? But honestly, clearly the first step of this if you sort it what's the maximum difference between two adjacent elements but honestly the clearly the first step of this problem is sorted why why is this tricky it's not it's not i'm just like so this problem itself especially like talk me through

Starting point is 00:06:18 how you'd solve it and then i'll add a little extra in here. I guess I would just do it in... My first thought is like an inclusive scan. Oh, okay. My first reaction is, okay, we're looking to find the maximum of something. So we're doing a reduction. Anytime we're looking through a list of things and we're looking to find, you know, the greatest or the smallest or something like that, some sort of local minima or maxima or global minima and maxima, then it's reduction. So, but in this case, it's sort of a stenciled reduction. And so that's why I said inclusive scan. But inclusive scan doesn't actually make sense because we don't need all those partial sums.

Starting point is 00:07:10 So I guess it's just like, it's just a reduction where it's like a zip reduce basically. It's how I would do it where the two input sequences that I would give to the reduction would be, one would be like from the zero element to the n minus one element, and the other one would be from the first element to the n element. And so if you create those two sub-sequences of the original sequence, and then you feed them into a zip, and then you do a reduction on that zipped view then you've you're doing a reduction on the pairs of adjacent elements um and then the reduction you just do you know um max and like if i miss something nope you are 100 correct this is fantastic and i'm actually it's a shame i'm almost positive we talked about this exact thing on a previous – not this exact problem, but something where this thing came up.

Starting point is 00:08:10 We'll come back to it in a second because I'm being vague while I'm referring to thing. My first question I'm going to ask is what is the name of the algorithm where you do the zip to first to second last and then second to last and if you want a hint i can give you a hint because there's actually two correct answers to this question well isn't it isn't it one of the um the window ones the sliding ones yes that is not the name of it though yeah i don't know i don't know what the one is one is an algorithm from C++. Wait, hang on, hang on, hang on. I am the... No, no, no. I see your eyes shifting around, going to CPP reference.

Starting point is 00:08:51 Yeah, I'm going to go look because I know how to find the answer because I am the chair of the C++ Library Evolution Group, and I'm pretty sure that this is the thing that was added under my tenure. So I should know this. I just don't remember the name. But I remember the paper. I remember the paper. I'm talking to you, listener.

Starting point is 00:09:12 And there's a silver medal answer and a gold medal answer. The silver medal answer is from C++98. The gold medal answer is from C++23. And this is just the first step. so it doesn't include the reduction. It's just this. And in functional languages, this is called ziptail. So it's slide, right, is the one that we want for this 23 one? You can solve this with slide, but slide is a...

Starting point is 00:09:41 Slide's not what you have in mind? No, it is not the exact version of what we want. Because slide, there's a C++23 ranges adapter that is very similar to slide, but slide returns you a range of ranges. There's a sibling algorithm to this that returns you a range of tuples and a specialization of that one that we want exactly for this. And we talked about it when we listed off all the different range adapters. However, there's also a sibling algorithm to this back from C++98.

Starting point is 00:10:12 We've added too many of them. Yep. That's why you got me, Bryce. I memorize all this stuff for fun. It makes me happy. See, there's this thing called Google that, you know, replaces you. I don't, I don't, I don't need Google because I mean, don't get me wrong. I've, I don't think about like the set algorithms very often, like asymmetric difference or

Starting point is 00:10:35 et cetera, those ones. But you know, these ones are, these ones are fundamental. Okay. So you're saying it's like slide, but it gives you tuples. Yep. Um, interesting. Um, slide but it gives you tuples yeah um interesting um wow is it is it adjacent yes yeah and even more so i mean of course it's adjacent yeah adjacent so adjacent is basically identical to slide. Slide takes its window size as an argument, variable arguments, whereas adjacent takes its window size as a template argument.

Starting point is 00:11:16 But we even want something that's even more specific than adjacent. I mean, well, in this case, just want like uh adjacent transformed exactly because we are taking the i don't know what you're looking for there we're taking the difference so we're passing stood colon colon minus two yeah yeah oh so it's just adjacent transform than meduse. Oh, that's very elegant. And do you know what the C++98 is? Hang on, hang on, hang on. I want to pick a fight with you here, which is that I believe that my answer of slide, like slide and adjacent are more or less the same thing.

Starting point is 00:11:57 It's just, you know, one of them's, you're specifying the size that you're looking for at compile time versus runtime. Like I think I should get full credit for my answer slide. I will give you a fourth place not making it to the Olympics. You're on the what they call, you're an Olympic alternate. So you didn't get bronze, silver, or gold. The reason is, and specifically adjacent is not really the correct like the gold answer

Starting point is 00:12:26 was adjacent transform and it's specifically because if you're using adjacent or slide spelling that out is way less elegant because in the slide example you're going to pass two to get a range of two a range of ranges where each inner range is two elements but then how are you going to destructure that and take the difference? Yeah, okay. You basically have to call like dot begin and dot end on those. And then even with the adjacent, you still have to unwrap the tuples with either a lambda that uses C++17 structure bindings

Starting point is 00:12:59 or the std colon colon get angle zero, angle one. Whereas adjacent transform was specifically added in the case where and actually there's even one more specialization which if barry is listening to this episode our guest guest from last episode he's going to be going you're actually it's still not the most specific one because there's a view adapter called pairwise which is a specialization of adjacent transform where the size is two i'm not a huge fan of the pairwise name even though it has precedence in f sharp um i think a javascript library and then also the python more iter tool so there's a couple libraries in popular languages

Starting point is 00:13:39 and f sharp has it as specifically this like adjacent i. I don't know that we have it in standard C++. C++ 23? Yes, we do. Well, then somebody needs to go update C++ Reference because adjacent. It's there. No, it's not. Which one? I see adjacent, but not pairwise.

Starting point is 00:14:01 It's there. You're not looking at the right place. Okay. Well, I'm looking at cppre okay well i'm looking at cpp reference dot com slash w slash cpp slash ranges which is where adjacent is oh no no it's it's on the page for uh adjacent view because i think it's just a um specialization of it yeah okay that makes sense no no it's not a specialization it's an alias specialization can't be under a different name it's an alias for adjacent exactly sorry boom you under a different name. It's an alias for adjacent to.

Starting point is 00:14:25 Exactly. Sorry. Boom! You just got owned! It is true. 100% you get that. I am one of the most pedantic people you'll meet, and I'm using the wrong word there. It is an alias cost us nothing. It's like one line. It's like yada, yada, yada, pairwise equals adjacent to, where the yada, yada, yada is inline constexpr auto. Yeah, this is a very, very lightweight argument. This is not a strong opinion loosely held. This is like a weak opinion loosely held. I am not a big fan of it because it sort of falls into the same category of std accumulate and other algorithms coming with default binary operations.

Starting point is 00:15:09 Pairwise is one extra thing for people to learn. And I don't think what it saves us, it saves us two chevrons and actually the number two. So it's three characters that it saves us. And I actually think that adjacent transform is more meaningful to the C++ program when you already have adjacent, adjacent difference, adjacent find. Yeah, I actually completely agree with you that I think it actually reduces clarity. Yeah. I've made these, I had that lightning talk called algorithm selection, where I start with mismatch as the most general algorithm and then show that stood equal and stood any of and stood adjacent find. All these algorithms are quote unquote specializations that can be spelled with stood mismatch. And this is one of these, like I could do the same thing now for adjacent, adjacent transform and pairwise transform and pairwise, sorry, adjacent pairwise, adjacent transform and pairwise transform and pairwise and sorry, adjacent

Starting point is 00:16:05 pairwise, adjacent transform and pairwise transform. And I just think that like, I think back when I was attending the ranges monthly meetings, I voted like weekly against adding this, but other folks were a big fan. And my biggest thing is that like, is there precedence for it? And in other languages, there is precedence for it. And I couldn't find like multiple names. And of the names that you might want to give this to, I guess it's decent. But I just would, I would rather have nothing there the same way I would rather have like

Starting point is 00:16:36 our left fold accumulate, not take a default operation. Yeah. And it's also the same thing as keys and values. I always forget about this, is that there is a view adapter called elements, which gives you the ability if you're looping over tuples, typically this is like two tuples that represent like a key value pair. You can use elements 0, 1, 2, 3, 4 to get access to the nth element from a tuple. And we have, once again, two aliases, keys for element 0 and values for elements 1, which I just think is like pretty undesirable in many cases where what you have is not like

Starting point is 00:17:18 a key value pair. I could technically have like a four tuple that represents like a four dimensional data point for some, I don't know, chemistry or space thing where it's just like a dimension of some four point thing. And I could use values to get the second one, like at index one, just because it happens to be an alias for elements one. And in cases where you have key value pairs, it's all right. But it just, yeah yeah these things i would rather have like the bare like core sort of fundamental algorithms and then in certain cases like any of and all of and none of those provide a lot more readability but also more importantly if you want keys values

Starting point is 00:17:58 or pairwise it's like one line that you can add to your program like if you if it like it's very easy for you to add that yourself. So yeah. Although I would sort of say that's kind of like it's obfuscating code that like the average C++ developer that really goes

Starting point is 00:18:14 and understands these views adapters. If they then see something, they're going to have to go F12 on it, which isn't a ton of work. But anyways, like I said, this is a weak opinion, loosely held. Like it's something that I'm against, but also, it doesn't break my heart that it's there.

Starting point is 00:18:28 I will choose to write adjacent transform bracket to angle bracket and be done with it. So just playing devil's advocate there for a minute. Because the associative containers that have a key value, you know, separations like maps, basically, because the maps store things as a stood pair of key value. to work with these the map containers and um uh and ranges if you didn't have the the helpful aliases like essentially what i'm saying what i'm arguing is that the the existence of the map containers and their design using uh giving you like when you ask for like a view of them um uh you're going to get a sequence of key value pairs um like because of that um and like correct me if i'm wrong but there's no way to like there's like no method on map where i can say like just give me a sequence of the keys is there not that i have used right so i i the alternative, if we didn't want, like, instead of having the alias for keys, I think we could have added a method to map and unordered map and multi-maps that returned you a range that was the keys and the values, and it would just do elements 0 and 1.

Starting point is 00:20:08 I think that would have been a fine alternative to do, although I am loathe to add more member functions to containers. And I think generally in C++ library design, for something that's like a separate concern, like algorithms, we tend to like to have the algorithms be um uh separate uh free functions that operate on generic things not uh things that are um members of uh of our data structures um but i think in this case having that little helper would have been fine and useful and then we wouldn't have had to have this you know stood ranges element stood ranges keys and stood or i guess it's stood views keys and stood views values yeah i mean i haven't spent a ton of time thinking about it but like

Starting point is 00:21:00 i think in the case where you're turning a associative container with keys and values into a range, and then if we didn't have keys and values, the aliases, then you'd have to use elements. In that case, you're losing, and you probably do want to use the aliases, keys, and values. But I think I would, off the top of my head, like, I would prefer, because in the case where you're not dealing with keys and values in some kind of associative container, it is adding semantic context where it's like not useful. And in fact, it's harmful probably. And probably something like first and second, even though in the associative container, when you have keys and values, that's still less helpful than keys and values. When you use first and second on any other context, it's more helpful. And so if we were going to add aliases, I would argue for something like that. But anyways, we're totally getting away from max gap though. So, and also, do you know what the, we still haven't said the silver answer to the C++ 98.

Starting point is 00:21:54 No, I was going to ask about that. Yeah, I don't know. Tell me. It's the second or first worst named algorithm in C++, in my opinion, which is adjacent difference. Oh, yeah, yeah. Adjacent difference would make sense, yes. Which is basically adjacent transform with a bunch of properties that suck because it's encoded the default std minus operation into the name of the algorithm, which I've mentioned on many different places. But so, yes, you do your adjacent transform, and then you basically just do a maximum reduction. And this is problem six of my 10 problems.

Starting point is 00:22:33 I used to have five, and then I recently added two in the last week. You shouldn't ask me any algorithm problems that are solved by reduce, because you know that my first go-to thing for any problem that you give me is, can I solve this by a reduction? Well, that was the thing, though, is if we go back to your original answer, what was my response?

Starting point is 00:22:54 100% correct. A reduction where you're zipping together, you're doing the zip tail trick, the tail from functional languages, meaning you're zipping sort of your original sequence with the tail of it, which is dropping the first element. That's the, that is like semantically the exact thing we're trying to do here. The, then like the follow-up question was like, how do you spell that using adapters? Which you don't lose points for not getting the right names of that stuff. But this is where it gets interesting because problem seven, like that problem itself is

Starting point is 00:23:21 not like, it's, it's nice to know that, oh, there's a nice function in C++ 23 ranges now where you can go adjacent transform. And then currently right now without the pipeline operator, you know, we're not able to pipe that to a max reduction. So you have to pass it to std colon colon ranges, colon colon max, which we might talk about in the next episode. I'm not sure if you saw this tweet that I had, which has become like the third most liked and second most retweeted. And it's caused a ton of opinions having to do with C++26 pipeline operator. But where I'm going with this is there is a problem number seven,

Starting point is 00:23:57 which is a modified version of this problem, which I think is just incredibly intellectually stimulating. And I've spent so much time thinking about it and how to spell this in different languages. Same problem. We'll assume that we already have a sorted list of numbers. And the question is now not what is the maximum of adjacent values,

Starting point is 00:24:18 difference between adjacent values, but what is the count of the maximum difference between adjacent values? And this, this is where things get interesting. So wait, so wait, you want the, you want to know how many times that maximum gap occurs. Yes. And I realized probably I should have done this 20 minutes ago. Also, we might be bleeding in. This might be part two now of this episode.

Starting point is 00:24:42 I probably should have given an example. So for instance, for the first problem, if we had the numbers 1, 3, 5, and 10, the differences are 2, 2, and 5. So the answer to that question is 5. But for the second problem, the answer would be 1 because there is 1, 5. However, if the sequence was 1, 3, 5, 10, 15, 20, our differences are then 2, 2, 5, 5, 5. And so then you want the count of the number of 5s, whatever the maximum difference is. In this case, it would be 3. And it's only a slightly different problem,

Starting point is 00:25:23 but the implications on how you solve this, I think are interesting. Tune in next week to hear part two of this two-part episode where we solve max gap count. Thanks for listening. We hope you enjoyed and have a great day.

Your Ad Here

Algorithms + Data Structures = Programs - Episode 115: Max Gap in C++23

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.