Algorithms + Data Structures = Programs - Episode 115: Max Gap in C++23
Episode Date: February 3, 2023In this episode, Conor and Bryce discuss the C++23 solution to the problem Max Gap.Link to Episode 115 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterADSP: The Po...dcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2023-02-01Date Released: 2023-02-03Combinatory LogicCollection Oriented ProgrammingClojure/conj 2023Parallel Block-Delayed SequencesMax Gap ProblemMax Gap SolutionC++17 std::reduceC++23 std::inclusive_scanC++23 std::views::slideC++23 std::views::adjacentC++23 std::views::adjacent_transformC++98 std::minusC++23 std::views::pairwiseC++23 std::views::pairwise_transformF# Seq.pairwisePython more_itertools.pairwiseRxJS pairwiseLightning Talk: Algorithm Selection - Conor Hoekstra [ ACCU 2021 ]C++98 std::accumulateC++20 std::views::elementsC++20 std::views::keysC++20 std::views::valuesC++98 std::adjacent_differenceConor’s Tweet about the C++26 Pipeline OperatorIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
See, the problem is we've added too many of them.
Yep.
That's why you got me, Bryce.
I memorize all this stuff for fun.
It makes me happy.
See, there's this thing called Google that replaces you pretty well.
I don't need Google because, I mean, don't get me wrong.
I don't think about the set algorithms very often, like asymmetric difference or et cetera, those ones.
But you know, these ones are, these ones are fundamental.
Welcome to ADSP, the podcast episode 115 recorded on February 1st, 2023.
My name is Connor, and today with my co-host Bryce, we discuss a problem entitled Max Gap and solve it in C++23.
See if you can listen along and solve it yourself.
So I will be coming to Toronto, not once, but twice, and not seeing you.
What? to toronto not once but twice and not seeing you what because it this is apparently the the easiest routing for me to get upgrades to london on is is the la guardia to to toronto
then toronto to um london on british airways it's it's this is my secret this. This is my secret. This is my secret.
The LaGuardia
to
London
by way of Canada special.
I mean, you can
come hang out with me at the
airport. I can't. We already talked about this.
Because you'll be in the International.
Yeah, yeah, yeah. But I'm going to be
coming twice maybe
the second maybe you can you can come somewhere with me come to london with me so uh so what's
going on what are we talking about today um oh am i supposed to have a topic because i'm not
no no no i've got tons of topics you see kind of that that's what works about this relationship is that I can be lazy and just show up unprepared and you are just, you're just happy and eager to.
Oh, yeah.
I mean, I've literally got like 17 different episodes that we could chat about right now.
Is it actually 17?
No, I mean, I just made that number up.
But like 15 of them are off topic, you know, probably like a good five of them have to do with combinatorial logic. Another three have to do with this new term I heard about called
collection-oriented programming, which I submitted to give a talk at Closure Conj,
which is in April. And anyways, we're not going to, these are things that we could talk about,
but I'm not going to talk about just to, you know, if folks are interested, I'll leave a link to this paper, Parallel Block Delayed Sequences,
which is about this sort of parallel algorithms model. But it mentions collection-oriented
programming, which sent me on this deep dive like a couple weeks ago. What are you going to say?
This is going to show you, dear listener, the difference between what Connor thinks about in his spare time and what I think about in my spare time.
And you're going to have to air this because this is a call for help.
Here it is.
I need somebody who speaks Japanese who can make a phone call for me.
All right.
Thanks for that, Bryce.
You're not going to ask why?
Go ahead.
Why?
Because I need them to call this little restaurant
that I ate at in Kyoto
because they had this amazing cheesecake.
It was this little French restaurant.
It's sort of an immediate problem because my girlfriend really liked it,
and I want to make it for her birthday, which is coming up.
So please, somebody reach out.
I need some help here.
Rest assured.
I will fast forward your voice at 10x for most of that and just leave in like a couple words to give people the sense. But someone will DM you or, you know, post a
comment on the ADSP Twitter, Twitter post with I speak Japanese, how may I be of service? I'm sure
back to all my topics that we're not going to talk about. I mean, a whole other thing I've spent, I have this, I'm working up towards building a course on algorithms that at some point I'm going to create.
And it's going to be, it's going to be based around a set of 10 problems. Um, some of which
we've discussed like the maximum consecutive ones, that's, uh, the third or second problem.
And these are problems that I just have spent a lot of time thinking about and, you know,
like to think about them while I'm running. And I recently came across a problem called Max, Max
gap. And we should actually throw it to a different episode. But there's like two different flavors of
this problem. And it's just been, it's been amazing. I'll leave a link in the description
for folks that want to go. You can't just tell me that and not like now okay okay i'm talking about max gap i just i just don't want to i don't want to
dedicate this whole episode maybe we'll dedicate this whole episode we'll keep the chit chat at
the beginning in but so max gap it's given a list of integers determine the difference, the maximum difference
between two adjacent elements.
Okay, they're adjacent.
If the list is sorted.
So you could simplify the problem
and say, given a sorted list,
what's the maximum difference
between two adjacent elements?
What is it,
if it's not sorted,
what is it supposed to give you?
Well, I mean, so there, so I just simplified the problem.
And in the more complicated problem, the steps are sort it first, and then...
Okay, all right.
So you want to just assume that we have a sorted list.
Yeah, so I'm just simplifying.
Like, the actual problem is given an arbitrarily, whatever, shuffled list of integers.
If you sort it, what's the maximum difference between two adjacent elements?
But honestly, clearly the first step of this if you sort it what's the maximum difference between two adjacent elements but
honestly the clearly the first step of this problem is sorted why why is this
tricky it's not it's not i'm just like so this problem itself especially like talk me through
how you'd solve it and then i'll add a little extra in here. I guess I would just do it in... My first thought is like an inclusive scan.
Oh, okay.
My first reaction is, okay, we're looking to find the maximum of something.
So we're doing a reduction.
Anytime we're looking through a list of things and we're looking to find, you know, the greatest or the smallest or
something like that, some sort of local minima or maxima or global minima and maxima, then it's
reduction. So, but in this case, it's sort of a stenciled reduction. And so that's why I said
inclusive scan. But inclusive scan doesn't actually make sense because we don't need all those partial sums.
So I guess it's just like, it's just a reduction where it's like a zip reduce basically.
It's how I would do it where the two input sequences that I would give to the reduction would be, one would be like from the zero element
to the n minus one element, and the other one would be from the first element to the n element.
And so if you create those two sub-sequences of the original sequence, and then you feed them
into a zip, and then you do a reduction on that zipped view then you've you're doing a reduction
on the pairs of adjacent elements um and then the reduction you just do you know um max and
like if i miss something nope you are 100 correct this is fantastic and i'm actually it's a shame
i'm almost positive we talked about this exact thing on a previous – not this exact problem, but something where this thing came up.
We'll come back to it in a second because I'm being vague while I'm referring to thing.
My first question I'm going to ask is what is the name of the algorithm where you do the zip to first to second last and then second to last and if you want a hint i can give you a hint
because there's actually two correct answers to this question well isn't it isn't it one of the
um the window ones the sliding ones yes that is not the name of it though yeah i don't know i
don't know what the one is one is an algorithm from C++. Wait, hang on, hang on, hang on.
I am the...
No, no, no.
I see your eyes shifting around, going to CPP reference.
Yeah, I'm going to go look because I know how to find the answer
because I am the chair of the C++ Library Evolution Group,
and I'm pretty sure that this is the thing that was added under my tenure.
So I should know this.
I just don't remember the name.
But I remember the paper.
I remember the paper.
I'm talking to you, listener.
And there's a silver medal answer and a gold medal answer.
The silver medal answer is from C++98.
The gold medal answer is from C++23.
And this is just the first step. so it doesn't include the reduction.
It's just this.
And in functional languages, this is called ziptail.
So it's slide, right, is the one that we want for this 23 one?
You can solve this with slide, but slide is a...
Slide's not what you have in mind?
No, it is not the exact version of what we want.
Because slide, there's a C++23 ranges adapter that is very similar to slide, but slide returns
you a range of ranges.
There's a sibling algorithm to this that returns you a range of tuples and a specialization
of that one that we want exactly for this.
And we talked about it when we listed off all the different range adapters.
However, there's also a sibling algorithm to this back from C++98.
We've added too many of them.
Yep.
That's why you got me, Bryce.
I memorize all this stuff for fun.
It makes me happy.
See, there's this thing called Google that, you know, replaces you.
I don't, I don't, I don't need Google because I mean, don't get me wrong.
I've, I don't think about like the set algorithms very often, like asymmetric difference or
et cetera, those ones.
But you know, these ones are, these ones are fundamental.
Okay.
So you're saying it's like slide, but it gives you tuples.
Yep.
Um, interesting. Um, slide but it gives you tuples yeah um interesting um wow is it is it adjacent yes yeah and even more
so i mean of course it's adjacent yeah adjacent so adjacent is basically identical to slide. Slide takes its window size as an argument, variable arguments, whereas adjacent takes
its window size as a template argument.
But we even want something that's even more specific than adjacent.
I mean, well, in this case, just want like uh adjacent transformed exactly because we are
taking the i don't know what you're looking for there we're taking the difference so we're passing
stood colon colon minus two yeah yeah oh so it's just adjacent transform than meduse. Oh, that's very elegant. And do you know what the C++98 is?
Hang on, hang on, hang on.
I want to pick a fight with you here,
which is that I believe that my answer of slide,
like slide and adjacent are more or less the same thing.
It's just, you know, one of them's,
you're specifying the size that you're looking for
at compile time versus runtime.
Like I think I should get full credit for my answer slide.
I will give you a fourth place not making it to the Olympics.
You're on the what they call, you're an Olympic alternate.
So you didn't get bronze, silver, or gold.
The reason is, and specifically adjacent is not really the correct like the gold answer
was adjacent transform and it's specifically because if you're using adjacent or slide
spelling that out is way less elegant because in the slide example you're going to pass two to get
a range of two a range of ranges where each inner range is two elements but then how are you going
to destructure that and take the difference?
Yeah, okay.
You basically have to call like dot begin and dot end on those.
And then even with the adjacent, you still have to unwrap the tuples
with either a lambda that uses C++17 structure bindings
or the std colon colon get angle zero, angle one.
Whereas adjacent transform was specifically added in the
case where and actually there's even one more specialization which if barry is listening to
this episode our guest guest from last episode he's going to be going you're actually it's still
not the most specific one because there's a view adapter called pairwise which is a specialization
of adjacent transform where the size is two i'm not a huge
fan of the pairwise name even though it has precedence in f sharp um i think a javascript
library and then also the python more iter tool so there's a couple libraries in popular languages
and f sharp has it as specifically this like adjacent i. I don't know that we have it in standard C++.
C++ 23?
Yes, we do.
Well, then somebody needs to go update C++ Reference because adjacent.
It's there.
No, it's not.
Which one?
I see adjacent, but not pairwise.
It's there.
You're not looking at the right place.
Okay.
Well, I'm looking at cppre okay well i'm looking at cpp reference
dot com slash w slash cpp slash ranges which is where adjacent is oh no no it's it's on the page
for uh adjacent view because i think it's just a um specialization of it yeah okay that makes sense
no no it's not a specialization it's an alias specialization can't be under a different name
it's an alias for adjacent exactly sorry boom you under a different name. It's an alias for adjacent to.
Exactly. Sorry.
Boom! You just got owned!
It is true. 100% you get that. I am one of the most pedantic people you'll meet, and I'm using the wrong word there. It is an alias cost us nothing. It's like one line. It's like yada, yada, yada, pairwise equals adjacent to, where the yada, yada, yada is inline constexpr auto.
Yeah, this is a very, very lightweight argument.
This is not a strong opinion loosely held.
This is like a weak opinion loosely held.
I am not a big fan of it because it sort of falls into the same category of std accumulate
and other algorithms coming with default binary operations.
Pairwise is one extra thing for people to learn.
And I don't think what it saves us, it saves us two chevrons and actually the number two.
So it's three characters that it saves us.
And I actually think that adjacent transform is more meaningful to the C++ program when you already have adjacent, adjacent difference, adjacent find.
Yeah, I actually completely agree with you that I think it actually reduces clarity.
Yeah. I've made these, I had that lightning talk called algorithm selection, where I start with mismatch as the most general algorithm and then show that stood equal and stood any of and stood adjacent find.
All these algorithms are quote unquote specializations that can be spelled with stood mismatch.
And this is one of these, like I could do the same thing now for adjacent, adjacent transform and pairwise transform and pairwise, sorry, adjacent pairwise, adjacent transform and pairwise transform and pairwise and sorry, adjacent
pairwise, adjacent transform and pairwise transform.
And I just think that like, I think back when I was attending the ranges monthly meetings,
I voted like weekly against adding this, but other folks were a big fan.
And my biggest thing is that like, is there precedence for it?
And in other languages, there is precedence for it.
And I couldn't find like multiple names.
And of the names that you might want to give this to, I guess it's decent.
But I just would, I would rather have nothing there the same way I would rather have like
our left fold accumulate, not take a default operation.
Yeah.
And it's also the same thing as keys and values. I always forget about
this, is that there is a view adapter called elements, which gives you the ability if you're
looping over tuples, typically this is like two tuples that represent like a key value pair.
You can use elements 0, 1, 2, 3, 4 to get access to the nth element from a tuple.
And we have, once again, two aliases, keys for element 0 and values for elements 1, which
I just think is like pretty undesirable in many cases where what you have is not like
a key value pair.
I could technically have like a four tuple that represents like a four dimensional data point
for some, I don't know, chemistry or space thing where it's just like a dimension of some four
point thing. And I could use values to get the second one, like at index one, just because it
happens to be an alias for elements one. And in cases where you have key value pairs, it's all
right. But it just, yeah yeah these things i would rather have like
the bare like core sort of fundamental algorithms and then in certain cases like any of and all of
and none of those provide a lot more readability but also more importantly if you want keys values
or pairwise it's like one line that you can add to your program like if you if it like it's very
easy for you to add that yourself.
So yeah.
Although I would sort of say
that's kind of like
it's obfuscating code
that like the average C++ developer
that really goes
and understands these views adapters.
If they then see something,
they're going to have to go F12 on it,
which isn't a ton of work.
But anyways, like I said,
this is a weak opinion,
loosely held.
Like it's something that I'm against, but also, it doesn't break my heart that it's there.
I will choose to write adjacent transform bracket to angle bracket and be done with it.
So just playing devil's advocate there for a minute.
Because the associative containers that have a key value, you know, separations like maps, basically, because the maps store things as a stood pair of key value. to work with these the map containers and um uh and ranges if you didn't have the the helpful
aliases like essentially what i'm saying what i'm arguing is that the the existence of the map
containers and their design using uh giving you like when you ask for like a view of them um uh you're going to get a sequence of
key value pairs um like because of that um and like correct me if i'm wrong but there's no way
to like there's like no method on map where i can say like just give me a sequence of the keys is
there not that i have used right so i i the alternative, if we didn't want, like, instead of having the alias for keys, I think we could have added a method to map and unordered map and multi-maps that returned you a range that was the keys and the values, and it would just do elements 0 and 1.
I think that would have been a fine alternative to do,
although I am loathe to add more member functions to containers.
And I think generally in C++ library design,
for something that's like a separate concern, like algorithms, we tend to like to have the algorithms be
um uh separate uh free functions that operate on generic things not uh things that are
um members of uh of our data structures um but i think in this case having that little helper
would have been fine and useful and then we wouldn't have had to have this you know stood ranges element stood ranges keys and stood or i guess it's stood views keys and stood views
values yeah i mean i haven't spent a ton of time thinking about it but like
i think in the case where you're turning a associative container with keys and values into a range, and then if we didn't have keys and values, the aliases, then you'd have to use elements.
In that case, you're losing, and you probably do want to use the aliases, keys, and values.
But I think I would, off the top of my head, like, I would prefer, because in the case where you're not dealing with keys and values in some kind of associative container, it is adding semantic context where it's like not useful. And
in fact, it's harmful probably. And probably something like first and second, even though
in the associative container, when you have keys and values, that's still less helpful than keys
and values. When you use first and second on any other context, it's more helpful. And so if we
were going to add aliases, I would argue for something like that. But anyways, we're totally getting away from max gap though. So, and also, do you know what the,
we still haven't said the silver answer to the C++ 98.
No, I was going to ask about that. Yeah, I don't know. Tell me.
It's the second or first worst named algorithm in C++, in my opinion, which is adjacent difference.
Oh, yeah, yeah.
Adjacent difference would make sense, yes.
Which is basically adjacent transform with a bunch of properties that suck because it's encoded the default std minus operation into the name of the algorithm, which I've mentioned on many different places.
But so, yes, you do your adjacent transform,
and then you basically just do a maximum reduction.
And this is problem six of my 10 problems.
I used to have five,
and then I recently added two in the last week. You shouldn't ask me any algorithm problems
that are solved by reduce,
because you know that my first go-to thing
for any problem that you give me is, can I
solve this by a reduction?
Well, that was the thing, though, is if we go back to your original answer, what was
my response?
100% correct.
A reduction where you're zipping together, you're doing the zip tail trick, the tail
from functional languages, meaning you're zipping sort of your original sequence with
the tail of it, which is dropping the first element.
That's the, that is like semantically the exact thing we're trying to do here.
The, then like the follow-up question was like, how do you spell that using adapters?
Which you don't lose points for not getting the right names of that stuff.
But this is where it gets interesting because problem seven, like that problem itself is
not like, it's, it's nice to know that, oh, there's a nice function in C++ 23 ranges now where you can go adjacent transform.
And then currently right now without the pipeline operator, you know, we're not able to pipe that to a max reduction.
So you have to pass it to std colon colon ranges, colon colon max, which we might talk about in the next episode. I'm not sure if you saw this tweet that I had, which has become like the third most liked
and second most retweeted.
And it's caused a ton of opinions
having to do with C++26 pipeline operator.
But where I'm going with this
is there is a problem number seven,
which is a modified version of this problem,
which I think is just incredibly
intellectually stimulating.
And I've spent so much time thinking about it
and how to spell this in different languages.
Same problem.
We'll assume that we already have a sorted list of numbers.
And the question is now not what is the maximum of adjacent values,
difference between adjacent values,
but what is the count of the maximum difference between adjacent values?
And this, this is where things get interesting.
So wait, so wait, you want the, you want to know how many times that maximum gap occurs.
Yes.
And I realized probably I should have done this 20 minutes ago.
Also, we might be bleeding in.
This might be part two now of this episode.
I probably should have given an example. So for instance, for the first problem, if we had the numbers 1, 3, 5, and 10, the differences
are 2, 2, and 5. So the answer to that question is 5. But for the second problem,
the answer would be 1 because there is 1, 5. However, if the sequence was 1, 3, 5, 10, 15, 20,
our differences are then 2, 2, 5, 5, 5.
And so then you want the count of the number of 5s,
whatever the maximum difference is.
In this case, it would be 3.
And it's only a slightly different problem,
but the implications on how you solve this, I
think are interesting.
Tune in next week to hear part two of this two-part episode where we solve max gap count.
Thanks for listening.
We hope you enjoyed and have a great day.