Algorithms + Data Structures = Programs - Episode 75: C++ Algorithms with Ben Deane (Part 1)
Episode Date: April 29, 2022In this episode, Bryce and Conor talk to Ben Deane about C++ Algorithms!TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest:For Ben Deane, C++ wasn’t even among the first 10... languages that he learned on his programming journey, but it’s been the one that has paid the bills for the last 20-odd years. He spent most of that time in the games industry; many of the games he worked on used to be fondly remembered but now he’s accepted that they are probably mostly forgotten. These days he works in the finance industry writing high-frequency trading platforms in the most modern C++ that compilers can support.In his spare time he watches a lot of YouTube’s educational sector, practices the Japanese art of tsundoku, reads about the history of programming, avoids doing DIY, and surprises his wife by waking in the middle of the night yelling, “of course, it’s a monad!” before going back to sleep and dreaming of algorithms.Show NotesDate Recorded: 2022-04-14Date Released: 2022-04-22BublyspindriftLa CroixSAPSUCKERBen Deane Cppcast Episode: Programming History, JIT Compilations and Generic AlgorithmsC++Now 2018: Ben Deane “Easy to Use, Hard to Misuse: Declarative Style in C++”C++Now 2019: Ben Deane “Identifying Monoids: Exploiting Compositional Structure in Code”CppCon 2016: Ben Deane “std::accumulate: Exploring an Algorithmic Empire”CppCon 2016: Ben Deane “Using Types Effectively”ADSP Episode 72: C++ Algorithm Family Feud!C++ std::nth_elementC++ std::accumulateC++ std::reduceC++ std::sortIntrosortMerge sortQuicksortPython’s TimsortC++ std::atomicC++ std::partial_sort_copyC++ std::partial_sortIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
Ben, we're talking about Ben. Prolific speaker on the C++ speaker track, algorithm expert, polyglot.
Anthelmint is interesting for another reason.
I have long considered it the sort of most, recorded on April 19th, 2022.
My name is Connor, and today with my co-host Bryce, we talk to algorithm expert Ben Dean about C++ algorithms.
Well, let's do this right off the bat.
We're hopping, well, before we hop into things, we'll get... I know, I have a feeling I know what Connor's about to do.
They're never going to sponsor us, Connor.
They're never going to sponsor us.
Did you hear Netflix down 20% after hours today so maybe maybe maybe nvidia will replace netflix as the n in uh
in fame i feel like that acronym is now increasingly irrelevant with with the players
in the market you know i feel like i throw a lot of shade at netflix on this podcast for absolutely
no reason and if anybody who works who works at netflix is listening to this podcast for absolutely no reason. If anybody who works at Netflix is listening to this podcast,
I really have no problem with you.
It sounds like a great company, but I will continue to throw shade.
All right, Connor, do your thing.
Our thing is that we're terrible.
Well, we're good.
I think we're decent podcast co-hosts,
but we do a terrible job of introducing our guests
as demonstrated by our last recording
with guests when we had patrice and jason i thought this was a bubbly i thought you were
gonna go do a bubbly oh no we well we've got the brand new bubbly flavor it's well so of course
how is bubbly because uh what's the name of the other one? I can't remember. Perrier?
No, no.
Uh-huh.
No.
They even, they spell bubbly wrong.
Oh gosh.
I had one today and I can't even remember the name.
San Pellegrino?
No, it's a flavor.
It's, it's.
La Croix?
La Croix, that's the one.
Oh, see, that's big down in the States.
It's not really that big up here in Canada.
But I've just discovered Spindrift,
which is, which is like La Croix, but nice.
I have to say, they have compromised.
It's not zero calories.
It's a whole five calories a can.
La Croix, if you want to sponsor us, we will make Ben come back and apologize.
I'm not a big La Croix drinker, but I do drink La Croix.
But I've recently discovered Spindrift.
We tolerate it. We tolerate it. Everyone tolerates La Croix drinker, but I do drink LaCroix, but I've recently discovered Spindrift. We tolerate it.
We tolerate it.
Everybody tolerates LaCroix.
Yeah, exactly.
What's the difference?
See, well, I just want to talk.
This is meta because we are literally proving what we're bad at.
We're bad at introducing our guests.
We got two seconds into introducing Ben, and then Bryce was like, weren't you going to talk about bubbly?
And then whatever percentage of attention deficit disorder I have, I was like, no, but let to talk about bubbly? And then whatever percentage of like attention deficit disorder I have,
I was like, no, but let me tell you.
And now we're talking anyways, let's finish this topic.
Spindrift, is that what it's called?
Spindrift?
Spindrift is like La Croix, except they've compromised, as I say,
very, very slightly on the calories for much nicer flavor, in my opinion.
Interesting.
You know, my go-to drink is vitamin water zero i literally
have a fridge full of vitamin water zero at all times it's like i don't drink water i just isn't
isn't that like the crypto isn't that like the crypto of drinks it's just a big scam it's i'm
it tastes too good to have zero calories i'm sure something in it is going to kill me but i am
addicted it's fair enough literally my my mother was here the last week visiting me and she's like
i want a glass of water there's no water in this apartment can i just have some regular water
wait don't you isn't is the tap water in new york not drinkable it, but I don't know.
All of the women in my life want cold water,
and my apartment does not have it.
All right.
Well, any of the companies, aforementioned companies,
including Sapsucker, which might be the spin drift of Canada,
which I think it has like 20 calories or 15 calories a can, but's like it takes like recycled i don't know syrup
or something and um i don't know honestly it's got it's something it comes something from a tree
something from a tree's in it i don't want syrup that somebody else has already had i i don't think
that's i don't think that's what it is but it's like I think there's a process of making syrup,
and then there's some like, you know, like in an oil,
what do they call that stuff in an oil plant or oil refinery,
and then there's like a pool where all the bad stuff goes.
It's called like a something off.
Oh, that makes more sense.
That makes more sense.
I think Ben actually knew the name of it.
What is it called?
You mean like a sump?
Like that just collects the runoff kind of?
Yeah, yeah, runoff. That's what i was thinking of that's the word um my my older sister as an engineer would have definitely uh had it right away and shannon she's um yeah she's in calgary
now left left me all alone in toronto but uh i think it's basically the runoff of like maple
syrup which is still good for stuff but not syrup and
so this drink company could be the spindrift equivalent speaking of syrup i was going to say
the current syrup i have in my fridge i just made pancakes and i noticed on the bottle and this is
going to offend everyone right on the bottle it says product of usa and canada oh that is that is wrong that is wrong i there is one good why there is one good
maple tree product from canada and that's maple butter specifically the type of maple butter that
jf's brother brings me every time i see him and yes this is a coded way of me telling J.F.
to tell his brother to continue to bring me maple butter.
You just said like two episodes ago
that he doesn't have time to listen to this.
You just ripped on him for like 10 minutes.
Somebody else will listen to this,
like Connor or somebody,
and will then tell J.F.,
and then J.F. will tell his brother.
I got a system.
But no, real maple syrup is made in made in vermont it is it is a
product of vermont america all right 10 minutes later today our guest i think other than sean
parent you are the second ever returning guest to the podcast unless if my memory's failing me
we've definitely and for the listener we cut every time we record with someone into like two or three
episodes so if you're thinking what are you we record with someone into like two or three episodes.
So if you're thinking, what are you talking about?
Everyone's been on two or three times.
False.
Everyone's been on once except for Sean Parent.
And then we cut that up into 30 to 35, 40 minute episodes.
So Ben, the last time you were on was during the virtual C++ Now of 2020, I think.
No, it was 2021. I was gonna say this almost this time last year this podcast did not exist at the time of yeah c++ now 2020 would have happened you're on your
74th week or whatever it is yeah let's well let's be honest we've all lost track of time
and uh in in this virtual new world that we live in um but wait bryce don't say
anything we were we were like we were we were almost there we were almost introducing him
so if you if you're in the c++ community you've probably heard of ben uh he's a prolific talk
speaker winner of many best speaker awards at c++ now one of the main talk speaker i'm a talk speaker
well you know honestly folks i should read a talk speaker. Well, you know, honestly, folks,
I should read a sentence that I wrote the other day on Discord.
There were so many skipped words and grammar mistakes.
Anyways, so he's a prolific speaker with dudes.
What's the actual?
Just speaker?
I think that's the word you're asking for.
I've got to be honest.
I just got back from running 18K very quickly at Run Club.
So I'm a little bit, and I'm in like a 2,500 calorie deficit right now.
Did you shower?
I did, yeah.
And I showered right before.
Anyways, Ben, we're talking about Ben.
Prolific speaker on the C++ speaker track.
Algorithm expert, polyglot.
If you've listened to his CppCast episode, I think you started off by going through the plethora of languages that you have studied at one point or used.
Likely.
Jason always asks about history and that sort of thing.
So, yeah. Yes, Ben is a fellow, uh, program language, historian, uh, enthusiast. What do you, what is it? Is it a book of file or, uh, some,
someone you have a library of file, bibliofile. You are really bad at words today, Connor. This
is going to be a rough takeoff. And the reason Ben's on today and Ben's one of my favorite
speakers. What let's list off a couple of your talks. What's the, I'm not even gonna remember any of the names of them, declarative.
Yeah, one was called easy to use,
hard to misuse declarative style in C++.
Yes, fantastic.
Ben, what's your favorite talk that you've ever given?
That's probably a hard question to answer.
That's a hard one, but I suspect like several speakers,
I feel like I've given the same talk
in different guises several times
or variations on a theme. There are two styles of talks that I really think about. The easy style,
which is, here's a thing I did, here's the code, here's a C++ feature, here's how to think about
it, explain it, et cetera, right? The easy style of talk to me is like, you have code, you explain it.
The harder style, the more,
but the more rewarding style
is the sort of more philosophical style.
Can be illustrated with code, of course,
but that style is more of the kind of declarative style talk.
So I like both styles.
Like I like the philosophical style,
but the easier one to put together is definitely like,
here's the code, here's how it works style.
Easy to use.
Yeah, easy to use, hard to misuse.
You had the other one too recently.
I think it was at CppCon, which was the,
or no, that was at C++ Now as well.
Identifying monoids.
Yeah.
And also the talk that I don't think ever gets mentioned enough.
And it definitely, well, it was at Cpp con. So it didn't quote unquote, win any awards, which is maybe why some of the other ones get more buzz. Even though obviously, they're, they're really good is the stood accumulate an algorithm empire. I think that talk is fantastic, because of sort of just the exercise of I think you use the word sort of perversely, like you're using the algorithms in ways they
shouldn't be used.
Yes.
And bending them as an exercise to really see like what they're capable of.
And a bunch of the things you point out is like, don't do this stuff.
But the exercise of doing this with the, you know, where you give a little summary of the
talk.
Well, that talk, like the talk that Jason and I did, Constance for all the things, it's kind of interesting to look back and see that that talk like the talk that jason and i did context for all the things it's kind of
interesting to look back and see that that talk is of its time right and to look back and be like
wow that was like you say that was just an exercise in making accumulate implement different
algorithms and it was a sort of a thought experiment if you like i mean it was an actual
code experiment but in a sense it's a
thought experiment but interesting but interesting to look back at and see you know it's kind of of
its time now and i look like all of us i look back at my own code and i think you know what was i
thinking i shouldn't have done that like there are better ways to do that in a sense for that
talk it's fine because the whole thing is predicated on just this is really
interesting and a good way to think about it but don't ever write this code sort of thing
the key quote from that talk that i've sort of clipped into a couple of my talks is that at a
certain point you go through the exercise of seeing how many algorithms you can actually
implement using std accumulate or some the sort of platonic equivalent of still accumulate yeah yeah and yes so 77 of
the 90 of them right right and i think we had 90 algorithms in 14 something like that connor what
just like yeah you just cut out one sec one sec connor is experiencing undefined behavior right
now yes come back to us connor are you good can people can people hear
me yeah we could hear you throughout the entire ordeal i'm guessing the listener can also hear you
i in the middle of me saying the 77 of the 90 algorithms are implementable in terms of accumulate
this woman started talking in my headphones saying your headphones are out of battery please recharge
turning off okay see but i feel i feel like that was probably not the first warning that you got
that your headphones were low on battery because that's true that's true that lady said something
earlier today and i ignored her um yeah you should have listened all right ben back to you you were
going to say one of your talks one of my talks that had quite a bit of impact
and people still seem to remember,
or at least, let's say,
folks outside of the rarefied bubble of conference speakers
that I talk to sort of in the real world
and or in the games industry as I was in,
they really enjoyed the um using types effectively talk that i gave
i think it was cppcon 16 is that the one where you have the you know how many types i have a
little game yeah a little type theory game um talking about sites uh types as characterized
by the cardinality of the set of values they hold are you are you coming to c++ now this year ben yep yes i'll be there that's in just over that's in less than two weeks now right
yeah yeah by the time this airs what day is it it's the 19th in fact one of the reasons that
we're recording this is because i will be at c++ now and or the one of the reasons we're recording
this now is because i'm going to be gone for the
first two weeks of may and uh and connor needs all the bryce gold to produce to produce the podcast
maybe i'll just end up going solo we've never we've never done an episode where we
no no that's not true we had the one one episode, right? At C++ Now last year.
Right.
There was one episode without me.
Connor and Tony and I.
Yeah.
It was Ben and I for the first bit,
and then Canadians have a habit of showing up halfway into the recordings.
That's not foreshadowing.
There's no random Canadian showing up today.
So how this goes is I listen to your podcast probably every other week.
And I catch up.
And then invariably, it's like on a Saturday morning when I have some time.
And then invariably, I DM Connor on Twitter.
I only use Twitter for DMs these days.
But I DM Connor and I say, I have opinions.
This is true.
And this is why Ben, so Ben, so this is uh gonna be i think i'm not sure
if it's a response to multiple podcasts but ben dm'd me after the c++ algorithm family feud
interview question uh of which oh i should i should call out we've got we've had like two or
three people who have coded yeah two people one of, two people. One of them was from the C++ North Denver
meetup, because I believe Tyler
Oh, right.
What was it? I want to get the name correct here.
Tyler Weaver
made both a YouTube video and
sort of a Godbolt and a QuickBench
and was basically saying that
it looked like
the max element where removing
an element and then just doing another max element
was the fastest and transform reduce was like the slowest i wasn't that ridiculous max element thing
my idea yeah but we said it was a bad idea though hey hey hey hey i sir i sir i'm an architect now. My job is not to write code or to have good ideas.
My job is just to come up with ideas,
and somebody else figures out if they're good or not.
It's, yeah.
So we'll get back to this.
But anyways, Tyler Weaver had a YouTube video,
and there was another individual.
I'll grab his name at some point.
So Ben DMed me after listening to this and said,
we need to chat. And I said, do you want to come on the podcast and he said yeah sure it doesn't matter
to me as long as we get to chat and so we need to have we need to talk we need to have conversation
well this is it like I have opinions right yeah who is it that messed up the worst was it me or
was it Bryce no one messed up and i actually really
enjoyed the format where like the game show format where you're like you're like okay bryce
now name name the answer oh but name the series of answers in reverse order so when you pose the
question which was uh find the top two right or the or the minimum two or the maximum two, same thing, right, in an unsorted array of integers, let's say.
My immediate thought was nth element.
That's what nth element does.
But then, of course, as is your won't,
you solved it with some reduction.
And that's fine.
Of course. This is the reduction yeah yeah
yeah yeah so so it kind of is i still believe the correct name for that algorithm is top two or top
n perhaps i mean i'm not claiming the nth element is particularly a good name for that application
so it occurred to me like you you're interviewing probably for library developers, right?
As opposed to application developers.
And there's a difference there.
That's right.
Because in an application developer, the obvious thing is, well, nth element, right?
In a library developer, what I would like as an application developer is for nth element to be implemented with a reduction under the hood.
If that is faster, taking into account array size, parallelism, element size, that sort of thing.
You made a comment to that effect yourself, Bryce, where you said something about like the,
you know, the performance may vary depending on several of those things anyway. Yeah.
So one of the things I think is a good part of this question, and as you said before, I don't ask technical questions in interviews.
But if I was, and I was going to use this question.
Look at Bryce trying not to say, oh, this was a good question.
Actually, let me correct myself.
It's in a bad category of questions that I don't endorse.
But in that bad category, it's an okay question.
It's more I call into question the entire notion of how traditional tech interviews are conducted.
But that aside, if I'm looking for a library developer, I'm looking for somebody who looks at this question, comes up with multiple different answers.
And then like the really the right way to answer this question is to tell me that there is no one correct answer.
That here are the options and that there are going to be different circumstances under which these will be, you know, each one of these will be the right one.
And that like maybe the way that you would build a high performance implementation, a generic implementation of this would be to use a combination of different strategies.
Yeah. Yeah. Because, I mean, as a very specific question, top two. Great.
The generalization is top N, of course, which in some sense is nth element.
Yes.
And that's where it's like, well, top two, you know, like a pair is very easy, a very small amount of data, assuming you have a small element size.
But if you want top a thousand, like that might be a very different sort of problem that parallelization might not be amenable to. And in general, median is a problem that's annoyingly hard to parallelize, I think.
It's definitely not one of the trivial bread and butter, like a reduction, the bread and butter one.
Yeah.
So that's interesting.
I guess my first thought is how many algorithms, like the first one I can think of is, I think the std sort
has the behavior where depending on the size,
it will choose different sort algorithms.
And I think like the actual name of,
I'm not sure if it's the same across
libstd C++ and libc++,
but I think at least one of them uses introsort,
which is like a combination of merge sort
and I'm not going to get the other one wrong,
but like quicksort or something like that
where above a certain size, it chooses merge sort,
but then once it gets down below a certain size,
it then switches to another one.
And that's actually, it's something similar in Python
that they have some kind of hybrid algorithm and it's named after some individual. So do you know how many
algorithms actually in the algorithm header have behavior like that? So that's basically what we're,
you're arguing here is it should be top N that uses nth element for the cases that on average
it's good for, and then it switches to a simple reduction that just has... Hang on, hang on. I think I would argue that the top end really is just nth element.
I think really the question is, should nth element implementations special case the case
of two elements, or the case of one, two, and three elements?
And actually, it would be an interesting experiment to see what the cutoff point is, which it makes sense to special case this.
The other observation I had is I think the reduction is, you can correct me on this, but I think it's by necessity giving you back the ordered pair.
The way it would have to be implemented.
Whereas nth element, of course course doesn't order the top yep
necessarily well it depends so like in the parallel version where you have the associative
the associative and commutative operation it actually will be but i think the semantic yeah
i think the semantics of reducing the pair mean that you have to designate one as the higher and one is the lower otherwise you'd have to you yeah but why wouldn't why wouldn't the nth element give
you give you the order nth element doesn't have to order because it's not sorted it might end up
ordering a pair i don't exactly know but if you ask you ask it for top end, it's going to pin the nth element.
Everything above it is going to belong above it, but not necessarily in the right order.
So the API for nth element is three iterators, the beginning of the range, the element that you want to get at the right spot and then the end of the range and the only thing
you're guaranteed is that everything below that point your nth element point is less than or
whatever if you're using the default comparator and then everything above it is greater and then
you're guaranteed that at that point is where it would show up in a sort of if it were ordered that
would be a partial sort, right?
Exactly.
And I think Kate Gregory actually at one point,
she in one of her talks says that partial sort copy should be called top end.
And then in a talk that I later gave,
I pointed out that actually partial sort copy should be called top end sorted and nth element copy should be called top end
because the partial sort one is the one that guarantees
that your your
top n are going to be sorted but but if the if the k if you're using nth element to do top two
doesn't it happen to be the case that that it'll be in order just for that case of top two yes
oh yeah actually that's a good point but but i guess the point is it hasn't done any extra
comparisons to make that the case right i just i just meant that like yeah in the case of two
it like it does sort of happen to be um yeah i mean maybe maybe case i don't i don't think that
we need a top in our element i think that the thatend element. I think that really it's just you either use
nth element or you use partial sort copy, depending on whether you want them ordered or not.
But this question of special casing for some of these lower count cases is an interesting one.
A lot of the optimizations that we end up doing in the CUB library, which is our library of
algorithmic primitives, of parallel algorithmic primitives for CUDA.
A lot of the big performance wins that we get is because like we identify a special case that, you know, happens to be easy to optimize and also like a lot of people just
fall into it.
So almost every algorithm has some decision tree of, you know, if it's this case, you do this thing and otherwise you do this thing.
The element is interesting for another reason.
It's I have long considered it the sort of most unloved algorithm or most overlooked algorithm.
And as you know, in I think it was 17, the complexity boundaries of sort were tightened.
It used to say prior to seven, I think it's 17, right?
Maybe 14 or even 11.
Maybe it was 11.
But it used to say n log n on average.
And now the standard says sort is n log n, period.
Right?
I think we cleaned that up in 17 because we cleaned up a lot of the complexities when we added the parallel algorithms and we had to figure out the complexities for them, but I could be wrong.
The same cleanup was not done for nth element.
Nth element is linear on average. But I think essentially the same work, which is
the work of Dave Musser, he did a lot of work on this back in the 2000s, I think it was,
that led to intro sort that Connor was talking about and the possibility of making sort n log n on average i think that same underlying
that same underlying work translates to making nth element linear period if we wanted that
someone's got to write a paper listener out there if you're interested in getting involved in
standard committee work yeah because intro select is the algorithm counterpart to for enthousiasm for intro sort right right yeah this ends part one
of what will probably be a four-part series on algorithms thanks for listening we hope you enjoy
and have a great day