Algorithms + Data Structures = Programs - Episode 75: C++ Algorithms with Ben Deane (Part 1)

Episode Date: April 29, 2022

In this episode, Bryce and Conor talk to Ben Deane about C++ Algorithms!TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest:For Ben Deane, C++ wasn’t even among the first 10... languages that he learned on his programming journey, but it’s been the one that has paid the bills for the last 20-odd years. He spent most of that time in the games industry; many of the games he worked on used to be fondly remembered but now he’s accepted that they are probably mostly forgotten. These days he works in the finance industry writing high-frequency trading platforms in the most modern C++ that compilers can support.In his spare time he watches a lot of YouTube’s educational sector, practices the Japanese art of tsundoku, reads about the history of programming, avoids doing DIY, and surprises his wife by waking in the middle of the night yelling, “of course, it’s a monad!” before going back to sleep and dreaming of algorithms.Show NotesDate Recorded: 2022-04-14Date Released: 2022-04-22BublyspindriftLa CroixSAPSUCKERBen Deane Cppcast Episode: Programming History, JIT Compilations and Generic AlgorithmsC++Now 2018: Ben Deane “Easy to Use, Hard to Misuse: Declarative Style in C++”C++Now 2019: Ben Deane “Identifying Monoids: Exploiting Compositional Structure in Code”CppCon 2016: Ben Deane “std::accumulate: Exploring an Algorithmic Empire”CppCon 2016: Ben Deane “Using Types Effectively”ADSP Episode 72: C++ Algorithm Family Feud!C++ std::nth_elementC++ std::accumulateC++ std::reduceC++ std::sortIntrosortMerge sortQuicksortPython’s TimsortC++ std::atomicC++ std::partial_sort_copyC++ std::partial_sortIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 Ben, we're talking about Ben. Prolific speaker on the C++ speaker track, algorithm expert, polyglot. Anthelmint is interesting for another reason. I have long considered it the sort of most, recorded on April 19th, 2022. My name is Connor, and today with my co-host Bryce, we talk to algorithm expert Ben Dean about C++ algorithms. Well, let's do this right off the bat. We're hopping, well, before we hop into things, we'll get... I know, I have a feeling I know what Connor's about to do. They're never going to sponsor us, Connor. They're never going to sponsor us.
Starting point is 00:00:55 Did you hear Netflix down 20% after hours today so maybe maybe maybe nvidia will replace netflix as the n in uh in fame i feel like that acronym is now increasingly irrelevant with with the players in the market you know i feel like i throw a lot of shade at netflix on this podcast for absolutely no reason and if anybody who works who works at netflix is listening to this podcast for absolutely no reason. If anybody who works at Netflix is listening to this podcast, I really have no problem with you. It sounds like a great company, but I will continue to throw shade. All right, Connor, do your thing. Our thing is that we're terrible.
Starting point is 00:01:38 Well, we're good. I think we're decent podcast co-hosts, but we do a terrible job of introducing our guests as demonstrated by our last recording with guests when we had patrice and jason i thought this was a bubbly i thought you were gonna go do a bubbly oh no we well we've got the brand new bubbly flavor it's well so of course how is bubbly because uh what's the name of the other one? I can't remember. Perrier? No, no.
Starting point is 00:02:07 Uh-huh. No. They even, they spell bubbly wrong. Oh gosh. I had one today and I can't even remember the name. San Pellegrino? No, it's a flavor. It's, it's.
Starting point is 00:02:17 La Croix? La Croix, that's the one. Oh, see, that's big down in the States. It's not really that big up here in Canada. But I've just discovered Spindrift, which is, which is like La Croix, but nice. I have to say, they have compromised. It's not zero calories.
Starting point is 00:02:31 It's a whole five calories a can. La Croix, if you want to sponsor us, we will make Ben come back and apologize. I'm not a big La Croix drinker, but I do drink La Croix. But I've recently discovered Spindrift. We tolerate it. We tolerate it. Everyone tolerates La Croix drinker, but I do drink LaCroix, but I've recently discovered Spindrift. We tolerate it. We tolerate it. Everybody tolerates LaCroix. Yeah, exactly.
Starting point is 00:02:47 What's the difference? See, well, I just want to talk. This is meta because we are literally proving what we're bad at. We're bad at introducing our guests. We got two seconds into introducing Ben, and then Bryce was like, weren't you going to talk about bubbly? And then whatever percentage of attention deficit disorder I have, I was like, no, but let to talk about bubbly? And then whatever percentage of like attention deficit disorder I have, I was like, no, but let me tell you. And now we're talking anyways, let's finish this topic.
Starting point is 00:03:10 Spindrift, is that what it's called? Spindrift? Spindrift is like La Croix, except they've compromised, as I say, very, very slightly on the calories for much nicer flavor, in my opinion. Interesting. You know, my go-to drink is vitamin water zero i literally have a fridge full of vitamin water zero at all times it's like i don't drink water i just isn't isn't that like the crypto isn't that like the crypto of drinks it's just a big scam it's i'm
Starting point is 00:03:38 it tastes too good to have zero calories i'm sure something in it is going to kill me but i am addicted it's fair enough literally my my mother was here the last week visiting me and she's like i want a glass of water there's no water in this apartment can i just have some regular water wait don't you isn't is the tap water in new york not drinkable it, but I don't know. All of the women in my life want cold water, and my apartment does not have it. All right. Well, any of the companies, aforementioned companies,
Starting point is 00:04:18 including Sapsucker, which might be the spin drift of Canada, which I think it has like 20 calories or 15 calories a can, but's like it takes like recycled i don't know syrup or something and um i don't know honestly it's got it's something it comes something from a tree something from a tree's in it i don't want syrup that somebody else has already had i i don't think that's i don't think that's what it is but it's like I think there's a process of making syrup, and then there's some like, you know, like in an oil, what do they call that stuff in an oil plant or oil refinery, and then there's like a pool where all the bad stuff goes.
Starting point is 00:04:54 It's called like a something off. Oh, that makes more sense. That makes more sense. I think Ben actually knew the name of it. What is it called? You mean like a sump? Like that just collects the runoff kind of? Yeah, yeah, runoff. That's what i was thinking of that's the word um my my older sister as an engineer would have definitely uh had it right away and shannon she's um yeah she's in calgary
Starting point is 00:05:16 now left left me all alone in toronto but uh i think it's basically the runoff of like maple syrup which is still good for stuff but not syrup and so this drink company could be the spindrift equivalent speaking of syrup i was going to say the current syrup i have in my fridge i just made pancakes and i noticed on the bottle and this is going to offend everyone right on the bottle it says product of usa and canada oh that is that is wrong that is wrong i there is one good why there is one good maple tree product from canada and that's maple butter specifically the type of maple butter that jf's brother brings me every time i see him and yes this is a coded way of me telling J.F. to tell his brother to continue to bring me maple butter.
Starting point is 00:06:09 You just said like two episodes ago that he doesn't have time to listen to this. You just ripped on him for like 10 minutes. Somebody else will listen to this, like Connor or somebody, and will then tell J.F., and then J.F. will tell his brother. I got a system.
Starting point is 00:06:22 But no, real maple syrup is made in made in vermont it is it is a product of vermont america all right 10 minutes later today our guest i think other than sean parent you are the second ever returning guest to the podcast unless if my memory's failing me we've definitely and for the listener we cut every time we record with someone into like two or three episodes so if you're thinking what are you we record with someone into like two or three episodes. So if you're thinking, what are you talking about? Everyone's been on two or three times. False.
Starting point is 00:06:49 Everyone's been on once except for Sean Parent. And then we cut that up into 30 to 35, 40 minute episodes. So Ben, the last time you were on was during the virtual C++ Now of 2020, I think. No, it was 2021. I was gonna say this almost this time last year this podcast did not exist at the time of yeah c++ now 2020 would have happened you're on your 74th week or whatever it is yeah let's well let's be honest we've all lost track of time and uh in in this virtual new world that we live in um but wait bryce don't say anything we were we were like we were we were almost there we were almost introducing him so if you if you're in the c++ community you've probably heard of ben uh he's a prolific talk
Starting point is 00:07:37 speaker winner of many best speaker awards at c++ now one of the main talk speaker i'm a talk speaker well you know honestly folks i should read a talk speaker. Well, you know, honestly, folks, I should read a sentence that I wrote the other day on Discord. There were so many skipped words and grammar mistakes. Anyways, so he's a prolific speaker with dudes. What's the actual? Just speaker? I think that's the word you're asking for.
Starting point is 00:08:05 I've got to be honest. I just got back from running 18K very quickly at Run Club. So I'm a little bit, and I'm in like a 2,500 calorie deficit right now. Did you shower? I did, yeah. And I showered right before. Anyways, Ben, we're talking about Ben. Prolific speaker on the C++ speaker track.
Starting point is 00:08:26 Algorithm expert, polyglot. If you've listened to his CppCast episode, I think you started off by going through the plethora of languages that you have studied at one point or used. Likely. Jason always asks about history and that sort of thing. So, yeah. Yes, Ben is a fellow, uh, program language, historian, uh, enthusiast. What do you, what is it? Is it a book of file or, uh, some, someone you have a library of file, bibliofile. You are really bad at words today, Connor. This is going to be a rough takeoff. And the reason Ben's on today and Ben's one of my favorite speakers. What let's list off a couple of your talks. What's the, I'm not even gonna remember any of the names of them, declarative.
Starting point is 00:09:06 Yeah, one was called easy to use, hard to misuse declarative style in C++. Yes, fantastic. Ben, what's your favorite talk that you've ever given? That's probably a hard question to answer. That's a hard one, but I suspect like several speakers, I feel like I've given the same talk in different guises several times
Starting point is 00:09:24 or variations on a theme. There are two styles of talks that I really think about. The easy style, which is, here's a thing I did, here's the code, here's a C++ feature, here's how to think about it, explain it, et cetera, right? The easy style of talk to me is like, you have code, you explain it. The harder style, the more, but the more rewarding style is the sort of more philosophical style. Can be illustrated with code, of course, but that style is more of the kind of declarative style talk.
Starting point is 00:10:00 So I like both styles. Like I like the philosophical style, but the easier one to put together is definitely like, here's the code, here's how it works style. Easy to use. Yeah, easy to use, hard to misuse. You had the other one too recently. I think it was at CppCon, which was the,
Starting point is 00:10:16 or no, that was at C++ Now as well. Identifying monoids. Yeah. And also the talk that I don't think ever gets mentioned enough. And it definitely, well, it was at Cpp con. So it didn't quote unquote, win any awards, which is maybe why some of the other ones get more buzz. Even though obviously, they're, they're really good is the stood accumulate an algorithm empire. I think that talk is fantastic, because of sort of just the exercise of I think you use the word sort of perversely, like you're using the algorithms in ways they shouldn't be used. Yes. And bending them as an exercise to really see like what they're capable of.
Starting point is 00:10:51 And a bunch of the things you point out is like, don't do this stuff. But the exercise of doing this with the, you know, where you give a little summary of the talk. Well, that talk, like the talk that Jason and I did, Constance for all the things, it's kind of interesting to look back and see that that talk like the talk that jason and i did context for all the things it's kind of interesting to look back and see that that talk is of its time right and to look back and be like wow that was like you say that was just an exercise in making accumulate implement different algorithms and it was a sort of a thought experiment if you like i mean it was an actual code experiment but in a sense it's a
Starting point is 00:11:25 thought experiment but interesting but interesting to look back at and see you know it's kind of of its time now and i look like all of us i look back at my own code and i think you know what was i thinking i shouldn't have done that like there are better ways to do that in a sense for that talk it's fine because the whole thing is predicated on just this is really interesting and a good way to think about it but don't ever write this code sort of thing the key quote from that talk that i've sort of clipped into a couple of my talks is that at a certain point you go through the exercise of seeing how many algorithms you can actually implement using std accumulate or some the sort of platonic equivalent of still accumulate yeah yeah and yes so 77 of
Starting point is 00:12:06 the 90 of them right right and i think we had 90 algorithms in 14 something like that connor what just like yeah you just cut out one sec one sec connor is experiencing undefined behavior right now yes come back to us connor are you good can people can people hear me yeah we could hear you throughout the entire ordeal i'm guessing the listener can also hear you i in the middle of me saying the 77 of the 90 algorithms are implementable in terms of accumulate this woman started talking in my headphones saying your headphones are out of battery please recharge turning off okay see but i feel i feel like that was probably not the first warning that you got that your headphones were low on battery because that's true that's true that lady said something
Starting point is 00:12:58 earlier today and i ignored her um yeah you should have listened all right ben back to you you were going to say one of your talks one of my talks that had quite a bit of impact and people still seem to remember, or at least, let's say, folks outside of the rarefied bubble of conference speakers that I talk to sort of in the real world and or in the games industry as I was in, they really enjoyed the um using types effectively talk that i gave
Starting point is 00:13:28 i think it was cppcon 16 is that the one where you have the you know how many types i have a little game yeah a little type theory game um talking about sites uh types as characterized by the cardinality of the set of values they hold are you are you coming to c++ now this year ben yep yes i'll be there that's in just over that's in less than two weeks now right yeah yeah by the time this airs what day is it it's the 19th in fact one of the reasons that we're recording this is because i will be at c++ now and or the one of the reasons we're recording this now is because i'm going to be gone for the first two weeks of may and uh and connor needs all the bryce gold to produce to produce the podcast maybe i'll just end up going solo we've never we've never done an episode where we
Starting point is 00:14:19 no no that's not true we had the one one episode, right? At C++ Now last year. Right. There was one episode without me. Connor and Tony and I. Yeah. It was Ben and I for the first bit, and then Canadians have a habit of showing up halfway into the recordings. That's not foreshadowing.
Starting point is 00:14:40 There's no random Canadian showing up today. So how this goes is I listen to your podcast probably every other week. And I catch up. And then invariably, it's like on a Saturday morning when I have some time. And then invariably, I DM Connor on Twitter. I only use Twitter for DMs these days. But I DM Connor and I say, I have opinions. This is true.
Starting point is 00:15:03 And this is why Ben, so Ben, so this is uh gonna be i think i'm not sure if it's a response to multiple podcasts but ben dm'd me after the c++ algorithm family feud interview question uh of which oh i should i should call out we've got we've had like two or three people who have coded yeah two people one of, two people. One of them was from the C++ North Denver meetup, because I believe Tyler Oh, right. What was it? I want to get the name correct here. Tyler Weaver
Starting point is 00:15:33 made both a YouTube video and sort of a Godbolt and a QuickBench and was basically saying that it looked like the max element where removing an element and then just doing another max element was the fastest and transform reduce was like the slowest i wasn't that ridiculous max element thing my idea yeah but we said it was a bad idea though hey hey hey hey i sir i sir i'm an architect now. My job is not to write code or to have good ideas.
Starting point is 00:16:08 My job is just to come up with ideas, and somebody else figures out if they're good or not. It's, yeah. So we'll get back to this. But anyways, Tyler Weaver had a YouTube video, and there was another individual. I'll grab his name at some point. So Ben DMed me after listening to this and said,
Starting point is 00:16:25 we need to chat. And I said, do you want to come on the podcast and he said yeah sure it doesn't matter to me as long as we get to chat and so we need to have we need to talk we need to have conversation well this is it like I have opinions right yeah who is it that messed up the worst was it me or was it Bryce no one messed up and i actually really enjoyed the format where like the game show format where you're like you're like okay bryce now name name the answer oh but name the series of answers in reverse order so when you pose the question which was uh find the top two right or the or the minimum two or the maximum two, same thing, right, in an unsorted array of integers, let's say. My immediate thought was nth element.
Starting point is 00:17:12 That's what nth element does. But then, of course, as is your won't, you solved it with some reduction. And that's fine. Of course. This is the reduction yeah yeah yeah yeah so so it kind of is i still believe the correct name for that algorithm is top two or top n perhaps i mean i'm not claiming the nth element is particularly a good name for that application so it occurred to me like you you're interviewing probably for library developers, right?
Starting point is 00:17:47 As opposed to application developers. And there's a difference there. That's right. Because in an application developer, the obvious thing is, well, nth element, right? In a library developer, what I would like as an application developer is for nth element to be implemented with a reduction under the hood. If that is faster, taking into account array size, parallelism, element size, that sort of thing. You made a comment to that effect yourself, Bryce, where you said something about like the, you know, the performance may vary depending on several of those things anyway. Yeah.
Starting point is 00:18:25 So one of the things I think is a good part of this question, and as you said before, I don't ask technical questions in interviews. But if I was, and I was going to use this question. Look at Bryce trying not to say, oh, this was a good question. Actually, let me correct myself. It's in a bad category of questions that I don't endorse. But in that bad category, it's an okay question. It's more I call into question the entire notion of how traditional tech interviews are conducted. But that aside, if I'm looking for a library developer, I'm looking for somebody who looks at this question, comes up with multiple different answers.
Starting point is 00:19:10 And then like the really the right way to answer this question is to tell me that there is no one correct answer. That here are the options and that there are going to be different circumstances under which these will be, you know, each one of these will be the right one. And that like maybe the way that you would build a high performance implementation, a generic implementation of this would be to use a combination of different strategies. Yeah. Yeah. Because, I mean, as a very specific question, top two. Great. The generalization is top N, of course, which in some sense is nth element. Yes. And that's where it's like, well, top two, you know, like a pair is very easy, a very small amount of data, assuming you have a small element size. But if you want top a thousand, like that might be a very different sort of problem that parallelization might not be amenable to. And in general, median is a problem that's annoyingly hard to parallelize, I think.
Starting point is 00:20:10 It's definitely not one of the trivial bread and butter, like a reduction, the bread and butter one. Yeah. So that's interesting. I guess my first thought is how many algorithms, like the first one I can think of is, I think the std sort has the behavior where depending on the size, it will choose different sort algorithms. And I think like the actual name of, I'm not sure if it's the same across
Starting point is 00:20:40 libstd C++ and libc++, but I think at least one of them uses introsort, which is like a combination of merge sort and I'm not going to get the other one wrong, but like quicksort or something like that where above a certain size, it chooses merge sort, but then once it gets down below a certain size, it then switches to another one.
Starting point is 00:21:00 And that's actually, it's something similar in Python that they have some kind of hybrid algorithm and it's named after some individual. So do you know how many algorithms actually in the algorithm header have behavior like that? So that's basically what we're, you're arguing here is it should be top N that uses nth element for the cases that on average it's good for, and then it switches to a simple reduction that just has... Hang on, hang on. I think I would argue that the top end really is just nth element. I think really the question is, should nth element implementations special case the case of two elements, or the case of one, two, and three elements? And actually, it would be an interesting experiment to see what the cutoff point is, which it makes sense to special case this.
Starting point is 00:21:53 The other observation I had is I think the reduction is, you can correct me on this, but I think it's by necessity giving you back the ordered pair. The way it would have to be implemented. Whereas nth element, of course course doesn't order the top yep necessarily well it depends so like in the parallel version where you have the associative the associative and commutative operation it actually will be but i think the semantic yeah i think the semantics of reducing the pair mean that you have to designate one as the higher and one is the lower otherwise you'd have to you yeah but why wouldn't why wouldn't the nth element give you give you the order nth element doesn't have to order because it's not sorted it might end up ordering a pair i don't exactly know but if you ask you ask it for top end, it's going to pin the nth element.
Starting point is 00:22:48 Everything above it is going to belong above it, but not necessarily in the right order. So the API for nth element is three iterators, the beginning of the range, the element that you want to get at the right spot and then the end of the range and the only thing you're guaranteed is that everything below that point your nth element point is less than or whatever if you're using the default comparator and then everything above it is greater and then you're guaranteed that at that point is where it would show up in a sort of if it were ordered that would be a partial sort, right? Exactly. And I think Kate Gregory actually at one point,
Starting point is 00:23:32 she in one of her talks says that partial sort copy should be called top end. And then in a talk that I later gave, I pointed out that actually partial sort copy should be called top end sorted and nth element copy should be called top end because the partial sort one is the one that guarantees that your your top n are going to be sorted but but if the if the k if you're using nth element to do top two doesn't it happen to be the case that that it'll be in order just for that case of top two yes oh yeah actually that's a good point but but i guess the point is it hasn't done any extra
Starting point is 00:24:06 comparisons to make that the case right i just i just meant that like yeah in the case of two it like it does sort of happen to be um yeah i mean maybe maybe case i don't i don't think that we need a top in our element i think that the thatend element. I think that really it's just you either use nth element or you use partial sort copy, depending on whether you want them ordered or not. But this question of special casing for some of these lower count cases is an interesting one. A lot of the optimizations that we end up doing in the CUB library, which is our library of algorithmic primitives, of parallel algorithmic primitives for CUDA. A lot of the big performance wins that we get is because like we identify a special case that, you know, happens to be easy to optimize and also like a lot of people just
Starting point is 00:25:17 fall into it. So almost every algorithm has some decision tree of, you know, if it's this case, you do this thing and otherwise you do this thing. The element is interesting for another reason. It's I have long considered it the sort of most unloved algorithm or most overlooked algorithm. And as you know, in I think it was 17, the complexity boundaries of sort were tightened. It used to say prior to seven, I think it's 17, right? Maybe 14 or even 11. Maybe it was 11.
Starting point is 00:25:55 But it used to say n log n on average. And now the standard says sort is n log n, period. Right? I think we cleaned that up in 17 because we cleaned up a lot of the complexities when we added the parallel algorithms and we had to figure out the complexities for them, but I could be wrong. The same cleanup was not done for nth element. Nth element is linear on average. But I think essentially the same work, which is the work of Dave Musser, he did a lot of work on this back in the 2000s, I think it was, that led to intro sort that Connor was talking about and the possibility of making sort n log n on average i think that same underlying
Starting point is 00:26:47 that same underlying work translates to making nth element linear period if we wanted that someone's got to write a paper listener out there if you're interested in getting involved in standard committee work yeah because intro select is the algorithm counterpart to for enthousiasm for intro sort right right yeah this ends part one of what will probably be a four-part series on algorithms thanks for listening we hope you enjoy and have a great day

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.