Algorithms + Data Structures = Programs - Episode 135: 🇸🇮 Slovenia 🇸🇮 2023 Road Trip!

Episode Date: June 23, 2023

In this episode, Conor and Bryce record live from Austria while driving and chat about algorithms including scans, unique and more!Link to Episode 135 on WebsiteDiscuss this episode, leave a comment, ...or ask a question (on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2023-06-18Date Released: 2023-06-23Lambda Days 2023 WebsiteItalian C++KX Con 2023: Algorithms in q - Conor HoekstraSkyline Problem in Top 10scan in BQNdistinct in qdedup in Ruststd::unique in C++C++Now 2019: Conor Hoekstra “Algorithm Intuition”Rainwater Problem in Top 10C++20 std::views::filterC++20 std::views::takeC++20 std::views::dropIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 I just realized we're probably making history right now. We might be the first ever podcast to be recorded during a road trip while one person's behind the wheel, one person's in the passenger seat. Definitely it's got to be the first tech podcast that's ever done something like this. Welcome to ADSP, the podcast episode 135, recorded on June 18th, 2023. My name is Connor, and today with my co-host Bryce, we record live from in the car on the 2023 Slovenia road trip. In today's episode, we're recording from Austria. See, we should have been recording earlier. So what's another problem that...
Starting point is 00:00:59 Wait, is that the start of the episode? Yeah, I mean... Put the mic in front of me. We're here, folks, live in austria is that too loud on the game welcome to the 2023 slovenia road trip we're live on the road i'm behind the wheel we got the ac blasting that's actually not true it's not blasting where are we bryce we're in we are in western austria we're not entirely sure where we are right now because we have no internet signal at the moment. What do you mean? We have no internet? We currently have no internet now.
Starting point is 00:01:33 And I think neither of us downloaded offline maps on Google Maps. So we better not hit the little X on your phone. What's this map here then? That's because we started the navigation. First of all, when you're talking, here's Bryce holding the mic in front of my mouth. Man, I'm loud. Man, if we apologize if the audio quality is terrible, Bryce doesn't know what he's doing. I'm the professional where it comes to the hardware.
Starting point is 00:02:02 That is true. But we're somewhere in Austria. We are making our way to the city of Bled where we are going to swim in Lake Bled. The hills are alive with the sound of music.
Starting point is 00:02:18 That's where it took place in Austria, right? That is correct. That is correct. We should do a little sing-along. My mother, here's a little anecdote for the listeners. I have three sisters, and my mother thought we were the Von Trapp family when we were kids,
Starting point is 00:02:36 and so she made us memorize most of the songs to the sound of music, and she would make us sometimes after dinner perform the, you know, Auf Wiedersehen, Goodbye, whatever that is. I apologize for the folks, is that German? I think it's German and you know, so like two of my younger sisters they would do, we flit, we flat say goodbye and anyways
Starting point is 00:02:59 I was I can't remember which role I played, but anyways sound of music, fantastic we love you Austria, but we, Sound of Music, fantastic. We love you, Austria, but we're only driving through you to get to Slovenia. Slovenia, here we come. We got in, Connor got in in the morning yesterday to Austria. I got in around 5 p.m. and didn't get to the hotel until like 6 or 7 because Connor's luggage got lost.
Starting point is 00:03:28 Oh, yeah. Well you you explain what happened listen folks we've been in europe here for two and a half weeks now we started off in poland we then went to italy lambda days was fantastic uh italian c++ was pretty good and uh and then we went to zurich we went to sw to Switzerland. Oh my God, Switzerland. So beautiful. Uh, the trip has been basically amazing. It's been perfect until, until on route to Vienna, Austria. Uh, the, the wonderful people of Australian, Australian, Austrian airlines, uh, lost my luggage and, uh, I was pretty Zen about it because what is getting upset about something like that do? However, that was all my clothes. Long story short, Bryce ended up with my luggage and brought it to the hotel we were staying at in Vienna. And, uh, I am wearing now my good vibes
Starting point is 00:04:21 t-shirt that I bought that Austria airlines will be paying for because I thought I would need to buy clothes. And the Good Vibes t-shirt is our good luck t-shirt. Good vibes, good luck. You know, it's a, I don't know. I don't know what else to say. What do you got to add to the story, Bryce? We should clarify that when Connor says that I ended up with this luggage, what he means is that my flight arrived later and I picked it up from the lost luggage. Not that the airline somehow gave his luggage to me.
Starting point is 00:04:52 But speaking of Australia. We weren't speaking. That was a slip of the tongue. Yeah. Okay. When my flight from Varna to here, we taxiing on the runway, and I look next to me, and there is a Virgin Australia plane in Bulgaria,
Starting point is 00:05:11 and not like a long-distance plane, like a short-distance plane, like an aerobody plane. And I'm like, how did this come to be here? Just to check, you hit the record button, right? I did hit the record button. We are recording. So we had a fascinating conversation about MaxScan and a few other related algorithmic things.
Starting point is 00:05:36 But we'd already started driving and we hadn't taken the microphone out, so we didn't record any of it but uh i'm i'm a little fascinated with uh the mac scan and some of the properties of scans of that type and so i think we're going to talk a little bit more about it so the the thing that fascinated me was right uh should we tell the problem first well kind of you've already kind of ruined the solution a little bit uh but you know what we're not going to spoil it well we did spoil it, but so this is how it started. I mean, we were talking about a bunch of stuff, but there was a problem recently.
Starting point is 00:06:11 Who has seen it? I believe I talk about it in my KXCon talk, so if that's online and you've seen it, you will know the solution to this problem, but the problem I call the skyline problem, and it's pretty simple you're given a vector of integers that represent heights of buildings in a skyline and the question is if you're standing on the left hand side of these buildings what is the distinct or unique
Starting point is 00:06:39 number of buildings that you can see um and so for example example, a test case would be, say you have five integers, two, one, three, two, five. And the answer, given these five buildings, is that you can see three of them. You can see the first two, and then the second building that has a height of one you can't see because it's hidden behind the first building of two.
Starting point is 00:07:02 You can see the third building with a height of three. The fourth building with a height of two you also can't see because it's blocked behind the first building of two. You can see the third building with a height of three. The fourth building with a height of two, you also can't see because it's blocked by the middle building. And then you can see the last building that has a height of five. So basically any building that has a shorter height than any building to the left of it, you can't see. And that's the question. We'll throw it back to Bryce to say what he found. And we'll pause. We'll talk about how beautiful the view is oh it is gorgeous here uh we are we are driving through the uh the hills and mountains of austria and uh you know it's pretty flat in vienna but um and we were on the highway for a while but we we
Starting point is 00:07:37 now are off the highway we're in sort of like rural Austria, driving west. And then, well, now I think we just took the turn heading south, south to sort of the western part of Slovenia where the city of Bled is. And, yeah, it's absolutely gorgeous. It's, you know, what you think of when you think of Austria, the rolling hills, the sound of music-esque stuff, and there's all these beautiful old buildings that we keep seeing and churches and stuff like that. I just realized we're probably making history right now.
Starting point is 00:08:23 We might be the first ever podcast to be recorded during a road trip while one person's behind the wheel, one person's in the passenger seat, definitely it's got to be the first tech podcast that's ever done something like this. I will assure you, this is not the first road trip podcast. I bet there are lots of road trip podcasts. Are we technical for a technology podcast? We got to be the first. I, perhaps, perhaps I will, I will humor you. But you know this kind of kind of the area we're in right now kind of reminds me of the pacific northwest just because the really really tall
Starting point is 00:08:52 trees um yeah it's absolutely gorgeous all right so people people have had a few minutes to think about we're whipping around these corners a little too fast. We've got to slow down here. We are definitely going to get some speeding tickets. No, I don't think so. Look, it's got a little happy face. No, it's got a sad face. There was a little, like, check your speed sign that's like, and it read our speed
Starting point is 00:09:19 and it had a sad face. Yeah, but people have had a couple of minutes to think about the solution, how they would solve it. Alright, now you can had a couple of minutes to think about the solution, how they would solve it. All right, now you can say what you'd like to say, Bryce. Well, the solution, which my first thought was
Starting point is 00:09:33 that this was some form of adjacent reduce, but Connor talked me out of that because I thought that maybe you just needed to have sort of two pieces of information, like the height of the thing and then how many buildings are in the information that you've reduced. But you actually need more than that. You need sort of like some compression of all of the information on all of the heights of the buildings. So that doesn't really work. But what the decision that Connor put forth was that you do a max scan,
Starting point is 00:10:06 so a scan with, you know, stood max if we're in C++. And then you do a, what did you call it, the deduplicate. So, yeah, in the Q language, this is known as distinct, and I think some functional languages call it distinct as well. In Rust, it's known as deduplicate. We don't actually have this function in C++, at least in the standard algorithms. We have a similar version of this called std unique,
Starting point is 00:10:36 but unique, if you are familiar with, it removes adjacent elements that are equal, whereas deduplicate or distinct removes any uh repeated values regardless of where they show up so in q after you do your max scan you make a call to distinct which basically in our initial example of two one three two five that's going to turn that into two two three three five and then you want to basically remove the duplicate values. So you get rid of one of the twos, one of the threes, and you're left with 235.
Starting point is 00:11:09 And then you just do a count of that, yeah. Yeah, in C++ that's a std size, and Rust it would be, I believe, count or len. But yeah, it's either called count, length, size, something like that in your various programming languages. But the realization that Connor had, so that deduplicate operation, it does not assume that the input sequence is sorted. Correct. And so that means that it's a little bit more expensive.
Starting point is 00:11:44 So how do you implement that? So, yeah, I was doing this in a sort of toy thing I was working on, and my implementation of distinct was just to allocate some small hash set, memory for a small hash set, and then you just keep track of all the values you've seen. So that's the naive implementation of how you remove duplicates from a list. But after thinking about, you know about the solution to this problem, I realized that one of the properties of a max scan is that it's sorted.
Starting point is 00:12:13 And if you know you have a sorted sequence, you don't actually need to make a call to an algorithm that is doing deduplication. You actually can make use of the more efficient algorithm that exists in C++, which is kind of the overarching point algorithm that exists in C++ that's unique, which is kind of the overarching point of this conversation that we had earlier, is that there are certain operations such as sort or max scan that leave you with a sorted sequence, which then potentially you can dispatch to a more efficient algorithm. And in this example, it's the difference between making a call to something like distinct versus making something to that is called uh like unique in c++ yeah
Starting point is 00:12:49 yeah and so i i uh i think that this category of scans that uh leave you with a monotonically increasing sequence for lack of a better term or monotonically decreasing if it's like a min scan, but that leave you with a sequence that is sorted, I think that's very interesting. And I can't recall a specific problem in the past where we've used one of these scans, but I suspect we probably have. But I want to learn more about this type of scan and whether there's other interesting use cases or properties of this. Yeah, can you think of another problem that we've talked about where we've solved it with a max scan or where we've solved it with something with a scan that leaves it in a monotonically increasing state? So off the top of my head, we mentioned this earlier.
Starting point is 00:13:58 This one makes use of two max scans, but it's not followed by a reduction or like a call to unique or something like that. So there's not a huge thing to be taken advantage of there. But for folks that have seen my C++ Now 2019 talk, I covered a problem that I call rainwater. And both Skyline and rainwater are a part of my top 10 GitHub repository. Man, it is so pretty where we are right now. It is absolutely gorgeous.
Starting point is 00:14:27 I was just thinking that. We just passed by this little, we're driving next to this little stream in this like valley in between these beautiful mountains. We gotta take a photo so that we can post this on the episode. I was just about to do that. This is slightly less iconic than it was a moment ago. But this is looking like it's going to return to being iconic and idyllic,
Starting point is 00:14:47 maybe is the word I'm looking for. But yeah, C++ Now, 2019, Rainwater, Skyline, both are on my top 10 GitHub repo. And that problem, it's very similar to the Skyline problem. You're given a skyline once again, except in this problem statement, they call it like a mountainous range. And you have to calculate
Starting point is 00:15:10 if it were to rain on this mountainous range, you know, how much water would be trapped in these kind of pockets formed between these, you know, integer heights of varying heights. And here you do a max scan from the left, a max scan from the right, and then you do what's now called a zip transform in C++, where you take the difference between those two.
Starting point is 00:15:31 So that's an example of when you use a max scan. Another example is, we were talking about this, when you're doing string trimming, if you want to get rid of like the leading zeros or sorry leading spaces at the beginning or end of a string that technically I'm going to stop talking so I don't there's a guy on a motorcycle and Connor's attempting to not hit the guy not hit the guy we've been successful been successful so far stay tuned folks uh you know you'll be you'll be hearing this episode I believe what was it yesterday the 16 You know, you'll be hearing this episode, I believe. What was it yesterday? The 16th.
Starting point is 00:16:07 So you're hearing this either on, like, June 23rd or 24th, depending on how recently or quickly you listen to the episode. And also whether we survive. Yeah, that's true. I guess, though, if we've hit someone, we've hit them by now. By the time you're listening to it, you might not know about it. But we've crashed or hit someone, we've hit them by now. By the time you're listening to it, you might not know about it, but we've crashed or hit someone. Because at this point, you're already back.
Starting point is 00:16:30 Are you stateside by Friday? Yeah, I'll be home by Thursday. You won't be there. You're just staying here until the next conference. Yeah, I'll be Ooh, look at that Lamborghini. Lamborghini in the hills of... Holy shit. A lot of fancy cars just passed by us.
Starting point is 00:16:45 Man, we got to go find out who they were. Yeah. They're probably going to the Sound of Music Museum. Oh, it's definitely got to be a thing. But yes, removing the leading zeros off of strings and stuff like that that basically is a min scan after you transform your string into a sequence or not necessarily a sequence you basically create a sequence of trues and falses aka ones and zeros where the ones correspond to if it's a space or not and then you can do a what's typically known as a logical and scan but uh as we were
Starting point is 00:17:26 talking about earlier in array languages um some of them basically extend the logical and and or to min and max because logical and and or are just the min and max in the space of zeros and ones in array languages because booleans are just zeros and ones, which also is, I think, an incredibly interesting property of array languages because they treat Booleans as just a subset of the sort of values that integers can take, a.k.a. zeros and ones. I don't know. Those are a few examples. Yeah, I feel like we often on this podcast,
Starting point is 00:18:06 we talk about, when we talk about algorithms, we talk about scans and reductions and transforms and various different adjacent algorithms and zips and things of that nature, but we do not frequently talk about sorted sequences and operations on sorted sequences. I mean, I'm sure we've talked about it a little bit, but have you noticed that as well?
Starting point is 00:18:35 I mean, what do you mean when you say we don't talk about operations on sorted sequences? I think they do not come up. I think sorted sequences do not come up as frequently in the algorithms that we talk about and the problems that we talk about. Yeah, well, so I think especially in GPU parallel land, you want to try and avoid sorts as often as possible
Starting point is 00:18:59 because compared to reduces and scans and transforms, they are not the bread and butter that... Obviously, we can accelerate sorts we do but uh compared to like if you can express your problem in terms of reduction scans and transforms that's the preferable way to spell your solution and so like in that regard yeah we don't talk about sorts a ton but i think what's really interesting is that the observation that there are different operations that can leave you with a sorted sequence aka a max scan a max scan is an operation that when you perform on a sequence by definition you will have or sorry you perform on any sequence by definition you will end up with a sorted sequence
Starting point is 00:19:40 and if it's a max scan that brings you to a sorted sequence well now it's now you're talking about like we're still in like the uh bread and butter area of what uh gpus are good at um and i think and that's what's really interesting one of the things that led me to have this eureka moment or sort of realization was that in the Q language, when you actually make a call to the sort function, which they call ASC, or the reverse sort is DECS, so like ask and desk, short for ascending and descending, it will, in the REPL, prefix your integer sequence with like an S backtick, which indicates that it's a sorted sequence i'm not sure how the implementation of q works but i wouldn't be surprised if then you know q is able to take advantage of the knowledge that that sequence or maybe it's just a visual thing
Starting point is 00:20:32 maybe there are no like different dispatches to different algorithm implementations but um if you do a max scan in uh the q repl it doesn't prefix it with an s back tick so like even though it is a sorted sequence, it's not telling you the same information. It only tells you that after you make a call to ASC and DE ask and desk, their two sorting functions, which, like I said, I don't actually know anything about the implementation of queue because it's a proprietary language, but I think it's a missed opportunity if there are some, like,
Starting point is 00:21:03 dispatching to different algorithm implementations based on this little piece of Boolean information. All you would have to do is add this little same Boolean information to a sequence after MaxScan or MinScan. Off the top of my head, are there any other operations that result in sorted sequences that aren't sort or max or min scan? I suspect there have to be. Like you could say like an and scan, a logical and scan, and a logical or scan,
Starting point is 00:21:39 but those are just extensions of min scans and max scans. But still, like some people might think of those as separate things, like technically, I guess any of, none of, and all of those are reductions. But if we had... Okay, but so I'll pause it. No, it's not going to be any comparator.
Starting point is 00:22:01 There must be some other family, right? Because, I mean, maybe they all boil down to max scans or min scans. If you think about, like, sort, like, there's other sorting comparators other than just less than and greater than. So maybe that would help. Maybe if we think about other common sort comparators, that may lead us to other scans that have this property. Well, no, when you say less than, greater than, those are equality operators,
Starting point is 00:22:40 meaning that they always return trues and falses. Right, right, right. But I'm saying, like, that that they always return trues and falses. Right, right, right. But I'm saying, I'm saying like, like that's, um, you know, if you look at like, what is,
Starting point is 00:22:50 what is Stidman and Stidmax do? Like, what's the comparison that they do? Uh, I mean, inside the operation, yes, there's a less than.
Starting point is 00:23:00 Right, right. And so if you think like, if we took like any arbitrary, like, uh, you know, comparator and so if you think like if we took like any arbitrary like uh you know comparator and you you uh like that you'd use for stood sort and you take that arbitrary comparator and you turn it into a you know if true then a then b then and then you do a scan with that
Starting point is 00:23:19 operation so you get that same property yeah off the top of my head, I can't think of any. There's some... Let me look at the specification of sort. Because what's the property that your comparator has to guarantee? We have internet again, which is helpful for most importantly
Starting point is 00:23:55 answering this question and also making sure that we arrive in Slovenia. And also apparently we have to buy this sticker that we put on the car so they don't charge us a bunch of money. Well, listener, if you know of any operations other than sort and min scans and
Starting point is 00:24:16 max scans, I'd be curious to know. I think actually also, too, a few episodes ago now, we were having a discussion and we were trying to talk about algorithms that are similar to filter. I think this was back when we were talking about, uh, parallelizing filter. Um, and I was trying to think of other algorithms that are similar to transformations, but they are, are not shape preserving. So like a filter is not a shape-preserving operation.
Starting point is 00:24:48 And I think off the top of my head, I couldn't think of any, but then I thought of a bunch later. So we've been talking about unique and distinct, and both of those are non-shape-preserving filter-like operations. There was a couple others, too. But anyways, point being is no one, I don't think anyone tweeted at us other algorithms that fit into that category,
Starting point is 00:25:13 but there are a bunch. Yeah, filter. Oh yeah, and then take and drop. Those are not shape-preserving either. They're rank-preserving, but not shape-preserving. So yeah, take, drop, take while, drop while, filter, unique, distinct. There's a bunch of them. preserving but not shape preserving um so yeah take drop take while drop while filter unique distinct there's a bunch of them so back to bryce so um i mean i do i do think that there's actually
Starting point is 00:25:35 like a lot of these like if you do a if you do a wait are we talking about what i just said or are we going back to the max no we're going back to the max scan thing. Sorry. I was... I didn't even hear what you said. If you do just like a scan with like Operator Plus on positive integers, you'll get a sequence that's sort of... This is true. That's actually a classic competitive programming trick is when you need to figure out... Ooh.
Starting point is 00:26:08 Ooh, we took the wrong turn, folks. Did we? Did we take the wrong turn? I think we're good. That's not yelling at us. We're good. It's unclear. Did it say rerouting?
Starting point is 00:26:26 You mean are we connected to the internet? Uh, I am. I think it would be, it would be rerouting if it was, if we had messed up. We're good. We're good. Listen, folks, we're still good. No problemo here. Um, you are definitely correct. And there's a competitive programming trick that if you need to figure out, like, what's the maximum number of things that, like, you can chew up or you can consume,
Starting point is 00:26:50 one of the fastest ways to do this is to do a plus scan and then a binary search with some value on that, basically, the result of your plus scan. So, like, a plus scan on positive values with a binary search is like a classic solution to certain competitive programming problems. Wait, what do you mean by to chew up? So like what, I remember specifically there was this problem from like years ago, why I know the rough details of this is hard for me to fathom. But it's something like you are, you have like, you know, an army of people shooting bows and arrows. I mean, like the context of this problem doesn't really matter.
Starting point is 00:27:30 And then you are going to have these like waves of enemies coming in and they come in in these waves. And there's some like criteria where you're able to defeat each wave. And the question is like, how many waves can you survive? And the solution to this problem is by doing a plus scan and so you know that like each one of your battalions matches up against however many of them but like to do that naively and you end up with like a quadratic solution but if you just do a plus scan in advance on your whole sequence of enemies that you're going to be facing, you can just do like multiple binary searches. So instead of having a quadratic solution, you end up with like an
Starting point is 00:28:11 n log n solution. Or no, yeah, it's a it's like a k log n solution, because log n on the size of the enemies that you're facing, and then K is like the number of times you're able to beat the waves or something like that. That was super hand wavy. I can find, I will go back and find, because I know actually I made a, the reason I remember the detail is now, this is coming back to me,
Starting point is 00:28:37 I made a YouTube video back in the early days of my YouTube channel when it used to all be about competitive programming about this problem and I remember creating a little graphic of these little stickman figures with bows and arrows and I think I put Thor you know animation
Starting point is 00:28:56 or clip art because I was a big fan of Thor Thor, God of Asgard. Anyways I don't even know what we were talking about you asked me to explain what do you mean by chewing? you're consuming a certain number of values, and if you have to do that repeatedly, you're going to end up with something.
Starting point is 00:29:11 Once again, Bryce is not doing a good job with the mic. He's got it in front of his mouth, and I'm the one talking. That is true. But anyways, you avoid a quadratic algorithm, and you end up with a linear plus scan and then repeated log N binary searches. Yeah. All right. What was the thing?
Starting point is 00:29:33 What was, what was the thing you wanted to talk about? The next topic. Wait, what, uh, what time are we at on our recording here? 30 minutes and four seconds.
Starting point is 00:29:43 Perfect folks. We just wrapped. We wrapped episode one of Slovenia 2023 road trip. We're still in Austria. Don't worry, Slovenia. We're coming. We're an hour and 16 minutes away from Lake Bled. My driving has been so fantastic.
Starting point is 00:29:58 We've actually, we've increased our destination arrival time by one minute while we were recording that podcast. Probably shows you how great of a driver I am. What's the next topic? Okay, now we're at episode two. Be sure to check the show notes either in your podcast app or at ADSP the podcast.com for links to any of the things that we mentioned in today's episode, as well as to a GitHub discussion where you can ask questions, leave comments or thoughts. Thanks for listening. We hope you enjoyed and have a great day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.