Algorithms + Data Structures = Programs - Episode 135: 🇸🇮 Slovenia 🇸🇮 2023 Road Trip!
Episode Date: June 23, 2023In this episode, Conor and Bryce record live from Austria while driving and chat about algorithms including scans, unique and more!Link to Episode 135 on WebsiteDiscuss this episode, leave a comment, ...or ask a question (on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2023-06-18Date Released: 2023-06-23Lambda Days 2023 WebsiteItalian C++KX Con 2023: Algorithms in q - Conor HoekstraSkyline Problem in Top 10scan in BQNdistinct in qdedup in Ruststd::unique in C++C++Now 2019: Conor Hoekstra “Algorithm Intuition”Rainwater Problem in Top 10C++20 std::views::filterC++20 std::views::takeC++20 std::views::dropIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
I just realized we're probably making history right now.
We might be the first ever podcast to be recorded during a road trip while one person's behind the
wheel, one person's in the passenger seat. Definitely it's got to be the first tech
podcast that's ever done something like this. Welcome to ADSP, the podcast episode 135, recorded on June 18th, 2023.
My name is Connor, and today with my co-host Bryce, we record live from in the car on the 2023 Slovenia road trip.
In today's episode,
we're recording from Austria.
See, we should have been recording earlier. So what's another problem that...
Wait, is that the start of the episode?
Yeah, I mean... Put the mic in front of me.
We're here, folks, live in austria is that too loud on
the game welcome to the 2023 slovenia road trip we're live on the road i'm behind the wheel we
got the ac blasting that's actually not true it's not blasting where are we bryce we're in we are
in western austria we're not entirely sure where we are right now because we have no internet signal at the moment.
What do you mean? We have no internet?
We currently have no internet now.
And I think neither of us downloaded offline maps on Google Maps.
So we better not hit the little X on your phone.
What's this map here then?
That's because we started the navigation.
First of all, when you're talking, here's Bryce holding the mic in front of my mouth.
Man, I'm loud.
Man, if we apologize if the audio quality is terrible, Bryce doesn't know what he's doing.
I'm the professional where it comes to the hardware.
That is true.
But we're somewhere in Austria.
We are making our way
to the city of Bled
where we are going to swim
in Lake Bled.
The hills are alive
with the sound of music.
That's where it took place
in Austria, right?
That is correct.
That is correct.
We should do a little sing-along.
My mother,
here's a little anecdote for the listeners.
I have three sisters, and my mother thought we were the Von Trapp family when we were kids,
and so she made us memorize most of the songs to the sound of music,
and she would make us sometimes after dinner perform the, you know,
Auf Wiedersehen, Goodbye, whatever that is. I apologize for the folks, is that German?
I think it's German
and you know, so like two of my younger sisters
they would do, we flit, we flat
say goodbye
and anyways
I was
I can't remember which role I played, but anyways
sound of music, fantastic
we love you Austria, but we, Sound of Music, fantastic.
We love you, Austria, but we're only driving through you to get to Slovenia.
Slovenia, here we come.
We got in, Connor got in in the morning yesterday to Austria. I got in around 5 p.m. and didn't get to the hotel until like 6 or 7
because Connor's luggage got lost.
Oh, yeah. Well you you explain what happened listen folks we've been in europe here for two and a half weeks now we started off in
poland we then went to italy lambda days was fantastic uh italian c++ was pretty good and uh
and then we went to zurich we went to sw to Switzerland. Oh my God, Switzerland. So beautiful. Uh, the trip has been basically amazing. It's been perfect until,
until on route to Vienna, Austria. Uh, the, the wonderful people of Australian,
Australian, Austrian airlines, uh, lost my luggage and, uh, I was pretty Zen about it
because what is getting upset about something
like that do? However, that was all my clothes. Long story short, Bryce ended up with my luggage
and brought it to the hotel we were staying at in Vienna. And, uh, I am wearing now my good vibes
t-shirt that I bought that Austria airlines will be paying for because I thought I would need to buy clothes.
And the Good Vibes t-shirt is our good luck t-shirt.
Good vibes, good luck.
You know, it's a, I don't know.
I don't know what else to say.
What do you got to add to the story, Bryce?
We should clarify that when Connor says that I ended up with this luggage, what he means is that my flight arrived later and I picked it up from the lost luggage.
Not that the airline somehow gave his luggage to me.
But speaking of Australia.
We weren't speaking.
That was a slip of the tongue.
Yeah.
Okay.
When my flight from Varna to here, we taxiing on the runway, and I look next to me,
and there is a Virgin Australia plane
in Bulgaria,
and not like a long-distance plane,
like a short-distance plane,
like an aerobody plane.
And I'm like, how did this come to be here?
Just to check, you hit the record button, right?
I did hit the record button.
We are recording.
So we had a fascinating conversation about MaxScan and a few other related algorithmic things.
But we'd already started driving and we hadn't taken the microphone out, so we didn't record any of it but uh i'm i'm a little fascinated with uh the mac scan and some of the
properties of scans of that type and so i think we're going to talk a little bit more about it so
the the thing that fascinated me was right uh should we tell the problem first well kind of
you've already kind of ruined the solution a little bit uh but you know what we're not going
to spoil it well we did spoil it,
but so this is how it started.
I mean, we were talking about a bunch of stuff,
but there was a problem recently.
Who has seen it?
I believe I talk about it in my KXCon talk,
so if that's online and you've seen it,
you will know the solution to this problem,
but the problem I call the skyline problem,
and it's pretty simple
you're given a vector of integers that represent heights of buildings in a skyline and the question
is if you're standing on the left hand side of these buildings what is the distinct or unique
number of buildings that you can see um and so for example example, a test case would be, say you have five integers,
two, one, three, two, five.
And the answer, given these five buildings,
is that you can see three of them.
You can see the first two,
and then the second building that has a height of one
you can't see because it's hidden behind
the first building of two.
You can see the third building with a height of three.
The fourth building with a height of two you also can't see because it's blocked behind the first building of two. You can see the third building with a height of three. The fourth building with a height of two, you also can't see because it's blocked by
the middle building. And then you can see the last building that has a height of five. So basically
any building that has a shorter height than any building to the left of it, you can't see.
And that's the question. We'll throw it back to Bryce to say what he found. And we'll pause.
We'll talk about how beautiful the view is
oh it is gorgeous here uh we are we are driving through the uh the hills and mountains of austria
and uh you know it's pretty flat in vienna but um and we were on the highway for a while but we we
now are off the highway we're in sort of like rural Austria, driving west.
And then, well, now I think we just took the turn heading south,
south to sort of the western part of Slovenia where the city of Bled is.
And, yeah, it's absolutely gorgeous.
It's, you know, what you think of when you think of Austria,
the rolling hills, the sound of music-esque stuff,
and there's all these beautiful old buildings that we keep seeing and churches and stuff like that.
I just realized we're probably making history right now.
We might be the first ever podcast to be recorded during a road trip while one person's behind the wheel,
one person's in the passenger seat,
definitely it's got to be the first tech podcast that's ever done something like this.
I will assure you, this is not the first road trip podcast. I bet there are lots of road trip podcasts.
Are we technical for a technology podcast?
We got to be the first.
I, perhaps, perhaps I will, I will humor you. But you know this kind of kind of the area we're in
right now kind of reminds me of the pacific northwest just because the really really tall
trees um yeah it's absolutely gorgeous all right so people people have had a few minutes
to think about we're whipping around these corners a little too fast. We've got to slow down here.
We are definitely going to get some speeding tickets.
No, I don't think so.
Look, it's got a little happy face.
No, it's got a sad face.
There was a little, like, check your speed sign that's like,
and it read our speed
and it had a sad face.
Yeah, but people have had a couple of minutes
to think about the solution, how they would solve it. Alright, now you can had a couple of minutes to think about the solution,
how they would solve it.
All right, now you can say
what you'd like to say, Bryce.
Well, the solution,
which my first thought was
that this was some form of adjacent reduce,
but Connor talked me out of that
because I thought that maybe
you just needed to have
sort of two pieces of information,
like the height of the thing and then how many buildings are in the information that you've reduced.
But you actually need more than that. You need sort of like some compression of all of the information on all of the heights of the buildings.
So that doesn't really work. But what the decision that Connor put forth was that you do a max scan,
so a scan with, you know, stood max if we're in C++.
And then you do a, what did you call it, the deduplicate.
So, yeah, in the Q language, this is known as distinct,
and I think some functional languages call it distinct as well.
In Rust, it's known as deduplicate.
We don't actually have this function in C++,
at least in the standard algorithms.
We have a similar version of this called std unique,
but unique, if you are familiar with,
it removes adjacent elements that are equal,
whereas deduplicate or distinct removes any uh repeated
values regardless of where they show up so in q after you do your max scan you make a call to
distinct which basically in our initial example of two one three two five that's going to turn
that into two two three three five and then you want to basically remove the duplicate values.
So you get rid of one of the twos, one of the threes,
and you're left with 235.
And then you just do a count of that, yeah.
Yeah, in C++ that's a std size,
and Rust it would be, I believe, count or len.
But yeah, it's either called count, length, size,
something like that in your various programming languages.
But the realization that Connor had, so that deduplicate operation, it does not assume that the input sequence is sorted.
Correct.
And so that means that it's a little bit more expensive.
So how do you implement that?
So, yeah, I was doing this in a sort of toy thing I was working on,
and my implementation of distinct was just to allocate some small hash set,
memory for a small hash set,
and then you just keep track of all the values you've seen.
So that's the naive implementation of how you remove duplicates from a list.
But after thinking about, you know about the solution to this problem,
I realized that one of the properties of a max scan is that it's sorted.
And if you know you have a sorted sequence,
you don't actually need to make a call to an algorithm that is doing deduplication.
You actually can make use of the more efficient algorithm that exists in C++,
which is kind of the overarching point algorithm that exists in C++ that's unique,
which is kind of the overarching point of this conversation that we had earlier,
is that there are certain operations such as sort or max scan that leave you with a sorted sequence,
which then potentially you can dispatch to a more efficient algorithm.
And in this example, it's the difference between making a call to something like distinct versus making something to that is called uh like unique in c++ yeah
yeah and so i i uh i think that this category of scans that uh leave you with a monotonically
increasing sequence for lack of a better term or monotonically decreasing if it's like a min scan,
but that leave you with a sequence that is sorted, I think that's very interesting.
And I can't recall a specific problem in the past where we've used one of these scans, but I suspect we probably have.
But I want to learn more about this type of scan
and whether there's other interesting use cases or properties of this.
Yeah, can you think of another problem that we've talked about where we've solved it with a max scan or where we've solved it with something with a scan that leaves it in a monotonically increasing state?
So off the top of my head, we mentioned this earlier.
This one makes use of two max scans, but it's not followed by a reduction or like a call to unique or something like that.
So there's not a huge thing to be taken advantage of there.
But for folks that have seen my C++ Now 2019 talk,
I covered a problem that I call rainwater.
And both Skyline and rainwater
are a part of my top 10 GitHub repository.
Man, it is so pretty where we are right now.
It is absolutely gorgeous.
I was just thinking that.
We just passed by this little,
we're driving next to this little stream
in this like valley in between these beautiful mountains.
We gotta take a photo so that we can post this on the episode.
I was just about to do that.
This is slightly less iconic than it was a moment ago.
But this is looking like it's going to return to being iconic and idyllic,
maybe is the word I'm looking for.
But yeah, C++ Now, 2019, Rainwater, Skyline,
both are on my top 10 GitHub repo.
And that problem, it's very similar to the Skyline problem.
You're given a skyline once again,
except in this problem statement,
they call it like a mountainous range.
And you have to calculate
if it were to rain on this mountainous range,
you know, how much water would be trapped
in these kind of pockets formed
between these, you know,
integer heights of varying heights.
And here you do a max scan from the left,
a max scan from the right, and then you
do what's now called a zip transform in C++, where you take the difference between those two.
So that's an example of when you use a max scan. Another example is, we were talking about this,
when you're doing string trimming, if you want to get rid of like the leading zeros or sorry leading spaces
at the beginning or end of a string that technically I'm going to stop talking so I don't
there's a guy on a motorcycle and Connor's attempting to not hit the guy not hit the guy
we've been successful been successful so far stay tuned folks uh you know you'll be you'll be hearing
this episode I believe what was it yesterday the 16 You know, you'll be hearing this episode, I believe.
What was it yesterday?
The 16th.
So you're hearing this either on, like, June 23rd or 24th, depending on how recently or quickly you listen to the episode.
And also whether we survive.
Yeah, that's true.
I guess, though, if we've hit someone, we've hit them by now.
By the time you're listening to it, you might not know about it.
But we've crashed or hit someone, we've hit them by now. By the time you're listening to it, you might not know about it, but we've crashed
or hit someone.
Because at this point, you're already back.
Are you stateside by Friday? Yeah, I'll be
home by Thursday.
You won't be there. You're just staying here until
the next conference. Yeah, I'll be
Ooh, look at that Lamborghini.
Lamborghini in the hills
of... Holy shit. A lot of fancy cars
just passed by us.
Man, we got to go find out who they were.
Yeah.
They're probably going to the Sound of Music Museum.
Oh, it's definitely got to be a thing.
But yes, removing the leading zeros off of strings and stuff like that that basically is a min scan after you
transform your string into a sequence or not necessarily a sequence you basically create a
sequence of trues and falses aka ones and zeros where the ones correspond to if it's a space or
not and then you can do a what's typically known as a logical and scan but uh as we were
talking about earlier in array languages um some of them basically extend the logical and and or
to min and max because logical and and or are just the min and max in the space of zeros and ones in
array languages because booleans are just zeros and ones,
which also is, I think, an incredibly interesting property of array languages
because they treat Booleans as just a subset of the sort of values
that integers can take, a.k.a. zeros and ones.
I don't know. Those are a few examples.
Yeah, I feel like we often on this podcast,
we talk about, when we talk about algorithms,
we talk about scans and reductions and transforms
and various different adjacent algorithms
and zips and things of that nature,
but we do not frequently talk about
sorted sequences and operations on sorted sequences.
I mean, I'm sure we've talked about it a little bit,
but have you noticed that as well?
I mean, what do you mean when you say
we don't talk about operations on sorted sequences?
I think they do not come up.
I think sorted sequences do not come up as frequently
in the algorithms that we talk about
and the problems that we talk about.
Yeah, well, so I think especially in GPU parallel land,
you want to try and avoid sorts as often as possible
because compared to reduces and scans and transforms,
they are not the bread and butter that...
Obviously, we can accelerate sorts we do but uh compared to like if you can express your problem
in terms of reduction scans and transforms that's the preferable way to spell your solution
and so like in that regard yeah we don't talk about sorts a ton but i think what's really
interesting is that the observation that there are different operations that can leave you with a sorted sequence
aka a max scan a max scan is an operation that when you perform on a sequence by definition you
will have or sorry you perform on any sequence by definition you will end up with a sorted sequence
and if it's a max scan that brings you to a sorted sequence well now it's now you're
talking about like we're still in like the uh bread and butter area of what uh gpus are good at
um and i think and that's what's really interesting one of the things that led me to
have this eureka moment or sort of realization was that in the Q language, when you actually make a call to the
sort function, which they call ASC, or the reverse sort is DECS, so like ask and desk,
short for ascending and descending, it will, in the REPL, prefix your integer sequence with like
an S backtick, which indicates that it's a sorted sequence i'm not sure how the implementation of q works but i wouldn't be surprised if then you know q is able
to take advantage of the knowledge that that sequence or maybe it's just a visual thing
maybe there are no like different dispatches to different algorithm implementations but um if you
do a max scan in uh the q repl it doesn't prefix it with an s back tick so like even though it is
a sorted sequence,
it's not telling you the same information.
It only tells you that after you make a call to ASC and DE ask and desk,
their two sorting functions, which, like I said,
I don't actually know anything about the implementation of queue because it's a proprietary language,
but I think it's a missed opportunity if there are some, like,
dispatching to different algorithm implementations
based on this little piece of Boolean information.
All you would have to do is add this little same Boolean information to a sequence after MaxScan or MinScan.
Off the top of my head, are there any other operations that result in sorted sequences that aren't sort or max or min scan?
I suspect there have to be.
Like you could say like an and scan,
a logical and scan,
and a logical or scan,
but those are just extensions of min scans
and max scans.
But still, like some people might think of those
as separate things,
like technically,
I guess any of, none of, and all of those are reductions.
But if we had... Okay, but so I'll pause it.
No, it's not going to be any comparator.
There must be some other family, right?
Because, I mean, maybe they all boil down to max scans or min scans.
If you think about, like, sort, like, there's other sorting comparators other than just less than and greater than.
So maybe that would help.
Maybe if we think about other common sort comparators,
that may lead us to other scans that have this property.
Well, no, when you say less than, greater than,
those are equality operators,
meaning that they always return trues and falses.
Right, right, right.
But I'm saying, like, that that they always return trues and falses. Right, right, right. But I'm saying, I'm saying like,
like that's,
um,
you know,
if you look at like,
what is,
what is Stidman and Stidmax do?
Like,
what's the comparison that they do?
Uh,
I mean,
inside the operation,
yes,
there's a less than.
Right,
right.
And so if you think like,
if we took like any arbitrary,
like,
uh, you know, comparator and so if you think like if we took like any arbitrary like uh you know
comparator and you you uh like that you'd use for stood sort and you take that arbitrary comparator
and you turn it into a you know if true then a then b then and then you do a scan with that
operation so you get that same property yeah off the top of my head, I can't think of any.
There's some...
Let me look at the specification of sort.
Because what's the property that your comparator has
to guarantee?
We have internet again,
which is helpful for
most importantly
answering this question and
also making sure that we
arrive in Slovenia. And also
apparently we have to buy this sticker that
we put on the car so they don't charge us a bunch of money.
Well, listener,
if you know of any
operations other than sort and min scans and
max scans, I'd be curious to know.
I think actually also, too, a few episodes ago
now, we were having a discussion
and we were trying to talk about
algorithms that
are similar to filter. I think this was back when we were talking about, uh, parallelizing filter.
Um, and I was trying to think of other algorithms that are similar to transformations, but they
are, are not shape preserving. So like a filter is not a shape-preserving operation.
And I think off the top of my head, I couldn't think of any,
but then I thought of a bunch later.
So we've been talking about unique and distinct,
and both of those are non-shape-preserving filter-like operations.
There was a couple others, too.
But anyways, point being is no one,
I don't think anyone tweeted at us
other algorithms that fit into that category,
but there are a bunch.
Yeah, filter.
Oh yeah, and then take and drop.
Those are not shape-preserving either.
They're rank-preserving, but not shape-preserving.
So yeah, take, drop, take while, drop while,
filter, unique, distinct. There's a bunch of them. preserving but not shape preserving um so yeah take drop take while drop while filter unique
distinct there's a bunch of them so back to bryce so um i mean i do i do think that there's actually
like a lot of these like if you do a if you do a wait are we talking about what i just said or
are we going back to the max no we're going back to the max scan thing. Sorry. I was... I didn't even hear what you said.
If you do just like a scan with like Operator Plus on positive integers,
you'll get a sequence that's sort of...
This is true.
That's actually a classic competitive programming trick
is when you need to figure out...
Ooh.
Ooh, we took the wrong turn, folks.
Did we?
Did we take the wrong turn?
I think we're good.
That's not yelling at us.
We're good.
It's unclear.
Did it say rerouting?
You mean are we connected to the internet?
Uh, I am.
I think it would be, it would be rerouting if it was, if we had messed up.
We're good.
We're good.
Listen, folks, we're still good.
No problemo here.
Um, you are definitely correct. And there's a competitive programming trick that if you need to figure out, like, what's the maximum number of things that, like, you can chew up or you can consume,
one of the fastest ways to do this is to do a plus scan and then a binary search with some value on that, basically, the result of your plus scan.
So, like, a plus scan on positive values with a binary search is like a classic solution to certain competitive programming problems.
Wait, what do you mean by to chew up?
So like what, I remember specifically there was this problem from like years ago,
why I know the rough details of this is hard for me to fathom.
But it's something like you are, you have like, you know,
an army of people shooting bows and arrows.
I mean, like the context of this problem doesn't really matter.
And then you are going to have these like waves of enemies coming in and they come in in these waves.
And there's some like criteria where you're able to defeat each wave.
And the question is like, how many waves can you survive?
And the solution to this problem
is by doing a plus scan and so you know that like each one of your battalions matches up against
however many of them but like to do that naively and you end up with like a quadratic solution
but if you just do a plus scan in advance on your whole sequence of enemies that you're going to be facing, you can just do like
multiple binary searches. So instead of having a quadratic solution, you end up with like an
n log n solution. Or no, yeah, it's a it's like a k log n solution, because log n on the size of the
enemies that you're facing, and then K is like the number of times
you're able to beat the waves or something like that.
That was super hand wavy.
I can find, I will go back and find,
because I know actually I made a,
the reason I remember the detail is now,
this is coming back to me,
I made a YouTube video back in the early days
of my YouTube channel when it used to all be
about competitive programming about this problem and I remember
creating a little
graphic of these little stickman figures
with bows and arrows and I think I put
Thor
you know animation
or clip art because I was a big fan of Thor
Thor, God of Asgard. Anyways
I don't even know what we were talking about
you asked me to explain
what do you mean by chewing?
you're consuming a certain number of values,
and if you have to do that repeatedly,
you're going to end up with something.
Once again, Bryce is not doing a good job with the mic.
He's got it in front of his mouth, and I'm the one talking.
That is true.
But anyways, you avoid a quadratic algorithm,
and you end up with a linear plus scan and then repeated log N binary searches.
Yeah.
All right.
What was the thing?
What was,
what was the thing you wanted to talk about?
The next topic.
Wait,
what,
uh,
what time are we at on our recording here?
30 minutes and four seconds.
Perfect folks.
We just wrapped.
We wrapped episode one of Slovenia 2023 road trip.
We're still in Austria.
Don't worry, Slovenia.
We're coming.
We're an hour and 16 minutes away from Lake Bled.
My driving has been so fantastic.
We've actually, we've increased our destination arrival time by one minute while we were recording that podcast.
Probably shows you how great of a driver I am. What's the next topic? Okay, now we're at episode
two. Be sure to check the show notes either in your podcast app or at ADSP the podcast.com for
links to any of the things that we mentioned in today's episode, as well as to a GitHub discussion
where you can ask questions, leave comments or thoughts. Thanks for listening. We hope you
enjoyed and have a great day.