Algorithms + Data Structures = Programs - Episode 176: 🇺🇸 prior, deltas & Dinner with Phineas
Episode Date: April 5, 2024In this episode, Conor and Bryce chat with Phineas Porter about the functions delta, prior and more over dinner.Link to Episode 176 on WebsiteDiscuss this episode, leave a comment, or ask a question (...on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest:Phineas Porter is a Software Developer at Jump Trading. Previously, he held roles in quant research and technology at UBS and Citibank. He graduated from Columbia University in 2014 with a degree in Operations Research. He lives in New York City with his wife, daughter (Ada) and son (Solomon).Show NotesDate Recorded: 2024-03-06Date Released: 2024-04-05Franchia Vegan Cafethurst::reduce_by_keythrust::inclusive_scanthurst::inclusive_scan_by_keyKXCON23 | Nick Psaris | Matching Algorithms in q and kdbKXCON23 | Phineas Porter | Dynamic Programming Approach to Content Aware Image Resizing | kdb at Jump Tradingthrust::reduceADSP Episode 131: One Algorithm To Rule Them All!Q priorC++23 std::views::adjacent_transformC++98 std::adjacent_differenceC++98 std::partial_sumC++17 std::variantQ deltasC++23 std::views::zipNumPy diffsArrayCast Episode 76: Conor McCarthy, PyKX and kdb+ 4.1ADSP Episode 147: 🇸🇮 SRT23 - Parallel std::unique Revisited (on a Walk in Venice)The Two Egg ProblemIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
Honestly, I don't want to offend our C++ listeners, but the best community is the array community.
It's, you know, they think the right way, you know.
You know, Phineas here, he swims in the land of folds and scans.
You know, we just like, we spend a little bit of time there because we work at NVIDIA and it's important to us.
But like C++ programmers, they're like a scan? What's that?
It's like a partial sum, you know.
Not everybody.
The listeners of this podcast probably know. But most of the C++ devs they're like, a scan? What's that? It's like a partial sum, you know? Not everybody. The listeners of this podcast probably know.
But most of the C++ devs out there, you know, they're still catching up on... One potato, two potato.
Yeah, one potato, two potato. welcome to adsp the podcast episode 176 recorded on march 6 2024 my name is connor and today with
my co-host bryce we chat about the algorithms from the q language prior deltas and more with
phineas porter over dinner i don't know i don't have a plan, Connor.
But I've had a revelation, and the people need to know about the revelation.
Do you want to do the introduction?
This is just like when we were doing the Slovenia 2023 road trip,
and every time he'd start recording, he'd be like,
all right, recording again.
And I'd be like, no, Bryce, we have to do the intro we are at francia this isn't getting picked up this isn't getting
picked up we got one mic now and it's tiny uh we're at the francia vegan cafe a place that
ramona does not like uh that bry Bryce just said she's never even been here.
She's not a big fan of vegan food.
She's Polish, so she likes her meat and potatoes.
And Bryce's cholesterol is too high,
so he's not allowed meat.
I myself did go to Shake Shack for lunch
and got two double hamburgers with bacon.
I calculated the calories.
It was 1,280 or something.
It was worth every calorie.
Anyways, we're here at a cafe.
It's not just me and Bryce.
We're with a never-before-guest on ADSP the podcast.
And honestly, probably he should be on a Raycast, not ADSP the podcast.
We have enough incentive for him to be on the podcast.
I mean, he hasn't said no at this point.
You've got to keep your voice down. You've got to keep your voice down.
You've got to keep your voice down.
We had some noise complaints earlier, and...
I got PTSD from when this just happened right now.
Anyways, no, we're going to let Phineas introduce him.
Connor doesn't seem to understand how consent works.
You have to ask people.
You can't just assume that a non-response means yes.
Phineas, do you want to be on our podcast?
Sounds good.
Also, T, I mean, he hasn't, I don't, have you, you haven't been mentioned by name on ADSP, the podcast.
You probably haven't been mentioned, what do you call it?
Synonymously.
Like, yeah, without, it? Synonymously. Yeah, synonymously.
And that was the same thing on a Raycast, but then at some point
you did get mentioned by name, because I think I
asked you if it was okay. So I have
subtly asked for consent. Anyways, I'm going to
hand it back to Bryce. We just finished dinner.
There's some surprise event happening later.
Nobody knows what's happening.
And Franchio, it was pretty
delicious. It was pretty delicious.
It's quite charged okay good back to bryce
so uh after we had our revelation earlier today that um that reduced by key is implemented in
terms of scan in cub in terms of scan by key in cub which is even worse um i know i we've spent a lot of time
in this pot i had a revelation and we've spent a lot of time in this podcast talking about how
like everything's implemented in terms of you know a couple of primitive algorithms but like more and
more it's just everything is implemented in terms of scan and i realized that all of the CUB algorithms, I think, in some way, shape, and form
use scan other than
just straight reduce
and for. Histogram,
I think, uses scan. The
sorts definitely uses scan.
Scan uses scan.
All the
run length encoding, run length decoding
uses scan.
What other algorithms are in Cub?
Copy if uses scan.
Like they're all just fancy scans is what I'm saying.
Everything is just a fancy scan.
First of all, listeners, I'm disappointed.
This is what we turned the mic on in the middle of dinner for.
This is a little bit disappointing.
Yes, thank you very much.
It was delicious.
Yeah, that's true.
We did have ice cream last night.
Anyways, disappointed that this is what we turned the mic on.
We're not giving it back to you yet.
I've got the power.
But transforms? Transforms are not giving it back to you yet. I've got the power. But transforms?
Transforms are not implemented in terms of the scan.
Should we hand it over to Phineas?
Also, Phineas, in case we don't let him say anything,
because I feel like Bryce is not going to let him,
we will make sure we put a link.
That's cool.
We will put a link to a talk he was featured in.
Nick Cyrus gave a talk at KXCon 2023.
It was fantastic.
You also gave a talk.
What am I even talking about?
I was going to say, I remember it from the Q Gods.
He's a Q God, but he also gave a fantastic talk himself.
Both will be linked.
Hopefully, Phineas will get to say something,
but Bryce is throwing a tantrum across the table
because I'm hoarding the mic.
I don't even remember what I was going to say.
What was I going to say?
What were we talking about?
Oh, yeah.
I'm talking about solely Cub algorithms,
which are like the underlying low-level primitives.
And in Cub's world, like a transform and a 4H are both just 4.
Like Cub has a device 4, and both are just forms of that.
Anyways, I'll let other people talk now.
I don't have anything to say.
Should we interview Phineas?
Let's interview Phineas.
Probably the episode of Which is the Mother Algorithm.
Oh, yeah.
Wait, you want to discuss that again?
No, you should just point out this was already the claim.
Oh, yeah, yeah, yeah.
One of your guests already claimed this.
We already... I think you actually said claim. Oh, yeah, yeah, yeah. One of your guests already claimed this. We already...
I think you actually said this.
Yeah, I probably did.
Just talking to this thing.
I think Bryce said that, actually, on an earlier episode,
that scan was the ultimate algorithm,
because you can implement everything as a scan.
Or maybe he claimed that reduce was a scan,
because if you just keep enough state around,
your reduce can also work as a scan,
technically.
So it really reduces every algorithm.
Yeah.
Yeah, I believe that.
So there is all this reduces.
I mean, yeah,
we will link to ADSP episode,
what was it?
I don't know.
120?
That's a guess.
We'll see how,
I'll look it up after,
but Ben was on.
I'm going to look it up
right now.
Ben Dean was on the episode.
Tristan Brindles was on the episode.
All right, folks. As you might be able to tell, Bryce was in charge of the audio. He brought his
own lapel Rode microphone. Questionable, questionable audio quality. And somehow,
he silenced it or muted the mic for a minute and a half here.
So you're going to miss a minute and a half of conversation.
We apologize.
I apologize on behalf of Bryce.
Back to the conversation.
Talk about whether prior should keep the first element or not.
Yes.
Yeah.
I mean, I think obviously prior should keep the first element.
You can reconstruct the original list that way, and it makes all the lengths the same,
so it's just very convenient for the shape.
But Connor here thinks that you should drop the first element
because it's not really part of the reduction.
So, first of all, prior is the correct name for adjacent difference,
and I've actually adopted the name prior in my own...
Sorry?
Who wants to drop the first element? So Bryce... Sorry? Who wants to drop the first element?
So Bryce has asked who wants to drop the first element.
The answer is me.
And the answer is we did.
Because we added in C++23 a view called adjacent transform,
which is the modern version of adjacent difference.
And that does drop the first element.
If you don't drop the first element, what do you do?
Keep the first element.
You can reconstruct. So cumulative sums
and deltas gives you the original
list if you have the first element. Otherwise, you're missing
a constant. It's like integrating a calculus.
If you drop C, you don't have the C anymore.
You're done. And this is brilliant because
deltas... You just want to keep it
just on its own? You just want to want to keep it just on its own?
You just want to keep the first element just on its own?
Return the len... you have an iterate... you have a container, it has n elements in it,
and the first element is the same element as it was in the original list.
But what if the type changes?
This is the problem with typed languages.
That's the problem.
That's the problem. So this is, that's exactly it.
So adjacent difference has a constraint and because it copies that first element in order to maintain
the circular round trip between partial sum, etc. Or like ratios and we'll give you back the
original series. So if you take ratios between all the numbers,
you know what the percent changes are,
and then you can go back to the original list.
This is like a very convenient feature to have around.
When you do, a lot of times you're looking for like,
I want to find all the cases where something big happened,
and then I want the actual numbers.
I don't have to keep around two lists around with me
the whole time, I just keep around the one element
and then come back.
Bryce is ordering his dessert here.
I will have the chocolate fudge cake.
I will have the tiramisu.
Connor?
Okay.
Connor being boring.
Where were we?
I've had the two Shake Shack burgers.
Did you have a Shake Shack?
I don't like shakes. Bryce asked me.
You don't like shakes?
What is wrong with you?
This thing has auto gain.
It's supposed to be idiot proof.
I had two burgers at Shake Shack.
Then I had lunch at a
Thai place where I had a couple chicken wings and pad thai.
Yeah, yeah.
I told you I was meeting up with an acquaintance, Ashley, if you're listening.
And what?
Ashley.
She's another member from the KXQ community.
Honestly, I don't want to offend our C++ listeners, but the best community
is the array community.
They think the right way.
Phineas here, he
swims in the land of folds and scans.
We just spend
a little bit of time there because we work at NVIDIA
and it's important to us, but C++
programmers, they're like, a scan? What's that?
It's like a partial sum? Not everybody.
The listeners of this podcast probably know.
But most of the C++ devs out there, they're still catching up on it.
Two potato type things.
Yeah, one potato, two potato.
What's the resolution to the fire thing?
How did you two resolve your differences?
Oh, we didn't.
Oh, Bryce.
I asked, how did you two resolve your differences?
And the answer is, I mean, I think we came to an agreement that...
I think for type languages, it makes sense to be...
You have to have at least two types, right?
You need to be able to maintain the fact that, like,
if you change types, you can't keep this first element.
But, like I said, it's, like, much more memory efficient
to, like, not keep around the original list
if you can go back and forth between the two.
And in Q and array languages, your first type can be different.
You can end up with a heterogeneous list, right?
So you can't do that in type languages.
And I think so what we ended up agreeing on is that prior and adjacent difference...
What?
I don't see a reason that you couldn't have a list and an iterator in C++ where the, well.
You can create an iterator, but you can't create a container that has nice properties.
What are you going to wrap it in?
What are you going to wrap it in, like a std variant?
Absolutely not.
So the point being is for convenience, adjacent difference and prior,
especially in array languages that are dynamically typed, is very nice.
However, if you want to design the perfect generic algorithm
that you are able to implement all the different flavors of kind of deltas
and adjacent difference or adjacent transform,
you want the one that doesn't bundle that first type there.
And then from there, you can kind of do everything you want.
There's another solution.
So the other solution is that your function,
the operator you're going to apply between all the adjacent elements
should have a default-like type for that.
Like identity value.
If you have that, then the identity value can push it into the first element, into the correct identity type for the first element.
So if you want to do a prior transform where you're bundling your prior with a binary operation, which is typically what you're doing, right?
Prior and adjacent transform, they're higher order functions that take a binary operation. If your binary operation and your type form, what is it, a monad or a semigroup or whatever it is from category theory,
that comes with an identity value.
And so if you've got integers with plus, that identity value is going to be zero.
And so you can just use that value as the first element.
And then instead of ending up with n minus one elements, you end up with n. just use that value as the first element,
and then instead of ending up with n minus 1 elements,
you end up with n.
And that's the other thing I haven't commented on,
is the convenience thing about the way prior works is that, like, in array languages,
you like to end up with equally lengthed arrays
because then your scalar operations work,
and you don't end up having to jump through hoops.
We care about that less in, like, a ranges-like library
because it's
a sequence.
What about for zip?
Does zip have laziness
built in? I actually don't know how our zip works.
Of course. Sorry, not laziness, like
short-circuiting. Like if one length is
different.
It goes to whichever is the shortest.
Yeah, yeah. And then there's equivalent algorithms.
Right, so Phineas just asked, what about zip?
I said something about laziness, but in my head I meant short-circuiting
because in functional languages like Haskell and whatnot,
they usually have two different algorithms,
zip that short-circuits to whatever the shorter length is,
and then they also have one called zip longest,
which they have varying flavors.
Sometimes you can give it, like, a default value to put in there.
Sometimes it'll just fill it with, like, some Knoll or something like that.
And we're going to hand it back to Bryce because he's had a couple slices of his cake,
and I'm guessing we're going to get a review here.
It's a pretty good cake.
I mean, it's hard to believe it's vegan.
What do you have against vegans, Bryce?
I literally brought us all. I don't have any problem with vegans.
I literally brought us all to a vegan restaurant,
and I'm trying to convert my meat-only girlfriend
to at least sometimes enjoy a vegan meal.
So I don't want to alarm anyone,
but my phone has 30% battery left,
and we will need some percentage battery to navigate to various places and various things.
So we've got to interview rapidly.
I mean, anything we've talked about prior.
I mean, I've got to say, too, prior.
Actually, in my toy work project, which hopefully isn't a toy in the future,
I used the name Prior because I think it's honestly better than Adjacent Transform.
It's shorter.
I do love short names.
I changed Deltas to Diffs to match NumPy a little bit more,
which I actually think Diffs is a little bit too close to one other thing.
But things are matching, quality matching.
Yeah, yeah.
So it's not – all right, I'm going to take a bite of this cake.
I'm not going to identify it as a vegan cake.
Cake is cake, folks.
You know, we don't discriminate here over in Canada.
I guess we're in America right now.
Oh, that's not clear.
But you should mention that I will be in Canada next week.
Yeah, it's pretty crazy.
Well, now I got a mouthful of cake.
Oh, this cake is good.
And I'm not a big tiramisu fan, you know.
It's too complicated. It's got...
What do they call them? What are the sticks?
The fingers? The lady fingers?
The lady fingers, yeah.
Don't get me wrong. I'll eat a tiramisu if...
I love tiramisu.
But yeah, Diff's deltas, prior.
I mean, this actually probably is going to come out in like a month and a half from now
because we've got so much content once again.
But I will link to the episode of ArrayCast.
We're going to be having on someone from KX.
We initially were hoping to get Andrew Wilson from the KX Core team.
I think he said no because he doesn't like us.
Not because he doesn't like us.
I'm just joking around, obviously.
And then we're juggling a few people.
Why did he say no?
I don't know.
Some people just...
I think he has to speak at KXCon because he works for KX and they force him to
but
language people aren't necessarily
the same people that want to get up in front of a crowd
or get in front of a mic
we're anomalies
especially you
although
what were you saying earlier
Connor stop referring to our podcast as a pod
I was like what are you talking about
it's such a large part of our life now I refer to our conversations as was
that one on pod or off pod because I can't there's been so many conversations at this point I can't
keep track I did sort of typically uh record uh Connor earlier today where we were just talking
about some problem because I'm like this could be good podcast content and there are times when I really want to talk about something with Connor but I'll
wait because I'm like well if we start talking about this it might be really good and then I
would want to record it for the podcast that's what happened in Venice when when we were going
to go on that walk I literally was like aha I have it and you were like stop talking and I was
like I haven't even started talking you're like you were about to I mean that walk, I literally was like, aha, I have it. And you were like, stop talking. And I was like, I haven't even started talking.
And you were like, you were about to.
And you wouldn't let me start explaining myself until we were recording.
But you know what?
For all the shit you give me about not contributing to the production values of this podcast,
I do make sure we're recording the good content.
Honestly, this is going to result in us at like 50.
Like once our kids are, or I guess actually we're already in our 30s.
So when we're 60, once our kids are like up in university,
we're going to like get our own, you know,
it won't even be Netflix at that point because GPTs will be so good,
we won't even call them that anymore.
Everyone will be able to go make their own reality TV show.
All we need to do is we'll need to put a couple rings up in the corners of our apartments
and feed it some footage, and we'll be able to have our own, the ADSP, the reality TV show.
And it'll be fantastic.
We won't have to edit anything.
It'll just know when Bryce says certain topics, it'll automatically filter it up.
And then that one time it'll mess up and then we'll get canceled.
We will definitely get canceled before 30 years from now.
Yes, I was going to say we'll link to whoever we do bring on from KX.
Phineas will be on.
I mean, he's already been on ADSP now.
He'll be on ArrayCast at some point.
How does it get into array programming language?
Oh yeah, great question. So Bryce just asked
how did you get into array programming
and specifically
actually you don't actually use Q on your day job.
You used to use Q. Last time I checked
you were using Scala. But anyways, yeah.
Give us a little story of how you
came to be a Q god. Because he is a Q god.
I don't know about a god, but
the way
I got into Q is a queue god. I don't know about a god, but the way I got into queue
was actually quite simple.
I was working on a problem which is a little bit related
to the classic programming problem,
which is suppose I want to know the highest floor
I can drop an egg off of, and I have two eggs.
And so there's like a canonical solution to this problem.
But I was thinking, what if the cost of testing
is not exactly like, you don't have two eggs,
but really think about it as the cost is
proportional to your guesses.
So if you overshoot, you pay some cost
proportional to the error.
And the way this is, the interesting case for this,
that's most interesting, is drug dosing.
I want to find the right dosage of a particular drug,
and I need to know how much your titration.
The way that doctors do this is really stupid.
They go really slowly up from zero
because they don't want to hurt you.
But you're obviously getting harmed
by not having the right drug the entire way through.
So what you should be doing is doing binary search.
But obviously, you can't just do binary search and give the guy a lethal dose of the drug and kill them.
So the idea is you need to find some optimal algorithm.
It's a skewed binary search.
It's a skewed binary search.
So I went to go and create a grid search over this whole thing.
And I started writing my code in Python.
And I had heard that Q was this amazing really fast programming language. And I learned enough
Q to implement the language and have
the solution finish before my Python code
finished running. And I was like, okay, I'm hooked.
Oh, wait, so it was running in
Python. It was still running in Python. And then
you learned enough and coded up, and
then it was done while your Python
solution was still running. Damn.
Also, Python was really slow
back then. This is like Python 2.7.
You know what?
Python is still very slow.
They got the Shannon plan,
though. For those that know, they know.
If you know, you know. Bryce doesn't.
What is the
canonical solution to the two eggs?
You gotta have the mic.
What is the canonical solution to the two eggs? You've got to have the mic. What is the canonical solution to the two eggs
problem?
You basically exponentially
go up the floor. So you go, start from
the first floor, then go to
two, go to four, go all the way up until
the first egg drops, and then go back to the first
floor that the egg didn't drop and then
just walk up one floor at a time until
both eggs are gone.
I mean, you can see why that would not be very welcome to hospitals.
You have, like, two patients are dead now.
Start with one barely living patient.
And for the record, I mean, are the doctors worried about hurting patients
or are they just worried about getting sued?
Do we really know? We don't know.
I didn't understand the better solution to this. hurting patients or are they just worried about getting sued? Do we really know? We don't know.
I didn't understand the better solution
to this,
to the dosing problem. What's the better solution?
I wasn't clear on that.
The better solution is depending on the cost
of overshooting, and it's going to vary,
but you can think of it as proportional to the error
that you overdose by.
By proportional, I mean
it might be some function
of the error between the true value.
Suppose you're supposed to give that patient
10 milligrams of something, and you give them 15.
That 5 is the error, and some function of 5 is the issue.
And it might be some crazy function
where it's exponential, right?
The idea is that cost is going to be factored
into your binary search so that instead of picking the absolute median between the two
values you're looking at, you're going to pick a quarter and go to 25% or 10% and so
on.
But for something like overriding somebody, isn't there a step function to that
cost because there's the step function of they're alive versus they're dead yeah so the the way that you could probably you
approximate this is basically with some like yeah with some exponential you you can approximate it
with some exponential cost function so basically like as the error grows larger you just like
it goes to your cost cost goes effectively to infinity.
But that's kind of how you model it.
Be sure to check these show notes either in your podcast app or at ADSPthepodcast.com
for links to anything we mentioned in today's episode,
as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions.
Thanks for listening. We hope you enjoyed and have a great day.
Low quality, high quantity.
That is the tagline of our podcast.
It's not the tagline.
Our tagline is chaos with sprinkles of information.