Algorithms + Data Structures = Programs - Episode 176: 🇺🇸 prior, deltas & Dinner with Phineas

Episode Date: April 5, 2024

In this episode, Conor and Bryce chat with Phineas Porter about the functions delta, prior and more over dinner.Link to Episode 176 on WebsiteDiscuss this episode, leave a comment, or ask a question (...on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest:Phineas Porter is a Software Developer at Jump Trading. Previously, he held roles in quant research and technology at UBS and Citibank. He graduated from Columbia University in 2014 with a degree in Operations Research. He lives in New York City with his wife, daughter (Ada) and son (Solomon).Show NotesDate Recorded: 2024-03-06Date Released: 2024-04-05Franchia Vegan Cafethurst::reduce_by_keythrust::inclusive_scanthurst::inclusive_scan_by_keyKXCON23 | Nick Psaris | Matching Algorithms in q and kdbKXCON23 | Phineas Porter | Dynamic Programming Approach to Content Aware Image Resizing | kdb at Jump Tradingthrust::reduceADSP Episode 131: One Algorithm To Rule Them All!Q priorC++23 std::views::adjacent_transformC++98 std::adjacent_differenceC++98 std::partial_sumC++17 std::variantQ deltasC++23 std::views::zipNumPy diffsArrayCast Episode 76: Conor McCarthy, PyKX and kdb+ 4.1ADSP Episode 147: 🇸🇮 SRT23 - Parallel std::unique Revisited (on a Walk in Venice)The Two Egg ProblemIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 Honestly, I don't want to offend our C++ listeners, but the best community is the array community. It's, you know, they think the right way, you know. You know, Phineas here, he swims in the land of folds and scans. You know, we just like, we spend a little bit of time there because we work at NVIDIA and it's important to us. But like C++ programmers, they're like a scan? What's that? It's like a partial sum, you know. Not everybody. The listeners of this podcast probably know. But most of the C++ devs they're like, a scan? What's that? It's like a partial sum, you know? Not everybody. The listeners of this podcast probably know.
Starting point is 00:00:31 But most of the C++ devs out there, you know, they're still catching up on... One potato, two potato. Yeah, one potato, two potato. welcome to adsp the podcast episode 176 recorded on march 6 2024 my name is connor and today with my co-host bryce we chat about the algorithms from the q language prior deltas and more with phineas porter over dinner i don't know i don't have a plan, Connor. But I've had a revelation, and the people need to know about the revelation. Do you want to do the introduction? This is just like when we were doing the Slovenia 2023 road trip, and every time he'd start recording, he'd be like,
Starting point is 00:01:22 all right, recording again. And I'd be like, no, Bryce, we have to do the intro we are at francia this isn't getting picked up this isn't getting picked up we got one mic now and it's tiny uh we're at the francia vegan cafe a place that ramona does not like uh that bry Bryce just said she's never even been here. She's not a big fan of vegan food. She's Polish, so she likes her meat and potatoes. And Bryce's cholesterol is too high, so he's not allowed meat.
Starting point is 00:01:55 I myself did go to Shake Shack for lunch and got two double hamburgers with bacon. I calculated the calories. It was 1,280 or something. It was worth every calorie. Anyways, we're here at a cafe. It's not just me and Bryce. We're with a never-before-guest on ADSP the podcast.
Starting point is 00:02:14 And honestly, probably he should be on a Raycast, not ADSP the podcast. We have enough incentive for him to be on the podcast. I mean, he hasn't said no at this point. You've got to keep your voice down. You've got to keep your voice down. You've got to keep your voice down. We had some noise complaints earlier, and... I got PTSD from when this just happened right now. Anyways, no, we're going to let Phineas introduce him.
Starting point is 00:02:45 Connor doesn't seem to understand how consent works. You have to ask people. You can't just assume that a non-response means yes. Phineas, do you want to be on our podcast? Sounds good. Also, T, I mean, he hasn't, I don't, have you, you haven't been mentioned by name on ADSP, the podcast. You probably haven't been mentioned, what do you call it? Synonymously.
Starting point is 00:03:05 Like, yeah, without, it? Synonymously. Yeah, synonymously. And that was the same thing on a Raycast, but then at some point you did get mentioned by name, because I think I asked you if it was okay. So I have subtly asked for consent. Anyways, I'm going to hand it back to Bryce. We just finished dinner. There's some surprise event happening later. Nobody knows what's happening.
Starting point is 00:03:21 And Franchio, it was pretty delicious. It was pretty delicious. It's quite charged okay good back to bryce so uh after we had our revelation earlier today that um that reduced by key is implemented in terms of scan in cub in terms of scan by key in cub which is even worse um i know i we've spent a lot of time in this pot i had a revelation and we've spent a lot of time in this podcast talking about how like everything's implemented in terms of you know a couple of primitive algorithms but like more and more it's just everything is implemented in terms of scan and i realized that all of the CUB algorithms, I think, in some way, shape, and form
Starting point is 00:04:06 use scan other than just straight reduce and for. Histogram, I think, uses scan. The sorts definitely uses scan. Scan uses scan. All the run length encoding, run length decoding
Starting point is 00:04:22 uses scan. What other algorithms are in Cub? Copy if uses scan. Like they're all just fancy scans is what I'm saying. Everything is just a fancy scan. First of all, listeners, I'm disappointed. This is what we turned the mic on in the middle of dinner for. This is a little bit disappointing.
Starting point is 00:04:46 Yes, thank you very much. It was delicious. Yeah, that's true. We did have ice cream last night. Anyways, disappointed that this is what we turned the mic on. We're not giving it back to you yet. I've got the power. But transforms? Transforms are not giving it back to you yet. I've got the power. But transforms?
Starting point is 00:05:07 Transforms are not implemented in terms of the scan. Should we hand it over to Phineas? Also, Phineas, in case we don't let him say anything, because I feel like Bryce is not going to let him, we will make sure we put a link. That's cool. We will put a link to a talk he was featured in. Nick Cyrus gave a talk at KXCon 2023.
Starting point is 00:05:32 It was fantastic. You also gave a talk. What am I even talking about? I was going to say, I remember it from the Q Gods. He's a Q God, but he also gave a fantastic talk himself. Both will be linked. Hopefully, Phineas will get to say something, but Bryce is throwing a tantrum across the table
Starting point is 00:05:46 because I'm hoarding the mic. I don't even remember what I was going to say. What was I going to say? What were we talking about? Oh, yeah. I'm talking about solely Cub algorithms, which are like the underlying low-level primitives. And in Cub's world, like a transform and a 4H are both just 4.
Starting point is 00:06:03 Like Cub has a device 4, and both are just forms of that. Anyways, I'll let other people talk now. I don't have anything to say. Should we interview Phineas? Let's interview Phineas. Probably the episode of Which is the Mother Algorithm. Oh, yeah. Wait, you want to discuss that again?
Starting point is 00:06:19 No, you should just point out this was already the claim. Oh, yeah, yeah, yeah. One of your guests already claimed this. We already... I think you actually said claim. Oh, yeah, yeah, yeah. One of your guests already claimed this. We already... I think you actually said this. Yeah, I probably did. Just talking to this thing. I think Bryce said that, actually, on an earlier episode,
Starting point is 00:06:35 that scan was the ultimate algorithm, because you can implement everything as a scan. Or maybe he claimed that reduce was a scan, because if you just keep enough state around, your reduce can also work as a scan, technically. So it really reduces every algorithm. Yeah.
Starting point is 00:06:49 Yeah, I believe that. So there is all this reduces. I mean, yeah, we will link to ADSP episode, what was it? I don't know. 120? That's a guess.
Starting point is 00:06:59 We'll see how, I'll look it up after, but Ben was on. I'm going to look it up right now. Ben Dean was on the episode. Tristan Brindles was on the episode. All right, folks. As you might be able to tell, Bryce was in charge of the audio. He brought his
Starting point is 00:07:15 own lapel Rode microphone. Questionable, questionable audio quality. And somehow, he silenced it or muted the mic for a minute and a half here. So you're going to miss a minute and a half of conversation. We apologize. I apologize on behalf of Bryce. Back to the conversation. Talk about whether prior should keep the first element or not. Yes.
Starting point is 00:07:39 Yeah. I mean, I think obviously prior should keep the first element. You can reconstruct the original list that way, and it makes all the lengths the same, so it's just very convenient for the shape. But Connor here thinks that you should drop the first element because it's not really part of the reduction. So, first of all, prior is the correct name for adjacent difference, and I've actually adopted the name prior in my own...
Starting point is 00:08:04 Sorry? Who wants to drop the first element? So Bryce... Sorry? Who wants to drop the first element? So Bryce has asked who wants to drop the first element. The answer is me. And the answer is we did. Because we added in C++23 a view called adjacent transform, which is the modern version of adjacent difference. And that does drop the first element.
Starting point is 00:08:25 If you don't drop the first element, what do you do? Keep the first element. You can reconstruct. So cumulative sums and deltas gives you the original list if you have the first element. Otherwise, you're missing a constant. It's like integrating a calculus. If you drop C, you don't have the C anymore. You're done. And this is brilliant because
Starting point is 00:08:42 deltas... You just want to keep it just on its own? You just want to want to keep it just on its own? You just want to keep the first element just on its own? Return the len... you have an iterate... you have a container, it has n elements in it, and the first element is the same element as it was in the original list. But what if the type changes? This is the problem with typed languages. That's the problem.
Starting point is 00:09:04 That's the problem. So this is, that's exactly it. So adjacent difference has a constraint and because it copies that first element in order to maintain the circular round trip between partial sum, etc. Or like ratios and we'll give you back the original series. So if you take ratios between all the numbers, you know what the percent changes are, and then you can go back to the original list. This is like a very convenient feature to have around. When you do, a lot of times you're looking for like,
Starting point is 00:09:33 I want to find all the cases where something big happened, and then I want the actual numbers. I don't have to keep around two lists around with me the whole time, I just keep around the one element and then come back. Bryce is ordering his dessert here. I will have the chocolate fudge cake. I will have the tiramisu.
Starting point is 00:09:52 Connor? Okay. Connor being boring. Where were we? I've had the two Shake Shack burgers. Did you have a Shake Shack? I don't like shakes. Bryce asked me. You don't like shakes?
Starting point is 00:10:11 What is wrong with you? This thing has auto gain. It's supposed to be idiot proof. I had two burgers at Shake Shack. Then I had lunch at a Thai place where I had a couple chicken wings and pad thai. Yeah, yeah. I told you I was meeting up with an acquaintance, Ashley, if you're listening.
Starting point is 00:10:35 And what? Ashley. She's another member from the KXQ community. Honestly, I don't want to offend our C++ listeners, but the best community is the array community. They think the right way. Phineas here, he swims in the land of folds and scans.
Starting point is 00:10:55 We just spend a little bit of time there because we work at NVIDIA and it's important to us, but C++ programmers, they're like, a scan? What's that? It's like a partial sum? Not everybody. The listeners of this podcast probably know. But most of the C++ devs out there, they're still catching up on it. Two potato type things.
Starting point is 00:11:12 Yeah, one potato, two potato. What's the resolution to the fire thing? How did you two resolve your differences? Oh, we didn't. Oh, Bryce. I asked, how did you two resolve your differences? And the answer is, I mean, I think we came to an agreement that... I think for type languages, it makes sense to be...
Starting point is 00:11:31 You have to have at least two types, right? You need to be able to maintain the fact that, like, if you change types, you can't keep this first element. But, like I said, it's, like, much more memory efficient to, like, not keep around the original list if you can go back and forth between the two. And in Q and array languages, your first type can be different. You can end up with a heterogeneous list, right?
Starting point is 00:11:52 So you can't do that in type languages. And I think so what we ended up agreeing on is that prior and adjacent difference... What? I don't see a reason that you couldn't have a list and an iterator in C++ where the, well. You can create an iterator, but you can't create a container that has nice properties. What are you going to wrap it in? What are you going to wrap it in, like a std variant? Absolutely not.
Starting point is 00:12:20 So the point being is for convenience, adjacent difference and prior, especially in array languages that are dynamically typed, is very nice. However, if you want to design the perfect generic algorithm that you are able to implement all the different flavors of kind of deltas and adjacent difference or adjacent transform, you want the one that doesn't bundle that first type there. And then from there, you can kind of do everything you want. There's another solution.
Starting point is 00:12:47 So the other solution is that your function, the operator you're going to apply between all the adjacent elements should have a default-like type for that. Like identity value. If you have that, then the identity value can push it into the first element, into the correct identity type for the first element. So if you want to do a prior transform where you're bundling your prior with a binary operation, which is typically what you're doing, right? Prior and adjacent transform, they're higher order functions that take a binary operation. If your binary operation and your type form, what is it, a monad or a semigroup or whatever it is from category theory, that comes with an identity value.
Starting point is 00:13:33 And so if you've got integers with plus, that identity value is going to be zero. And so you can just use that value as the first element. And then instead of ending up with n minus one elements, you end up with n. just use that value as the first element, and then instead of ending up with n minus 1 elements, you end up with n. And that's the other thing I haven't commented on, is the convenience thing about the way prior works is that, like, in array languages, you like to end up with equally lengthed arrays
Starting point is 00:13:56 because then your scalar operations work, and you don't end up having to jump through hoops. We care about that less in, like, a ranges-like library because it's a sequence. What about for zip? Does zip have laziness built in? I actually don't know how our zip works.
Starting point is 00:14:15 Of course. Sorry, not laziness, like short-circuiting. Like if one length is different. It goes to whichever is the shortest. Yeah, yeah. And then there's equivalent algorithms. Right, so Phineas just asked, what about zip? I said something about laziness, but in my head I meant short-circuiting because in functional languages like Haskell and whatnot,
Starting point is 00:14:35 they usually have two different algorithms, zip that short-circuits to whatever the shorter length is, and then they also have one called zip longest, which they have varying flavors. Sometimes you can give it, like, a default value to put in there. Sometimes it'll just fill it with, like, some Knoll or something like that. And we're going to hand it back to Bryce because he's had a couple slices of his cake, and I'm guessing we're going to get a review here.
Starting point is 00:14:58 It's a pretty good cake. I mean, it's hard to believe it's vegan. What do you have against vegans, Bryce? I literally brought us all. I don't have any problem with vegans. I literally brought us all to a vegan restaurant, and I'm trying to convert my meat-only girlfriend to at least sometimes enjoy a vegan meal. So I don't want to alarm anyone,
Starting point is 00:15:23 but my phone has 30% battery left, and we will need some percentage battery to navigate to various places and various things. So we've got to interview rapidly. I mean, anything we've talked about prior. I mean, I've got to say, too, prior. Actually, in my toy work project, which hopefully isn't a toy in the future, I used the name Prior because I think it's honestly better than Adjacent Transform. It's shorter.
Starting point is 00:15:53 I do love short names. I changed Deltas to Diffs to match NumPy a little bit more, which I actually think Diffs is a little bit too close to one other thing. But things are matching, quality matching. Yeah, yeah. So it's not – all right, I'm going to take a bite of this cake. I'm not going to identify it as a vegan cake. Cake is cake, folks.
Starting point is 00:16:17 You know, we don't discriminate here over in Canada. I guess we're in America right now. Oh, that's not clear. But you should mention that I will be in Canada next week. Yeah, it's pretty crazy. Well, now I got a mouthful of cake. Oh, this cake is good. And I'm not a big tiramisu fan, you know.
Starting point is 00:16:34 It's too complicated. It's got... What do they call them? What are the sticks? The fingers? The lady fingers? The lady fingers, yeah. Don't get me wrong. I'll eat a tiramisu if... I love tiramisu. But yeah, Diff's deltas, prior. I mean, this actually probably is going to come out in like a month and a half from now
Starting point is 00:16:54 because we've got so much content once again. But I will link to the episode of ArrayCast. We're going to be having on someone from KX. We initially were hoping to get Andrew Wilson from the KX Core team. I think he said no because he doesn't like us. Not because he doesn't like us. I'm just joking around, obviously. And then we're juggling a few people.
Starting point is 00:17:16 Why did he say no? I don't know. Some people just... I think he has to speak at KXCon because he works for KX and they force him to but language people aren't necessarily the same people that want to get up in front of a crowd or get in front of a mic
Starting point is 00:17:31 we're anomalies especially you although what were you saying earlier Connor stop referring to our podcast as a pod I was like what are you talking about it's such a large part of our life now I refer to our conversations as was that one on pod or off pod because I can't there's been so many conversations at this point I can't
Starting point is 00:17:53 keep track I did sort of typically uh record uh Connor earlier today where we were just talking about some problem because I'm like this could be good podcast content and there are times when I really want to talk about something with Connor but I'll wait because I'm like well if we start talking about this it might be really good and then I would want to record it for the podcast that's what happened in Venice when when we were going to go on that walk I literally was like aha I have it and you were like stop talking and I was like I haven't even started talking you're like you were about to I mean that walk, I literally was like, aha, I have it. And you were like, stop talking. And I was like, I haven't even started talking. And you were like, you were about to. And you wouldn't let me start explaining myself until we were recording.
Starting point is 00:18:31 But you know what? For all the shit you give me about not contributing to the production values of this podcast, I do make sure we're recording the good content. Honestly, this is going to result in us at like 50. Like once our kids are, or I guess actually we're already in our 30s. So when we're 60, once our kids are like up in university, we're going to like get our own, you know, it won't even be Netflix at that point because GPTs will be so good,
Starting point is 00:19:00 we won't even call them that anymore. Everyone will be able to go make their own reality TV show. All we need to do is we'll need to put a couple rings up in the corners of our apartments and feed it some footage, and we'll be able to have our own, the ADSP, the reality TV show. And it'll be fantastic. We won't have to edit anything. It'll just know when Bryce says certain topics, it'll automatically filter it up. And then that one time it'll mess up and then we'll get canceled.
Starting point is 00:19:28 We will definitely get canceled before 30 years from now. Yes, I was going to say we'll link to whoever we do bring on from KX. Phineas will be on. I mean, he's already been on ADSP now. He'll be on ArrayCast at some point. How does it get into array programming language? Oh yeah, great question. So Bryce just asked how did you get into array programming
Starting point is 00:19:49 and specifically actually you don't actually use Q on your day job. You used to use Q. Last time I checked you were using Scala. But anyways, yeah. Give us a little story of how you came to be a Q god. Because he is a Q god. I don't know about a god, but the way
Starting point is 00:20:04 I got into Q is a queue god. I don't know about a god, but the way I got into queue was actually quite simple. I was working on a problem which is a little bit related to the classic programming problem, which is suppose I want to know the highest floor I can drop an egg off of, and I have two eggs. And so there's like a canonical solution to this problem. But I was thinking, what if the cost of testing
Starting point is 00:20:31 is not exactly like, you don't have two eggs, but really think about it as the cost is proportional to your guesses. So if you overshoot, you pay some cost proportional to the error. And the way this is, the interesting case for this, that's most interesting, is drug dosing. I want to find the right dosage of a particular drug,
Starting point is 00:20:50 and I need to know how much your titration. The way that doctors do this is really stupid. They go really slowly up from zero because they don't want to hurt you. But you're obviously getting harmed by not having the right drug the entire way through. So what you should be doing is doing binary search. But obviously, you can't just do binary search and give the guy a lethal dose of the drug and kill them.
Starting point is 00:21:11 So the idea is you need to find some optimal algorithm. It's a skewed binary search. It's a skewed binary search. So I went to go and create a grid search over this whole thing. And I started writing my code in Python. And I had heard that Q was this amazing really fast programming language. And I learned enough Q to implement the language and have the solution finish before my Python code
Starting point is 00:21:30 finished running. And I was like, okay, I'm hooked. Oh, wait, so it was running in Python. It was still running in Python. And then you learned enough and coded up, and then it was done while your Python solution was still running. Damn. Also, Python was really slow back then. This is like Python 2.7.
Starting point is 00:21:47 You know what? Python is still very slow. They got the Shannon plan, though. For those that know, they know. If you know, you know. Bryce doesn't. What is the canonical solution to the two eggs? You gotta have the mic.
Starting point is 00:22:03 What is the canonical solution to the two eggs? You've got to have the mic. What is the canonical solution to the two eggs problem? You basically exponentially go up the floor. So you go, start from the first floor, then go to two, go to four, go all the way up until the first egg drops, and then go back to the first floor that the egg didn't drop and then
Starting point is 00:22:19 just walk up one floor at a time until both eggs are gone. I mean, you can see why that would not be very welcome to hospitals. You have, like, two patients are dead now. Start with one barely living patient. And for the record, I mean, are the doctors worried about hurting patients or are they just worried about getting sued? Do we really know? We don't know.
Starting point is 00:22:43 I didn't understand the better solution to this. hurting patients or are they just worried about getting sued? Do we really know? We don't know. I didn't understand the better solution to this, to the dosing problem. What's the better solution? I wasn't clear on that. The better solution is depending on the cost of overshooting, and it's going to vary, but you can think of it as proportional to the error
Starting point is 00:23:00 that you overdose by. By proportional, I mean it might be some function of the error between the true value. Suppose you're supposed to give that patient 10 milligrams of something, and you give them 15. That 5 is the error, and some function of 5 is the issue. And it might be some crazy function
Starting point is 00:23:20 where it's exponential, right? The idea is that cost is going to be factored into your binary search so that instead of picking the absolute median between the two values you're looking at, you're going to pick a quarter and go to 25% or 10% and so on. But for something like overriding somebody, isn't there a step function to that cost because there's the step function of they're alive versus they're dead yeah so the the way that you could probably you approximate this is basically with some like yeah with some exponential you you can approximate it
Starting point is 00:23:58 with some exponential cost function so basically like as the error grows larger you just like it goes to your cost cost goes effectively to infinity. But that's kind of how you model it. Be sure to check these show notes either in your podcast app or at ADSPthepodcast.com for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quantity.
Starting point is 00:24:24 That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.