Algorithms + Data Structures = Programs - Episode 178: Henry the Clock, chunk_by & more!

Starting point is 00:00:00 So, yes, just to recap what has just happened here. Bryce was traveling in Japan, bought a French clock in Japan from the 1900s. Well, no, technically the 1800s, which is the 19th century. Yes. And then traumatized the clock, which has a name and is now part of the podcast lore, Henri the Clock. Welcome to ADSP, the podcast episode 178, recorded on April 17th, 2024. My name is Connor and today with my co-host Bryce, we chat about Henri the Clock, Chunk Buy, and more. Do you know where I just was?

Starting point is 00:00:53 In conference, ACCU? No, no, I was in Dublin. Okay, yes, I am at ACCU in Bristol right now, but I was just in Dublin, which is technically the first ADSP road trip or the first Bryce and Connor road trip. You and I took a bus from the 2019 Belfast C++ committee meeting. We took a bus from Belfast to Dublin to go visit your sister. And if I recall correctly, I basically invited myself and you were somewhat uncomfortable about your plan, which was just to go hang out with your sister, and then suddenly you had this Bryce along with you. That's a pretty accurate description of what happened, folks.

Starting point is 00:01:44 Yeah. Yeah, but no, I am in Bristol now. a pretty accurate description of what happened, folks. Yeah. Yeah, but no, I am in Bristol now. I was in Japan for the Japan C++ Committee meeting, and then I took a bit of vacation. I went to the beautiful art island of Naoshima in the Japanese Inland inland sea um we also spent some time in kyoto where i bought a uh a clock um a clock that is going to cause a great deal of difficulty for uh this podcast connor because this is a striking clock. It strikes every 30 minutes.

Starting point is 00:02:28 And it does not have a silent mode. So there are going to be times when we are recording and when the clock suddenly starts going off. And I have a small apartment. There's nowhere in the apartment that I can put the clock where it won't be audible. Can I ask a question? Yes. What prompts someone to... I got actually two questions.

Starting point is 00:02:52 I got two questions. I got lots of questions, but I'm only gonna ask two. My first question, and I'm gonna ask these in order and then you can answer in whichever order you want. First question, what prompts someone to buy a clock that makes noise every 30 minutes? And question number two, it never is quiet? Do you not plan on sleeping

Starting point is 00:03:13 anymore? I mean, it's a small clock. It's a French carriage clock, an 1890s French carriage clock. So it's maybe four or five inches tall and like two inches wide. Um, and maybe like three inches, uh, deep. So it's not, it's not a big clock. Does that have, does that have anything to do with how much noise it makes or, or cause that doesn't answer either question. Well, yes. I mean, if it, if it, If it was a larger clock, it would make more noise. My mother just returned to the hotel room. I guess worthy of mentioning, which is that my mother is here with me at ACCU.

Starting point is 00:03:55 Does your mom want to be on the pod right now? Maybe. But let's close out the clock business. Right, so it's a small clock. But let's close out the clock business. Right. So it's a small clock. I mean, you want to know why would somebody buy a clock that chimes? And this is an excellent question because some viewers, listeners, some listeners, not viewers because this is not a TV show. Some listeners will recall that I have a dog who has made some appearances on the podcast.

Starting point is 00:04:29 And one of the things that sets off the dog is the doorbell going off. Another thing that sets off the dog, amusingly enough, is the chime that the Lyft app makes when it successfully hails you a ride. And this noise that the Lyft app makes, it does not seem to respect silent notifications. It seems like they have a bug in their app where that particular noise does not pay attention to your notifications volume. It just uses your device-like media volume. And so I can't really mute it. And every time the Lyft app dings, every time I get connected with the driver, it dings and then the dog goes crazy.

Starting point is 00:05:17 And so presumably the dog is not going to like Henri. Henri is the name of the clock, if you were wondering. And yes, I gave the clock a name. This poor clock was traumatized being transported all over Japan and then back home. It's currently not in my apartment, but it's with a clock doctor because Honorea's got an ailment that requires some care. But it makes a very lovely sound. And I was just very charmed with the clock. I was very charmed with the vintage clock store that we got the clock from in Kyoto. We not only got that clock, we also got a small cuckoo clock, not one that makes noise, just a small German cuckoo clock that Ramona

Starting point is 00:06:17 really wanted that got thrown in for free because we were buying the much more expensive clock. And then Ramona got a brooch watch, which is kind of exactly what it sounds like. It is a brooch that one can pin to their clothes that is also a watch. And I thought that there was going to be greater resistance from my partner on getting this lovely but potentially annoying clock. But no, she thought that the noise was very lovely and very soft, and she did not think that it was going to be a problem. So we're going to find out when we get home and when I get Henri back from Um, and yes, some like more, uh, more modern, if you will, um, antique clocks, um, will have a silent mode. Um, but this is a fairly old one and not a particularly fancy one. Um, so it does not have a silent mode. So yeah, at night, you just get used to it.

Starting point is 00:07:32 And I mean, it really didn't bother us when we were sleeping in a hotel room with it. So I mean, there's a disclaimer I have to put out there, which is that there's a lot of people who I have told throughout the years that like, hey, if you're ever in New York, feel free to crash at my apartment. And I should note, listener, this is not a blanket invitation. If you have not explicitly been invited to crash in my apartment, this is not for you. But if you have explicitly been

Starting point is 00:07:56 invited to crash at my apartment, you should now be aware that if you are sleeping in my apartment, there will be a French carriage clock that will be making noise every 30 minutes. So it's still a great apartment. I still highly recommend you take me up on the offer. I'm almost never in the apartment, so you'd probably have the place to yourself, but there is also the clock. So yes, just to recap what has just happened here.

Starting point is 00:08:22 Bryce was traveling in Japan, bought a French clock in Japan from the 1900s. Well, no, technically the 1800s, which is the 19th century. Yes. part of the podcast lore, Henri the clock and is having surgery right now in New York, back where Bryce is not because he's in the UK and stay tuned to find out if Henri makes it. And also you'll probably be hearing from Henri quite often if it's every 30 minutes. And we typically record for more than 30 minutes at a time. Yes.

Starting point is 00:09:04 I do think he's going to be fine. He was he's still keeping good time. It's just his striker. That's a little bit jammed right now. So this this poor clock, we we purchased this clock. And then like the day after we purchased this clock, we had to take a train and then like two buses and then a ferry to get to this island. And then like we were in a different hotel like every night and just in constant motion and traveling the entire time carrying this clock, which needs to remain upright and is very fragile

Starting point is 00:09:37 and can't be bumped. It was a whole ordeal. A very Bryce story. Man, so I've been working on my ECCU talk, which the premise for my talk, it's think parallel. And it's all, it's sort of like attempting to teach people the mindset that you and I have been cultivating for how to approach parallel programming. So it's meant to really be a deep dive into particular algorithms and data structures. And originally when I pitched the talk, I was going to like have it, I was going to do one version of the talk and it was going to sort of cover all the bases. However, that's not how things have panned out. So now the version that I'll be presenting at ECCU will be Think Parallel Part One Scan. And so instead of a talk that touches on a bunch of different topics about

Starting point is 00:10:27 parallel programming, it will touch on one topic. It will be 90 minutes of scan, where we talk about how to implement scan, how we can use scan, all sorts of things about scan. And I am very excited for this talk. And if you are a listener, some of the things that we have talked about in this podcast in the last year will certainly come up during this talk. And right in the like three or four days before I got back from Japan, like a Sunday, and then my mom got in on a Tuesday. And then on Thursday, we flew out to Ireland. And so I had like between that Sunday and that Thursday to make my talk. I'd already given an earlier version of the talk, but I needed to like write this algorithm, this new chunk by algorithm that you and I've talked about a little bit. And I needed to like write it and then like make slides for it. And then like also like we had like dinner plans and like I had like no free time. I had to unpack and repack. But I got some good news, which is that it did require me staying up till 2 a.m. But at 2 a.m., I cracked it. And I had a little silent party because at this point,

Starting point is 00:11:58 my mother was asleep on the couch. My dog was asleep on the couch. My girlfriend was asleep in the bed. And everybody else thought I was a little bit crazy that I'd been working on this all day and all night and had been just distraught during dinner that I could not crack this algorithm. I was there at my favorite sushi place, my mother and my girlfriend having a good time and chatting about life and non-work things as one should at a nice sushi place. And I'm sitting there just like in the zone with a piece of paper and a little notepad, sketching out math and stuff. And they're just like politely letting me be, but I cracked it. I cracked it. And

Starting point is 00:12:41 Connor, this algorithm, when you see it, it is going to blow your mind. It's beautiful. It is beautiful. And I benchmarked it too, and it's fast. It's like, I haven't been this excited since I cracked that rolling reduce algorithm last year. And that rolling reduce algorithm will probably be a whole other, I'll probably do a think parallel like reductions talk. All of that rolling reduce is actually really kind of a scan because it's implemented in terms of two scans. So this whole think parallel talk series might just all be scans. It's all just scans. That's all the that's all it really, really is. Anyways, I'm excited about it. I got into the conference today and I did not go to the conference at all because I needed to like actually finish making the slides. And I've got most of it together now. And I'm pretty happy with the result. Well, first of all, congratulations. Second of all, you said that we've mentioned Chunk Buy

Starting point is 00:13:59 a couple times. We had a whole three-part series chunk by yes part one two and three look man it's all my my life for the past six weeks is all a blur like i have been i was i was in japan then i was in ireland now i'm here a lot of like at some point i was in new york i was writing a lot of code doing a lot of things. It's been chaos. But yes, I do recall now that there was a three-part series. And the algorithm that I poorly describe, I think, in that recording is the algorithm that I have now successfully implemented. So the thing that made it trickier than I had initially anticipated is the boundary conditions, which is that, yes, you can implement this with a scan, but you sort of have to be clever to handle the first element and the last element.

Starting point is 00:15:07 But also then if you're tiling it up and doing it in a single pass, you have to do some trickiness to handle the beginning and the end of each tile. And at one point, I thought that there might, there's this thing in Cub called block discontinuity, which I think is how you would do this if you were doing it in Cub. But I just wanted to write this straight in standard C++.

Starting point is 00:15:29 All these algorithms, everything in my talk is implemented purely in standard C++ parallel algorithms. But yeah, I was eventually able to sort of crack that aspect of it. And I'm not sure, did we talk, when we talked about Shunk Buy, I know I posited a solution for how we could hung up on the fact that we were digging, we were going down the rabbit hole of looking at the reduce by key thrust algorithms and whether or not those were implemented with, I think we determined that it was implemented with a inclusive scan by key. And then we were talking about whether you could implement it as a reduce i think that's where we kind of left stuff right right well it turns out you can do chunk by with the scan um and the key intuition that one must realize to get to the chunk by with the scan

Starting point is 00:16:41 is that first you must realize that you could implement chunk by with multiple scans. Like if I could do like three scans, three very simple scans, I could do chunk by. It might be inefficient, but three very simple scans and I could do it. You know, like I could do scan, i could do a scan to figure out um uh you know how many groups they're going to be and what the indexes those groups are going to be and then i could do a scan to figure out the you know the start and the end of each one of those groups or i could do one scan to figure out the start and one scan to figure out the end and all of those would be fairly simple scans and then the only tricky part that you're presented with is how do I write a single scan that does

Starting point is 00:17:33 all three of those scans at the same time? And that also does the flag setting part too. And that is where the challenging part is. Because then you have to define your own new data type. And then you have to define an operator on that data type, a reduction operator on that data type that does the right thing. And one of the places where I got hung up a good bit is on the identity operation, that I defined my data type, and then I defined my plus operator on that data type, and I defined it in a way that it worked. However, to define a data type that's going to work correctly with a parallel scan, you have to support a identity operation that, at the very least, can be a left identity. So, you need some element that you can take that element

Starting point is 00:18:34 and do that element plus something, and you always get back something. And the reason that you need that is because you're going to have some temporaries that each, you know, once you split up your data, you're going to have some temporaries that each thread is going to accumulate stuff into and then communicate to the other threads. And in the case of something like a decoupled lookback scan, you're not going to only have those temporaries

Starting point is 00:19:02 within a thread, but then you're going to have temporaries that will get created and stored and communicated across different threads and will get added up in interesting ways to implement this decoupled look back mechanism, which is a whole separate podcast episode we should talk about and will also be covered in my talk. But for all of that to work, you need to be able to have a nothing object that you can add to something and get back that same something. And the original version of my code didn't work correctly once I imported it to the single pass version because I had not thought about this question of like an identity element. And I did not have the operation defined in a way

Starting point is 00:19:55 where I could add a nothing thing to it and still get back the identity. And so then I had to sit down and that was sort of the insight that I had the night that I was at the sushi place was that I needed to have some identity operation. And then I had one other insight, which was just sort of an insight about how I could reset a particular count in a scan. So sometimes in a scan, you're counting things, and at some point, you might want to reset them. Like, you know, like, it's very simple to do, like, a scan that just, like, counts all of the things, right? That's just like a scan with, like, you know, a plus operation. That's just sort of our typical scan. Well, maybe that's not a way to put it. Like, if you do a scan where you have an integer plus

Starting point is 00:21:00 and where every element is transformed into, you know, one, then that'll give you a count. Sorry, where was I? Oh, yes, yes. So sometimes you want to do something with a scan where you have some reset condition. And I was having difficulty getting the reset condition correct for chunk by, like when do I reset this count for the next chunk? But then at some point I realized that I had this other piece of information that I was tracking in my interval type. And that piece of information was what is the index, the current index of like in the output, like what number group am I on? You follow? Like one of the things that you need to keep track of is how many chunks have I created thus far? Yeah, makes sense. Yeah. Yeah. And so, okay. And so then I realized, aha, if I have the index of whatever chunk I'm currently on,

Starting point is 00:22:10 then I can tell when I need to reset easily because all I have to do is say, hmm, has the index, like if I have two elements, L and R, if the index of L is different than the index of R, then I'm at a place where I need to reset. With the caveat being that there's a special case if one of them has an index of zero because you may be accumulating something on the right. It's like a local thing. But that's the basic idea that got me to the correct solution, as opposed to the thing that I was doing prior to that, which was far more complicated and stateful, and just simply didn't work. Well, it sounds like it's going to be a super interesting talk. I mean, one of the things that I thought about...

Starting point is 00:23:01 I fear that you may be the only person that'll understand it. No, I don't think that's true at all. Definitely, we should, one of the people that i thought about i i fear that you may be the only person that'll understand it no i i don't think that's true at all definitely we should one of the people that i thought that i think it's he's come up on the podcast before for sure we just haven't reached out to him we should have on bartaj maluski because we are struggling through the terminology of like identity value and stuff but all of this stuff has names like you're referring to basically as a monoid. And then I even looked it up. I was like, what is a monoid that requires commutativity? And ChatGPT or GPT-4 to be precise said that those are referred to as commutative monoids or abelian monoids.

Starting point is 00:23:43 So basically what std reduce requires is an abelian monoid. And I actually think that in this particular case that I actually don't need it to be commutative that in particular it's an I only need because scan is non-commutative because it guarantees that things are going to be done from left to right, I only actually need it to be like a left identity, that this identity element only has to work when it's on the left-hand side of the operation. Interesting. And one thing to watch for in my talk, and I have the code on GitHub. If you go and look on the code on GitHub, one thing to watch for is whenever there is an addition or a scan operation.

Starting point is 00:24:35 In my code examples, I just use addition for simplicity. But whenever there's a scan operation, the choice of which thing is on the left and which thing is on the right is very, very purposeful. And in some places, it seems unintuitive. Like there's a couple places where you might look at the code and you might be like, well, why is he using assignment operator and plus instead of using plus equals? And the reason is that using plus equals there would do things in the wrong order. It would put the thing that should be on the left on the right. And I'm fairly certain that I got it correct in all the places. And I think if I didn't get it correct, the code

Starting point is 00:25:20 would not work because the data type that I'm using for chunk by is not at all commutative, I believe. element times your type equals that same type and it says that this is typically referred to as a left identity monoid which yeah i would do is gpt4 making that up or does it actually the case i'm sure there's someone that's a category theory expert out there but uh which is why we just bring on bartage because probably to the category theory informed person. And that's the thing is I worked my way through his book. And so when I hear you saying like, oh, it's a type, but I didn't, I didn't define this like nothing, you know, all of this stuff is actually already like there's a, you know, category theory for it. And all this stuff has names, and we should do our best to try and know those names if we're going to be. It pains me to say it.

Starting point is 00:26:25 And it pains me especially to say it with my mother in earshot. But this is one of those rare times when my college education actually taught me something useful. For those who are unaware, I dropped out of college the first time. But then the second time, I was working at LSU and was a student part-time and then eventually got a degree there. And my degree was in applied math. And I actually think I would have been stuck and given up on this problem if not for the realization that the issue that I was having was that I did not have a sufficient, that I could not from my actual math schooling. So yeah, there is something that I think I learned in school. And as soon as I had that realization,

Starting point is 00:27:37 I did something that like every now and then, every couple of months or so, there's a time where I come across something in work that requires me to go and do math. And that's what I was doing at the sushi restaurant. I was like, once I had that realization, I was like, all right, I'm going to sit down and I'm going to approach this not as a math, can I define these operations in a very abstract, pseudocode-y fashion? And then can I come up with a proof that this works? And I do somewhere in my apartment in New York have the sushi restaurant paper where I did the proof. Well, there you have it, folks. Lots to look forward to. If you're at ACCU, well, when is your talk before or after Friday? It's going to be Friday.

Starting point is 00:28:35 It's going to be Friday. Is it Friday? Friday. In the afternoon? Six hours ahead, Bristol, right? Yes. Yes. This podcast will air before the talk oh perfect uh so you know

Starting point is 00:28:48 go to bryce's talk if you happen to be at accu if not link will be in the show notes for when it goes online we'll get an update on on i do want to circle back to my my fear that that my talks are not understood which um i i fear that i lose people when i give talks. Or I know that I lose people. And it's funny because on the one hand, I think one of the things that makes me a good speaker is that I try not to have too much assumed knowledge and I try to do things very from first principles. And I think that's one of the things

Starting point is 00:29:18 that's been successful throughout my time in tech I think has been my desire to not assume too much knowledge when presenting things. But on the other hand, I can feel and see when I'm losing people and when I'm moving too fast. And when things seem like very, very obvious set of conclusions to me does not necessarily track for everybody. And I do I do sometimes worry that that I lose people and that and that I'm not sufficiently communicating my ideas. Yeah, I wouldn't be too worried about it. As long as it's the people who don't think they're losing folks. It's like, you know, crazy people aren't worried about being crazy.

Starting point is 00:30:19 Or no, wait. Yeah, yeah. That's exactly it. It's like if you think you're crazy, you're probably not crazy. Meaning that at least it's on your radar i know all the time that i'm losing folks when i start talking about you know the hierarchy of combinators and combinatory logic and a bunch of folks pointed out in the comments but like one of my favorite comments on one of these videos was like you know not useful in my day job and was a bit confusing but uh i appreciate the guy's enthusiasm

Starting point is 00:30:46 like he's clearly excited and uh you know as long as i feel like yeah they are able to pick up something even if they end up getting lost at some point i mean there's tons of talks that i've watched not because necessarily the presenter was doing a bad job but the requisite knowledge that you needed in order to be able to keep up i just didn't have at the time and then you go back you watch their talk later and potentially it hits like in instead of only understanding 10 you understand 90 so it's uh you just hopefully you put you know at least intermediate or expert on the label and then people won't be disappointed yeah i pretty much always do. Yeah. Yeah. But it's also like one of those things

Starting point is 00:31:27 where it's like, I think I barely have an understanding of how and why these things work. It's not so bad for chunk buy, actually. For chunk buy, I think it's pretty straightforward to understand. For the rolling reduction one,

Starting point is 00:31:43 there's a component of the rolling reduction thing that I's a component of the rolling reduction thing that I kind of have a mental model for how it works, but maybe I don't. I cannot explain how it works. It works. Once I see how it works, it's sort of like, oh yeah, that makes sense. But I cannot explain to you how one gets from the problem to this is how you should write the solution. Yeah, it's too bad. That was one of the episodes we lost, which was quite entertaining. But anyways, maybe we'll revisit that.

Starting point is 00:32:19 Oh, we lost that? Oh, yeah. Yeah. We'll definitely revisit it at some point because at some point I'm going to give a talk about rolling reduce. And to give that talk, I'm going to have to come up with an explanation. Anyways, so I mean, if this is going to go in the episode on Friday, I guess we should mention that if you want to meet Bryce's mom, she'll be at his talk on Friday. And you could, in theory, find her and me, uh, after my talk.

Starting point is 00:32:49 So yeah, go to Bryce's talk, get to meet Bryce's mom. All right. I will, I will splice this back into the first episode. Uh, and, and they'll have to be, you'll know that they'll have to be like, uh, I don't know what you call them. An avid listener listener because there's probably a short window of time when when the uh the podcast goes live and then your talk starts so i will like i will be genuinely impressed if somebody comes up to me after the talk and

Starting point is 00:33:19 mentions that they have listened to this episode yeah i, I don't even think it'll happen because they'll be at the conference. There might be a prize. I'll say this. If you do, there's a prize. There is a prize. All right, you heard it here first, folks. And what time actually is your talk at? 11 a.m.

Starting point is 00:33:38 Okay, no. London time. I told you, bro. I told you. The time difference is five hours. The podcast gets released at 9 a.m. EST. Could you make an exception? Sure, why not?

Starting point is 00:33:55 I will release it at 9 a.m. local time for ACCU, which will be, what is that, five or six hours earlier? Well, you know what? Sure. You could probably schedule it. I probably could, but yeah, I'll look into it. I'll make it happen, folks. If I have to wake up at 3 a.m., it's not the most difficult thing I've had to do for ADSP. Be sure to check these show notes either in your podcast app or at ADSPthepodcast.com

Starting point is 00:34:24 for links to anything we mentioned in today's episode, as well as a link to a get up discussion where you can leave thoughts, comments and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quantity. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

Your Ad Here

Algorithms + Data Structures = Programs - Episode 178: Henry the Clock, chunk_by & more!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.