Algorithms + Data Structures = Programs - Episode 242: Thrust & Parallel Algorithms (Part 4)

Starting point is 00:00:00 It says, there's a comment, slash slash, we must prepare for his coming. Everything's lowercase except for the H and his. And then struct goeser. And it's a function object that calls something's destructor. Well, the comment made more sense because I think that used to be a forward declaration. So obviously this is a Ghostbusters reference. Welcome to ADSB the podcast episode 242 recorded on May 21st, 2025. My name is Connor and today with my co-host Bryce, we finish part four of our four-part conversation with Jared Hoverock.

Starting point is 00:00:54 We chat about thrust, parallel algorithms and more. That actually kind of leads to another question that I had meant to ask earlier, which is you talked a little bit about the transfer of Thrust from research to production in the handoff, but that was sort of after Thrust had sort of made it. At the time, there were a couple other libraries that were kind of similar to Thrust, like CUDPP. And I think there was one or two other ones. And I just wonder, can you tell us a little bit about when you guys first put Thrust out there?

Starting point is 00:01:41 What was the response? Did it just very rapidly get a lot of adoption and growing popularity? And like, what, why was it that Thrust is the one that succeeded and became so ubiquitous? And like, how quickly did that happen? I mean, the reason it succeeded is because we chose the right interface. How fast it happened, um, I don't really remember. I remember we gave, uh, at the first GTC, we gave like two talks on Thrust, and I remember the crowd being large and enthusiastic. Um, so it at least had succeeded enough to attract some, you know, some people to a talk.

Starting point is 00:02:24 Do you have the slides from those talks? Somewhere they might be in. Are they not in the repository? They might not be. I don't know. Oh, I don't think so. They're probably in Perfor somewhere. If you look through Perfor, you'll find them.

Starting point is 00:02:38 Um, or I might have them in a Google drive or something like that. Anyway. Yeah. They were, Nathan gave a talk and I gave a talk at the GTC and Nathan's talk was sort of here's what thrust is and I think my talk was like here to do, here's how to do some, you know, interesting tricks with thrust and it's just, you know, walk, the slides just walk through some of the example programs. But, so how long, so like after that, did you ever reach a point where you guys were just like swamped with PRs and like, when did you, when did it become like visible to you that there were a lot of people using this? And when did the writing on the wall that it was going to have to be taken over by the

Starting point is 00:03:31 engineering team? It never became visible to me that a lot of people were using it. I would have been happy to... Nathan was really pro-productization because for career reasons, I would have rather not productize it so we could continue, you know, developing it in interesting directions, I guess. I guess the problem is when it got productized, it became less flexible, like changing it became harder. You know, before we actually released it on Google code, like GitHub didn't even exist at the time. So we're the

Starting point is 00:04:06 initial releases on Google code. And if we wanted to change something, we would just change something. And if people had a bug, we would fix it. You know, next day or something like that. You can't really do that with a product. But I know like hearing you guys talk about it or hearing you guys talk about people out in the wild using it I guess is when it dawned on me that it has a lot of users. Like at the time it just seemed like a you know a project we were working on and we're happy to work on. Yeah I was I was I was, I sort of always had that sense. And I mean, I guess it's, it's hard for folks who sit in research because you, you produce

Starting point is 00:04:51 a thing and then if it becomes marginally successful, then it gets picked up by, you know, some engineering team and you then you go work on the next thing and you don't necessarily see what the... I think it works differently now. So Thrust was one of the first things that NV research productized. And we didn't have a lot of experience with handing off things to product teams.

Starting point is 00:05:17 So I think if it were to happen today, researchers could be involved in the product, like to the extent that they're interested in. But in those days, it was sort of seen as like, okay, you guys productize this thing, it's time to work on something new. But I think since then, people probably see the value of keeping the original authors sort of in the loop, maybe a bit more if they want to be with more recent projects. Are you proud of Thrust?

Starting point is 00:05:52 I have mixed feelings about it. I guess what I see is the sort of missed opportunities. And I sort of see the mistakes that could be corrected or should have been corrected. But it's gratifying to see that people still like to use it and think it's worthwhile. But it's hard not to focus on things you wish could have been done better. One frequent thing that gets brought up is, well, thrust is great for many things and it's a great place to start. If you're doing, if your algorithm ends up being one call, it's a single pass

Starting point is 00:06:32 thing, thrust is perfect for that. If you're launching multiple things because thrust doesn't have any synchrony model because it blocks, it can become a performance bottleneck. That's something that is an often cited criticism of thrust. And we sometimes think of it as being an on-ramp for parallelism that it's easy and gets people started, but oftentimes if you need to architect a much more complex application where you're launching a bunch of different kernels, then you have to go and do something a little bit lower level. And had Thrust evolved earlier and more to support asynchronous in a more structured way, perhaps it could have been even more successful than it is today.

Starting point is 00:07:24 Oh, for sure. And Thrust predates Asynchrony in CUDA. Thrust predates CUDA Streams, right? And it was always not clear how to make it work well with CUDA Streams, and I don't think it ever really did, or does it work well with kuda streams and I don't think it ever really did or does work well with kuda streams. I guess another regret is that in addition to thrust, we also have cub and these two things are interdependencies like there's cyclic dependencies between the two libraries which causes problems. I guess it's just a shame that we couldn't find a way to do cub-like things in Thrust and, or you know, thrust-like

Starting point is 00:08:14 things in Cub. It's sort of an awkward situation to have both libraries. Well, and it's actually interesting that you mentioned that because, you know, we – so when I started at NVIDIA in 2017, I was – Thrust had been essentially without a maintainer for like two or three years. And like on day – well, I didn't even know what I was being hired to do when I started. Like it was unclear from the job description. But – and like my hiring was weird, but just like Olivier had talked to someone and been like, hey,

Starting point is 00:08:49 you should interview this guy. And then I interviewed. And then they're like, oh, you want a job. And they gave me an offer. And then I showed up. And on day one, they're like, so here's this thing, Thrust. You're now in charge of it. There's no product manager.

Starting point is 00:09:01 Nobody's really touched it in two years. Figure out what to do with it. And so then I started the CUDA C++ core libraries team at Nvidia and originally it was just me and then it was me and one other person for some period of time and we productized CUB just as Thrust had been productized and did some amount of work to unify things between

Starting point is 00:09:25 the two. But now, you know, eight years later, that team, the CUDA C++ Core Libraries team, or I think it's now just the Core Compute Libraries team, same acronym but different meaning, it's now like a, I don't know, it's more than 10 people team. It could be 15 people or so. It's a lot of people working on it and they work on Thrust, Cub, and libcu++, which is our port of libc++. And there's a move for all three of these libraries to essentially merge together into

Starting point is 00:10:02 one unified thing. And they're all in now a monorepo now. In fact, there's some talk about not the thrust name, thrust name will essentially always exist because there's so many people who have written code for it. But of migrating the thrust back end and merging it with some of the other things. So the thrust code base, parts of the thrust code base that have been around for a long time now essentially going away in the future.

Starting point is 00:10:33 And all of these things merging together into a more unified structure down the road. I think the last thing I would ask you about is the thrust was open source sort of from the start and what was that like? I mean, this was in a very early time. It was before GitHub. You mentioned it was like Google code. But how did you collaborate with people back in the day? How did users find you? How did you interact?

Starting point is 00:11:23 What was that like? We had a Google Groups mailing list that people would ask, how do I do this in Thrust? That sort of thing. Or maybe they wouldn't report bugs. There wasn't like use Discord or Slack or anything like that. Um, so just through, um, you know, mailing lists, I guess that's how people used to collaborate in the 2000s, you know, through a mailing list or something like that. I remember when we, uh, push the code for the first time, we, we

Starting point is 00:12:01 announced it on Hacker News and some subreddit and just, you know, waiting to see what would happen. But as far as I know, Thrust was the first open source thing that NVIDIA ever published. I don't know of any that are, there might be a few previous publications, but I don't know of them. So that was, you know, we had to get special permission for that. You are in fact correct that Thrust was in fact the first open source project at Nvidia. And we know this because Thrust completely predated any open source process at Nvidia. And every year or two, somebody's like,

Starting point is 00:12:55 hey, why isn't there an entry in our listing of open source things in this database for Thrust? And we're like, well, because it was the first one. It existed before all these things. Yeah, well, just the nature of the software demanded that the code be available, right? Because it's C++ templates. So there's no way to hide it and there's no way to use it unless you have the source. So, had to be open source. Yeah.

Starting point is 00:13:27 And I mean, you mentioned that Thrust lifted from other projects. And in fact, Thrust contains, you know, Thrust Iterator code was just taken from Boost and some other parts were taken from some other places. And it probably would have not evolved as quickly if you had not been able to pull in all this other stuff. That's right. Yeah. I'm not a

Starting point is 00:14:01 expert GPU programmer. I can just kind of get by, but I'm good at taking other people's code and making sense of it. So that's mostly what we did. Jared, if you're not an expert GPU programmer, I don't know what we are. There are levels. There are levels and yeah. What does that mean for us, Jared? What does that mean for the people? Well, the whole idea of Thrust is that you don't have to be

Starting point is 00:14:23 an expert programmer to use a GPU. You just take something off the shelf. Someone else is already figured that out. It's really the... what was the Comrad tagline? The original Comrad tagline. Own the means of computation. It was actually very fitting. When I worked on the HBX project, the original name for the programming model that Thomas

Starting point is 00:14:53 Sterling wanted to use was, he wanted to call it Agincourt for the Battle of Agincourt which was a, I don't know what century battle between the British and the French. The French had an army that was primarily mounted knights and the English army was primarily peasant longbowmen and they defeated the nobility. So Thomas Sterling had this very idyllic story of this. It was this programming model where tons of tiny little cores and tiny little tasks would take over and topple the great evil that

Starting point is 00:15:38 was MPI and sort of synchronous message passing program models. There was, you know, I couldn't find it. I tried to ask Jatch GPT to search for it while we were chatting, but there were a couple interesting snippets of code from like the early days of Thrust that had some quirky comments that if I ever come across again,

Starting point is 00:16:02 we'll have to have you back on to ask you to explain. What are you thinking of? It was like something like Frobenator or something like that. There was some weirdly named class somewhere, and then there were a couple of them. I think I know what you're talking about. You're talking about Gozer?

Starting point is 00:16:21 Yes. I am talking about Gozer. What about Gozer? Yes! I am talking about Gozer! What was Gozer? Is the code still there or is it gone? I'm looking right now. I really... If they deleted that, I'm gonna have some choice words. Oh no, I think it's gone! How did you spell it? Was it 1-0 or 2-0s?

Starting point is 00:16:41 Gozer is spelled G-O-Z-G-R. One O or two O's. Gozer is spelled G-O-Z-E-R. Yeah. Ha ha ha ha. I'm sure it's in the GitHub history somewhere. It's still here. So could you explain yourself? It says, I'm going to read this.

Starting point is 00:16:56 I think it's pretty. It says, there's a comment, slash slash, we must prepare for his coming. Everything's lowercase except for the H in his. And then struct goeser. And it's a function object that calls something's destructor. Well, the comment made more sense, because I think that used to be a forward declaration. So obviously this is a Ghostbusters reference.

Starting point is 00:17:32 Oh man. Obviously I was born in 1990. I don't know what this means. Ghosts are the destructor is the reference. So this is a functor that calls a destructor. I remember I got an email a long time ago. I never thought anyone would see this. I just thought it was being funny.

Starting point is 00:17:54 But I got an email from a compiler engineer at Nvidia because, I don't know, something about Thrust was exposing a compiler bug and it involved this destruct that we're looking at here. And JD like posts the refro in the email and he's got Gozer staring back at me and I'm like... We didn't discuss the name or... I never thought this would The real question is, Bryce, did you know what this meant? I had no idea. I am, I like you. I'm the same age as you, Connor. I mean you're one year, you're one year younger? Older, younger? What year were you born, Connor? I'm 90. You're 80. You're 91, right? Yeah. 91. Yeah, you're one year younger. I was gonna say I would feel very bad about myself if you understood the reference and I did not.

Starting point is 00:19:06 I read this and I was like, I don't understand the joke, but here we are. I don't know how many people we have listening. I think over time it'll roughly be three or 4,000, so I mean. It is, it is line 102 in thrust slash detail slash allocator slash destroy range dot i n l. I'm sure you've got some boomers in your audience. Jared, were these, I've always wondered this, so I'll ask even though we're over time, but why not.

Starting point is 00:19:40 Two things I'll ask. One, thrust has a a lot of these headers that are.INL, like inline. Was this because you at some point had people that would include these in their.cpp files and you wanted to separate them from the declarations? Or was there at some point a version of Thrust that had a mode where you just like boost ASEA where you could build like a library version of it in addition to the header only

Starting point is 00:20:12 version or like what is what is that dot I and L like naming convention mean? I think that's just a jaredism. I don't remember why I chose to do that. The other question is, Thrust uses.h for its header files and some other CUDA projects use.cuh and some other extensions. Why.h? Why not.cuh? Why not make up your own extension?

Starting point is 00:20:47 .coo? I've always thought that... I've always disliked.coo. C-U-H. But at least at some point there was a goal that you could include any thrust editor with a normal C++ compiler and it wouldn't freak out. And I don't know if that's still the case, but there wasn't, you don't, you didn't have to include it with MVCC. So it seemed unnecessary to. That's actually, it's a good point that we didn't talk about this, but Thrust has this backend system that we mentioned, but we didn't talk specifically about the fact that Thrust can target CPUs, it can target CPU parallelism. And it's designed in a way where no part of the Thrust interface says CUDA.

Starting point is 00:21:36 And Thrust has this, it uses the host and device terminology, where it's very abstract from the GPU. You can have a device back in, it's not a GPU back in. Was that intended to help adoption? Why do that? Because it seems like it created a bunch of work too. Because it was helpful when just debugging your program for correctness to be able to use like a sequential backend or, I don't know, like open MP backend, for example. So I think it definitely helped with adoption.

Starting point is 00:22:19 Just getting the CUDA stuff right in your CUDA C++ program is like a task on its own. And really, if you wanted to, you could write everything. Just get your algorithms skeletoned together first, making sure that it works in parallel before actually targeting a CUDA GPU for real. I mean, you could totally – I could totally see that being helpful to people. Like going all in on GPU parallelism from the get go is often a big, big first step. So it's helpful to face things sometimes. And I think the CPU back ends can help with that.

Starting point is 00:23:02 What was debugging like in the early days? Well, we used to have device emulation in CUDA. So MVCC used to have a CPU backend. You could put it into a mode where it would generate x86 code, and you could actually step through your code in a debugger. So that's what debugging was like in the early days. But then they phased that out very soon.

Starting point is 00:23:27 They used to use an entirely different compiler back end for MVCC. Open 64. Yes, is what they use. Anyway, I think maybe that made that device simulation mode easier. But I don't know, I've never used the buggers I've always used print after bugging so that's what it was like for me Connor has some thoughts you've always used print after bugging you've never used debuggers I used when I used to use Microsoft IDE I used to have a you know a pretty good debugger and I would use it then. But no, these days I don't use an IDE.

Starting point is 00:24:09 I use printf and the terminal to program. I also use printf. Is that because you watched the last of us season one episode four where it was survivalist technology and you really identified with that and you wanted to be like modern technology is not for me and I'm going to bunker down. There used to be this guy at Nvidia, I think he was a dev tech, he had a blog, but he was an actual survivalist. He wouldn't use Lib C.

Starting point is 00:24:40 He wouldn't use any headers at all like he would just insert system calls if he wanted to print into his program I guess because it made his programs fat I'm not actually sure why but one thing is he made you know compiling and linking is his program very. But if he needed to debug, I think he would put S to the screen, and he would put S from a syscall or something like that. So there's levels to survivalism, too. So you're saying you're not even close to what

Starting point is 00:25:21 is really possible in terms of coding survivalism. Using, not debugging and just using printf is just the start. You're not even at the true level of survivalism in coding. I guess I've benefited from the things I've had to program in the past. They're really well defined interfaces and like think of every thrust algorithm. Like the post conditions are spelled out explicitly what they are so it makes it really easy to write unit tests for what you're doing. Jared, when you encounter a bug that you have to debug, is it typically a like a parallelism

Starting point is 00:26:03 bug? No, it's typically a compiler bug. Those are the bugs I learned too. When you run into a bug where you have to put a printf into your program, is it usually something that's like parallel? I don't know if I would say that or not. Bugs can come from, you know, bugs can come from any reason. I will say that now that I spend most days in Rust, I don't spend any time debugging. But when I am writing, if I'm writing like MLIR code

Starting point is 00:26:33 and C++, the first thing I have to do is debug whatever I wrote because it's not gonna work the first time for sure. And then yeah, I do use printf. I usually use printf because when I'm debugging something it's usually something that, like if it's if it's something that's not a parallel bug it's usually pretty quick for me to get to the bottom of it. But for the most part if it's a bug it's a bug that is

Starting point is 00:27:02 happening due to some like race condition or synchronization thing. And so my process is first I run compute sanitizer or like a mem checking tool that tells me whether I'm randomly writing somewhere in memory. And if that's the case, I get to the bottom of that. And then if that isn't the case, then I usually start adding printf's to see if I've got some incorrect assumption. And if I open it up in a debugger, usually the race condition, like the timing has changed. And so it's not. Like whatever the timing condition is that caused the problem has typically gone away.

Starting point is 00:27:56 One of the first jobs I had at NVIDIA, I was working on the Optics team. So Optics is like a ray tracing framework. And so one of the first jobs I had was to sit and babysit an emulator waiting for a race condition in the chip that some ray tracing code exposed. And you just sit down there for several hours just waiting for something to happen or not. But that's what I spend a week doing. Shortly after coming to Nvidia, race conditions are tricky. I had a, I had, I had a colleague who one time I was, uh, I was at the office. It was like 7 p.m. and he's sitting across from me. And, um, he, uh and he like stands up very abruptly and I'm

Starting point is 00:28:48 like where you going? He's like I have to go like I have to go monitor this bring-up machine and some other building and I'm like well why can't you monitor it from here? And he's like you don't understand it could catch fire at any moment. So I later came to understand that when you're on bring up duty, one of the jobs is you just go and sit next to the bring up machine and there's a fire extinguisher there because it's got these giant heat sinks and fans and nobody's figured out what clock speed you can run this

Starting point is 00:29:22 thing at safely at. So sometimes bad things might happen and it might overheat out how like what clock speed you can run this thing at safely yet so sometimes like bad things might happen and it might overheat and it might spark and catch on fire but we figure out all those bugs before they you know get into gamers hands yeah exactly oh all right I think that's that's all I got Connor you got anything else want to ask Jared? I mean we've kept him past 30 minutes past we said we were gonna interview him and I mean, I got my sequence question answered. I got my counting iterator question answered. I Feel like we got a lot of questions answered. I mean, we're going to have to bring you on in the future.

Starting point is 00:30:06 We talked, we alluded to if you were going to start from scratch today. And, you know, we do work in research, so you're probably working on something. And at the point in time where we're working on 2.0, not of thrust necessarily, but of your next project, we will definitely have to have you back on. Oh, but maybe that's a good thing to close with, which is Jared, do you want to, you got anything you want to plug? You want to say anything or anything about-

Starting point is 00:30:38 It's not that you want to plug. Do you have anything that you can plug? And if you can't plug, what's the general direction of which you are working on things? So I think I mentioned, like, I have kind of a wish list for a programming language that would make software like Thrust more, much easier to write and maintain. So I would like to try to do something in that space.

Starting point is 00:31:04 And hopefully that will be successful. So I would like to try to do something in that space and Hopefully hopefully that will be successful we'll see If people interested they can just look at what's going on on my github. We're not going on there But there are Some hints there of what might be next some hints on Jared's github link in the show notes folks go and drop some stars Here's the here's the real question if I really do release this as like a two and a half hour episode You should what do I call it not do that? I mean, I shouldn't but I feel like if I if I if I titled this episode

Starting point is 00:31:45 I mean I won't but I could title it Nvidia engineers discuss like CUDA programming Given the the zeitgeist. Do you know the term that I heard over the last week? It's called gen sanity Have you heard that? He went to Taiwan and he gave his keynote at Computech on Sunday night. And on Monday, Tuesday, all these shows were talking about Gen Sanity. Because he announced not just that we were launching our global headquarters in Taipei, but that also we opened some platform to third-party vendors or

Starting point is 00:32:26 something. Yeah, NVLink. NVLink, yeah, NVLink Fusion. I'm an engineer. I don't really understand what's going on but the point is is the MSNBCs of the world and the Bloombergs of the world, they love us folks and so if we named, if we called this podcast, three Nvidia engineers discuss the parallel algorithms that power the world, it would blow up. I'm not gonna do that, but I could. Oh, we gotta ask Jared, what's your favorite algorithm? Oh yeah, yeah, yeah.

Starting point is 00:32:59 My favorite algorithm? Yeah. Inside of Thrust and outside of Thrust, if they're a different algorithm. Metropolis, Metropolis light transport is my favorite algorithm outside of thrust. What was that? Metropolis? Metropolis light transport.

Starting point is 00:33:22 Metropolis light transport. That sounds like something from Altered Carbon, like the Netflix show. No, it's something from graphics, computer graphics. Is that like a global illumination thing? Yeah, it's like a global illumination thing. It applies Metropolis Hastings to light light transport light transport is just what they call Rendering fancy term for rendering you can tell you can tell someone's an old-timer at Nvidia If they don't currently work in graphics, but have like deep graphics chops

Starting point is 00:34:00 Be sure to check these show notes either in your podcast app or at ADSP the podcast com for links to anything we mentioned in today's episode as well as a link to a get up discussion where you can leave thoughts, comments and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality, high quality. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

Starting point is 00:34:23 All right. All right. All right. We should let Jared go. The green grass in the background is gone, folks. Whether this was one episode or five episodes, it started out lush and green in the background and now it is a dark sky in the background of Kansas. We appreciate Jared for taking all this time tonight to chat with us. We will definitely have you back in the future.

Starting point is 00:34:47 I think we got to have more Nvidia internal folks. We should get Dwayne Merrill on. We should get Chris Cech on. We should get a bunch of these folks on. We're a podcast hosted by two Nvidia engineers. We should have more Nvidia folks on. I will say very clearly that this is not a work activity This is something we do outside of work I mean, yes, I that was not clear from what I just said it should be clear now

Starting point is 00:35:14 but the point being is we do have access to We do folks that work at Nvidia hence why Jared Let's ask you it Jared. Who should we interview next? Oh, yeah, that's a good question Ian buck Do you think Ian would come on the podcast a second question to Jared do you think you would come on the podcast? He might but he probably has some good stories. Oh, yeah so I mean if my History is correct

Starting point is 00:35:48 You did thrust? Ian Before kuda was a thing did something called brook. Yeah, right which was Basically kuda before it was kuda. I think it was quite different from before it was CUDA? I think it was quite different from the CUDA. I think it had much different programming model than CUDA does. Thank you for doing this, Jared.

Starting point is 00:36:11 Thanks for having me. You're welcome.

Algorithms + Data Structures = Programs - Episode 242: Thrust & Parallel Algorithms (Part 4)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.