Algorithms + Data Structures = Programs - Episode 227: Re: The CUDA C++ Developer’s Toolbox

Starting point is 00:00:00 Like I think honestly, this is your best work, Bryce. It's your best work maybe of your career. And because CUDA, let's be honest, has a lot of work to do when it comes to its onboarding and education experience. And I think that this is the start of something beautiful. Welcome to ADSP the podcast episode 227 recorded on March 20th 2025. My name is Connor and today with my co-host Bryce I ask him questions about his GTC 2025 2025 talk entitled the CUDA C++ developers toolbox and more. One, two, three, four, five screenshots, five screenshots.

Starting point is 00:00:57 So not eight or nine, but definitely the most I've taken of any other talk I've watched is one, which leads me to, I mean, I'm not gonna screen share because we don't need to. You'll know the slides I'm talking about and maybe we'll include it in the show notes. By far, my favorite slide, arguably better than Jensen's equivalent folks. And Jensen said it was his favorite slide

Starting point is 00:01:17 if you watch the keynote. And Jensen, if you're listening to this for some reason, your slide was beautiful too, but for the discussion that's about to ensue, you'll understand why I like this one better. It is the four by six grid of logos of libraries. And this is why I've been up, I've lost sleep. I think I've had like four or five hours of sleep.

Starting point is 00:01:35 I couldn't get to sleep for like an hour. I woke up at another point in the night, like couldn't get back to sleep for an hour because all I am doing is thinking about this slide deck. First of all, where did you get all the logos from? Oh man, so much work went into that slide. I know, I know. I saw the slide and I've never, I've lost sleep over this slide. So just so that the listener is not like, what are they talking about? We've got a four by six grid on a single slide of 24 different libraries. And we'll list off a couple.

Starting point is 00:02:05 Thrust, Libku++, Rapids, KooFFT, KooTensor, KooDNN, Cub, Cutlass, etc. Some of them have logos. Some of them have modified versions of their logos to make it look better. Some of them just have text. Anyways, and I haven't seen most of these logos. We do have to clarify that they are not logos. They are graphic signifiers. All right, fantastic. This is exactly what I was hoping to get into. Some of them are logos. Thrust? Is thrust a logo? That's a logo.

Starting point is 00:02:34 It is a graphic signifier. Okay, none of these are logos. Even rapids? Rapids doesn't count? Nope. They're all graphic signifiers. All right, I'm sure that's for some legal reason that I'm not involved. But walk me through what went into this slide. So none of these are official logos. Why do some of them don't have graphics? Where did you get these graphics from?

Starting point is 00:02:56 Was AI involved? Yeah, so actually, AI was involved, but the place where AI was involved was actually a different place with those graphic signifiers. The place where I used AI most heavily was on the slide introducing Thrust and Cub, where I list the four different kinds of things that are in Thrust and Cub. And there's a little graphic signifier for each one.

Starting point is 00:03:24 And those were all generated by O1, although it required me holding a baseball bat over the model. It's an interesting question of why would I generate images with a large language model, right? Like what shouldn't show it, let me put it differently. There are large language models specifically designed for generating images. There's one I've used in the past, I forget the name of it. I did not use one of those models and the reason I did not use one of those

Starting point is 00:03:57 models is because I wanted to generate very simple vector graphic images. In particular, I wanted to generate SVGs. And so I viewed this as being more of a coding problem than a image generation problem. And so I thought it would be better to use something like O1 or Claude. I just used O1 because convenience, that seems to be the day to day model that I use for things.

Starting point is 00:04:29 So I had it generate those SVGs and it took a bit of back and forth. I wanted them in a specific color scheme which it was able to do, but then things like, I told it like make it 200 by 200 SVG and it would make something but it would put padding around it, right? So that what I want in these little graphic signifiers is I want either in the horizontal or vertical dimension of the graphic, I wanted to touch the boundaries because I'll add padding myself. And it took a little work to convince it that actually I know I don't want any padding. And there was a bunch of iteration back and forth of getting it, like one of the mistakes it made frequently was it would get the

Starting point is 00:05:25 the layering wrong so like the if there's some of these graphic signifiers have arrows and dots like like dots and then arrows going between them and sometimes it would have the arrows be on the layer in front of the dots so you'd see the start of the arrow like in the middle of the dot and like it doesn't look good. And I would be like, no, put them in this order. And I was surprised that if I told it a complex instruction like have the dots in front of the arrows, and then the

Starting point is 00:05:58 arrows in front of the other element, and then the other element in front of this, it kind of got the idea. And then the logos on that page that you saw, so some of them come from, some of them have like all the CUDA math libraries have their own little graphics signifier, those are the ones on the left and they, I love those because they have a very nice uniform style. And for slideware, I love having these simple vector graphic images. And I've been trying to think about how do I describe the property of what I'm looking for in the slide graphic and I think it's like one

Starting point is 00:06:48 I'm not looking I want something with no shading, no like shadows or anything like that sometimes they're 3D sometimes they're 2D usually it's like a 2D flat perspective I usually use very few colors. And I want it to be a vector graphic so that I can scale it and have it look very nice. And I want it to be simple.

Starting point is 00:07:15 And one of the reasons that I like it to be simple is because if it's simple, then I can make it as small or as large as I want. And I don't have to worry about there being a loss of the detail of it. Some of the libraries didn't have a graphic signifiers, some of the libraries didn't have graphic signifiers and I just made the graphic signifier like that little one for NVBench did not exist, did not exist a week ago.

Starting point is 00:07:44 And then some of them like the cuDNN and the tensorRT ones NvBench did not exist a week ago. And then some of them, like the cuDNN and the tensorRT ones, I found there was a blog post where there was on cuDNN, and there was a blog post on tensorRT. And those are not official logos for those libraries. But the blog post had a nice image that was at the header of the blog post. It was an official NVIDIA blog post.

Starting point is 00:08:07 And I was like, this is a nicer way than just putting the Kudian in text up on the slide. So yeah, and in at least a couple of these libraries, I just went to the person who maintains the library. And I'm like, hey, look, your library is going to be on this slide. Do you have a graphic signifier that you want me to use? And I think the next time I give this talk,

Starting point is 00:08:30 there will be more of these that will actually have graphic signifiers and not just text. Because at least one person promised me that they would create the graphic signifier. And I didn't have a chance to check back in with them. But they have an idea of what it would look like. We just haven't produced it yet. But I know how much you, Connor, love having programming

Starting point is 00:08:55 language logos. And I was talking with some people internally about this. So CUDA C++ does not have a programming language logo. And one of the points that I made to people is that I think the best programming language logos and the best graphic signifiers in general are the ones where you don't have to put text below it to say what it is. Like the C++ logo, the C logo, the Fortran logo, you don't have

Starting point is 00:09:28 to put the text below it saying that it's C++ or Fortran or C. Now the Python logo doesn't, like, if you have no familiarity, you're not going to know. But I feel like Python is ubiquitous enough that maybe it's not a problem I don't know about the like the rest logo Like maybe it's fine But the ones that I like best for programming languages are the ones where you can even if you don't know The logo you can tell what the programming language is. I Mean I've got so many so many so many thoughts and follow-up questions and this is clearly Gonna turn into two parts. So maybe the start of my questions was part two of this episode

Starting point is 00:10:07 So you got the gtc recap which probably was only 25 minutes But at this point this is gonna blow way past the 35 minute like max one episode And i'm about to twi i'm about to twi i'm about to tweet low on sleep folks I I did mention I lost sleep over this slide and I still got other questions to get to Um, i'm about to tweet, with your permission, Jensen versus Bryce slide deck. Is that fine? I'm not gonna add Jensen. That's fine. I'm just gonna pretend that I didn't hear you ask me the question. All right, but it needs to be done mostly, I mean, I don't need to put Jensen versus Bryce

Starting point is 00:10:44 slide deck. It's just this slide that I'm talking about versus the one because do you know the slide that I'm referring to in Jensen's talk that actually then I don't need to tweet it. But now that I said I will, I will because I needed you to and actually it probably will be good. So head to Twitter. I'm about to tweet this right now. Actually, should I just put Jensen versus Bryce?

Starting point is 00:11:02 No commentary. People can infer what they want. And once you're at Twitter, and this is why I've lost so much sleep, I'm about to explain. So once again, for the audio listener, Jensen's slide is, we will say, much more consistent in that every single one of the, what I'll assume is a cell phone screen with some artistic art on it, it looks roughly the same. Unlike Bryce's slide, there are no missing graphics signifiers. That being said, they're also way less meaningful

Starting point is 00:11:34 because these are all just artistic art things that have nothing to do with the actual underlying technologies. And on top of that, it does some weird things or weird or it's maybe the future. All of the cu prefix libraries are followed by uppercase letters, which is not the case in your slide deck. Which is also not the way that these libraries are named. But at least on Jensen's slide that is the case. And...

Starting point is 00:12:00 So, I have slightly mixed feelings about that. I, you know, the math, I do feel like we ought to be consistent in the style and the spelling. And the math libraries all consistently do cu and all uppercase letters now. And I think part of that is because cu-blas, which is one of the first ones, cu-blas and cu-f-f FFT were the first two CUDA libraries. Well, BLAS is an acronym and FFT is an acronym. So I think that started the trend. And then originally Coup Solver, which solver is not an acronym, solver is a word, Coup

Starting point is 00:12:41 Solver was originally Coup capital S and then lowercase the rest of solver. And the same for Coo sparse. There were periods of times where it was spelled differently. But I think that for consistency with the Coo Blas, the Coo FFT, the Coo DSS, that it became this way. I mean, that's fair enough. I completely agree that we should this way. I mean, that's fair enough. I completely agree that we should be consistent. I'm just merely pointing out that there's a delta. And one of the biggest things that I've lost sleep over

Starting point is 00:13:14 is the fact that on your slide, Rapids is there. And one, first of all, Rapids is purple and... I'm gonna get in so much trouble. I'm gonna get in so much trouble. I mean, it's there and it's been greenified. I don't necessarily, I'm not too upset by the fact that it's been greenified because you want to make the slide look nice. If it was purple it would look awful. But on Jensen's version of the slide, he doesn't mention Rapids. He just puts cuDF and cuML which is, and so for folks outside of Nvidia not familiar with Rapids,

Starting point is 00:13:42 which is where I started my So for folks outside of Nvidia not familiar with Rapids, which is where I started my Nvidia career on the Rapids team before switching to research, is Rapids is basically an umbrella term for a bunch of these libraries, which include QDF. QDF is also an acronym for DataFrame, which is the Pandas equivalent, and ML for machine learning. And it's kind of, the reason I've lost sleep is that it is indicative of this kind of, I don't wanna say marketing problem,

Starting point is 00:14:10 but like, and also too, there was another slide from the Python talk where they show this graphic of all the different Python technologies of which Nvidia was not a part of. It shows PyCuda in 2010, then Numba, KooPy, PyTorch, Jax, and then Rapids Warp, KudaPython, and Triton Lang. Rapids Warp and KudaPython being only the NVIDIA-built projects. All the other ones were outside, which I thought was kind of curious that for the first decade NVIDIA didn't have any direct ownership over these projects.

Starting point is 00:14:46 But that, I'm not sure about warp because I actually haven't used it, but I believe Rapids requires Conda at the moment. KooPy numeric requires Conda, whereas like Jax and PyTorch, you can just pip install. And anyway, so there's just this, what do you call it? Like disconnect of like, you do you call it? Like disconnect of like, you've got in the first two verticals, all the coup libraries, and then you've got

Starting point is 00:15:10 rapids, which is kind of like an umbrella for a bunch of coup. So I'm getting to my question. What are your thoughts and feelings about this? Yeah, so you know, I actually, so one, I do have to say, I liked Jensen's slide. And I think that his slide and my slide are trying to do different things, because they're trying to speak to different people. My slide and my talk is targeting purely a developer audience.

Starting point is 00:15:35 And Jensen's speaking to not just developers, but also people who are in these particular industries. He's speaking to analysts, to everybody who cares about GPUs. And I think that his slide is far more effective for a broader audience. But I also I love that despite the fact that Jensen's got a much wider audience that he has to speak to, that he's still able to have a talk that's so developer focused. I think it's great that he's still got a talk where he's got a slide there where he calls

Starting point is 00:16:12 out and speaks about a lot of the great libraries that we built. That's I think really special. I think there's not a lot of tech CEOs who will talk about, you know, developer libraries, like specific developer libraries that their company's building. So I actually was kind of excited by Jensen's slide. I debated whether I should, and I actually, I think if I had had time, I would have taken Rapids off of my slide and I would have put cootie up there instead. It's not solely because of the purple logo nature of Rapids.

Starting point is 00:16:51 It's because Rapids isn't a library. Rapids is a collection of libraries and to some degree it's more of product. I wanted this slide to be not product focused, but library focused. Like the point of my talk was to, or the point of this slide in my talk was to point people to all the tools that are available to them. And so I think that when I give this talk in the future,

Starting point is 00:17:19 I'll probably instead of rapids just have cootie-f there. The problem is there's not enough real estate in the slide for me to do what I would really want which would be to list all of the various Rapids libraries, of which there are many. But maybe what I would do is I would take off one or two of the math libraries and put some of the Rapids libraries up there too. Well, I'm thinking of like this slide, I know because I know you will clearly evolve over time and the iterations that you give this talk. But like I think, you know, one of the questions

Starting point is 00:17:51 at the end of the talk was about docs and, you know, best practices in terms of education, which CCCL and Nvidia is one of their primary focuses now. But what would be amazing is if like this was, there was a JavaScript, I don't know if you need D3 for the kind of animations that I'm thinking of, but like imagine this as a landing page website where you hover, you take your mouse and hover over these

Starting point is 00:18:17 and it does a little jiggle every time you go over each one and then for rapids, you would get this little balloon that expanded into QDF and QML. And it would be a good starting point of like, the other thing is too is, which is a whole other layer of this of like, what is the language that you are interrupting with? Technically Rapids at the highest level is Python, right?

Starting point is 00:18:36 There are C plus, CUDA C plus plus libraries that you can target that lie beneath them. But it seems a bit odd to have Rapids sitting right on top of Thrust. I mean, actually it doesn't seem odd because have RAPIDS sitting right on top of Thrust. I mean, actually, it doesn't seem odd because RAPIDS is kind of built on top of Thrust, but like Thrust is a CUDA C++ library, whereas RAPIDS is primarily a C++ or a Python library that has C++ lower level libraries that you can target. Anyway, so depending on the, what do we call them, graphic signifier? You know, you have access to different languages

Starting point is 00:19:06 per tool or per library. So the hardest part about this slide was figuring out the ordering and the grouping of these. And there's a couple different accesses upon which I needed to group them. First, there were libraries that do similar things. So there's, on the left hand side, there were libraries that do similar things. So there's, on the left hand side, there are eight libraries that are typically known as the CUDA math libraries. And then we have the three C++ core libraries, Thrust,coup++, and cobb. Then there's Envybench, which is maintained and developed by the C++ core libraries team,

Starting point is 00:19:53 but it's really, it's a benchmarking framework. I think of it more of like a developer tool. And then there's the CUDA runtime, which is what everything here is built on top of. Then some of these libraries are device side libraries. Some of these libraries are libraries where it's a thing that you call from your CPU that launches work on the GPU. Some of them are things that you purely use in device code on the GPU. And some of them are a mix of both. And so I try, like there's a couple on here, Cutlist, Cooperative Groups, and Cub. I needed those three to be next to each other.

Starting point is 00:20:35 But also Cub needed to be near all the other CCCL libraries. But then the last constraint, and then there were libraries like Envy, Shrem, and Nickel, which are communication libraries that I wanted to have together and then like Coup files and IO library I wanted to have that near like the Coup to runtime and some of the other like like it's not a compute library there's one way of dividing these libraries is is it a library that does, you know, that does math or science or physics or is it a library that does like, you know, runtime management or, or comms or IO or stuff like that.

Starting point is 00:21:15 And then there's the, the machine learning libraries, QDNN and TensorRT. And I wanted those three libraries to kind of be centered in the image. I decided that it was best for them to either be at the top or the bottom because I have four rows. If I had had three rows, I would have put them in the center row. But I feel like with four rows, that would have been hard. And I actually, the way I made this slide is I started off with just raw boxes and I figured out how many of these blank boxes did I want to have and then based on that I tried out different configurations and then I made the list of what things I was going to include.

Starting point is 00:22:09 You know going off of that. Let me see if I can find the first version of this slide deck went through. It's on R4 right now. And the way that I version my slide decks is I bump to another R version of them when either I give the talk or if I delete content. Like if I change the content plan substantially, then I will make a new version of the slides because I want to record, I want to save what my old content plan was. I'm going to share my screen with you very briefly, but it's okay because the visual will be easy to explain. This is what the slide originally looked like where it was the four by six grid in a slightly different configuration and it is just boxes,

Starting point is 00:23:12 square boxes that say the library names and some of them have different colors to indicate the grouping and there's a giant like text box on the slide that says to do this but do better and I think I did better than the you know colored text boxes one hopes all right I mean I'll have to think on this but like I'm coming to realize the reason I have lost sleep and we're discussing this at length. And also I put the cart in front of the horse. I didn't even explain Bryce's talk, which I'm going to take a 10 second digression. The title of Bryce's talk was the CUDA C++ Developer's Toolbox.

Starting point is 00:23:59 It is at a high level showing some examples of Thrust, cub, and what was the other libcu++? You show that at all? libcu++, yeah. Yes. And then also it covers the universal vector, which I have a question about, which we'll get to. But anyways, high level, that's what it's about. I was supposed to say that at the beginning, but I got too excited about this slide.

Starting point is 00:24:18 And the reason I've lost sleep, I think this is your best slide you've ever made, but it's going to be so much better. Like I think honestly, this is your best slide you've ever made but it's gonna be so much better. Like I think honestly this is your best work Bryce. It's your best work maybe of your career and because CUDA, let's be honest, has a lot of work to do when it comes to its onboarding and education experience. And I think that this is the start of something beautiful and so my recommendation, I already had this in my mind before you showed me what I'm looking at right now, which is I'm staring at six different colors.

Starting point is 00:24:51 We've got yellow, red, blue, purple, pink, and green. When you were describing that on the two left columns of the six columns on this slide are the CUDA math libraries, I was thinking, man, what you should have added to that talk was like having a rounded, not pointy, but a rounded rectangle that kind of briefly goes over and says, these are the math libraries. And then that one disappears.

Starting point is 00:25:14 And then the next one pops up and that one has a red coloring. So basically have like six slides that transition and fade from like one rounded rectangle to another rounded rectangle that basically visually, and then you would vocally describe because I think to folks at Nvidia, you know, we see thrust next to libcu++ next to cub and and we know that like, okay You know thrust is is kind of a layer above cub and you know, it interacts with anyways We see the connections but for folks that are maybe even experienced or not experienced,

Starting point is 00:25:46 those connections are invisible. They have no idea that this implicit grouping exists there. And also, they may also not know what these libraries are. And it's interesting that you say that because we talked a little bit earlier about that this is a 35-minute talk about that this is a 35 minute talk and this is a talk I plan to give a lot this year and for the most part speaking engagements are hour long talks. So the reality is the GTC version of this talk was really just a preview.

Starting point is 00:26:18 There is an hour long version of this talk and there was content that had to be cut and I think of the content that had to be cut. And I think of the content that had to be cut, the parts that are probably the most interesting to you is one, the reason I put so much effort into this slide is because this slide, something just like what you described, will happen with this slide. I haven't built that part of the deck yet, but I'm going to spend five to ten minutes going through this overview of the libraries and giving people the, not going to go into details, but giving people the pointers about what are these libraries, when should you use them?

Starting point is 00:26:51 Because it's the CUDA C++ Developers Toolbox, the name of the talk. We want to teach people what are the tools in the toolbox. And the other thing that got sadly cut from this version of the talk is I spent some time talking about the by-key algorithms. And I was very sad to cut that from this talk because I think if you're looking at thrust versus the standard library algorithms, the part of thrust that's most compelling for

Starting point is 00:27:19 people is that there's all these very useful extended algorithms. Like you could use C++ standard parallelism, which we have an implementation that supports GPU acceleration, but if you wanted to do a segmented reduction, that's not in standard C++ yet. And so I had a couple different examples of doing some nice segmented reductions that I had to trim. Those will be in the next version of this talk and also the deep dive and this or not the deep dive the overview of some of these libraries will be in the next version of this talk and I suspect that I will do something like

Starting point is 00:27:57 what you described with the boxes. It'll probably just be like a like drawing a circle around these. And I actually like the idea that you just had about rapids, to have it expand out into all of the various rapids libraries. I should say this is the first time I've given a talk where I have used transitions and I used the morph words transition that Connor taught me about. But then I think that I maybe do this more than you because when I've seen you use the morph words transition, you usually do it on a pure code slide.

Starting point is 00:28:42 But I often have a code with diagrams slide side by side and I found that when I used morph words that it would fade the diagram to black and the reason is because PowerPoint in a lot of cases if I had the diagram, even if I had the same diagram on one side versus another slide, the group of objects would have a different name from slide to slide. So PowerPoint would get confused. But I learned that if you go into this magical thing called the selection pane in PowerPoint, you can give your group of objects a specific name. And if they have the same name from slide to slide, then the PowerPoint transitions know that they're the same thing.

Starting point is 00:29:27 And so that's how I was able to use that trick to build some really cool morph transitions. And I really love the morph transitions. For so long I resisted having transitions in my slides and part of it is that transitions take time and they can throw off by timing. Part of it is that if you rely heavily on transitions and animations in your slides then the PDF version of your slides, the like if somebody's just reading through your slides it can be a little bit harder. But I am a convert.

Starting point is 00:29:57 I loved the morph words transition and to explain for people who don't know, that what the morph words transition does is, if you've got like two slides of text that are very similar, but there's some changes between the two, the morph words transition will, it will show the words, the text between the two slides, like evolving into each other.

Starting point is 00:30:27 Maybe you have a better way of explaining it than me. Yes, sure. I will, better than me explaining it, although I will briefly. First thing I have to say is we need to wind down this part two of our conversation because I don't have a super hard stop, but I do have a hard stop-ish,

Starting point is 00:30:43 and I definitely wanna get to the stay tuned to next week listener. You know what super hard stop but I do have a hard stop ish and I definitely want to get to the Stay tuned to next week listener. You know what my hard stop is my hard stop is after a certain hour I will be unable to park anywhere in downtown San Jose Well, we'll put our hard cap at 30 minutes from now. So if we want a 25 minute part three We've got to wrap this up in five minutes. So And I have I have like four questions We're gonna have to rapid fire and then you get like 15 seconds per answer

Starting point is 00:31:07 and maybe we'll revisit this topic. So I have my third most popular YouTube video is how to use basically morph because it's the number one question I got. Welcome to the club. It's been here since 2019. You're only half a decade late and it's the most beautiful thing ever.

Starting point is 00:31:20 There's three different versions. Objects, the default, words is the second setting and characters. So most people don't know how to use it effectively for code because they leave it set to the default option of objects and that does nothing for your code. You have to change it to words. And yes, it basically just automatically finds the closest number of words. It's magical. And also that little hack that you sent me. I've I've watched basically this morph not keynote, but presentation where you can do crazy things with it even more than what you showed me.

Starting point is 00:31:48 So I've been a morph pro. In fact, I don't want to name the person at Microsoft, but a very senior person at Microsoft one time emailed me and asked, how are you doing that in PowerPoint? And I was like, brah, you work for Microsoft, like email the PowerPoint team. But wait, take 15 seconds, 15 seconds for whatever you're going to say, because we got a rapid fire questions after this. Do I know what? Do you know anything about the algorithm that they use to do the matching? No, but I would love to talk to the PowerPoint folks. I imagine it's just kind of some like closest thing, because it does some weird stuff every once in a while. But if

Starting point is 00:32:18 you work for Microsoft and you know or happen to work on that algorithm, get in touch with us. I usually block anyone that emails me asking to be a guest, but we're reaching out this time. So I'll briefly say the thing that I love the most about the morph transition is nothing about the visual itself. It's about the structure of the code. I used to, if I had a progression of code evolutions, it used to like where there was different things on different slides, it used to be that I would add additional new lines and additional spacing so that when I went from one slide to another slide, elements would stay in the same place. And that meant that if you looked at the first code slide in one

Starting point is 00:33:03 of these evolutions, the spacing would be all weird and unnatural. And the meant that if you saw, if you looked at the first code slide in one of these evolutions, the spacing would be all weird and unnatural. And the thing that I love about Morph is that with Morph, I don't have to do that because with Morph, if I need to insert a line at the top of the code in the next slide, what Morph will do is it will gracefully shift all the code down. I don't like it if it's jarring. If like one slide it's nine lines of code, and then the next slide those same nine lines of code

Starting point is 00:33:32 are there, but they've moved down one line because I've inserted a tenth line of code in front. I don't like that. So what I would do is on the nine lines of code slide, I would put a blank line at the top. And now I don't need to do that because now with morph it will just Beautifully just shift everything down and that that is that is for me the killer feature All right, it's very graceful and we've used up three of our five minutes left in this episode folks

Starting point is 00:33:57 So maybe I'll limit myself to one question and you have 30 seconds to answer the number one thing That I was confused in your talk How come MVC++ and Stoodpar didn't get mentioned as a CUDA C++? I hope this isn't breaking news that that technology has been backburnered or something. What's the reason for not mentioning it? Not that it's been backburnered, just that we wanted to give a talk about how to do CUDA programming. Stdpar is great. It's not in any way something that we don't support. We certainly want people to be using it, but we wanted to give a talk about if you want to write CUDA code.

Starting point is 00:34:41 Like how should you write idiomatic CUDA C++ code? And I think part of the problem is that people that consider themselves CUDA programmers, they don't think that a talk about C++ standard parallelism is for them, right? So we wanted to reach a particular audience. And so this talk is, or the whole curriculum that our colleagues Georgi of Tushenko and much of this talk is based on material that Georgi made for the tutorial.

Starting point is 00:35:12 So I can't take too much credit for it. But Georgi proposed that we shouldn't start teaching people the low level things. The way that you typically teach CUDA is you start off by teaching people about kernels and device code and separate host and device memory and warps and blocks. This talk doesn't mention warps or blocks once. In this talk, I only talk about distinct host and device memory in the last part of the talk. Right? I introduce the idea that there's different CPU and GPU memory as the last part of the talk, right? I introduce the idea that there's different

Starting point is 00:35:45 CPU and GPU memory as the last thing. And I don't even say the word kernel anywhere in here really. I just talk about launching work on the GPU. And Georgie came up with this idea that we should teach people to use libraries first, to use abstractions first. to use libraries first, to use abstractions first, use libraries first, try that out, use the things like thrust first and then you know only as a more advanced topic would you go and write your own kernel. You should only do that if you've tried to use the libraries and the libraries haven't worked for you because honestly you're going to get better performance if you use the abstractions. And so if you think about how do you just teach like modern C++ versus C. Well the way that people teach modern C++ is like you don't start with teaching them pointers, right?

Starting point is 00:36:35 You don't start teaching them pointers and C-style arrays. The way that I've seen people teach modern C++ to beginners these days is you start off by teaching them the standard start off by teaching them the standard library, by teaching them containers. You only introduce the notion of pointers much later on. And that's what we're trying to do with CUDA C++ here is we are teaching the content from the high level to the low level instead of from the low level to the high level. And I had two instructors during GTC come up to me and they were like, wow, this is

Starting point is 00:37:06 really surprising. This is a very different way of teaching. And one of them, he sort of like asked this to me and then just paused as if he expected me to say something like, oh no, this isn't how you should actually teach it. And I was just like, yeah, no, like that's what we want you to do. And he just paused and he just processed. And he was clearly, he was expecting me to say something to make him feel reassured that he didn't have

Starting point is 00:37:34 to completely rethink how he was teaching this material. And then he realized that, no, wait, you really are telling me to completely rethink how I've been teaching this material. And that's exactly what we want. We wanna teach from the high level to the low level. An amazing answer, if too long. telling me to completely rethink how I've been teaching this material and that's exactly what we want. We want to teach from the high level to the low level. An amazing answer if too long.

Starting point is 00:37:49 We have now exceeded our threshold and I have follow up stuff but we're going to table this. Remember I asked about MVC++. We're going to revisit MVC++ which the short version of what Bryce said is this was a CUDA C++ talk, MVC++ is ISO C++ with, you know, inherent or automatic parallelism. A topic for another day, and I did have questions about the universal vector and unified memory, but we will table those questions for a follow-up discussion.

Starting point is 00:38:21 We are now transitioning to part three because at this point, if we stop at, you know, five minutes past the half hour mark it's only going to be a 22 minute episode folks. So part three of this recording which I believe puts us on like episode 228 potentially. Be sure to check these show notes either in your podcast app or at adspthepodcast.com for links to anything we mentioned in today's episode as well as a link to a get up discussion where you can leave thoughts comments and questions. Thanks for listening, we hope you enjoyed, and have a great day!

Your Ad Here

Algorithms + Data Structures = Programs - Episode 227: Re: The CUDA C++ Developer’s Toolbox

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.