Algorithms + Data Structures = Programs - Episode 82: GPUs - Responding To Reddit Comments
Episode Date: June 17, 2022In this episode, Bryce and Conor respond to some reddit comments on Episode 80.TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2022-06-04Date Released: 2022-06-17...ADSP Episode 80: C++ Multidimensional Arrays and GPUsADSP Episode 80 Reddit Post & CommentsJames Berrow cpp.chat Episode: Colour Is Not Black and WhiteSIMTSIMDC++ std::mdspanC++ std::mdarrayNVIDIA cuBLASNVIDIA cutlassNVIDIA cuNumericNVIDIA MATXC++ std::transformC++ std::reduceC++ count_ifJ ; (raze)APL , (ravel)GPU SM (Streaming Multiprocessor)ARRAY 2022PLDI 2022Intro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
We're gonna we're gonna now call Bryce Dr. Doolittle here because I like like like I legitimately would like to own a horse now and go riding on said horse.
Welcome to ADSP the podcast episode, recorded on June 4th, 2022.
My name is Connor, and today with my co-host Bryce, we respond to Reddit comments on episode 80,
where we talked about GPUs and multidimensional iterators.
So my girlfriend's dog, we got spayed yesterday.
Thoughts and prayers go out to the dog. was that we got her a dog bed for my apartment,
which I'm very excited about because the spot she likes in my apartment
is right under my desk.
And that dog bed is just very comfortable for my feet.
Like I'm very, very happy with this setup.
You know, having like a warm dog there to keep my feet warm is like also a benefit
yeah i didn't think that was where that was going but where did you think that was going i thought
you were going to say that you were unhappy um because this is good we could lose a couple
listeners here but i am not the biggest dog fan fan. I do know this based upon some past conversations you and I have had.
And to redeem myself a little bit, I actually do love dogs.
I just don't like untrained dogs, which happens to be the majority of dogs.
And if you're a dog owner and you're like, well, my dog's trained.
If, for instance, I'm in an elevator and you are also in that elevator with your dog and you think that we're losing a couple listeners here.
And you think that, oh, everyone loves dogs.
I'll just let my dog jump all over this person and start licking his pants and blah, blah, blah.
To me, that is an untrained dog or maybe an untrained owner that thinks that like their dog is the world's greatest gift, et cetera, et cetera.
I love all dogs and I am completely fine with your dogs jumping on me in elevators and other places.
I'm also very good with dogs.
Yeah, like Olivier Gouraud's dogs love me.
Those dogs love me because I give the best belly rubs.
Like we go, we go, we go months without seeing each other, but then I see his dogs again
and they remember me because they remember how much I love them.
This is, this is extremely true because one time the both of us went over to, um, Olivier's
house for dinner and yeah those dogs absolutely
they were like oh our our kindred spirit has arrived um and like the dogs were just laying
their lap in their heads in your lap and yeah yeah so yeah definitely bryce is the dog person
of the two of us i'm i'm good i'm great with too. Don't tell the girlfriend that she does not like cats. But yeah, I'm just I'm just an animal whisperer. I'm good. I'm good with horses, too, apparently, because when we were in Utah, we did horseback riding, which I've never done before. And I thought it was going to be scary. But no, apparently I am a natural with horses, too. And I now love horses.
We're going to we're going to now callryce dr doolittle here because uh i like like like i legitimately would like to own a horse now and go riding on said horse
i mean i could see that for you i could see you having like a ranch that you
visit on weekends or like live on a house on the ranch and have people to look after the horses and
then visiting that part of your property on the weekends yeah did you did
you see these pictures of the dog mugging that i oh yeah yeah those were great those were great um
i thought that really added a lot a lot of color to um to the story there were there were a few
that were even better which was my girlfriend like trying to shoo them out out of but i could not um
tweet those because she was in her pajamas
and I determined that she would murder me
if I tweeted those pictures.
Well, maybe if we ever start
like a Patreon podcast model
where we charge our listeners
for the juiciest content,
which don't worry, listener,
we're never going to do that.
You mean an OnlyFans.
You're describing
i mean that's not what all the other podcasts call it but sure i mean it is kind of the same
transactional model um yeah where you're paying money for a certain type of content um but uh
anyways to to bring it back to where i thought your story was going
because that's where this all started.
I'm not a big fan of dogs. So when you were talking about the dog being under your desk,
my first thought was obviously Bryce does not want that dog there. And so that he's happy that
he's getting this bed so that he can put the bed somewhere else. So the dog will no longer want to
go under the desk, you know, win-win um and then when you said uh and it's fantastic because
it's very fluffy and it makes my feet comfortable i was like damn i got that one wrong um regardless
of whether or not you own a dog i i highly recommend that everybody go get a dog bed
to put your feet on when you're sitting at your work desk because it's amazing and then once you've got the dog bed you may as well get a dog
too so yeah but if you thought that i was if i was unhappy with anything that my girlfriend's
dog was doing do you think that i would mention it here in a place that she could theoretically
have access to i mean you did say last episode that uh or is it two episodes ago that she
neither she nor your mother listens so yes yes um those are both true points but I'm I'm I'm
well I guess we're on apple podcast so my mother could get access yeah she she doesn't my my mother does not have the attention span to put up with all of the the the tech
nonsense and random weird tangents that we have to get far enough to to hear any of the things
that she may take offense at it's true it's true even when even when there's like non-technical
episodes sometimes i'll send them to my family and no one listens because they just don't care.
They're like, Connor, we hear your voice enough when you come by like one week out of the year.
That's way too much already.
It's not actually entirely true about my mother.
Throughout my career, my mother has tried very hard to understand what it is exactly that I do.
So she might have this game
but i just i just don't i think that if she did she would start at episode zero i don't think she
would start at the latest episodes or like even if she did like like like she's in italy right now
so i wouldn't need to worry about her listening to like like this particular episode but she might
get through some of them.
I do not think she would get through any episodes.
That's true.
Prove me wrong, mom.
Prove me wrong.
Maybe one day we'll release
like the most technically titled episode
and then we'll just talk about
like the juiciest gossip,
like family gossip
and hope that none of our families listen to it.
Anyways, we need to get back to these these so you've been waiting for a week for us to talk about these reddit comments
and uh i'm gonna read the first one connor edited this together in a way that did not
no like i did not i just cut it straight in half i read some other reddit comments where people
were talking about what podcasts would replace CPPcast.
And there were some people who suggested ours.
And there were some people who were like, yeah, but like half the time they just go off on tangents.
That's true.
And Bryce is too.
Oh, God.
What was the word?
It wasn't demure.
It was another word.
Nonchalant.
Bryce is too nonchalant on the podcast. Nonchalant. Bryce is too nonchalant on the podcast.
Nonchalant?
Yes, which is a very true assessment, I think.
We're supposed to be – I don't know what the opposite of nonchalant – I know what nonchalant means, but I don't think chalant is a word.
But like this is – listen, listen.
I'm all about constructive criticism, whoever made that comment.
But this is our podcast.
No, chalant is a word.
Is chalant a word?
But, well, no, no, no, I don't think it's a real word.
Wiktionary defines it as not non-chalant, which is about the dumbest definition i've ever heard i mean um i think that's the that's the
beauty of uh our podcast i mean that's the thing is i kind of thought that like it's a real shame
that cpp cast disappeared because they're kind of in my mind like this kind of like voice or like
core where you know a couple news articles that's a good c++ podcast you know what
the one that like the actual the actual quality podcast yeah yeah they they they have like a
format as to whatever this is yeah we are not that and that's the thing i was like should we start
doing like a couple news articles and then my first thought was just like no like that's i don't want
to do that uh we if we did we'd have to start another podcast for that purpose.
Not this one.
I like the fact that we got three minutes into our last episode 81 and we were like,
all right, we're going to talk about Topic X.
And then I was like, oh, wait a second.
We got to do some corrections.
And now that I've said that.
Yeah, I was going to stop you.
I was like, what, what are you doing?
Like.
Technically, I won't go read the other one, but that was only one thing I needed to mention.
There's a list of them, but we'll do that on another episode.
And Bryce still has to go to the gym.
And Bryce still has to go to the gym.
And eat food.
But anyways, go on.
Listen, the nonchalant list is here to stay, even though I don't think that's the right word.
Nonchalant is kind of like adjacent to—
They didn't say you were nonchalant they said i was nonchalant but doesn't doesn't nonchalant
it's i i think it's adjacent to like apathetic it's like you know you're not you know yeah
yeah i don't you think you're nonchalant on this i think you get overwhelmingly excited
and blow people's eardrums out with your i am sufficiently seasoned that I have learned to not care about a number of things that I deem not sufficiently important.
Whereas you – maybe it's not seasoned.
I'm traumatized enough.
No, that's not the right word.
I've seen enough things.
I've seen enough things that like, I have learned to be like, yeah, you know, I don't really care about that, that particular thing. Whereas you, you're just like, you're like, uh, a young school
girl. You're just always so excited by the world and the the singing of the birds
the blossoming flowers you gotta stop to smell every rose and i'm just like
yeah whatever like let's get to the important things here so yes all right it's a fair criticism
i don't i don't know how to take being called a young school girl but um sure i'll take it as a
compliment anyways we're like 20 minutes
into our 30-minute episode talking about comments on episode 80. In episode 82, here's our first
comment from James Barrow. Hopefully, I'm pronouncing that correctly. The discussion
around GPU iteration order feels a little like something's being lost in translation. If you
have a 2D array in row major order and you want it to
some column-wise, each thread will be assigned to one element in a row and then iterate down columns.
So if you consider the thread index to be your loop variable, it's not all that different.
The mismatch here, I suspect, is actually in the definition of a thread. GPU threads are
essentially one lane in a SIMD architecture. That's S-I-M-D.
So I think the mix up here is essentially definition in nature. In a CPU architecture, you'd give each thread a chunk of data to process, loading rightwards slash contiguously
slash row-wise with SIMD, and then summing downwards, which isn't different to a GPU.
It's just a terminological, it's terminological at that point.
In a language like OpenCL, running on a CPU, each thread maps to one SIMD lane, and it's very much like programming a GPU.
He's absolutely right, but it, every lane is an independent threat.
And now with modern NVIDIA GPUs, truly an independent threat. this sort of joined vector style execution for operations
where all the threads are at the same place and doing the same thing,
the way that you write the code is different
from how you'd write the code on the CPU.
And as Olivier used to describe it,
you know, the GPU compiler,
in the GPU compiler, we vectorize so late,
we vectorize in the hardware.
And, you know, the difference in the programming model
does change how you write the code.
Like under the hood, yeah, like the code is,
you know, the way that you optimally
process things may be the same but the way that you write the code and the way that you logically
move through the data and the code in each thread is a bit different awesome well james barrow you
heard it here you're 100 correct but that that doesn't mean programming to each of these,
a GPU and a CPU, is the same.
Moving on to comment number two.
I think James was the best comment.
But yeah, we will move on to comment number two.
Really enjoyed.
This is from Kemanachi.
Don't know who that is because it's Reddit.
Really enjoyed this episode.
A lot of the multidimensional reordering just sounded like flattening a matrix, quote unquote, to me.
I believe APL calls it raising a matrix.
I think raising is J terminology.
APL calls it raveling, but same difference.
Also, the way Bryce described how GPUs work sounded pretty intuitive.
I think a great diagram or animated visual might help clearing that up within the array
language community.
This episode has me even more interested in seeing how MD-SPAN and multidimensional iterators
will work on GPUs in the future.
Comments?
I'm not sure I understand the flattening part. I mean, it is certainly true that I think almost all ways of performing operations on multidimensional structures do tend to lead to some flattening, you know, in some way, shape, or form.
And by that, I mean that sort of we, you know,
you move through in some sequence in time through the data.
And like that is like a linear thing um that you do um
but i'm not sure that i understand the flatten where what he's talking about the flattening
in this context so flattening i mean i think what he's probably referring to is when you describe
the two different um ways you can index yeah when dealing with the 2d matrix the first one was just
uh or i can't remember if
it was first or second but one of them was just yeah yeah it was just you know looping from zero
to uh your dimensions time your dimensions and then doing some modulus arithmetic that in a sense
is basically just looping from if you if you're storing your array contiguously in memory,
just like from the beginning to the end, and then behind the scenes,
you're doing some modulus arithmetic to get I and J.
Yeah, exactly.
Yeah.
And like, as we discussed in that episode, that approach is usually not the right way to do things.
Yeah. yeah yeah because that that the the there's information loss when you go from a from you know a multi-dimensional position to a linear index into storage um there's information loss
you can reconstruct the multi-dimensional index from the linear location in storage
but that's often an expensive operation. And that's maybe something that
we didn't get into in the last time. And this is sort of something that's
core to the MD-SPAN model. An MD-SPAN, this is core to a lot of like matrix models,
but MD-SPAN illustrates it particularly well because an MD-SPAN is a non-owning view into
something. So when you construct an MD-SPAN, you give it a pointer to some data somewhere.
In the MD-SPAN model, that data is just a linear data.
It's not inherently multidimensional indices, from some multidimensional space,
into single-dimensional indices into that storage.
And that storage array, it may be contiguous,
it may not be contiguous.
In a lot of the simple cases,
you can think of it being contiguous,
but if you're, or you can think of the, let me rephrase that, the backing storage is contiguous, but the elements that the order of the elements that you might access may not be a mapping from not every element in that contiguous block of storage
may be mapped to from one of the multidimensional indices.
But so you have this operation in MD-SPAN that's defined by the layout,
which is the operation where you give it a multidimensional index
and it gives you back a single dimensional index,
which is the location and storage.
And so when you're iterating, your iterator can choose to store one of those two things,
either the multidimensional index, the thing that you would give to this layout function,
or the storage index. And the problem is that the layout function that does this mapping is usually pretty quick. There are some cases like sparse where it may be more expensive,
but let's just assume that it's pretty quick. Especially in dense cases, it's almost always
pretty quick. But the inverse operation of that going from storage index to multidimensional index,
that's almost always slow. And at the very least, it's almost always slower than the other way
around. And so it's almost always wrong to store the storage index in discard the multidimensional index if you
ever think that you're going to need to reconstruct and reuse that multidimensional index.
Yeah, we're probably going to end up having a bunch of conversations around this topic
based on where my career is going.
Because I've been thinking about this ever since starting to work at NVIDIA, which was now back in 2019.
Is that we have a, like, and this is getting very NVIDIA specific.
So, you know, I apologize to the listener if you care less about this topic.
But there's a bunch of different, and some of it is C++.
But like, so C++, MD-SP md span and md array are coming and in terms of
the libraries that nvidia offers there's like a a bunch of different like we've got kublas
we've got matx we've got you know python front ends to c++ backends that are sort of you know
numpy copies and i've always wondered uh like what's actually like the most efficient backend to a kind of like array language? Because, you know, all the indexing that you're talking about, it only it doesn't actually matter all the time, like for scalar operations is what they call an array languages, like when you are just adding a scalar to your matrix. Exactly. Yeah. It actually has, it has no, it makes no difference
how you're, if you're storing it linearly in memory or structurally in memory, because
if it's stored linearly, you're just basically doing a transform, a C++ stood transform on every
single element. And so I just, I'm just, there's another aspect here, which is quite significant.
When you're iterating through multidimensional data, you're usually either doing something
element wise or the second case.
And I'm going to, I'm just going to call it the second case for right now.
Let's focus on that element wise case.
That's, that's the case of like, you're adding a scaler to every element or, you know, you wanted to do something like a stood reduce on every element.
That's something we talked about the last time.
Now...
Wait, pause.
What do you mean by a stood reduce?
Because assuming every element is like an integer, what does a reduction on a single integer do?
I'm like,
if I have a multidimensional,
if I have a 2d matrix and I want to sum up,
you know,
all of the,
all,
all of the elements of the matrix.
Oh,
so you mean,
you mean regardless of the rank of your matrix, if you want to perform a reduction across every single rank down into a
single scale or or or a count if if you want to count all the non-zeros in your matrix okay okay
so basically any operation that yeah the structure the rank or the shape doesn't impact if you're
reducing down to a single element or you're doing an element wise operation on every element okay gotcha and and importantly you know um importantly i think these operations they almost only make
sense if they're all independent of each other like i can't you can imagine a stood reduce
right being like that that being a useful thing like i want to sum up all of the
values in this matrix or i want to count all the non-zeros in this matrix.
How about adjacent difference?
Yeah, basically you're saying you need a monoidal binary operation where you can do it in any order.
Now, the reason that adjacent difference isn't a particularly useful operation to apply is because you you're you're taking the difference between
you know two adjacent elements but what does adjacent here mean in this context right now
and so so this first class of of things you'd of iterations you want to do something to every one of the elements and
that thing is, you know, kind of independent of other elements and or at the very least
it doesn't care about like position of one element relative to another.
The key thing here, the key insight is these are operations on elements.
The other class of things that you want to do to a matrix, I will posit, you don't really want to do them, you don't really need to consider multiple elements that have some relation to each
other, then you're going to care about indexes, not about the particular elements.
And so, like the example of this, like, you know, adjacent difference is basically,
for the 1D case, it's a simple, like like stencil operator. But anybody who's done any amount
of like numerical programming knows that if I'm writing some sort of, you know, stencil loop
that's doing some numeric computation, I want to, or, you know, I want to be able
to do like the value at like IJ plus the value at, you know, I plus one J plus the value at I
minus one J, you know, something like that. Now, the distinction here is in the second case of things, you care about
iterating indexes. So I will posit that the two different categories of iteration that you might
want to do over a multidimensional structure is one, element-wise by that, I mean, iterating over all of the elements
or two, something where you're iterating through an index space, not particular elements.
So the couple translations that I would give for the array language listener is that you're sort
of your element wise operations, we would separate into two different categories. So only the shape preserving ones that basically like the
reduction version in an array language, what you would do is we heard the word raise earlier,
which is the J term or ravel, which is the APL term, which when you ravel, say a matrix that
has two dimensions, so a shape of length two, you know, a two by three matrix.
When you ravel that, it actually, to my knowledge, doesn't change anything in memory.
It just destroys the shape because everything's stored linearly in a raveled form anyway.
So actually a ravel, which you might think is an expensive operation, the way that they're stored,
you basically just change the shape to the product of
all your elements of your shape, your dimensions, and then you don't actually have to change
anything in memory. And then you perform a reduction. And then I think when you say indexes,
I typically think more of like sub shapes, but it's the same ideas like where I'm, you know,
performing something on a certain rank, um, that equates to just, uh, you know, an IJ index. Um,
but so was there, was there a follow-up point to like, the point I was getting at is that
in a very concrete C plus plus space, I think there are two categories of useful iterators
that you would have to a multidimensional structure.
The first category of iterator is an iterator that gives you values and does not tell you what
index the things live at. And there is no way to get the index out from it. But that doesn't matter because the way that you want to use this is to do something element-wise to each one of the elements of it.
And then there is a second class of index, which is not an index over elements.
There's a second class of iterator, which is not an iterator over elements, returning you the elements of the matrix,
but it's an iterator that produces indexes that you would then use to index into the matrix.
And there is a both programming model difference here, and also a significant performance difference because if we only have to build efficient index iterators,
that's a much more tractable problem
than building efficient element iterators
from which we can recover the multidimensional index
for a variety of reasons that have to do with how compiler vectorizers
work that um i do not remember all the details of and even if i could would not get into in depth
uh on this podcast because that would be that would be hairy even for us
well i have more questions but we're gonna let bryce go exercise
and eat food oh there were two more comments there were two more comments oh there was only
one more comment we were at comment number two and very quickly the last comment was a response
to the previous one by kemanachi greg cpp was responding to a sentence in the last comment
which was i think a great diagram
or animated visual might help clearing that up with the array language community.
And Greg CPP responded to that.
Yes, exclamation mark.
I'm not a GPU programmer, but I would really appreciate a good reference for how GPU programming
works at a conceptual level, not how to program in CUDA or with any other tools, but how they
work generally.
For example, I believe that most GPUs have a CPU user space,
quote unquote, driver, a CPU kernel space driver,
and code on the GPU proper,
but I have no idea what the responsibilities of each are.
For example, when you dynamically allocate GPU memory,
where does the allocation algorithm run?
Or when the vendors talk about GPU cores,
what does that really mean?
Is that answerable in a couple minutes?
Well, I'm going to work backwards.
So GPU cores, like often as a marketing term,
there is a, you know, on NVIDIA GPUs, we have an actual execution unit called an SM.
And I think of that as being the closest analogy to a core. On GPUs, we virtualize a whole lot of
threads and SMs can be heavily oversubscribed with threads. And in fact, that's how they like to operate because, you know, while
the way that the GPU SMs are signed to work, while they are waiting for data for one, you know,
group of threads, they can go and execute for the next group of threads. And, you know, this is very heavily pipelined. So,
you know, you want to do what's called latency hiding, where you never actually have to wait
for something because there's always more work available. So, you know, you maybe have one group
of active threads and then you have, you know, 30 or 60, um, and those may be slightly
excessive numbers, um, you know, four or 16 other groups of threads, which, um, uh, you've initiated
loads on or stores on, or some, some operation that needs to wait, um, on, um, memory traffic.
Um, and, uh, you know, those are sort of pipelined. And so as one of those loads comes in,
then that group of threads becomes available. And then you go and you execute the compute code until
it hits the next memory load. And then by that time, somebody else has become available to start computing. So they are sort of designed for
this barrel threading, these SM cores. I don't remember what the rest of that question was. It
seemed like it was more, you know, this plea for a desire to understand the basics of the GPU
compute model, which is something that I too have a desire for,
as we talked about in the last time, I'm by no means a GPU expert. I will say, time permitting,
it is likely that I will scrap my original plan for my Array 2022 keynote, which was just to
recycle a good chunk of my existing slide deck on C++ standard parallelism.
But I'm thinking about just scrapping that plan and instead just solely talking about
MD-SPAN and talking in greater detail about the problems related to multidimensional iteration
in C++ and in compilers.
However, that will require...
The time that I have between now and
the start of that keynote is quite limited and made more limited by the fact that I have lots
of other things going on. And I have a friend coming into town to visit. Well, I can say it's
hot. Han is coming to visit next week. And yeah, we'll see whether I have time to do that.
I do also technically have a day job where I have to do things.
I mean, you could make a strong argument that giving a talk about multidimensional iterators
and MD-SPAN to the academic community of array language and multi-dimensional array libraries could be of immense value,
especially if you're focusing on how it impacts programming to a GPU, to NVIDIA. So, you know,
could be a part of your J job. Yeah, yeah, yeah. But it requires more work than I may have time
to do. So we'll see. see we'll see if you get a
bunch of meeting cancellation uh notices from me um in the next 24 hours it's because i've decided
to execute on this and have decided that uh that uh we are postponing all meetings until post i
assume you're talking to me and not the listener no i mean some of the listeners uh
are on you know meetings that i would send to the cancellation notices for
gotcha gotcha and if i send those meeting cancellation notices i'm gonna put in the note
listen to as ads for details i mean this won't be out until after.
Right.
People will have to wait a week or two to find out why they didn't get to know.
Listen to episode 82, which will be released on June 17th or something like that.
All right.
We'll wrap it up here.
Thanks for listening.
We hope you enjoyed and have a great day.