Algorithms + Data Structures = Programs - Episode 170: VIN & HPX
Episode Date: February 23, 2024In this episode, Conor and Bryce chat about VIN numbers and the HPX parallel execution model.Link to Episode 170 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterAD...SP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Recorded: 2024-02-22Date Released: 2024-02-23Top Market Cap CompaniesTop 10 Market Cap Companies as of Feb 22, 2024VIN (Vehicle identification number) NumbersISO 3779ISO 4030HPXAdaptive Global Address SpaceIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
No, for those of you that have listened either from the beginning or you tuned in partway and you went back to the backlog and you've listened to every episode, you are our most important listeners.
Not your mom who tunes in every once in a while.
Absolutely not, Bryce.
She is our legal counsel.
Don't get me wrong.
But our most important listeners are the ones that listen to all our episodes. Welcome to ADSP the podcast
episode 170 recorded on February 22nd, 2024. My name is Connor and today my co-host Bryce
explains the relationship between VIN numbers and the HPX parallel execution model. I don't know What problem don't we have? Well, should I share my screen briefly?
Yeah
Should I share my screen briefly?
You should
Let's go
Oh, yes
We certainly do not have a problem
With how our company is doing
Man, that was a crazy year
Largest market cap
I think we're number four we're number four
baby but at this point we're approaching two trillion we are less than 200 billion away from
taking over saudi aramco can you believe it folks we're more valuable than Amazon in fifth, Alphabet in sixth, Meta in seventh,
good old Warren Buffett,
Berkshire Hathaway at number eight
were even more valuable than him, folks.
Yeah.
Woo!
Microsoft and Apple, number one and two,
just to fill out the top eight.
And I guess we'll do top 10.
Oh, I didn't know that.
TSMC is number 10?
Yeah.
And Eli Lilly. Eli Lilly. Here's a quiz, Bryce. What drug is Eli Lilly? eight and i guess we'll do top 10 oh i didn't know that tsmc is number 10 yeah and eli lily
eli lily here's a quiz bryce what drug is eli lily famous for i don't know insulin i want to say
yeah that's correct yeah wow i'm really really nailing the quizzes yeah yeah um you know it's
i remember um when i joined nvidia inVIDIA like six years ago,
and it was right after Volta was released,
and it was before the big AI wave.
And some people told me at the time,
like, oh, you don't want to go work for one of the hardware companies
because like NVIDIA and Intel, they don't pay as well.
You should go work for like a Google or a Microsoft or an Apple.
But, you know, I just thought I came to work here because I was so impressed with the people from NVIDIA
that I interacted with on the C++ committee, Olivier Gouraud, Jared Halbrick, and Michael Garland.
I was just so impressed by them that I just knew that this is the place I wanted to go work.
And my background was in parallel programming.
And it seemed to me like they were really the parallel chip company.
And I was like, this is the place for me.
And boy, boy, has it worked out.
That's up to you.
But, you know, here's something you can leave in. So in 2011, when I was just a wee little kid, my parents encouraged me to, like, start saving.
And, like, as part of that, they were like, you should take a couple hundred dollars of your money and just play around with some stocks.
Just to, you know, not – don't put all your savings in your own investments of
your own choice, but just so that you like learn a little bit about the market and just as a,
um, I don't know, learning experience. And, uh, I don't remember all the stocks that I bought.
I bought some that were like some solar cell companies and one like EV company, um, uh,
all of which were like, like one of which was like a was a Chinese startup that I'm sure is no longer around.
But then I also bought AMD and NVIDIA because I was at LSU at the time working on HPX,
and I thought that they would be big winners in the future.
And that was in 2011.
And I couldn't have bought more than $100 or $200 of each.
But a year or two later, I just sold all that and just moved
everything into index funds with my financial advisor. And for a few years after that, I would
look back at how had my portfolio done if I had not sold it. And I haven't done that in a number
of years. And I don't even want to know how much if i had held that 200 of nvidia and uh in
the 200 of amd stock from 2011 i don't even want to know how much it would be now yeah it's always
hard looking back at that stuff uh yeah and just for just for context because i guess this is i
mean it's pretty relevant for those that are listening on friday when this gets released
but we are recording this the day prior to friday
february 23rd i'm actually i'm going i'm going to my first uh like in-person company all hands
meeting today because i'm at i'm in hillsborough oregon uh because my girlfriend's here for work
and so i figured i'd tag along and go visit my colleagues who are here,
who work on the NVIDIA HPC compiler. And they happen to have like a, you know, the CEO's not
doing the all hands here in Oregon, which is one of our satellite offices. But they have like a
viewing party for everybody in the local office to tune into the all hands. And so I'm going to my first in-person all-hands.
I've been here six years.
I've never been to an all-hands in person.
I remember when I was in headquarters,
like a few times walking past the ongoing all-hands
of being like, I can't, I don't have time for this.
Like I got too much stuff to do right now.
I think I've like maybe called into 102 102 but i haven't really ever like attended like
an all hands meeting um so now i'm gonna attend an all hands meeting and i'm bringing bringing
i got my dog here with me who's traveling with us and so she's gonna come to her first nvidia
all hands meeting or her first nvidia all pause meeting all right so i got one comment to finish
and then i've got a question.
My first comment to finish was I was, I was given some context by the day that we were recording
this because we were talking about market cap. NVIDIA, at least at this point in time, it is 1215
PM EST. So we're mid market. Market's been open for three hours and 15 minutes. We're up 15%
right now. So our market
cap, we were ranked fourth at the beginning of the day. We'll probably be ranked fourth at the
end of the day, but it's been a good day. But it's all ephemeral. It'll go down, it'll go up.
My question now is, you realize that the All Hands is at 1 p.m. Eastern time and therefore
10 a.m. your local time, which means you have 45 minutes to get there, right?
Well, yeah. I mean, I imagine I'll probably miss a little bit of it.
And I do have to, you know, make slides during it because I'm giving a talk later in the day.
Yeah. I mean, this is the beautiful thing about attending these things remotely, which is what I have. I've never been to an all hands in person and all hands doesn't need,
doesn't require a hundred percent of the attention. And you know, if we're, if we're
being honest, you can, especially if you're doing things that require compiling or running tests
that, you know, have a little five minute or 10 minute loop. Perfect, perfect time to do that
kind of stuff. Cause you can pay attention, hit the button and poof. Yeah. And I mean, I have all the content for my slides. I just need to
take the code and put it into slide form and add a few notes. Yeah. So I actually, I have two topics
for us today. All right. The first topic is I got a question from our most important listener,
my mother, who apparently has told some of her friends about the podcast
and they have listened.
She's listened to a couple episodes, I think, too.
Well, she's clearly not our most important listener
if she doesn't listen to every episode.
She's my mother.
She's the most important listener.
She's the most important listener.
For those of you that have listened either from the beginning
or you tuned in partway and you went back to the
backlog and you've listened to every episode, you are our most important listeners. Not your mom
who tunes in every once in a while. Absolutely not, Bryce. She is our legal counsel. Don't get
me wrong. But our most important listeners are the ones that listen to all our episodes.
So she sent me a text and she said, Dear ADSP, how are VIN numbers assigned to
cars? And this is, I presume, a follow-up to, she must have seen our episode about
how credit card numbers are issued and I also mentioned it to her. So for those who don't know,
every car has a VIN number, which is an identifying number
that gets issued to it.
So, I did a little bit of research on this.
And it's not, I think, as interesting an algorithmic question as the credit card numbers.
Because one of the key things with the credit card numbers is that with the credit card
numbers, you don't want them to be issued sequentially because you don't want them to
be guessable.
Because if they're guessable, then they're attackable, right? Then you can find out somebody's credit card number
and maybe, you know, and if you can find out their credit card number and if maybe you have some way
of telling from this, if they were issued sequentially, then maybe you could figure out
roughly when the card was issued. And then if you knew what the typical expiration date length is for
that card issuer, then you could figure out, you know, the credit card's expiration date,
yada, yada. Maybe you could figure out the billing zip code and then you're able to, you know,
to use that card number. So, obviously, with credit cards, you need to make sure that they're
not issued in a predictable way. But not so much a problem with vehicle identifier numbers. Now, VIN numbers tend to differ from
country to country, how they're issued. There are some, unsurprisingly, ISO standards for VIN
numbers. I don't think we're going to purchase and look at those today. There's ISO...
Ooh, the listeners want another PDF purchase for an obscene amount of money for
12 pages so there's two standards uh there's iso 3779 which uh uh defines the like structure in
the content in a VIN number and then iso 4030 which uh it says is about location and attachment, which I think is like where it should go on the car.
And they are not just for cars, but also for towed vehicles, motorcycles, and things like scooters and mopeds, et cetera. And there's a couple of interesting things about VIN numbers that I think
lead to some lessons about programming in general, and in particular, designing address systems.
So, a VIN number isn't just solely a unique identifying number, just as a credit card
number actually isn't solely a uniquely identifying number. Just as a credit card number actually isn't solely a uniquely
identifying number. You know, the credit card number has some information embedded in it,
like, you know, who the credit card issuer is. In the case of a VIN number, you have
the manufacturer identifier as part of it. And then there's a vehicle descriptor section which tells you like the general characteristics of the
vehicle. And then there's a vehicle identifier section and that can be the just solely like a
unique serial number. But a lot of manufacturers will include, you know, some things like the make, model, what engines in the car, some other details that
are maybe specific to them will be embedded in that vehicle identifier section. The stated purpose
of the vehicle identifier section is that it's supposed to be an indication that provides clear
identification of a particular vehicle, whereas the vehicle descriptor section is supposed to provide an
indication of the general characteristics of the vehicle. And so, the thing that I think is
interesting for us to talk about in the context of programming is this notion of information
embedded in address spaces. And this is actually something that I'm quite familiar with because back in the
day, I worked on HPX, this parallel runtime system at OSU. And the thing that I, like my primary
contribution there was redesigning a system that we called the adaptive global address space, which was HPX's addressing
space. And at the time it was a 128-bit address space. And if you think about your like local
memory address space on most systems, you typically think about that as like, oh, well,
that's, you know, it's just some arbitrary, you know, 64-bit number that points to some memory location that the operating system issues.
But actually, depending on the operating system, some operating systems don't just – it's not just like solely an identifier.
There's usually some prefix there that's maybe associated with your particular process. Although these days with address space randomization,
it tends to be less predictable what the actual structure of a local memory address is.
But in HPX, we did sort of the same sort of thing that the credit card system does and that the
the VIN system does, which is that we embedded useful information into the
address itself. And in particular, in HPX, we had this 128-bit address. And the lower 64 bits,
we would use to embed a local memory address. So, in HPX's model, HPX was a distributed model. And
so, these 128-bit addresses were global addresses to objects
that would be somewhere in the system. And what we did with those lower 64 bits is we would embed
the local address of, hey, on this, you know, this is the memory address on whatever particular
system this thing lives on. And this was quite useful because this meant that when you were
routing a request to a particular distributed object, you didn't have to do a local lookup of
what's the address of this object in local memory. So you could take this 128-bit address,
you could look at the top 64 bits, which had the information about, like,
what node is this thing on? And then you could dispatch a message to that node.
And that message could use the local pointer to that object. You didn't have to go to that node
and then say, hey, I'm on this node. Let me, you know, let me go and access this object.
And we actually embedded some other
information in this address too. One of the most interesting things, I think, from a programming
language perspective is we embedded typing information into the address. So, in HPX's model,
all of the objects in this global address space were objects.
They were classes.
It wasn't just like raw memory, but everything was an object.
And so every object would have a particular type,
and those types would be registered globally.
And so every one of these global addresses would have the type embedded in there,
which was useful because if you wanted to do like a remote procedure call, you could do some type checking to make sure that you've got an object
of the correct address of the correct type before you even dispatch the message. So it was like this
like fat pointer that embedded the type in it. Now, there was one challenge with HPX, which is that it was an
active global address space. And the active part was that objects could move from one node to
another node for load balancing purposes. And so the way that we handled that is that essentially we would, if an object moved from one node to another node, we couldn't change the address because the address itself has embedded in it information about what node it lives on and the address of the object on that node. And we wanted our system to have a property that we call referential integrity,
which means that if you have a reference, a pointer to an object, that's always good.
Even if the thing moves from one place to another place in this distributed address space,
the original address remains good. And so what we did was we simply, if a node received a message
to a particular address and that address had moved, the node would just redirect
to wherever the new place is. And then eventually through caching, everybody would hopefully get
the update. And when it does the redirection, it would also send a message back
to whoever communicated with it saying,
hey, this thing's moved. Put this into your address cache
so that you stop calling me and asking me to redirect this thing
to somewhere else because otherwise I could end up with this node
which was probably already experiencing high load balance,
which is why stuff was moved away from it. And then if it
has to be spending all this time redirecting things to other nodes, well, that's just going
to make the load balance issue worse. And one of the other useful things about having the local
address embedded in the global address was that it optimized one of the fast cases, which is if I'm accessing an object using its
global address and the object lives on the node that I'm on, which is the common case
that I'm doing work on a local object, I can completely avoid any communication with the
addressing server because I can see, hey, I've got this address. I can see that this
address, the node that this object lives on is my current node. And there's a fast way that we can
check that in our system. We don't have to go round trip to the global addressing server.
So I know that this object is local and I know that it hasn't moved and I can check that quickly.
And then, hey, because I have the local address here, I don't have to go look up where this object is.
I can just take the local address out of this global address and dereference it and boom, I'll have an actual object in memory. And I think a lot of address spaces these days tend to have this sort of embedded,
useful information in some way, shape, or form. You know, a lot of people maybe think like,
oh, we'll never need more than 64-bit addresses for memory. But actually, we almost certainly will in the future move to
wider addresses because as processors become more complex and we see more specialized
hardware and specialized memory and types of memories, I think that we'll start to see more and more of a
need for clever addressing systems where we have all sorts of embedded information in the
addressing systems. And that will require more bits than we currently have. Oh, another interesting
thing that we would embed in the HPX addresses is that in HPX, all of the objects had their lifetimes managed by reference counting.
And the addresses themselves would have a number of reference counts in it.
And what you would do is every time you send the address to another node, so every time you serialize the
address and it gets sent over the wire to somewhere else, you split the reference count.
So the sender keeps half of the reference count and the receiver gets half of the reference count.
And this way, when you send a message to another node, you don't have to go and call the global addressing system and say, hey, increment the reference count.
You avoid having additional reference counting traffic back to the original owner of the object when you're communicating.
And so that was a pretty clever little trick. And essentially, that meant that these addresses, while they had this property of referential
integrity, when you were comparing these addresses, you had to mask away the reference counting
bits because those would always change.
Because I could have the address in one place where it has, you know, 128 reference counts left. And then on
some other place, maybe I have an address where the same address, but where I only have 32 reference
counts left on it. And those two should compare equal because they're the same address. I've just
got this couple bits in the address where I'm storing this reference counting information.
So, it's a pretty cool system. I was pretty, I was pretty, that was some of the best where I'm storing this reference counting information. So it was a pretty cool system.
I was pretty, that was some of the best code I wrote, I would say.
So this all started with a comparison to VIN numbers?
Yeah, to VIN numbers.
How much of this complicated HPX fanciness ties back to the VIN number?
Well, all of the fanciness in the HPX address system was all about this idea of we're going to embed useful information into an address.
So the address is, yes, it's a unique identifier, but it also has useful information.
And that's applicable to credit cards and VIN numbers have, you know, this check digit thing where given a VIN number, you can, some of
the digits in the VIN number and some of the digits in a credit card number are check digits
that are computed from the other digits. So, you can do this quick check to immediately determine
whether or not the address that you have is complete garbage or not. Like, that's a clever use of embedded information. And also,
given a credit card number, you can tell who the card issuer is. You know, like you see some
websites out there where they ask you to select like Visa, MasterCard, or Amix. Well, they don't
have to do that. They can actually infer that, the type of the the entity from the number just in the same way that an hpx
given an address you can you can determine what the type of the object that it points to is so
what and what is it about vin numbers they're just storing certain pieces of information about the
vehicle or the seller or something yeah yeah certain information about like the make and
model of the vehicle and the manufacturer.
There's a few other things that can be in there.
Like you can tell like the model year of the car.
You could tell the, you know, what type of engine,
like what are some of the features of the car?
Like is it, you know, is it a Mazda 3 with, you know, this engine or that engine? Which trim is it? Sometimes even things like the color,
et cetera, which is useful and important, for example, for law enforcement to be able to,
oh, you're just sending around the VIN number and I can tell, oh, I'm looking for a, you know,
this VIN number is a white Toyota Corolla. And that can just be inferred immediately
from the VIN number without having to go and query some database. Or rather, I should say,
like, one of the key properties of HPX, one of the reasons why we embedded all this information
was to avoid having to send queries to a global server or to a server that lived somewhere else,
that we wanted to localize the lookup of
information. And VIN numbers let you do that too. You know, every police car can have a computer in
it that has a little database of all of the decodings of VIN numbers. And then, you know,
you can just type the VIN number in there. You don't have to go and call some server and look up,
you know, what's the make and model of this car you can just have information can all be in a little local store and uh of decoding info yeah that is useful
i've never for the record never owned a vehicle which is why i know so little about uh but all
this stuff but to answer the the original question of my mother which is uh how are they issued um
so one they are assigned when the car is actually manufactured. And most manufacturers
just issue them sequentially. And so the number typically would indicate what...
The actual part of it that's solely a number indicates what number of this car rolled off
the factory line. Because again, there's no security need to
randomize that part of it there. And there is some, you know, in certain
countries and places, the VIN number, when you have to like register it with
some government agency when the car gets created and the manufacturers all take
care of that.
And there was, there was a, there has been some effort in recent years to update and modernize the VIN systems, at least in the U.S. to accommodate sort of future growth.
Interesting.
Yeah.
Be sure to check these show notes either in your podcast app or at ADSP the podcast.com
for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts comments and questions.
Thanks for listening.
We hope you enjoyed and have a great day.
Low quality high quantity.
That is the tagline of our podcast.
It's not the tagline.
Our tagline is chaos with sprinkles of information.