Algorithms + Data Structures = Programs - Episode 170: VIN & HPX

Starting point is 00:00:00 No, for those of you that have listened either from the beginning or you tuned in partway and you went back to the backlog and you've listened to every episode, you are our most important listeners. Not your mom who tunes in every once in a while. Absolutely not, Bryce. She is our legal counsel. Don't get me wrong. But our most important listeners are the ones that listen to all our episodes. Welcome to ADSP the podcast episode 170 recorded on February 22nd, 2024. My name is Connor and today my co-host Bryce explains the relationship between VIN numbers and the HPX parallel execution model. I don't know What problem don't we have? Well, should I share my screen briefly?

Starting point is 00:01:06 Yeah Should I share my screen briefly? You should Let's go Oh, yes We certainly do not have a problem With how our company is doing Man, that was a crazy year

Starting point is 00:01:23 Largest market cap I think we're number four we're number four baby but at this point we're approaching two trillion we are less than 200 billion away from taking over saudi aramco can you believe it folks we're more valuable than Amazon in fifth, Alphabet in sixth, Meta in seventh, good old Warren Buffett, Berkshire Hathaway at number eight were even more valuable than him, folks. Yeah.

Starting point is 00:01:55 Woo! Microsoft and Apple, number one and two, just to fill out the top eight. And I guess we'll do top 10. Oh, I didn't know that. TSMC is number 10? Yeah. And Eli Lilly. Eli Lilly. Here's a quiz, Bryce. What drug is Eli Lilly? eight and i guess we'll do top 10 oh i didn't know that tsmc is number 10 yeah and eli lily

Starting point is 00:02:05 eli lily here's a quiz bryce what drug is eli lily famous for i don't know insulin i want to say yeah that's correct yeah wow i'm really really nailing the quizzes yeah yeah um you know it's i remember um when i joined nvidia inVIDIA like six years ago, and it was right after Volta was released, and it was before the big AI wave. And some people told me at the time, like, oh, you don't want to go work for one of the hardware companies because like NVIDIA and Intel, they don't pay as well.

Starting point is 00:02:39 You should go work for like a Google or a Microsoft or an Apple. But, you know, I just thought I came to work here because I was so impressed with the people from NVIDIA that I interacted with on the C++ committee, Olivier Gouraud, Jared Halbrick, and Michael Garland. I was just so impressed by them that I just knew that this is the place I wanted to go work. And my background was in parallel programming. And it seemed to me like they were really the parallel chip company. And I was like, this is the place for me. And boy, boy, has it worked out.

Starting point is 00:03:19 That's up to you. But, you know, here's something you can leave in. So in 2011, when I was just a wee little kid, my parents encouraged me to, like, start saving. And, like, as part of that, they were like, you should take a couple hundred dollars of your money and just play around with some stocks. Just to, you know, not – don't put all your savings in your own investments of your own choice, but just so that you like learn a little bit about the market and just as a, um, I don't know, learning experience. And, uh, I don't remember all the stocks that I bought. I bought some that were like some solar cell companies and one like EV company, um, uh, all of which were like, like one of which was like a was a Chinese startup that I'm sure is no longer around.

Starting point is 00:04:08 But then I also bought AMD and NVIDIA because I was at LSU at the time working on HPX, and I thought that they would be big winners in the future. And that was in 2011. And I couldn't have bought more than $100 or $200 of each. But a year or two later, I just sold all that and just moved everything into index funds with my financial advisor. And for a few years after that, I would look back at how had my portfolio done if I had not sold it. And I haven't done that in a number of years. And I don't even want to know how much if i had held that 200 of nvidia and uh in

Starting point is 00:04:46 the 200 of amd stock from 2011 i don't even want to know how much it would be now yeah it's always hard looking back at that stuff uh yeah and just for just for context because i guess this is i mean it's pretty relevant for those that are listening on friday when this gets released but we are recording this the day prior to friday february 23rd i'm actually i'm going i'm going to my first uh like in-person company all hands meeting today because i'm at i'm in hillsborough oregon uh because my girlfriend's here for work and so i figured i'd tag along and go visit my colleagues who are here, who work on the NVIDIA HPC compiler. And they happen to have like a, you know, the CEO's not

Starting point is 00:05:32 doing the all hands here in Oregon, which is one of our satellite offices. But they have like a viewing party for everybody in the local office to tune into the all hands. And so I'm going to my first in-person all-hands. I've been here six years. I've never been to an all-hands in person. I remember when I was in headquarters, like a few times walking past the ongoing all-hands of being like, I can't, I don't have time for this. Like I got too much stuff to do right now.

Starting point is 00:06:01 I think I've like maybe called into 102 102 but i haven't really ever like attended like an all hands meeting um so now i'm gonna attend an all hands meeting and i'm bringing bringing i got my dog here with me who's traveling with us and so she's gonna come to her first nvidia all hands meeting or her first nvidia all pause meeting all right so i got one comment to finish and then i've got a question. My first comment to finish was I was, I was given some context by the day that we were recording this because we were talking about market cap. NVIDIA, at least at this point in time, it is 1215 PM EST. So we're mid market. Market's been open for three hours and 15 minutes. We're up 15%

Starting point is 00:06:43 right now. So our market cap, we were ranked fourth at the beginning of the day. We'll probably be ranked fourth at the end of the day, but it's been a good day. But it's all ephemeral. It'll go down, it'll go up. My question now is, you realize that the All Hands is at 1 p.m. Eastern time and therefore 10 a.m. your local time, which means you have 45 minutes to get there, right? Well, yeah. I mean, I imagine I'll probably miss a little bit of it. And I do have to, you know, make slides during it because I'm giving a talk later in the day. Yeah. I mean, this is the beautiful thing about attending these things remotely, which is what I have. I've never been to an all hands in person and all hands doesn't need,

Starting point is 00:07:29 doesn't require a hundred percent of the attention. And you know, if we're, if we're being honest, you can, especially if you're doing things that require compiling or running tests that, you know, have a little five minute or 10 minute loop. Perfect, perfect time to do that kind of stuff. Cause you can pay attention, hit the button and poof. Yeah. And I mean, I have all the content for my slides. I just need to take the code and put it into slide form and add a few notes. Yeah. So I actually, I have two topics for us today. All right. The first topic is I got a question from our most important listener, my mother, who apparently has told some of her friends about the podcast and they have listened.

Starting point is 00:08:08 She's listened to a couple episodes, I think, too. Well, she's clearly not our most important listener if she doesn't listen to every episode. She's my mother. She's the most important listener. She's the most important listener. For those of you that have listened either from the beginning or you tuned in partway and you went back to the

Starting point is 00:08:26 backlog and you've listened to every episode, you are our most important listeners. Not your mom who tunes in every once in a while. Absolutely not, Bryce. She is our legal counsel. Don't get me wrong. But our most important listeners are the ones that listen to all our episodes. So she sent me a text and she said, Dear ADSP, how are VIN numbers assigned to cars? And this is, I presume, a follow-up to, she must have seen our episode about how credit card numbers are issued and I also mentioned it to her. So for those who don't know, every car has a VIN number, which is an identifying number that gets issued to it.

Starting point is 00:09:08 So, I did a little bit of research on this. And it's not, I think, as interesting an algorithmic question as the credit card numbers. Because one of the key things with the credit card numbers is that with the credit card numbers, you don't want them to be issued sequentially because you don't want them to be guessable. Because if they're guessable, then they're attackable, right? Then you can find out somebody's credit card number and maybe, you know, and if you can find out their credit card number and if maybe you have some way of telling from this, if they were issued sequentially, then maybe you could figure out

Starting point is 00:09:39 roughly when the card was issued. And then if you knew what the typical expiration date length is for that card issuer, then you could figure out, you know, the credit card's expiration date, yada, yada. Maybe you could figure out the billing zip code and then you're able to, you know, to use that card number. So, obviously, with credit cards, you need to make sure that they're not issued in a predictable way. But not so much a problem with vehicle identifier numbers. Now, VIN numbers tend to differ from country to country, how they're issued. There are some, unsurprisingly, ISO standards for VIN numbers. I don't think we're going to purchase and look at those today. There's ISO... Ooh, the listeners want another PDF purchase for an obscene amount of money for

Starting point is 00:10:26 12 pages so there's two standards uh there's iso 3779 which uh uh defines the like structure in the content in a VIN number and then iso 4030 which uh it says is about location and attachment, which I think is like where it should go on the car. And they are not just for cars, but also for towed vehicles, motorcycles, and things like scooters and mopeds, et cetera. And there's a couple of interesting things about VIN numbers that I think lead to some lessons about programming in general, and in particular, designing address systems. So, a VIN number isn't just solely a unique identifying number, just as a credit card number actually isn't solely a uniquely identifying number. Just as a credit card number actually isn't solely a uniquely identifying number. You know, the credit card number has some information embedded in it, like, you know, who the credit card issuer is. In the case of a VIN number, you have

Starting point is 00:11:35 the manufacturer identifier as part of it. And then there's a vehicle descriptor section which tells you like the general characteristics of the vehicle. And then there's a vehicle identifier section and that can be the just solely like a unique serial number. But a lot of manufacturers will include, you know, some things like the make, model, what engines in the car, some other details that are maybe specific to them will be embedded in that vehicle identifier section. The stated purpose of the vehicle identifier section is that it's supposed to be an indication that provides clear identification of a particular vehicle, whereas the vehicle descriptor section is supposed to provide an indication of the general characteristics of the vehicle. And so, the thing that I think is interesting for us to talk about in the context of programming is this notion of information

Starting point is 00:12:38 embedded in address spaces. And this is actually something that I'm quite familiar with because back in the day, I worked on HPX, this parallel runtime system at OSU. And the thing that I, like my primary contribution there was redesigning a system that we called the adaptive global address space, which was HPX's addressing space. And at the time it was a 128-bit address space. And if you think about your like local memory address space on most systems, you typically think about that as like, oh, well, that's, you know, it's just some arbitrary, you know, 64-bit number that points to some memory location that the operating system issues. But actually, depending on the operating system, some operating systems don't just – it's not just like solely an identifier. There's usually some prefix there that's maybe associated with your particular process. Although these days with address space randomization,

Starting point is 00:13:48 it tends to be less predictable what the actual structure of a local memory address is. But in HPX, we did sort of the same sort of thing that the credit card system does and that the the VIN system does, which is that we embedded useful information into the address itself. And in particular, in HPX, we had this 128-bit address. And the lower 64 bits, we would use to embed a local memory address. So, in HPX's model, HPX was a distributed model. And so, these 128-bit addresses were global addresses to objects that would be somewhere in the system. And what we did with those lower 64 bits is we would embed the local address of, hey, on this, you know, this is the memory address on whatever particular

Starting point is 00:14:40 system this thing lives on. And this was quite useful because this meant that when you were routing a request to a particular distributed object, you didn't have to do a local lookup of what's the address of this object in local memory. So you could take this 128-bit address, you could look at the top 64 bits, which had the information about, like, what node is this thing on? And then you could dispatch a message to that node. And that message could use the local pointer to that object. You didn't have to go to that node and then say, hey, I'm on this node. Let me, you know, let me go and access this object. And we actually embedded some other

Starting point is 00:15:25 information in this address too. One of the most interesting things, I think, from a programming language perspective is we embedded typing information into the address. So, in HPX's model, all of the objects in this global address space were objects. They were classes. It wasn't just like raw memory, but everything was an object. And so every object would have a particular type, and those types would be registered globally. And so every one of these global addresses would have the type embedded in there,

Starting point is 00:16:01 which was useful because if you wanted to do like a remote procedure call, you could do some type checking to make sure that you've got an object of the correct address of the correct type before you even dispatch the message. So it was like this like fat pointer that embedded the type in it. Now, there was one challenge with HPX, which is that it was an active global address space. And the active part was that objects could move from one node to another node for load balancing purposes. And so the way that we handled that is that essentially we would, if an object moved from one node to another node, we couldn't change the address because the address itself has embedded in it information about what node it lives on and the address of the object on that node. And we wanted our system to have a property that we call referential integrity, which means that if you have a reference, a pointer to an object, that's always good. Even if the thing moves from one place to another place in this distributed address space, the original address remains good. And so what we did was we simply, if a node received a message

Starting point is 00:17:25 to a particular address and that address had moved, the node would just redirect to wherever the new place is. And then eventually through caching, everybody would hopefully get the update. And when it does the redirection, it would also send a message back to whoever communicated with it saying, hey, this thing's moved. Put this into your address cache so that you stop calling me and asking me to redirect this thing to somewhere else because otherwise I could end up with this node which was probably already experiencing high load balance,

Starting point is 00:18:03 which is why stuff was moved away from it. And then if it has to be spending all this time redirecting things to other nodes, well, that's just going to make the load balance issue worse. And one of the other useful things about having the local address embedded in the global address was that it optimized one of the fast cases, which is if I'm accessing an object using its global address and the object lives on the node that I'm on, which is the common case that I'm doing work on a local object, I can completely avoid any communication with the addressing server because I can see, hey, I've got this address. I can see that this address, the node that this object lives on is my current node. And there's a fast way that we can

Starting point is 00:18:55 check that in our system. We don't have to go round trip to the global addressing server. So I know that this object is local and I know that it hasn't moved and I can check that quickly. And then, hey, because I have the local address here, I don't have to go look up where this object is. I can just take the local address out of this global address and dereference it and boom, I'll have an actual object in memory. And I think a lot of address spaces these days tend to have this sort of embedded, useful information in some way, shape, or form. You know, a lot of people maybe think like, oh, we'll never need more than 64-bit addresses for memory. But actually, we almost certainly will in the future move to wider addresses because as processors become more complex and we see more specialized hardware and specialized memory and types of memories, I think that we'll start to see more and more of a

Starting point is 00:20:07 need for clever addressing systems where we have all sorts of embedded information in the addressing systems. And that will require more bits than we currently have. Oh, another interesting thing that we would embed in the HPX addresses is that in HPX, all of the objects had their lifetimes managed by reference counting. And the addresses themselves would have a number of reference counts in it. And what you would do is every time you send the address to another node, so every time you serialize the address and it gets sent over the wire to somewhere else, you split the reference count. So the sender keeps half of the reference count and the receiver gets half of the reference count. And this way, when you send a message to another node, you don't have to go and call the global addressing system and say, hey, increment the reference count.

Starting point is 00:21:11 You avoid having additional reference counting traffic back to the original owner of the object when you're communicating. And so that was a pretty clever little trick. And essentially, that meant that these addresses, while they had this property of referential integrity, when you were comparing these addresses, you had to mask away the reference counting bits because those would always change. Because I could have the address in one place where it has, you know, 128 reference counts left. And then on some other place, maybe I have an address where the same address, but where I only have 32 reference counts left on it. And those two should compare equal because they're the same address. I've just got this couple bits in the address where I'm storing this reference counting information.

Starting point is 00:22:03 So, it's a pretty cool system. I was pretty, I was pretty, that was some of the best where I'm storing this reference counting information. So it was a pretty cool system. I was pretty, that was some of the best code I wrote, I would say. So this all started with a comparison to VIN numbers? Yeah, to VIN numbers. How much of this complicated HPX fanciness ties back to the VIN number? Well, all of the fanciness in the HPX address system was all about this idea of we're going to embed useful information into an address. So the address is, yes, it's a unique identifier, but it also has useful information. And that's applicable to credit cards and VIN numbers have, you know, this check digit thing where given a VIN number, you can, some of

Starting point is 00:22:46 the digits in the VIN number and some of the digits in a credit card number are check digits that are computed from the other digits. So, you can do this quick check to immediately determine whether or not the address that you have is complete garbage or not. Like, that's a clever use of embedded information. And also, given a credit card number, you can tell who the card issuer is. You know, like you see some websites out there where they ask you to select like Visa, MasterCard, or Amix. Well, they don't have to do that. They can actually infer that, the type of the the entity from the number just in the same way that an hpx given an address you can you can determine what the type of the object that it points to is so what and what is it about vin numbers they're just storing certain pieces of information about the

Starting point is 00:23:36 vehicle or the seller or something yeah yeah certain information about like the make and model of the vehicle and the manufacturer. There's a few other things that can be in there. Like you can tell like the model year of the car. You could tell the, you know, what type of engine, like what are some of the features of the car? Like is it, you know, is it a Mazda 3 with, you know, this engine or that engine? Which trim is it? Sometimes even things like the color, et cetera, which is useful and important, for example, for law enforcement to be able to,

Starting point is 00:24:17 oh, you're just sending around the VIN number and I can tell, oh, I'm looking for a, you know, this VIN number is a white Toyota Corolla. And that can just be inferred immediately from the VIN number without having to go and query some database. Or rather, I should say, like, one of the key properties of HPX, one of the reasons why we embedded all this information was to avoid having to send queries to a global server or to a server that lived somewhere else, that we wanted to localize the lookup of information. And VIN numbers let you do that too. You know, every police car can have a computer in it that has a little database of all of the decodings of VIN numbers. And then, you know,

Starting point is 00:24:57 you can just type the VIN number in there. You don't have to go and call some server and look up, you know, what's the make and model of this car you can just have information can all be in a little local store and uh of decoding info yeah that is useful i've never for the record never owned a vehicle which is why i know so little about uh but all this stuff but to answer the the original question of my mother which is uh how are they issued um so one they are assigned when the car is actually manufactured. And most manufacturers just issue them sequentially. And so the number typically would indicate what... The actual part of it that's solely a number indicates what number of this car rolled off the factory line. Because again, there's no security need to

Starting point is 00:25:46 randomize that part of it there. And there is some, you know, in certain countries and places, the VIN number, when you have to like register it with some government agency when the car gets created and the manufacturers all take care of that. And there was, there was a, there has been some effort in recent years to update and modernize the VIN systems, at least in the U.S. to accommodate sort of future growth. Interesting. Yeah. Be sure to check these show notes either in your podcast app or at ADSP the podcast.com

Starting point is 00:26:24 for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts comments and questions. Thanks for listening. We hope you enjoyed and have a great day. Low quality high quantity. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

CODACE Plant Stand

Algorithms + Data Structures = Programs - Episode 170: VIN & HPX

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Algorithms + Data Structures = Programs - Episode 170: VIN & HPX

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.