CoRecursive: Coding Stories - Story: To The Assembly

Episode Date: October 1, 2020

How do CPUs work? How do compilers work? How does high-level code get translated into machine code? Today's guest is Matt Godbolt and he knows the answers to these questions. How he became an expert i...n bare metal programming is an interesting story. Matt shares his origin story and the creation of compiler explorer in today's interview. Episode Page Episode Transcript Links: Compiler Explorer Matt's Github Matt's Blog Matt's YouTube

Transcript
Discussion (0)
Starting point is 00:00:00 I think that's probably the most thing that I want to sort of get out of this conversation with you is that I want to sort of engender, hopefully, the excitement in other people that I feel when I start taking the lid off and pulling out bits further and further down the stack until you get to the CPU and then go like, oh my gosh, I've got all the way to the bottom. And then someone taps you on the shoulder and says, no, you can go deeper than this. You're like, deeper? This is amazing. And keep on going. That's Matt Godbolt. He's an expert in low latency computing. He's going to teach us some lessons
Starting point is 00:00:27 about why it's important to understand what happens between, you know, your high level language and when things get executed down at the level of the CPU. For instance, what is a for loop in your high level language actually turned into
Starting point is 00:00:40 when it's executed by the CPU? And does that have performance implications? Matt's an important speaker because he has all this knowledge and how he came by it is a pretty interesting story. And it all came out of a lucky break where, you know, an afternoon's work with a friend and a bit of JavaScript
Starting point is 00:00:58 and a memorable last name. And then here I am. So I carry a certain amount of guilt for that, but that's also just being British. Hello and welcome to Co-Recursive, where I bring you interesting stories about software development. Before we get into the episode with Matt,
Starting point is 00:01:17 I have a brief plug. I started a new job. I now work at Earthly. We're building an open source build tool. You can find it on GitHub. Search for Earthly. Builds are always a big mess and we're trying to make that situation a bit nicer but back to matt matt's fame is not all luck as he describes it we're going to walk through his story a little bit and i think you'll see that he's been on a trajectory to be the expert in
Starting point is 00:01:42 kind of what happens when you get down to the metal um for some time i was about seven and i went around to a friend's house and they had like a very very very primitive flight simulator that he was flying around and i was apparently uninterested in that but when he then showed that you could type in stuff and make stuff scroll up the screen that apparently that was what enthralled me and so i was very very lucky to get one of them for myself for my my eighth birthday so that was yeah 1984 so i've been uh hacking on computers for a long time now 36 years that computer was the zx spectrum an 8-bit computer with 48k memory and a 16k rom also it had a tape drive because uh we were in the era before discs were very common what did this computer look like did it look like
Starting point is 00:02:34 picture computers as we picture them today or so it's if you if you have any kind of um reference point maybe in this in the u.s in particular things like the apple 2 2e that kind of era that kind of thing so the sync the spectrum was um a relatively small so the keyboard was probably smaller than an average laptop sort of keyboard area and the keyboard was the computer it was the keys themselves were like rubber they were like a rubbery mat with indentations in them, which underneath there was a very simple, literally just a thin, oh gosh, what they call membrane, which made contact or not. So it was a very horrible feeling thing. It was very woolly, very sort of there.
Starting point is 00:03:19 And the whole thing was the computer. You plugged it into a television and you tuned the television into the right channel. And it was blurry, of course, because it wasn't very good. And yeah, and it was tiny, really. You could hold it in your hand and sort of wave it around type of thing. And it was notoriously prone to overheating in some cases. And there was an expansion port in the back.
Starting point is 00:03:38 It was a fun little computer to play around with. You would buy games from the local newsagent for £2.99, which is about five bucks-ish, I guess. And you'd put them in and it was just like listening to an old modem sound, you know, the screeching, whatever. You'd wait four or five minutes to get the 40-odd K of data in and then you'd play your game. And you'd have to hope as well that there wasn't a corruption on the tape.
Starting point is 00:04:01 It would then crash and you'd have to start again. So it was a lot of fun about that. And you were very much exposed to how the computer worked when you just turned it on because you'd get a blank screen with a like, okay, what do you want to do? And obviously as a game player, you would just say, unload the thing I'm about to put it on the cassette tape. But if you don't have many games, another alternative would be to buy magazines from the same newsagent you were getting the games, and they would have type-in listings at the back. And that's kind of how I got started in programming,
Starting point is 00:04:29 would be the enthralling picture on the cover of the magazine, of course, would never actually meet the quality of the type-in listing that you spent four or five hours typing in. This is kind of crazy, right? Instead of buying the game on a cassette tape you could buy a magazine and then in the back 20 pages was just all the source code printed out in basic and you would type it in and hopefully you didn't make any mistakes and then of course you once you've typed this thing in you have to save it to the same unreliable cassette tape that you were loading the games off of so you had to hope that you a right. B, you were able to save it so you could recover it. But of course you would make typos. You'd make typos all over the place.
Starting point is 00:05:09 And that would introduce one sort of to the process of what a program was. So even if you weren't a programmer and even didn't understand what you were typing in, you would get the gist of what it was. And certainly if you were then curious about it, it was a definite leg into discovering how one might make one's own. So that was kind of how I started, was starting to write little basic programs, literally as in basically the programming language. And then laterally, you would start typing in these giant, instead of basics that you would obviously want, that you could understand, there would start to be the more and more of these like just data statements after data statement statements and
Starting point is 00:05:48 of course if you then look into it you realize that what you're typing in is the machine code of an assembly based game so unfortunately this computer the the spectrum did not have a built-in assembler you actually had to go and buy an assembler but you could of course do it by hand or if the person who had authored it and sent off their program to the news agent had like essentially compiled compiled uh assembled their their their code and then dumped it out as hex and then just said we'll type that all in and then you get a game obviously that's much more intractable to a user all right i'm gonna get some of this wrong but bear with me so basic is basic is a basic programming language, like 10, print hello, 20, go to 10.
Starting point is 00:06:31 It just keeps printing hello, right? But to do more advanced things in these games that were in the back of the book, they would drop down to machine code. Machine code is a level below assembly code. It's really just raw binary or hexadecimal. It's the actual instructions the CPU can execute. So Matt couldn't really understand this machine code, but he wanted to
Starting point is 00:06:52 understand it. And he wanted to understand what assembly was. So assembly is one level above machine code, where machine code might just have a binary representation for a jump instruction assembly has mnemonics so it would have like jmp for jump assembly also has labels just like in basic so that you can jump to certain locations like go to 10 in my example matt didn't have any of this but he did want to program his spectrum in machine code so he persisted and so my very first working assembly program, I remember vividly, I was at a swimming gala waiting for my turn to swim. And so I must have been 13-ish, 12, no, maybe younger than that. Cause I, yeah, there's a part of the story we'll get to, but around 11, I would say 11 or 12. And I'd written out very carefully by hand,
Starting point is 00:07:44 all of the assembly instructions for like just something which scrolled a piece of text at the bottom of the screen. Yeah. And then I had to obviously hand assemble it. That is take the fact that this one is a, an LDA comma six, two with the equivalent bytes. And then when I got home, I would type in the sequence of bytes, run it with my fingers crossed. And of course no debuggers, no, no sort of feedback other than it either worked or it with my fingers crossed. And of course, no debuggers, no sort of feedback, other than it either worked or it went horribly wrong. And I was very, very fortunate that probably for the first and only time in my career,
Starting point is 00:08:12 a program I wrote on paper worked first time. And so it's almost like just like a simple lookup table, right? You're like, I want to add, but then for that's it's this hexadecimal code or something exactly right the the assembler's job is pretty straightforward it gets a little bit more complicated because there are often opcodes or the like the primitive instructions that you want the computer to run for you have different ways of phrasing them like the the word is the same that the human writes, but depending on the context it's used in,
Starting point is 00:08:50 you want to use a different actual number. It's like 80s Sudoku. Instead of sitting by the pool figuring out numbers, you're looking at these codes. You're looking at the charts, right, right, right. I mean, probably the most important thing that an assembler is able to do is to allow the programmer to define labels and essentially go-to statements. So I know everyone hates go-to and no one should be using go-to
Starting point is 00:09:14 in any modern programming language, but when you get right down to the metal, that's pretty much all you actually have. So all the CPU has is arithmetic instructions, comparison instructions, multiplies, divides, loads and stores and things, compares and then go-tos or jumps or branches. They're all basically the same thing.
Starting point is 00:09:31 Now, the branch location has to go to essentially a label. So you want to say like, hey, come back to this location. Like your average loop would be, you know, here's the top of the loop, do some stuff, decrement the counter. If the counter is not yet zero, then go back to the top of the loop, do some stuff, decrement the counter. If the counter is not yet zero, then go back to the top of the loop. Those addresses obviously change because depending on the amount of work that you've done between the start of the loop and the end of the loop, the number of bytes
Starting point is 00:09:53 of actual instructions may change. The address of where the label is may change. And so there's this huge rippling effect where if the assembler wasn't tracking this for you, you would be on your poor hand written stuff, scratching out everything that was an address and adding one to it because you just had to insert an instruction right at the very whole top of the program. And of course, that's a huge pain. Yeah, no, definitely.
Starting point is 00:10:17 So, you know, when you were writing out all these programs, hand assembling it and stuff, like, I don't know, like what, what did your, did your family think or your friends, did they, did they think it was cool or that you'd gone bonkers or? I think probably the latter. So in fairness, the hand assembling stage didn't last very long, but yeah, my family, I think always thought I was a little, uh, unusual in this, but I was very fortunate that, that on the first day of my secondary
Starting point is 00:10:46 school, so that would have been, you know, high school, just before high school age, I bumped into a good friend, someone who became a very good friend, who was also very much into computing in the same level that I'm in. I'm still in touch with him, still great mates. And, you know, that, that allowed us to form a nucleation point for a bunch of similarly minded geeky kids when nerdiness and geekiness was not cool. I'm not sure it is cool now, but maybe it's different than it was, especially in late 80s, early 90s Britain.
Starting point is 00:11:16 Matt's school had a computer lab full of these computers, the BBC Micro, and he ends up getting a BBC Micro at home as well. And so at lunchtimes, people were allowed to go in and obviously use the computers to play games, of course, which was what everyone was going to do once you have a room full. So from that point of view, the computers were cool. And if you knew how they worked and could get like the latest games or whatever, then you were only cool to the people who wouldn't otherwise have thought you were cool in that room, in that context. So, you know, you've got a little bit of cachet there, but it didn't necessarily flow outside into, you know, the PE lesson. Not quite the same level of cool
Starting point is 00:11:58 there for your average scrawny British nerdy kid. So Matt and his friends start making little demo applications for the BBC Micro. We were writing little demos, like little examples of how cool you could make the computer run. And at the time, the same magazines that I used to buy in the newsagents were always looking for new submissions. And so he and I would send our submissions off to Acorn User, BBC Acorn User.
Starting point is 00:12:24 And if we were very lucky, they would print them and they would send us £10, £20, £50 for like a star rating. And in, oh gosh, where are we? In like 1989 or 1990 era, 50 quid for two 14-year-olds was a lot of money. Thank you very much. So the pair of us did very well out of those. You know, the kind of things that we were doing there would be like Mandelbrot generators,
Starting point is 00:12:49 Julia set generators, just funny little programs that made nice pictures happen on the screen. Around the time that I was talking about those mid-teen years when we were writing our stuff and sending it to Acorn User, the Atari ST was out, the Amiga was out, and they had vastly, vastly superior sound and graphics and everything, but, you know, we're doggedly hanging onto our old ways. But when I did make the leap, I made a leap to the 32-bit era. So an interesting point, actually. So the Acorn, that was the company behind the BBC Micro, they knew also that the writing was on the wall for their 8-bit era. And they thought to leapfrog 16-bit too. And so some of the engineers that were working at Acorn at the time
Starting point is 00:13:33 looked around for a CPU that could take them to the 32-bit era. They couldn't find something that they liked. They designed something and they rather hubristically called it the Acorn RISC machine. This was the name for the CPU. The first revision of that chip was never actually placed into a real computer, but the computer that I had had the second revision of that chip. And I've been talking about that chip all that time because the big reveal at the end of this
Starting point is 00:13:58 is that the Acorn RISC machine became the advanced RISC machine, which became the ARM chip. So the ARM chip was designed by the team that built the BBC Micro based off of their experiences of working with the 6502 in the late 80s, or mid-80s, I should say. So nowadays, they're ubiquitous. There's about probably half a dozen of those damn things in my cell phone here. They're everywhere, but they had their roots back in sort of like the era that I grew up in. So Matt heads off to university and he spends most of his time there, not going to
Starting point is 00:14:31 classes, but writing games for his new ARM machine. So mostly this was before like virtual memory, before like process separation, really. It was an interesting operating system. I mostly wrote little games. I tried to write some games. There's a game called Blurb that there's a video around somewhere that Richard, who I was still in contact with, and I wrote together. Probably the most important thing that I wrote on the Acorn Archimedes, as it was called, was an internet relay client, internet relay chat client. So if people remember back in the days of before instant messaging and things like that, there was IRC. IRC was essentially a federated network of messaging between hubs and you could
Starting point is 00:15:13 send messages to each other. And it was sort of interactive. There were channels a bit like sort of Slack groups these days. And at the time the, the, um, there wasn't a client for the, the, the Archimedes, so I wrote my own. But being the person that I was and the experience that I had had and not really having a C compiler, there wasn't C compilers. There were C compilers, but again, like the assemblers of yesteryear,
Starting point is 00:15:38 you had to go and buy them. There was no GCC that was available. So I took the very sensible approach of writing an internet relay chat client in straight assembly. You know, this is a fully windowed system, right? You know, you've got drag and drop windows. There's an operating system you're interacting with. There's clipboards. There's, you know, you're doing TCP IP conversations and stuff.
Starting point is 00:15:57 And it's, you know, of course, assembly is the right thing. I know how this goes. Get on my pad of paper and start writing. I guess you had an assembler. Thankfully, the Archimedes also had a built-in assembler, so that was not a big deal. But yeah, the whole thing was written in assembly. And in the middle of that,
Starting point is 00:16:15 the thing that was sort of de rigueur for the day was that your IRC client, so at the sort of computer lab, we were using whatever IRIX or, you know, SunOS systems that the computer department had available to us. So they were all Unix-y based, and their Unix things all had,
Starting point is 00:16:35 sorry, all their Unix IRC clients all had scripting languages built into them so that you could, you know, have like things that greeted people when you joined the channel. You could like protect the channel. You could set the topic and all, you topic and all the kind of rubbishy things that people love to do in those kinds of environments.
Starting point is 00:16:50 And so I decided I also wanted to have a scripting language in my IRC client. And so I added an object-oriented BASIC into my IRC client, which taught me two things one the people who'd written basic the first time around on the on the archimedes were amazing at what they were doing so sophie will wilson is the person i know was most involved in it it's just amazing the the speed of it was phenomenal my interpreter was staggeringly slow in comparison so i had to you know i learned how bad I was at doing it. I was like, well, even though I'm writing it in straight assembly, it's not a patch on what they had done.
Starting point is 00:17:32 And the second thing was that once you do this, it was a shareware program, so people were paying me 10 quid to register it, although it was freely available on download sites. And that kept me in beers in university. So I've been very fortunate that I've had a couple of things along the way freely available on like, um, download sites. And that kept me in beers in university. So I've been very fortunate that, that I've had a couple of things along the way that have kept me in, in a decent, um, uh, yeah, decent, um, uh, I can't think what I'm the right word is, but yeah, it kept me going. Um, I had financial, I had, I did, I did. Yes, I did, exactly.
Starting point is 00:18:07 But once it's released, somebody smart realized that the scripting language, which I had started to write more and more of the system, like sort of bootstrappy wise, I'm like, well, why would I write this thing in my horrific assembly code? And now I've got this language I can write. Oh, and so I would write it in that. And then things got out of hand.
Starting point is 00:18:21 And before I knew it, I got a patch sent to me or an email saying hey you know you you can take your scripting language and do this and someone had written a web browser primitive web browser a news reader and an email client in ir basic in ir client and i'm like oh my god once again so that there's i can't remember which rule it is now there's like all programs expand until they can either produce or consume email, I think is the rule. And it was definitely the case for me.
Starting point is 00:18:49 So that was my first sort of interaction with somebody who I didn't know well coming out of the blue and saying, your code's cool, but it can do this as well and sort of show me the way. It was a really interesting moment. So yeah, that was what I did with that the source code to that is actually on github now i found it on an old hard drive i was like oh gosh now everyone
Starting point is 00:19:09 can see how dreadful this stuff was because you know it's it's three in the morning you're writing you've got to think of another label name for your assembly loop that's the same as all the other loops that you've written except slightly different and so it's called you know womble loop jedi three yeah that makes sense and it makes no sense no sense unless you're high on caffeine matt spends his university writing games and on irc and eventually he gets to the last year of school in about the last year of university i'd gotten chatting over irc pleasingly enough with with a somebody who worked for a games company. And when I was starting to look for a job, he suggested, um, applying to them and I did and
Starting point is 00:19:51 went along and, uh, they, they would, I, the interview went very, very well. They said, when can you start? I'm like, well, I, you realize I'm still at university and they're like, oh, uh, well, do you have to finish it and i thought my parents would crucify me if i didn't actually complete my degree so they they agreed to let me sort of come back as an intern during the summer holidays and things um and but yeah so i ended up working for a games uh company called argonaut games and they're this amazing people i met and things i learned there were just just quite something something, but, and I had a great time, but it was also a lot of long days into nights, weekends, all the sort of bad things
Starting point is 00:20:34 you've heard about the games industry crunch. Uh, it was too. And, you know, looking back now with my, my more open eyes, I could also see that it was not a very good environment. It was a lot more toxic than it had any right to be. And so there were a lot of things that weren't good about it, but it was of its time is probably the most charitable thing I can say about it. So by this point, I'd learned C begrudgingly and then sort of graduated onto C++. Let's not skip over that. So there must've been some point where you were like, hey, I can use a high-level language like C instead of Assembler. That's true.
Starting point is 00:21:13 So I was definitely put off for the longest time because of the lack of a C compiler. I came across a hooky copy of the C compiler for the Acorn Archimedes. And at that time, and I'm not proud to say it, but I was very snooty about the code generator. You know, I'd spent my life writing assembly like it was the fluid language. And I mean, I look back and I can look back, thankfully,
Starting point is 00:21:38 because of the hard disk image I found. The code's terrible. I can't believe that I thought I was good at it. And the compiler code probably was about the same level of quality. But of course, I didn't understand C very well. So I sneered down at it as like, you know, this is a macro assembler gone bad. And now I look back and go,
Starting point is 00:21:55 that's actually probably a great way in, you know, for an assembly program who was using macros to just say, well, I'll call this a function instead. And now I've got a thing or I could use, you know, hash define something. And now I actually have got a macro and a real macro that works better. I feel like this is a big moment for Matt. He finally admitted that a high level language might be useful, that it might be able to write assembly better than he does. He's not no longer interested in assembly. Like he's still going to look at the generated assembly, but you know, he's willing to let the compiler write it i think this foreshadows where he ends up so matt leaves the
Starting point is 00:22:30 game industry he moves over to the united states to chicago and he gets a job at a trading firm doing market making the best analogy i have for a market maker which is not a very flattering one is is like a used car salesman like i will buy your car for this price and then the guy who walks in immediately after you wants to buy the car from me and i'm going to sell it to him for a lot more money than you sold it to me because my job is to warehouse the cars yeah and provide like ballast for both sellers and buyers and that's what a market maker is doing there is someone who's in the market and that's how you can buy google shares for at any price because somebody somewhere has got a big stock of them and is prepared to sell them, but also to buy more of them at a seemingly fairish price. And so that's what I
Starting point is 00:23:13 spent the first few years doing was doing market making for particular types of US stocks. And of course, everybody's playing a game where they're looking at the world and going, well, actually, maybe I'll prepare to buy a penny more or a penny less. And so the thing is changing at a staggering rate. You know, like talking, saturating a 10 gigabit Ethernet line with the changes, just the change information about one exchange. And there were like 14 in mainland US. So it's a lot of data you're processing at essentially line rate or getting on towards line rate.
Starting point is 00:23:45 The barrier to entry is you have to be able to consume this amount of data, make an intelligent decision about what you're going to do in amongst all of that. So obviously what you need to be able to do is react very quickly. And whenever you want to react to stuff quickly and you want to deal with floods of packets, you turn to a compiled language. And in our case, we turned to C and C++.
Starting point is 00:24:02 And so, yes, I inherited a code base that was predominantly C++ 98, so that's the original-ish C++ era. And a couple of years in, we were looking at whether or not it would be okay to start adopting some of the new features that were coming in C++ 11. All right, so here you have Matt, who was begrudgingly dragged into C and C++ from assembly. And he works at a place where they struggle just to keep up with the market flow. And there are new convenience features coming to his language. Matt is now primed to become an expert on how the sausage is made. What happens inside of a giant optimizing compiler like GCC? And how does that affect his ability to write programs that can keep up with
Starting point is 00:24:45 the market flow? And so C++11 gave us two things and many other things, huge amounts of other things. I'm glossing over those, but it gave us the auto keyword, which says, yeah, just the type of the variable is whatever I'm equaling it to. So if you did like auto I equals zero, you're getting an in. Type inference. Exactly. Yeah. I mean, I know other languages have much more sophisticated systems for doing this. Rust in particular is even more sophisticated. But like C++ is pretty much like whatever's on the other side of the equals determines what you are with some, of course, wonderful C++ strange caveats and asterisks and footnotes about the weird edge cases.
Starting point is 00:25:22 But, you know, it wouldn't be fun if it was easy. So we got auto and we also got the range for. And so now I could just do for auto i colon vec. And now I'm getting all the integers i in the vector. And that's great. But obviously, well, not obviously at all. We had been burned previously in other languages. So there was a mixture of languages that were used in the desk.
Starting point is 00:25:43 Some of it was written in Java. And Java is an excellent language. But we had been bitten in Java with iterators because if you iterate over and arrange things in Java, a new object is created. Every time you do that, you get a new iterator that goes over the object.
Starting point is 00:25:59 And the Achilles heel of the systems that we had before was that we just basically couldn't afford to let them garbage collect. We had so much junk in there that it couldn't be done quick enough for our systems to remain on. And in fact, the GC thing, the first thing it would do is try and turn off the system so that we could spend hundreds of milliseconds churning through
Starting point is 00:26:19 and then deal with the fallout of everything going wrong because we'd missed maybe some packets off of the wire, all those things. So we were avoiding it by trying to not create garbage. And so this iteration idiom that we would have liked to have used in Java was off limits. We had to just, you know, for in i equals naught, i is less than vector.size equivalent in Java.
Starting point is 00:26:37 And so when we said, let's use the new cool features in C++, quite rightly, the lead program was like, are you sure that's the same? And so Apal and I sat down and we, this has gone into law slightly now, so obviously the tale has grown somewhat in the telling. But my memory is that we wrote a very simple function in C++ and compiled it and just dumped the output with obj dump and the disassembler and the demangler and whatever. We did that a couple of times as we fiddled with
Starting point is 00:27:08 stuff. And then I had the idea to use the Unix watch command. And what watch does is it runs the rest of the command over and over again, highlighting the differences. And so I was able to take the compiler and, sorry, I was going to say like watch, run the compiler on temp-test.c, pipe it through all these things to demangle it, get rid of some of the stuff that the compiler generates that I don't care about, and then show me that, please. And then I split the terminal in half, and I had VI on one side, and I had the results of this on the other.
Starting point is 00:27:37 And so I was able to make changes, and every two seconds, it would immediately show me the assembly output. Can you picture it? Basically, we have a split screen. On one side, you have your text editor with a single file of code. And then on the other side, you have your generated assembly,
Starting point is 00:27:50 which the compiler emits. It's basically nonsense to me, but not to Matt. But even to me, like if I change the idiom, it shouldn't produce a whole much more assembly. It shouldn't have new extra allocations. If something like that
Starting point is 00:28:05 happens, then I know something's up and that I should look into it. And of course, naturally, that was a very valuable and useful thing to be able to do, to just experiment interactively. I think even I at the time, C++ compilation is such a heavyweight activity that until my friend Jordan had sort of showed me how quickly he could just knock up a thing in a temp directory that was all of like three lines and run the compiler. And I was like, oh yeah, actually it's not too bad, is it? It doesn't take too long. Of course, it's quite fast to compile four lines of code.
Starting point is 00:28:35 But until that had happened, I just would never have done it. I'd never have experimented in this way. Once you have this tool, one performance question you might want to ask it is when are bit shifts a worthwhile optimization? Bit shifting is, instead of multiplying a number, you shift it.
Starting point is 00:28:52 So you shift an integer left is equivalent to multiplying it by two, but it can be faster in certain circumstances. If you look at like the Doom source code or the Wolfenstein 3D source code, you'll see that it's covered with, you know, things like things like oh a equals a shifted up by eight plus a shifted up by two you're like what and a shifts up by that's 256 a shifts up by two that's four oh oh you're multiplying it by 260 i see what you're doing there but you're using shifts because shifts are faster all right and then you you can sort of like say well okay that's that's cool but obviously the compiler knows this trick too. And the thing about the compiler, it's much more consistent about applying that trick than you are. And so anytime you're multiplying by 260, it says, I got you.
Starting point is 00:29:32 I know what you're doing here. And so the risk of revealing one of the big sort of spoilers in one of my talks about where I talk about this particular thing, what you can do is you can take a particular number. There's a number, I forget which one it is. It's got like a certain number of set bits and you like throw it in compiler explorer and instead of using these shifts and adds that you see if you like do multiply by 10, it just goes back to using a multiplier. You're like, oh, compiler, you gave up.
Starting point is 00:29:55 You gave up, didn't you? You decided that my number wasn't worth it. You're just going to use a multiply instruction. And so if you sit and pick it apart and go, well, this is, you know, A shifted up by 10 plus A shifted up by four plus A shifted up by 2 minus A because it's one less than all of that. Okay, right.
Starting point is 00:30:09 And you put that into the compiler and you see, oh, and it still does a multiply. So I wrote it out as shifts and adds because that would be faster and you turn it back into the multiply again. And it's like, no, you're stupid because those shifts and adds are no longer cheaper than the multiply. The multiply is five cycles and you've just generated seven cycles worth of shifts and ads. And so even though you phrased a multiply as a bunch of shifts and ads, it was able to,
Starting point is 00:30:34 again, unpick it and say, what are you actually doing? You're multiplying by one, six, nine, seven, nine, seven, two. I'm just going to multiply by one, six, nine, seven, two. And the cool thing about that is that you can then go and say, well, I read this out of some Doom source code, which of course is 386 or 286 era. If I tell it to target a 32-bit system and say the architecture is that CPU, it does indeed do the shifts and adds. It goes, no, I got you.
Starting point is 00:31:01 I know that the multiplier was like far too slow. I will use the shifts and adds. And now you look at the shifts and adds, it's actually been cleverer than my example. So it was able to unpick my shifts and adds, determine it was a multiply by some high constant, and then it had a much better way of doing that that was still shifts and adds than my original way.
Starting point is 00:31:17 And it's just like, this is why you trust the compiler. Another optimization that Matt can see in action using his tool is vectorization. So vectorization specifically is an interesting technique where the compiler is able to see that it might be worthwhile doing multiple iterations of a loop at the same time. CPUs have a number of instructions that treat a register, which is maybe 64, 128, 256 bits wide, instead of just treating it as a giant, giant number, it treats it as a structure containing a number of 32-bit values or 16-bit values or 8-bit values or a number of double precision numbers or single precision numbers.
Starting point is 00:32:01 That's crazy. I write a for loop and I say add up all these things, and it's like, yeah, I'll just add loop and I say, add up to all these things. And it's like, yeah, I'll just add them up four at a time because I can do that. It's so cool. And yeah, and exactly. And it does this for every loop, you know, every loop that it thinks is profitable. Whereas as an engineer, if you were like actually having to write this out longhand, like old Matt would have done, you'd have to be really quite devoted to, to this, to always use the magic instructions that do this this way and deal with all the edge cases. But the compiler will happily you'd have to be really quite devoted to this to always use the magic instructions
Starting point is 00:32:25 that do this this way and deal with all the edge cases. But the compiler will happily spit that out every time it sees a loop that it thinks is worthwhile to, which is just another reason, right? You know, rather than being smart individually every time,
Starting point is 00:32:35 be smart once and teach the compiler to do it. And then everyone benefits from it all the time. In other words, if there's some trick that you think can make your code faster, probably the compiler authors have already put that trick into the optimization part of the compiler. I have all the time, by the way.
Starting point is 00:32:51 I know you sort of scheduled it up to now, but I can keep talking until you're bored of listening to me, which is already you've shown a remarkable fortitude. No, no, it's super interesting. So my favorite example is counting the number of set bits so if you have a number a 32-bit integer unsigned 32-bit integer and you want to know how many one bits there are in that 32-bit value yeah it sounds like a pointless thing it sounds like an interview question which i think it probably is but there are some genuine reasons for doing it they're
Starting point is 00:33:21 used in like um uh packed uh matrices packed, uh, matrices that are, you know, very, very sparse matrices that have lots of zeros in them. Instead of storing all those zeros, you store a mask that says, well, which of the following values are populated? And then it just has the populated values immediately after it. So you need to sometimes say, well, how many are there to get to the next row? Yeah. I don't need to justify it really. It's just a thing that you might want to do reasonably. So there's a bunch of ways you might solve this. The easiest way is you have a 32-bit integer.
Starting point is 00:33:50 You're just going to check if the first bit is one and then drop it off and keep checking 32 times. Then there's some optimizations you can make on that. If the whole thing's zero, you know, you're done. You can exit early. You could do some sort of bit twiddling hacks, et cetera. But anyway, you write any of those ways on a modern computer, a modern compiler,
Starting point is 00:34:10 and you turn the optimizer on to full and you tell it that the architecture is like a modern PC as opposed to the default, which is like the oldest thing it possibly supports. And it doesn't matter, pretty much any way you wrote that, it will take the whole thing and replace it with the pop count instruction, which is the how many bits are set in this register instruction.
Starting point is 00:34:30 And that's just mind-boggling to think of what's gone on there. You've taken essentially an order n or an order set bits of n, any number of ways you could write it, and the compiler authors are like, we got you. We know what you're doing, even if you phrased it in all these different ways. There's a normalization pass inside the compiler authors are like, we got you. We know what you're doing. Even if you phrased it in all these different ways, there's a normalization pass inside the compiler. There's then tricks for noticing what you're doing, idiom detection. And then it's like, no, this is counting the number of set bits. We're going to replace it with that one instruction. And that's amazing.
Starting point is 00:34:58 I totally agree. It is amazing. It means that this interview question is easier to answer in assembly than in a high level language, because in assembly, it's just a single instruction. What does all this knowledge about compiler optimizations tell us about writing high-level code, though? If the one thing that I want people to take away from all of this conversation is that compilers have moved on to the point now where, even though it's so useful for you to understand what's going on right at the bottom and understand what the CPU is doing, what the RAM is doing, what the caches are doing, what the branch predictor is doing. That's great. It's wonderful. It's exciting. It's interesting.
Starting point is 00:35:33 But don't necessarily think that you can't trust the compiler to know those things, or some of them at least, and take them into account. Trust the compiler. Always trust the compiler always trust the compiler write you write code that's easy to read because the computer the human is more important than the compiler now the compiler has your back whatever you wrote so don't trick yourself and write some natty little thing because you think it'll be a cycle faster or not it won't be the compiler honestly will beat you on almost everything that you can care to do write the code so that your colleague or you tomorrow morning when you haven't got coffee can understand and trust the compiler to just generate the right code. So in a sense, you know, you're admitting defeat from your old days.
Starting point is 00:36:17 In a sense, yes. But in another sense, it's as if the scales have fallen from my eyes now. And there are just so many more smart people that work on compilers that have the time and the energy and the will and the clever ideas to add in these heuristics, these tricks, these knowledge of the ISA that's deeper than anything I would have these days. Am I saying that you can't beat the compiler anymore? No, of course not. There's always a case where you know more information than the compiler. But the compiler is very, very good under almost all circumstances that you care to do. So Matt's a changed man, right? I mean, he had already moved on to using a high level language. But now with his tool, you know, he can really see the assembly that's being generated and all the optimizations that are happening. And, you know, this is clearly something valuable. So he
Starting point is 00:37:10 puts this up on a website, on his own personal website, godbolt.org, basically unchanged, you put your C++ code on one side, and you see the result on the other. And this starts to spread inside of the C++ community. Until my favorite story for the whole of this is that we were lucky enough to have Andrej Aleksandrescu, who was like the father of some of the template metaprogramming tricks that you'll see. There's a number of books he's written. He came and gave some talks at DRW about performance.
Starting point is 00:37:41 He was then working for Facebook and they were dealing with large data at scale and there was all these things. And it was fun for me because I spent the whole time heckling him on like the assembly code he was showing just generally. And so I kept pulling him up and that, and he, we had a good,
Starting point is 00:37:53 it was good banter back and forth. And eventually at the end, he said something along the lines of, ah, I think somebody told me there's a website that you can just put your code into. Uh, and it shows you what the assembly looks like.
Starting point is 00:38:04 And at that point I blushed bright red as everyone else in the room looked at me and like pointed and said yeah it's his i'm like oh now i feel like an absolute dreadful person for i feel not mentioning earlier yeah he came to give a talk and you were like poking at him and then he's like i think i have a solution like yeah that's mine um yeah so he and i've become friends now, so I think he's forgiven me for being quite so dreadful. So I think we're all right on that front. But yeah, so it grew and grew, and it probably took a couple of years before it became more well-known.
Starting point is 00:38:38 So godbolt.org now supports many, many languages, 21 as I look right now, maybe more by the time this comes out. I first came across it on Hacker News when somebody was posting a link about the ZIG language, like Z-I-G, and how well it optimizes down to x86 assembly. If you can read assembly and you have small examples, godbolt.org has become sort of a Rosetta Stone for code performance. One thing
Starting point is 00:39:06 Matt doesn't like about the tool though, is its name. Yeah. I mean, I think that like people say to like, to godbolt something, right? Yes. Yes, they do. It's something that, yeah, it's, I've been in C++ committee meetings where somebody has said, why don't we just put this in Godbot and see what happens? And I'm like, I'm in the room. You can't say that with me here. It sounds weird. I don't know.
Starting point is 00:39:35 It's fitting though, right? Because like, here's your story of always like, it seems like the punchline to each of your anecdotes is like, to the assembly, let's see what it's doing. Yes. the punchline to each of your anecdotes is like to the assembly let's see what it's doing yes i think you know there is a reason why it was me or rather like i was in the right place at the right time with all the right like background for this to happen right so it you're right it is the punchline so i guess in that way yeah but i mean it's it's luck it's luck i mean that's that's the thing. People, I get invited to things like this now, this
Starting point is 00:40:07 enjoyable time talking to you, and I get, I've been able to talk at various conferences that I would never have dreamt to as just some rando who writes C++ for a living. And it all came out of a lucky break where, you know, an afternoon's work with a friend and a bit of JavaScript
Starting point is 00:40:22 and a memorable last name. So that was the show. Matt, he recommends that nobody write assembly now, but that everybody should be able to read it. And because of his interest in assembly and reading the assembly generated by compilers, he's become famous in this world of very performance-critical code. Check out godbolt.org where you can play around with the tool that he built that he likes to call Compiler Explorer,
Starting point is 00:40:49 but everybody else calls Godbolt. As I mentioned at the beginning, I got a new job at Earthly. If you go to earthly.dev or check it out on GitHub, you can see what we're building. We just want to make builds better and builds faster and builds more reproducible. It's just kind of an ugly, complicated area.
Starting point is 00:41:08 So we have an open source tool and we're trying to make the build experience better. Until next time, thank you so much for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.