CoRecursive: Coding Stories - Story: To The Assembly
Episode Date: October 1, 2020How do CPUs work? How do compilers work? How does high-level code get translated into machine code? Today's guest is Matt Godbolt and he knows the answers to these questions. How he became an expert i...n bare metal programming is an interesting story. Matt shares his origin story and the creation of compiler explorer in today's interview. Episode Page Episode Transcript Links: Compiler Explorer Matt's Github Matt's Blog Matt's YouTube
Transcript
Discussion (0)
I think that's probably the most thing that I want to sort of get out of this conversation with you
is that I want to sort of engender, hopefully, the excitement in other people that I feel
when I start taking the lid off and pulling out bits further and further down the stack
until you get to the CPU and then go like, oh my gosh, I've got all the way to the bottom.
And then someone taps you on the shoulder and says, no, you can go deeper than this.
You're like, deeper? This is amazing. And keep on going.
That's Matt Godbolt. He's an expert in low latency computing.
He's going to teach us some lessons
about why it's important to understand
what happens between,
you know, your high level language
and when things get executed down
at the level of the CPU.
For instance, what is a for loop
in your high level language
actually turned into
when it's executed by the CPU?
And does that have performance implications?
Matt's an important speaker
because he has all this knowledge
and how he came by it is a pretty interesting story.
And it all came out of a lucky break
where, you know, an afternoon's work with a friend
and a bit of JavaScript
and a memorable last name.
And then here I am.
So I carry a certain amount of guilt for that,
but that's also just being British.
Hello and welcome to Co-Recursive,
where I bring you interesting stories
about software development.
Before we get into the episode with Matt,
I have a brief plug.
I started a new job.
I now work at Earthly.
We're building an open source build tool.
You can find it on GitHub.
Search for Earthly. Builds are always a big mess and we're trying to make that situation a bit
nicer but back to matt matt's fame is not all luck as he describes it we're going to walk through his
story a little bit and i think you'll see that he's been on a trajectory to be the expert in
kind of what happens when you get down to the metal um for
some time i was about seven and i went around to a friend's house and they had like a very very
very primitive flight simulator that he was flying around and i was apparently uninterested in that
but when he then showed that you could type in stuff and make stuff scroll up the screen that
apparently that was what enthralled me and so i was very very lucky to get one of them for myself for my my eighth birthday so that was yeah 1984
so i've been uh hacking on computers for a long time now 36 years that computer was the zx spectrum
an 8-bit computer with 48k memory and a 16k rom also it had a tape drive because uh
we were in the era before discs were very common what did this computer look like did it look like
picture computers as we picture them today or so it's if you if you have any kind of um
reference point maybe in this in the u.s in particular things like the apple 2 2e that kind of era that kind of thing so the sync the spectrum was um a relatively small so
the keyboard was probably smaller than an average laptop sort of keyboard area and the keyboard was
the computer it was the keys themselves were like rubber they were like a rubbery mat with indentations in them,
which underneath there was a very simple, literally just a thin,
oh gosh, what they call membrane, which made contact or not.
So it was a very horrible feeling thing.
It was very woolly, very sort of there.
And the whole thing was the computer.
You plugged it into a television and you tuned the television
into the right channel.
And it was blurry, of course, because it wasn't very good.
And yeah, and it was tiny, really.
You could hold it in your hand and sort of wave it around type of thing.
And it was notoriously prone to overheating in some cases.
And there was an expansion port in the back.
It was a fun little computer to play around with.
You would buy games from the local newsagent for £2.99, which is about five bucks-ish, I guess.
And you'd put them in and it was just like listening
to an old modem sound, you know, the screeching, whatever.
You'd wait four or five minutes to get the 40-odd K
of data in and then you'd play your game.
And you'd have to hope as well that there wasn't
a corruption on the tape.
It would then crash and you'd have to start again.
So it was a lot of fun about that. And you were very much exposed to how the computer
worked when you just turned it on because you'd get a blank screen with a like, okay,
what do you want to do? And obviously as a game player, you would just say,
unload the thing I'm about to put it on the cassette tape. But if you don't have many games,
another alternative would be to buy magazines from the same newsagent you were getting the games,
and they would have type-in listings at the back.
And that's kind of how I got started in programming,
would be the enthralling picture on the cover of the magazine, of course,
would never actually meet the quality of the type-in listing that you spent four or five hours typing in.
This is kind of crazy, right?
Instead of buying the game on a cassette tape you could buy
a magazine and then in the back 20 pages was just all the source code printed out in basic and you
would type it in and hopefully you didn't make any mistakes and then of course you once you've
typed this thing in you have to save it to the same unreliable cassette tape that you were loading
the games off of so you had to hope that you a right. B, you were able to save it so you could recover it. But of course you would make typos. You'd make typos all over the place.
And that would introduce one sort of to the process of what a program was. So even if you
weren't a programmer and even didn't understand what you were typing in, you would get the gist
of what it was. And certainly if you were then curious about it, it was a definite leg into
discovering how one might make one's own.
So that was kind of how I started, was starting to write little basic programs,
literally as in basically the programming language. And then laterally, you would start
typing in these giant, instead of basics that you would obviously want, that you could understand,
there would start to be the more and more of these like just data statements after data statement statements and
of course if you then look into it you realize that what you're typing in is the machine code
of an assembly based game so unfortunately this computer the the spectrum did not have a built-in
assembler you actually had to go and buy an assembler but you could of course do it by hand
or if the person who had authored it and sent off their program to the news agent had like essentially compiled compiled uh assembled their their their
code and then dumped it out as hex and then just said we'll type that all in and then you get a
game obviously that's much more intractable to a user all right i'm gonna get some of this wrong
but bear with me so basic is basic is a basic programming language,
like 10, print hello, 20, go to 10.
It just keeps printing hello, right?
But to do more advanced things in these games
that were in the back of the book,
they would drop down to machine code.
Machine code is a level below assembly code.
It's really just raw binary or hexadecimal.
It's the actual instructions
the CPU can execute. So Matt couldn't really understand this machine code, but he wanted to
understand it. And he wanted to understand what assembly was. So assembly is one level above
machine code, where machine code might just have a binary representation for a jump instruction assembly has mnemonics so it would have like jmp
for jump assembly also has labels just like in basic so that you can jump to certain locations
like go to 10 in my example matt didn't have any of this but he did want to program his spectrum
in machine code so he persisted and so my very first working assembly program,
I remember vividly, I was at a swimming gala waiting for my turn to swim. And so I must have
been 13-ish, 12, no, maybe younger than that. Cause I, yeah, there's a part of the story we'll
get to, but around 11, I would say 11 or 12. And I'd written out very carefully by hand,
all of the assembly
instructions for like just something which scrolled a piece of text at the bottom of the screen.
Yeah. And then I had to obviously hand assemble it. That is take the fact that this one is a,
an LDA comma six, two with the equivalent bytes. And then when I got home, I would type in the
sequence of bytes, run it with my fingers crossed. And of course no debuggers, no,
no sort of feedback other than it either worked or it with my fingers crossed. And of course, no debuggers, no sort of feedback,
other than it either worked or it went horribly wrong.
And I was very, very fortunate that probably for the first and only time in my career,
a program I wrote on paper worked first time.
And so it's almost like just like a simple lookup table, right?
You're like, I want to add, but then for that's it's this hexadecimal code or something exactly
right the the assembler's job is pretty straightforward it gets a little bit more
complicated because there are often opcodes or the like the primitive instructions that you want
the computer to run for you have different ways of phrasing them like the the word is the same
that the human writes,
but depending on the context it's used in,
you want to use a different actual number.
It's like 80s Sudoku.
Instead of sitting by the pool figuring out numbers,
you're looking at these codes. You're looking at the charts, right, right, right.
I mean, probably the most important thing that an assembler is able to do is to allow the programmer to define labels
and essentially go-to statements.
So I know everyone hates go-to
and no one should be using go-to
in any modern programming language,
but when you get right down to the metal,
that's pretty much all you actually have.
So all the CPU has is arithmetic instructions,
comparison instructions,
multiplies, divides, loads and stores and things,
compares and then go-tos or jumps or branches.
They're all basically the same thing.
Now, the branch location has to go to essentially a label.
So you want to say like, hey, come back to this location.
Like your average loop would be, you know,
here's the top of the loop, do some stuff,
decrement the counter.
If the counter is not yet zero,
then go back to the top of the loop, do some stuff, decrement the counter. If the counter is not yet zero, then go back to the top of the loop. Those addresses obviously change because depending on the amount
of work that you've done between the start of the loop and the end of the loop, the number of bytes
of actual instructions may change. The address of where the label is may change. And so there's this
huge rippling effect where if the assembler wasn't tracking this for you, you would be on your poor hand written stuff,
scratching out everything that was an address
and adding one to it
because you just had to insert an instruction
right at the very whole top of the program.
And of course, that's a huge pain.
Yeah, no, definitely.
So, you know, when you were writing out all these programs,
hand assembling it and stuff,
like, I don't know,
like what,
what did your, did your family think or your friends, did they, did they think it was cool
or that you'd gone bonkers or? I think probably the latter. So in fairness, the hand assembling
stage didn't last very long, but yeah, my family, I think always thought I was a little, uh, unusual
in this, but I was very fortunate that, that on the first day of my secondary
school, so that would have been, you know, high school, just before high school age,
I bumped into a good friend, someone who became a very good friend, who was also very much into
computing in the same level that I'm in. I'm still in touch with him, still great mates. And,
you know, that, that allowed us to form a nucleation point for a bunch of similarly
minded geeky kids
when nerdiness and geekiness was not cool.
I'm not sure it is cool now, but maybe it's different than it was,
especially in late 80s, early 90s Britain.
Matt's school had a computer lab full of these computers, the BBC Micro,
and he ends up getting a BBC Micro at home as well.
And so at lunchtimes, people were allowed to go in and
obviously use the computers to play games, of course, which was what everyone was going to do
once you have a room full. So from that point of view, the computers were cool. And if you knew
how they worked and could get like the latest games or whatever, then you were only cool to
the people who wouldn't otherwise have thought you were cool in that room, in that context. So, you know, you've got a little bit of cachet there, but it
didn't necessarily flow outside into, you know, the PE lesson. Not quite the same level of cool
there for your average scrawny British nerdy kid. So Matt and his friends start making little demo applications
for the BBC Micro.
We were writing little demos,
like little examples of how cool you could make the computer run.
And at the time, the same magazines that I used to buy
in the newsagents were always looking for new submissions.
And so he and I would send our submissions off to Acorn User,
BBC Acorn User.
And if we were very lucky, they would print them
and they would send us £10, £20, £50 for like a star rating.
And in, oh gosh, where are we?
In like 1989 or 1990 era, 50 quid for two 14-year-olds
was a lot of money.
Thank you very much.
So the pair of us did very well out of those.
You know, the kind of things that we were doing there would be like Mandelbrot generators,
Julia set generators, just funny little programs that made nice pictures happen on the screen.
Around the time that I was talking about those mid-teen years when we were writing our stuff
and sending it to Acorn User, the Atari ST was out, the Amiga was out,
and they had vastly, vastly superior sound and graphics and everything, but, you know,
we're doggedly hanging onto our old ways. But when I did make the leap, I made a leap to the 32-bit era. So an interesting point, actually. So the Acorn, that was the company behind the BBC Micro,
they knew also that the writing was on the wall for their 8-bit era.
And they thought to leapfrog 16-bit too.
And so some of the engineers that were working at Acorn at the time
looked around for a CPU that could take them to the 32-bit era.
They couldn't find something that they liked.
They designed something and they rather hubristically called it
the Acorn RISC machine.
This was the name for the CPU.
The first revision of that chip was never actually placed into a real computer,
but the computer that I had had the second revision of that chip.
And I've been talking about that chip all that time because the big reveal at the end of this
is that the Acorn RISC machine became the advanced RISC machine, which became the ARM chip.
So the ARM chip was designed by the team that built the BBC Micro
based off of their experiences of working with the 6502
in the late 80s, or mid-80s, I should say.
So nowadays, they're ubiquitous.
There's about probably half a dozen of those damn things in my cell phone here.
They're everywhere, but they had their roots back in sort of like the era that
I grew up in. So Matt heads off to university and he spends most of his time there, not going to
classes, but writing games for his new ARM machine. So mostly this was before like virtual memory,
before like process separation, really. It was an interesting operating system. I mostly wrote little games.
I tried to write some games. There's a game called Blurb that there's a video around somewhere that
Richard, who I was still in contact with, and I wrote together. Probably the most important thing
that I wrote on the Acorn Archimedes, as it was called, was an internet relay client,
internet relay chat client. So if people remember back in the days of before instant messaging and things
like that, there was IRC.
IRC was essentially a federated network of messaging between hubs and you could
send messages to each other. And it was sort of interactive.
There were channels a bit like sort of Slack groups these days.
And at the time the, the, um, there wasn't a client for the,
the, the Archimedes,
so I wrote my own.
But being the person that I was and the experience that I had had
and not really having a C compiler, there wasn't C compilers.
There were C compilers, but again, like the assemblers of yesteryear,
you had to go and buy them.
There was no GCC that was available.
So I took the very sensible approach of writing an internet relay chat client in straight assembly.
You know, this is a fully windowed system, right?
You know, you've got drag and drop windows.
There's an operating system you're interacting with.
There's clipboards.
There's, you know, you're doing TCP IP conversations and stuff.
And it's, you know, of course, assembly is the right thing.
I know how this goes.
Get on my pad of paper and start writing.
I guess you had an assembler.
Thankfully, the Archimedes also had a built-in assembler,
so that was not a big deal.
But yeah, the whole thing was written in assembly.
And in the middle of that,
the thing that was sort of de rigueur for the day
was that your IRC client,
so at the sort of computer lab,
we were using whatever IRIX or, you know,
SunOS systems that the computer department
had available to us.
So they were all Unix-y based,
and their Unix things all had,
sorry, all their Unix IRC clients
all had scripting languages built into them
so that you could, you know,
have like things that greeted people
when you joined the channel.
You could like protect the channel.
You could set the topic and all, you topic and all the kind of rubbishy things
that people love to do in those kinds of environments.
And so I decided I also wanted to have a scripting language in my IRC client.
And so I added an object-oriented BASIC into my IRC client,
which taught me two things one the people who'd written basic the first time around on the on the archimedes were amazing at what they were doing
so sophie will wilson is the person i know was most involved in it it's just amazing the the
speed of it was phenomenal my interpreter was staggeringly slow in comparison so i had to you
know i learned how bad I was at doing it.
I was like, well, even though I'm writing it in straight assembly,
it's not a patch on what they had done.
And the second thing was that once you do this,
it was a shareware program, so people were paying me 10 quid to register it,
although it was freely available on download sites.
And that kept me in beers in university.
So I've been very fortunate that I've had a couple of things along the way freely available on like, um, download sites. And that kept me in beers in university. So
I've been very fortunate that, that I've had a couple of things along the way that have kept me
in, in a decent, um, uh, yeah, decent, um, uh, I can't think what I'm the right word is, but yeah,
it kept me going. Um, I had financial, I had, I did, I did. Yes, I did, exactly.
But once it's released,
somebody smart realized that the scripting language,
which I had started to write more and more of the system,
like sort of bootstrappy wise,
I'm like, well, why would I write this thing in my horrific assembly code?
And now I've got this language I can write.
Oh, and so I would write it in that.
And then things got out of hand.
And before I knew it, I got a patch sent to me
or an email
saying hey you know you you can take your scripting language and do this and someone had written a web
browser primitive web browser a news reader and an email client in ir basic in ir client and i'm like
oh my god once again so that there's i can't remember which rule it is now there's like all
programs expand until they can either produce or consume email,
I think is the rule.
And it was definitely the case for me.
So that was my first sort of interaction
with somebody who I didn't know well
coming out of the blue and saying,
your code's cool, but it can do this as well
and sort of show me the way.
It was a really interesting moment.
So yeah, that was what I did with that the source code to
that is actually on github now i found it on an old hard drive i was like oh gosh now everyone
can see how dreadful this stuff was because you know it's it's three in the morning you're writing
you've got to think of another label name for your assembly loop that's the same as all the other
loops that you've written except slightly different and so it's called you know womble
loop jedi three yeah that makes sense and it makes no sense no sense unless you're high on caffeine
matt spends his university writing games and on irc and eventually he gets to the last year of
school in about the last year of university i'd gotten chatting over irc pleasingly enough with
with a somebody who worked for a games company.
And when I was starting to look for a job, he suggested, um, applying to them and I did and
went along and, uh, they, they would, I, the interview went very, very well. They said,
when can you start? I'm like, well, I, you realize I'm still at university and they're like, oh, uh,
well, do you have to finish it and i thought my parents
would crucify me if i didn't actually complete my degree so they they agreed to let me sort of
come back as an intern during the summer holidays and things um and but yeah so i ended up working
for a games uh company called argonaut games and they're this amazing people i met and things i
learned there were just just quite something something, but, and I had
a great time, but it was also a lot of long days into nights, weekends, all the sort of bad things
you've heard about the games industry crunch. Uh, it was too. And, you know, looking back now with
my, my more open eyes, I could also see that it was not a very good environment. It was a lot
more toxic than it had any right to be. And so there were a lot of things that weren't good about
it, but it was of its time is probably the most charitable thing I can say about it. So by this
point, I'd learned C begrudgingly and then sort of graduated onto C++. Let's not skip over that.
So there must've been some point where you were like,
hey, I can use a high-level language like C instead of Assembler.
That's true.
So I was definitely put off for the longest time
because of the lack of a C compiler.
I came across a hooky copy of the C compiler for the Acorn Archimedes.
And at that time, and I'm not proud to say it,
but I was very snooty about the code generator.
You know, I'd spent my life writing assembly
like it was the fluid language.
And I mean, I look back and I can look back, thankfully,
because of the hard disk image I found.
The code's terrible.
I can't believe that I thought I was good at it.
And the compiler code probably was about the same level of quality.
But of course, I didn't understand C very well.
So I sneered down at it as like, you know,
this is a macro assembler gone bad.
And now I look back and go,
that's actually probably a great way in, you know,
for an assembly program who was using macros to just say,
well, I'll call this a function instead.
And now I've got a thing or I could use, you know, hash define something. And now I actually have got a macro and a real
macro that works better. I feel like this is a big moment for Matt. He finally admitted that
a high level language might be useful, that it might be able to write assembly better than he
does. He's not no longer interested in assembly. Like he's still going to look at the generated
assembly, but you know, he's willing to let the compiler write it i think this foreshadows where he ends up so matt leaves the
game industry he moves over to the united states to chicago and he gets a job at a trading firm
doing market making the best analogy i have for a market maker which is not a very flattering one is
is like a used car salesman like i will buy your car for this price and then the guy who walks in immediately after you wants to buy the car from me and i'm
going to sell it to him for a lot more money than you sold it to me because my job is to warehouse
the cars yeah and provide like ballast for both sellers and buyers and that's what a market maker
is doing there is someone who's in the market and that's how you can buy google shares for at any
price because somebody somewhere has got a big stock of them and is prepared to
sell them, but also to buy more of them at a seemingly fairish price. And so that's what I
spent the first few years doing was doing market making for particular types of US stocks. And of
course, everybody's playing a game where they're looking at the world and going, well, actually,
maybe I'll prepare to buy a penny more or a penny less.
And so the thing is changing at a staggering rate.
You know, like talking, saturating a 10 gigabit Ethernet line with the changes,
just the change information about one exchange.
And there were like 14 in mainland US.
So it's a lot of data you're processing at essentially line rate or getting on towards line rate.
The barrier to entry is you have to be able to consume this amount of data,
make an intelligent decision about what you're going to do
in amongst all of that.
So obviously what you need to be able to do is react very quickly.
And whenever you want to react to stuff quickly
and you want to deal with floods of packets,
you turn to a compiled language.
And in our case, we turned to C and C++.
And so, yes, I inherited a code base that was predominantly
C++ 98, so that's the original-ish C++ era. And a couple of years in, we were looking at whether
or not it would be okay to start adopting some of the new features that were coming in C++ 11.
All right, so here you have Matt, who was begrudgingly dragged into C and C++ from assembly.
And he works at a place where they struggle just to keep up with the market flow. And there are
new convenience features coming to his language. Matt is now primed to become an expert on how the
sausage is made. What happens inside of a giant optimizing compiler like GCC? And how does that
affect his ability to write programs that can keep up with
the market flow? And so C++11 gave us two things and many other things, huge amounts of other
things. I'm glossing over those, but it gave us the auto keyword, which says, yeah, just the type
of the variable is whatever I'm equaling it to. So if you did like auto I equals zero, you're
getting an in. Type inference. Exactly. Yeah. I mean, I know other languages have much more sophisticated systems for doing this.
Rust in particular is even more sophisticated.
But like C++ is pretty much like whatever's on the other side of the equals
determines what you are with some, of course, wonderful C++ strange caveats
and asterisks and footnotes about the weird edge cases.
But, you know, it wouldn't be fun if it was easy.
So we got auto and we also got the range for.
And so now I could just do for auto i colon vec.
And now I'm getting all the integers i in the vector.
And that's great.
But obviously, well, not obviously at all.
We had been burned previously in other languages.
So there was a mixture of languages that were used in the desk.
Some of it was written in Java.
And Java is an excellent language.
But we had been bitten in Java with iterators because if you iterate over
and arrange things in Java,
a new object is created.
Every time you do that,
you get a new iterator
that goes over the object.
And the Achilles heel of the systems
that we had before
was that we just basically couldn't afford to let them garbage collect.
We had so much junk in there that it couldn't be done quick enough
for our systems to remain on.
And in fact, the GC thing, the first thing it would do
is try and turn off the system so that we could spend
hundreds of milliseconds churning through
and then deal with the fallout of everything going wrong
because we'd missed maybe some packets off of the wire,
all those things.
So we were avoiding it by trying to not create garbage.
And so this iteration idiom that we would have liked to have used in Java
was off limits.
We had to just, you know, for in i equals naught,
i is less than vector.size equivalent in Java.
And so when we said, let's use the new cool features in C++,
quite rightly, the lead program was like,
are you sure that's the same?
And so Apal and I sat down and we, this has gone into law slightly now,
so obviously the tale has grown somewhat in the telling.
But my memory is that we wrote a very simple function in C++
and compiled it and just dumped the output with obj dump
and the disassembler and the demangler and whatever. We did that a couple of times as we fiddled with
stuff. And then I had the idea to use the Unix watch command. And what watch does is it runs
the rest of the command over and over again, highlighting the differences. And so I was able
to take the compiler and, sorry, I was going to say like watch, run the compiler on temp-test.c,
pipe it through all these things to demangle it,
get rid of some of the stuff that the compiler generates
that I don't care about, and then show me that, please.
And then I split the terminal in half, and I had VI on one side,
and I had the results of this on the other.
And so I was able to make changes, and every two seconds,
it would immediately show me the assembly output.
Can you picture it?
Basically, we have a split screen.
On one side, you have your text editor
with a single file of code.
And then on the other side,
you have your generated assembly,
which the compiler emits.
It's basically nonsense to me,
but not to Matt.
But even to me,
like if I change the idiom,
it shouldn't produce a whole much more assembly.
It shouldn't have new extra allocations.
If something like that
happens, then I know something's up and that I should look into it. And of course, naturally,
that was a very valuable and useful thing to be able to do, to just experiment interactively. I
think even I at the time, C++ compilation is such a heavyweight activity that until my friend Jordan had sort of showed me
how quickly he could just knock up a thing in a temp directory
that was all of like three lines and run the compiler.
And I was like, oh yeah, actually it's not too bad, is it?
It doesn't take too long.
Of course, it's quite fast to compile four lines of code.
But until that had happened,
I just would never have done it.
I'd never have experimented in this way.
Once you have this tool,
one performance question you might want to ask it
is when are bit
shifts a worthwhile optimization?
Bit shifting is, instead of multiplying a number, you shift it.
So you shift an integer left is equivalent to multiplying it by two, but it can be faster
in certain circumstances.
If you look at like the Doom source code or the Wolfenstein 3D source code, you'll see
that it's covered with, you know, things like things like oh a equals a shifted up by eight plus a shifted up by two you're like what
and a shifts up by that's 256 a shifts up by two that's four oh oh you're multiplying it by 260
i see what you're doing there but you're using shifts because shifts are faster all right and
then you you can sort of like say well okay that's that's cool but obviously the compiler knows this trick too. And the thing about the compiler, it's much more consistent about
applying that trick than you are. And so anytime you're multiplying by 260, it says, I got you.
I know what you're doing here. And so the risk of revealing one of the big sort of spoilers in one
of my talks about where I talk about this particular thing, what you can do is you can
take a particular number. There's a number, I forget which one it is. It's got like a certain number of set bits
and you like throw it in compiler explorer
and instead of using these shifts and adds
that you see if you like do multiply by 10,
it just goes back to using a multiplier.
You're like, oh, compiler, you gave up.
You gave up, didn't you?
You decided that my number wasn't worth it.
You're just going to use a multiply instruction.
And so if you sit and pick it apart and go,
well, this is, you know, A shifted up by 10
plus A shifted up by four plus A shifted up by 2 minus A
because it's one less than all of that.
Okay, right.
And you put that into the compiler and you see, oh,
and it still does a multiply.
So I wrote it out as shifts and adds because that would be faster
and you turn it back into the multiply again.
And it's like, no, you're stupid because those shifts
and adds are no longer cheaper than the
multiply. The multiply is five cycles and you've just generated seven cycles worth of shifts and
ads. And so even though you phrased a multiply as a bunch of shifts and ads, it was able to,
again, unpick it and say, what are you actually doing? You're multiplying by one, six, nine,
seven, nine, seven, two. I'm just going to multiply by one, six, nine, seven, two.
And the cool thing about that is that you can then go and say, well, I read this out of some Doom source code,
which of course is 386 or 286 era.
If I tell it to target a 32-bit system
and say the architecture is that CPU,
it does indeed do the shifts and adds.
It goes, no, I got you.
I know that the multiplier was like far too slow.
I will use the shifts and adds.
And now you look at the shifts and adds,
it's actually been cleverer than my example.
So it was able to unpick my shifts and adds,
determine it was a multiply by some high constant,
and then it had a much better way of doing that
that was still shifts and adds than my original way.
And it's just like, this is why you trust the compiler.
Another optimization that Matt can see in action
using his tool is vectorization.
So vectorization specifically is an interesting technique where the compiler is able to see that it might be worthwhile doing multiple iterations of a loop at the same time.
CPUs have a number of instructions that treat a register, which is maybe 64, 128, 256 bits wide,
instead of just treating it as a giant, giant number, it treats it as a structure containing a number of 32-bit values
or 16-bit values or 8-bit values
or a number of double precision numbers or single precision numbers.
That's crazy.
I write a for loop and I say add up all these things,
and it's like, yeah, I'll just add loop and I say, add up to all these things. And it's
like, yeah, I'll just add them up four at a time because I can do that. It's so cool. And yeah,
and exactly. And it does this for every loop, you know, every loop that it thinks is profitable.
Whereas as an engineer, if you were like actually having to write this out longhand, like old Matt
would have done, you'd have to be really quite devoted to, to this, to always use the magic
instructions that do this this way and deal with all the edge cases. But the compiler will happily you'd have to be really quite devoted to this to always use the magic instructions
that do this this way
and deal with all the edge cases.
But the compiler will happily spit that out
every time it sees a loop
that it thinks is worthwhile to,
which is just another reason, right?
You know, rather than being smart
individually every time,
be smart once and teach the compiler to do it.
And then everyone benefits from it all the time.
In other words,
if there's some trick that you think
can make your code faster,
probably the compiler authors have already put that trick
into the optimization part of the compiler.
I have all the time, by the way.
I know you sort of scheduled it up to now,
but I can keep talking until you're bored of listening to me,
which is already you've shown a remarkable fortitude.
No, no, it's super interesting.
So my favorite example is counting the number of set bits
so if you have a number a 32-bit integer unsigned 32-bit integer and you want to know how many one
bits there are in that 32-bit value yeah it sounds like a pointless thing it sounds like an interview
question which i think it probably is but there are some genuine reasons for doing it they're
used in like um uh packed uh matrices packed, uh, matrices that are,
you know, very, very sparse matrices that have lots of zeros in them. Instead of storing all
those zeros, you store a mask that says, well, which of the following values are populated?
And then it just has the populated values immediately after it. So you need to sometimes
say, well, how many are there to get to the next row? Yeah. I don't need to justify it really.
It's just a thing that you might want to do reasonably.
So there's a bunch of ways you might solve this.
The easiest way is you have a 32-bit integer.
You're just going to check if the first bit is one
and then drop it off and keep checking 32 times.
Then there's some optimizations you can make on that.
If the whole thing's zero, you know, you're done.
You can exit early.
You could do some sort of bit twiddling hacks, et cetera.
But anyway, you write any of those ways
on a modern computer, a modern compiler,
and you turn the optimizer on to full
and you tell it that the architecture
is like a modern PC as opposed to the default,
which is like the oldest thing it possibly supports.
And it doesn't matter, pretty much any way you wrote that,
it will take the whole thing
and replace it with the pop count instruction,
which is the how many bits are set in this register instruction.
And that's just mind-boggling to think of what's gone on there.
You've taken essentially an order n or an order set bits of n,
any number of ways you could write it,
and the compiler authors are like, we got you.
We know what you're doing,
even if you phrased it in all these different ways. There's a normalization pass inside the compiler authors are like, we got you. We know what you're doing. Even if you phrased it in all these different ways, there's a normalization pass inside the compiler. There's then tricks for
noticing what you're doing, idiom detection. And then it's like, no, this is counting the number
of set bits. We're going to replace it with that one instruction. And that's amazing.
I totally agree. It is amazing. It means that this interview question is easier to answer in
assembly than in a high level language, because in assembly, it's just a single instruction.
What does all this knowledge about compiler optimizations tell us about writing high-level code, though?
If the one thing that I want people to take away from all of this conversation is that
compilers have moved on to the point now where, even though it's so useful for you to understand
what's going on right at the bottom and understand what the CPU is doing, what the RAM is doing,
what the caches are doing, what the branch predictor is doing.
That's great. It's wonderful. It's exciting. It's interesting.
But don't necessarily think that you can't trust the compiler to know those things,
or some of them at least, and take them into account.
Trust the compiler. Always trust the compiler always trust the compiler write you write code that's easy to read
because the computer the human is more important than the compiler now the compiler has your back
whatever you wrote so don't trick yourself and write some natty little thing because you think
it'll be a cycle faster or not it won't be the compiler honestly will beat you on almost
everything that you can care to do write the code so that your colleague or you tomorrow morning when you haven't got coffee can understand and trust the compiler to just generate
the right code. So in a sense, you know, you're admitting defeat from your old days.
In a sense, yes. But in another sense, it's as if the scales have fallen from my eyes now.
And there are just so many more smart people that work on compilers that have the time and the energy and the will
and the clever ideas to add in these heuristics, these tricks, these knowledge of the ISA that's
deeper than anything I would have these days. Am I saying that you can't beat the compiler anymore? No, of course not.
There's always a case where you know more information than the compiler. But the compiler
is very, very good under almost all circumstances that you care to do. So Matt's a changed man,
right? I mean, he had already moved on to using a high level language. But now with his tool,
you know, he can really see the assembly that's being generated and all the optimizations that are happening. And, you know, this is clearly something valuable. So he
puts this up on a website, on his own personal website, godbolt.org, basically unchanged,
you put your C++ code on one side, and you see the result on the other. And this starts to spread
inside of the C++ community.
Until my favorite story for the whole of this is that we were lucky enough to have Andrej Aleksandrescu,
who was like the father of some of the template metaprogramming tricks
that you'll see.
There's a number of books he's written.
He came and gave some talks at DRW about performance.
He was then working for Facebook
and they were dealing with large data at scale
and there was all these things.
And it was fun for me because I spent the whole time heckling him on like
the assembly code he was showing just generally.
And so I kept pulling him up and that,
and he,
we had a good,
it was good banter back and forth.
And eventually at the end,
he said something along the lines of,
ah,
I think somebody told me there's a website that you can just put your code
into.
Uh,
and it shows you what the assembly looks like.
And at that point I blushed bright red as everyone else in the room looked at me and like pointed and said
yeah it's his i'm like oh now i feel like an absolute dreadful person for i feel not mentioning
earlier yeah he came to give a talk and you were like poking at him and then he's like i think i
have a solution like yeah that's mine um yeah so he and i've become friends now, so I think he's forgiven me for being quite so dreadful.
So I think we're all right on that front.
But yeah, so it grew and grew,
and it probably took a couple of years
before it became more well-known.
So godbolt.org now supports many, many languages,
21 as I look right now,
maybe more by the time this comes out.
I first came across it on Hacker News
when somebody was posting a link about the ZIG language,
like Z-I-G, and how well it optimizes down to x86 assembly.
If you can read assembly and you have small examples,
godbolt.org has become sort of a Rosetta Stone for code performance. One thing
Matt doesn't like about the tool though, is its name. Yeah. I mean, I think that like people say
to like, to godbolt something, right? Yes. Yes, they do. It's something that, yeah, it's,
I've been in C++ committee meetings where somebody has said,
why don't we just put this in Godbot and see what happens?
And I'm like, I'm in the room.
You can't say that with me here.
It sounds weird.
I don't know.
It's fitting though, right?
Because like, here's your story of always like,
it seems like the punchline to each of your anecdotes is like,
to the assembly, let's see what it's doing.
Yes. the punchline to each of your anecdotes is like to the assembly let's see what it's doing yes i think you know there is a reason why it was me or rather like i was in the right place at the right time with all the right like background for this to happen right so it
you're right it is the punchline so i guess in that way yeah but i mean it's it's luck
it's luck i mean that's that's the thing. People, I get invited to
things like this now, this
enjoyable time talking to you, and I get,
I've been able to talk at various conferences
that I would never have dreamt to as just some
rando who writes C++ for a living.
And it all came out of a lucky
break where, you know, an afternoon's
work with a friend and a bit
of JavaScript
and a memorable last name.
So that was the show.
Matt, he recommends that nobody write assembly now,
but that everybody should be able to read it.
And because of his interest in assembly and reading the assembly generated by compilers,
he's become famous in this world of very performance-critical code.
Check out godbolt.org where you can play around with the tool that he built
that he likes to call Compiler Explorer,
but everybody else calls Godbolt.
As I mentioned at the beginning,
I got a new job at Earthly.
If you go to earthly.dev or check it out on GitHub,
you can see what we're building.
We just want to make builds better
and builds faster and builds more reproducible.
It's just kind of an ugly, complicated area.
So we have an open source tool and we're trying to make the build experience better.
Until next time, thank you so much for listening.