Microarch Club - 101: Matt Godbolt
Episode Date: April 10, 2024Matt Godbolt joins to talk about early microprocessors, working in the games industry, performance optimization on modern x86 CPUs, and the compute infrastructure that powers the financial tr...ading industry. We also discuss Matt's work on bringing YouTube to early mobile phones, and the origin story of Compiler Explorer, Matt's well-known open source project and website.Matt's Site: https://xania.org/Matt on LinkedIn: https://www.linkedin.com/in/godbolt/Matt on X: https://twitter.com/mattgodboltMatt on Mastodon: https://hachyderm.io/@mattgodboltMatt on Bluesky: https://bsky.app/profile/mattgodbolt.bsky.socialDetailed Show Notes: https://microarch.club/episodes/101
Transcript
Discussion (0)
Hey folks, Dan here. Today on the MicroArchClub podcast, I am joined by Matt Godbolt. Matt is wellroprocessors, namely the Zilog Z80 and
MOS Technology 6502, including a discussion of undocumented opcodes and their creative uses.
We then talk about Matt's time in the gaming industry and what went into building games for
early consoles, before discussing his experience working on YouTube for cell phones at Google.
The last part of our conversation focuses primarily on the past 15
years of Matt's career, which has been in financial trading. Matt explains why trading requires a deep
understanding of hardware and software and shares how technologies such as FPGAs allow firms to gain
a competitive advantage. A common theme throughout our conversation is the ever-rising complexity of
processors and the systems built on top of them.
While abstraction of hardware and low-level software has allowed us to build new applications faster than ever before, Matt and I both assert that there is value in understanding what is
going on behind the scenes. Or, as Matt states succinctly in our conversation, you should always
understand the abstraction level directly above and directly beneath you, and there is always at
least one level beneath you. and there is always at least one
level beneath you. I followed Matt's work for quite some time, but I wanted to extend a thank
you to Jonathan Yu for suggesting that I ask Matt to join for an episode of the MicroArch Club.
With that, let's welcome to the show.
Thank you for having me. It's great to be here.
Absolutely. I've followed some of your work and I might say tooling for a period of time um and i think you know when we were chatting about before this show
i think we may have uh crossed paths on social media or github at some point in time i think so
i literally just before the show when i was searching uh for your name to find the outline
you'd sent me i found some github repos we'd obviously both been looking at the same time
where you'd committed to so like i think we've been broadly following in the same footsteps for a long while.
Absolutely.
And I will say, I think that you are the first listener requested guest
because Jonathan Yu on Mastodon kind of pinged you
and I think John Masters as well and said,
y'all would be great candidates.
And I said, that sounds like a great idea.
So I'm super glad to have you here.
I'm very pleased he did.
Although I feel slightly fraudulent looking at the folks that have already been on the podcast
and also, you know, the sort of general belief behind the podcast of like discovering things.
I'm like, I'm also on the same journey.
I think you are to discover how this world got put together that we're so enamored of. Right, absolutely. Well, you know, I think that one of the common themes with at least
all the guests I've had so far, which some of them have been released and I've got a few that
I've recorded that haven't released yet, is just kind of an interest in computing history. And so
in some ways, right, we're all like on this journey of understanding
about how the industry is evolving and all the things that have have come before. And one of
the things that's really neat, I think, is looking back at computing history and seeing that what's
new is kind of old. And we're really new on the side. It's all been done before. We go around in
circles. Yeah, quite. Exactly. And so maybe maybe that's kind of a good place for us to get started,
just kind of talking about your introduction to computing
and maybe when you were growing up,
how you first were exposed to computers
and what that environment was like.
Absolutely.
There's a sort of family story about the first time I ever saw a computer.
I was at a friend's house, and he had a Sinclair Spectrum.
I think I was seven.
I must've been seven at the time.
So I'm going to age myself here.
This was 1983.
And apparently there was like one of the really,
really simple flight simulators
where it was literally a line where the horizon was
and then four lines for where the runway was.
And then most of the screen was the instrument panel
because that didn't change very often.
And so the poor thing only had to draw
the tiny little like window at the top.
And my parents said I was interested in watching this
at my friend's house,
but then he apparently reset the machine
and which was in those days,
you pull the power cable out
and plug it back in again, right?
And then of course it drops into basic
and he started typing in a simple program
and the numbers were scrolling up the screen. So as my mom tells me, and then of course it drops into basic and he started typing in a simple program and or the
numbers were scrolling up the screen so as my mum tells me and apparently I was wrapped with that
that was so interesting to me I don't really remember this but that was the story and then
as a result of that on my eighth birthday I was very lucky to get my own spectrum and that's where
my journey started typing in the programs from the book that came
with the computer back when you know manuals were actually pretty full and had like the data sheet
in the back and had the circuit diagram of it even and um you know there were like the two or
three programs that would print out a british union jack flag uh and i remember christmas time my mom reading it out
and me typing it in and and you know that was where the journey began um so the the spectrum
was like probably the most i mean so it was the timex the sinclair timex over here in i say over
here i'm in the states now despite my accent um uh this it was the sininclair Timex over here. So it was a Z80, or Z-A-T,
depending on where you come from,
processor.
And it was kind of relatively cheap for the time.
It was a very compact computer
and it had a very terrible rubber keyboard,
which felt awful and was pretty nasty.
But it was a gateway into a whole new world and of course what 10 year old or
whatever by the time i was sort of got to grips with it didn't want to play computer games and so
we would you know back in the day this was the games would come on audio cassettes
they would be encoded so you know if you think of modem screech but lower and more rubbish that's
the kind of sound that we were we grew up with and even now i get the hairs on the back of my neck go up when i hear that noise
because it reminds me of those days back when right and so you'd load up games and whatever
and you know they were reasonably easy to duplicate legally or otherwise and so there was quite a
circuit around the playground of folks sharing but eventually you reach the point where you couldn't
get more games and then you're like well maybe i could make my own game maybe that would be more fun and so you learn basic you know
you probably already had learned basic just because of the way that you know you turn the computer on
you have to type in commands to even get it to load from the cassette tape right but very quickly
you realize especially with the spectrum its implementation of basic was pun intended rather basic and it wasn't very fast it was
incredibly slow it was not a fast interpreter and so any game that was more than like the number
guessing game where you say is it higher than seven yes or is your number seven you know eight
whatever anything more than that was a little bit too much for it so i remember i wrote a couple of
strategy games and i wrote a little adventure game and i
even got as far as selling one of these games in the back of a you know magazine where you could
like the classified ads at the back you know send 10 pounds to this address and we'll send you a
cassette in the post i sold one copy not very much but it was better than nothing right right um but
then you get to the point where you're like well i really want a game where i can shoot things
because you know that's more exciting than typing stuff in and at that
point the only way to get any kind of performance is to write this thing called assembly and you
didn't really understand what it was but you knew you had to to to do it to get the performance it
was the thing that you know you knew that that stuff was written either in assembly or basic
um the spectrum didn't have an assembler which was a which was a shame you, you know, you knew that stuff was written either in assembly or basic.
The Spectrum didn't have an assembler, which was a shame.
You know, you had to go and buy one and I couldn't afford one.
And so I remember the very first assembly program I ever got working was a scroll text, very simple scroll text at the bottom of a screen.
And it was written during a very boring swimming gala that i had to attend because my sister was was in and i hand assembled it on the back of the the program for the the you know the schedule for the for the gala swimming oh right right and then i got the program is an incredibly overloaded word
in this context right i was gonna ask did you bring the the spectrum with you to this no it
was written in pencil right on the back and then hand assembled when I got home.
And it worked first time.
And I was really hooked then.
Then I managed to find someone who had an assembler.
And I wrote a sort of simple block-based game where you ran around.
But it was a lot faster than you could ever achieve with BASIC.
And there were like a little tiny bit of programmable logic in it.
And the door was open.
But I consider it an absolute blessing
that I was born when I was,
and computers were as simple, air quotes, simple,
as they were back then,
because a 10-year-old, 12-year-old
could fit the whole thing reasonably in their head
and understand enough of it,
especially with the right amount of will
and motivation to
make a game to to to uh yeah understand the whole thing and make a game themselves uh nowadays of
course you know my i've got kids that are older than that now and the pair of them are like well
i want to make minecraft and i want to make you know some new fps like, well, that's a long way away from like a one block,
a star moving around inside a maze of pluses and minuses.
But yeah, so then I moved
from the ZX Spectrum.
A very good friend of mine had a BBC.
So back in the 80s,
the British government decided
that in order to kind of get ahead,
they should teach all their citizens about this newfangled thing called a microcomputer.
And they commissioned the BBC to make a program, a TV program now.
And in UK English, that's program MME at the end, which is a strange thing.
But yeah, you can't hear that on a podcast.
Anyway, a program which would then be distributed,
you know, broadcast.
That's what we call it back, isn't it?
Distributed, broadcast to everyone
to teach them what a computer was.
And the makers of this TV show,
we'll go with a slightly less ambiguous name here,
decided that they should have an official computer
to go alongside of it
so that they could teach the ideas that they were doing with an actual physical computer that you could also get yourself
and obviously there were other ones around but they wanted to make it sort of mostly affordable
and it critically it was going to go into schools at the time so that schools would have this sort
of backup as well and so like the whole there was a generation my age that grew up with
a particular computer in their school and that computer was made by a
company called acorn who very famously at the last minute sort of outbid and outmaneuvered
sir clive sinclair of the zx spectrum or timex sinclair um and uh got the contract to make this
computer even though the computer had been made in like three days with them soldering it together
and writing the software or in all nighters.
And literally as the person from the BBC was due to come in to see this
apparent demonstration machine,
it wasn't working.
And for the whole time,
someone was having to hold like a wire that they discovered was loose with
their hand or otherwise earthed or grounded in some way,
you know,
one of those amazing stories.
It's probably more apocryphal than real,
but it's great to think about.
But they won the contract,
and this machine was pretty prevalent in the UK,
and that was the computer I moved to, actually.
I moved to the sort of 128-kilobyte version of it,
which, you know, woo-hoo, a whole 128K.
And that had a 6502 in it,
which was the movement of, you know,
a different CPU from the Z80, obviously.
And sort of curiously was like really a RISC processor.
If the Z80 is a CISC processor, you know,
like there's 768 odd opcodes that it has.
The 6502 has less than 256.
Not every single opcode in the one byte that is an op code actually encodes for something that they they meant to happen right which is a whole other
story we can get into in a minute uh and so that was what i really focused on so during my my teen
late teens i was i was working with a good friend of mine and we were making games and we were
sending there was a magazine uh back when that was a thing.
Perhaps some folks will remember what magazines were.
They're like thin, flimsy books that you could buy once a month from a special shop that sold them.
It's like if you, you know, printed a PDF or something like that, you know?
That's right.
Yeah.
Or like a blog post or something like that.
A sequence of blog posts are all printed out.
Yeah.
But there was an appetite for type in programs where you would buy the magazine and at the back of the magazine in like a yellow pages and really cheap quality print. There was, you know, 800 line programs for various different things.
And there'd be articles in a magazine explaining why you might want to type the program in with little screenshots and things
and like and lots of artist drawings that made it look a lot more impressive than it actually was
but it was we were hooked right that was that was a way that you could learn more about how to
program the computer essentially they were like the blog posts and all the stack overflow of their days you know
you would type it in and inevitably you type it in wrong and it wouldn't work and then you'd
scratch your head and you'd stare at the thing and you kind of go well i think i understand enough
of the flow of it to now work out where i must have typed it in wrong and so you learn debugging
skills before you even knew what debugging was and that was great and yeah later in my teens i was i was writing articles with my friend and sending
them to this and it was a great way of of uh you know keeping us in a few 10 pounds here 20 pounds
there for for for buying more games as it happened right you know that that was but it meant that we
again the the chips were simple enough the computer was simple simple enough, and the BBC had a built-in assembler just out of the gate.
You turn the computer on, you open a square bracket,
and you're typing assembly.
It was fantastic.
But the BASIC was also incredibly fast.
It was a very good put-together version of BASIC.
The person who wrote that BASIC went on to write the BASIC
for the Archimedesedes which we'll probably talk
about well maybe we'll talk about a second um i know some foreshadowing here i'm getting all
excited as well because it's such a great story set of stories uh but yeah so it was it was
wonderful that we were able to learn so much about this this system and it was so uh everyone had the
same system under their desk not like pcs these days where everything's different so if you found
some clever trick about your computer it would work on everyone else's computer, too, even if it wasn't in the manual.
Right. And so people enterprising folks would realize that of those 256 opcodes in the 6502, some were not specified.
But like somewhere there's a network of transistors doing something.
And it's not like they threw an exception, a hardware level exception.
It's just like, no, different parts of the chip turned on because these bits were high and these bits were low.
And so very famously, there was like a store instruction.
Store A was one opcode.
Store X, the X register, was the next opcode.
Store Y was the next opcode.
And then doesn't do anything, undefined, was the next one.
You're like
well is there a fourth secret register or what and so you try it out and through a bit of working out
you you realize that no what it's doing is there's a those two bits there are the bottom two bits of
the opcode are selecting either if the both are clear then it's the accumulator that's put onto
some internal bus inside the 6502 and then if
the low bit is set the x is put onto the bus and if the high bit is set the y bit y register is
put onto the bus but if you set both of them it just puts the x register and that this is the like
as well as and the y register onto the bus and because it was an nmos design we discover later on that meant that
essentially the zero bits would win so it was actually an and it was x and y that was put onto
the bus which meant that then when the store circuitry went to go and now push this out to
memory you got the x register anded with the right y register written out to memory and maybe that's
useful to you if you're writing a sprite routine. And for example, you need to mask the bits that you don't want to change
with a sort of like don't change these bits mask.
So, you know, these were clever things that people would discover and determine.
And even like the video circuitry was clever.
You could do some tricks to change and lie to the system
that it had slightly more lines or slightly fewer lines
and cause it to generate the H syncsync or the v-sync at different times which would be interpreted
interpreted then by the monitor or more likely the television that it was plugged into
as moving up and down slightly so you could get it to wiggle around and then with careful other
things timed you could get the screen to scroll around in a very smooth way that was otherwise
totally impossible for something that underpowered right So there were a lot of really cool things that you
would learn back then about how to make the most of it. But by the end of this process...
Yeah, go ahead. Sorry.
No, I was just going to jump in while we're kind of like on the 6502. So you mentioned
two very important 8-bit microprocessors, right? The Z80 and the 6502.
And there's kind of a couple of other contemporaries around there. But in some of my
own experience and some of the research I was doing for this show, the BBC Micro, and then I
believe from my pre-show stalking that we talked about before we jumped on here, you had a BBC Master, is that right?
That's correct, yes.
That was the posh 128K version, yeah.
Right.
And so I've heard in talking with a lot of folks
that these computers are really impactful,
and also the 6502 was in a number of other very notable systems,
so the Apple I, the Apple II, the NES. I'm leaving off a number of other very notable systems so that apple one apple two um the nes i'm leaving
off a number of here i don't know if you have any off the top of your head that that i haven't named
but um bender from futurama has a 6502 and the terminator okay if you freeze frame the terminator
when he's got the stuff scrolling down the screen at the very beginning of the movie it's 6502 op
codes and it's like a bootloader boot rom thing it's copying memory down low right okay perfect
so so both uh both real and fictional uh impactful computers exactly but yeah so you mentioned um
kind of the uh that it was somewhat of a reduced instruction set computer and that there was a
space of 256 opcodes I think there was 151 used for 56 instructions which you know is is
drastically smaller than some of the the contemporaries which I think was a contributor
to it being cheap which I think was like the the big driver of, you know, the systems that
it was put into being able to be cheap. And then also is probably why when I talked to so many
folks that they had exposure to it, right, because it was more accessible and kind of like drove this
revolution, you know, having personal computers and that sort of thing. I was I was curious,
you know, in moving from the the Spectrum to the BBC Master,
was there a significant price difference between those two machines?
Because I know the Z80 was also on the cheaper side, but more expensive than the 6502.
The BBC was actually very expensive for what it was.
Okay.
The 6502 may have been cheap, but I think the expensive part was the ram they put in ram that was and this is this
is the probably one and only time in the history of the universe uh that this has been true the
ram was twice as fast as the cpu which meant that their the cpu and the video circuitry shared it
on alternating cycles so it was running at four megahertz and the tv output system was running at
two megahertz as was the um the cpu and then they were out of phase by uh you know half a clock or
however that works and so that meant that you never had contended ram which we had got on the
z80 there were banks of ram which was slower to access because that's also where the screen was
and so you were sharing it
with the screen every time the tv needed to serialize out more colors the spec spectrum
would have the ula and the spectrum would grab the bus essentially steal it away from the cpu
and go no this is mine now um take the information it needed and then the cpu would run slower
whereas on the um on the the bbc uh it was just shared time-sliced style,
which is pretty bonkers to even think about, right?
The RAM is twice as fast as the CPU.
What would we give for that these days, right?
Right, right.
We definitely have the opposite situation now.
Right.
And one of the other things that I kind of observed about the 6502,
that it was, despite it having you know, having less functionality,
if you will, or seemingly less functionality, it was much more performant in a lot of cases
than some of the competitors. And, you know, I think there was a variety of reasons for that.
You mentioned that there was only a handful of registers. I think you mentioned the accumulator,
the X and Y registers, and I think there was a stack handful of registers. I think you mentioned the accumulator, the X and Y registers,
and I think there was a stack pointer and a program counter.
That's right, although those weren't really registers in the same sense,
just like they aren't on most architectures.
Right.
Yeah, the Z80, on the other hand, had all these sort of paired 16-bit registers, sort of pseudo 16-bit registers that sort of very presage the 8080
that it was sort of around contemporaneously.
And there was a lot of cross-pollination
and some strange IP-related nonsense
that then sort of bled.
So there's a little bit of x86 smell to it, right?
Even back then.
But the Spectrum was really,
oh, sorry, the Z80 was very interesting
because to save money in the Z80,
they only had a
four bit alu that they just pumped twice to get eight bit answers or four times to get the 16 bit
answers which meant that even like really simple things although it was clocked at a higher speed
at least it wasn't a spectrum i think 3.57 someone's gonna correct me in the comments i'm
sure um but somewhere of that range it took more cycles to do anything and
they were like very complicated p states and t states and other things that were to do with like
am i accessing ram or not accessing ram the 6502 on the other hand access ram every single cycle
unconditionally there was no even memory um enable pin on it was like nope if if the clock's happening
i'm looking at ram or i'm reading from ram or I'm writing to RAM. That's the only thing, you know, that's the two things I'm doing.
Right.
Which, you know, reduced the pin count on the actual chip itself, simplified the design of everything.
You just plugged it into RAM and went, there you go. like load the accumulator were one cycle to read the byte to load the accumulator, one cycle to store the value, to read the value from the opcode
and put it into the accumulator.
And I think there was another one cycle always because everything took three.
Is that right?
Oh, no, now I'm doubting myself.
This is awful.
I've got a huge table of them somewhere.
But, you know, it was pretty straightforward,
although I've just demonstrated it's a bit more complicated.
Because it was just,
it was how many memory accesses did you need to do the work?
And that was it.
Whereas the Z80 had this, like,
it may take four cycles to do an ad even,
because there's, you know, four bit things to do.
But that led to some really interesting side effects,
actually, on the 6502 now we're here,
that were kind of unobservable as a programmer.
And yet.
So one of the opcodes is a rotate instruction.
So it reads a value and rotates it, as in shifts it up one and takes the top bit
and puts it back down where the bottom bit was.
And then it writes it back.
So this is a read modify write instruction uh the first op the first cycle would be read the roll
opcode the next two cycles would be read the address that i'm going to be doing this from
the fourth cycle would be read from that address now i know where it is
the fifth cycle well i'm doing the rotate, dot, dot, dot.
And the sixth cycle is when I'm writing it back.
But as I've said, there is no memory-enabled disable pin.
So what's it doing on that fifth cycle?
It's accessing something.
It's doing something with the RAM.
So what is it doing?
And again, it wouldn't matter, right?
As long as it's not destroying anything, presumably whatever it's going to do is it doing um and the only and again it wouldn't matter right as long as it's not
destroying anything presumably whatever it's going to do is it's going to write the correct
piece of information at the end but it could reasonably just read the same value twice maybe
maybe it you know it could write to some dummy location or it could read some dummy location or
whatever but it turns out it actually writes back the unmodified value effectively the little table
in the alu not in the alu in the, what do they call it?
Not the ULA.
There's like a little array of,
like, it's not quite microcode.
It's just like on step three of instruction five, then-
Oh, the PLA and the-
PLA, thank you.
Yes, yes, thank you.
I knew that it was,
it's one of those three letter acronyms
that I can't remember.
But on that fifth cycle, they just said, well, thank you. I knew that it was, it's one of those three-letter acronyms that I can't remember. But on that fifth cycle,
they just said,
well, we might as well start the write operation,
even though it doesn't do anything,
because we're going to write something,
and then on the sixth cycle,
we're going to write the correct value anyway.
So on the fifth cycle,
it redundantly wrote the value it just read,
and on the sixth cycle,
it wrote the correct value.
And you think, again,
totally unobservable.
Why would you care?
Except lots of hardware was memory mapped in
those days as it is now in fact right but that meant that reading and writing to memory sometimes
had a side effect right and so nobody would choose to do this really but if you are for example
making a game and you want to make sure no one can copy your game or no one can at least,
you know, hack it to put extra lives
or cheats or whatever into it.
What you might reasonably do
is encrypt your game
and then write the decryption routine
and have the decryption routine,
like decrypt the code
that's immediately after it
and then run into it.
As in the last instruction
of the decryption routine,
the next thing after that is the first byte of the thing it decoded right there's no break points on these
machines there's nothing like that um there are like registers you can set that say if we get
interrupted reset the machine and wipe ram so like i could once it's got to that point it's this
it's like a one-way street the only thing i can do is reset the computer after that but i can play the game right but it means i can't get into it and look at it and hack it or got to that point, it's like a one-way street. The only thing I can do is reset the computer after that.
But I can play the game, right?
But it means I can't get into it and look at it and hack it or anything like that.
But obviously, if you can see the instructions that do the decoding,
because you can load it off the disk yourself,
you can just do yourself what those instructions did,
either by copying them somewhere else and running them
and then running the decoding and then maybe saving it before you run to the the decrypted game and now you've got a decrypted
version of the game and so there was a cat and mouse game in the early 90s about this kind of
stuff and um the sort of the the cat and mouse game increased from just simple exclusive or with
some random keys that i made up through to well what if we
as as the encryption writer what if we use random bytes we read off the disk in places that you
wouldn't expect okay fair enough that stops you from copying the disk uh what if we start doing
things like there are these hardware timers i can read from a hardware timer the value is always
changing now if we copy the code down if me as a hacker copies the code down
low and tries to do this the time is changing and because i'm manipulating it myself externally time
is changing more differently than if it was running free so now the key the decryption key
isn't the same and so i don't decode the game but there are ways and means of stopping the timers
and then rewinding them back exactly the right amount and then carrying on and stopping them
again and rewinding them and all this kind of nonsense. And then,
so eventually somebody came up with a protection system where they threw the kitchen sink of everything they could possibly think of.
That was like essentially either not,
it was deterministic,
but unspecified.
One of the things was things like rotating some of these timers.
If you rotate the timer,
then obviously reading and writing to a timer has a side effect of,
of setting and resetting it. And this role was one of the many things that there was done that would
cause this weird behavior that no one would have known and in fact many years later we tracked down
the person who wrote this protection system and said how did you know all this stuff because you
know all these things fed into the key and you know things like enabling interrupts and then
having these timers make the interrupts go off and then the interrupt deliberately corrupting registers
so the decryption routine would actually return
in a specified place with a different accumulator value
than when it's like, who would do such a thing?
And we're like, well, how did you know what it was going to do?
And he said, I didn't.
I just knew it was deterministic.
But then how did you encrypt this?
How did you have this depth of
knowledge and whatever he said well i desoldered the chips off of the board i disabled the
functionality that wipes the memory when it breaks you know when when it gets when it when it um
right uh when it hits the end and through some clever tricks which i won't go into now as we're
already been talking about this for 10 minutes but um he found a series of um decryption or rather sort of um yeah i suppose it is decryption things which formed a
ring of cycle 255 and so he painstakingly did this 255 times and then saved the penultimate one
and that was the one that went to the fabrication factory and he still doesn't understand how it was
now the i shall tell you now why i know this which perhaps will segue in or we can go back and this is because many years later well
first of all i tried to hack that game as a kid and i failed and it was a my friend uh richard and
i wrote a 6502 simulator in 6502 to try and simulate it perfectly in order to decode the stupid thing and we failed but
fast forward 20 years and i wanted to write an emulator for my bbc my beloved bbc micro
and in order to just run the game not try to decrypt it just to run it normally i had to solve
all of those problems and really understand at the lowest level what's going on so i can tell
you that the fifth cycle of a roll writes the uninitialized value back and i and i know why
because i simulate it in the emulator in order to have this work and in fact uh the the protection
system is now one of the unit tests of my uh of my uh of my emulator it's like does it decode yes
good right where you go right so anyway that was a huge derailment um that was great uh
i i've looked through your your emulator a little bit and actually was uh poking at some of your
unit tests because um i was curious about um one of the the other attributes of the 6502
which is documented behavior um it turns out um And that is the zero-page addressing mode,
which I thought was, I don't know if this was common at that time,
but it was an interesting thing.
Yeah.
I think it was, basically, it was the only way for it to have pointers
because, as we've discussed, we had an A register, an X register,
and a Y register, and those were all 8-bit registers,
unlike the Z80 with its paired HLDBCAF registers.
The 6542 didn't have 16-bit registers,
but you could indirect through a pair of memory locations
in the zero page, as you say.
So the first 256 bytes of RAM was just still normal RAM.
It wasn't cache. cache wasn't special it was
still out in the on the board but the opcodes that accessed it could first of all they only
needed one byte if the opcode said hey i'm a zero page opcode then there was only one byte for the
address and the second there were several indirect instructions that would operate through a pair of
zero page addresses and treat it as a 16-bit address to then read
from somewhere else so it's almost like you had 128 16-bit registers available to you which was
really quite a powerful concept and and and some of the more exotic architectures these days that
have like belt computers or like register files that spill there's a sort of flavor
of that there isn't any way of like offsetting the zero page you could use the x register to
actually offset into the zero page but that was very uncommon you know it was essentially like
you had to very carefully allocate your zero page if you're writing a game and you're like well the
operating system such as it is still writes to this in an nmi routine so i have to like leave
those ones alone but but I can,
if I page the ROMs out from the basic and then disable interrupts, then I can use four zero through four F or whatever it is,
you know,
and somehow you could get some memory in the zero page.
And yeah,
I think it was a really interesting and innovative way.
And again,
it's very simplistic,
right?
And it's not traditionally risky because it's not like load store.
I mean, like, you know,
there were instructions that would do these
read, modify, writes and things like that.
But it was a really simple set
of very straightforward concepts
that were used to build
all of the rest of the instructions.
And I think that's sort of
what I think of as being risky,
even though that, yeah, as I say,
there's not strictly load store.
Right. Yeah, it was interesting.
I feel like the, I haven't encountered many other instruction sets with the,
I think there was 13 addressing modes maybe for the 6502.
Sounds about right.
There'd be a file somewhere in the emulator with them all listed.
Right, right, yeah.
They've shrunk down the number of instructions,
but you can execute them all in a variety of ways.
Right.
Yeah. down the number of instructions but you can execute them all in a variety of ways so right um yeah and it wasn't as beautiful as as um some of the later things that were inspired by it in terms of the way it's laid out so you couldn't mix and match them arbitrarily but you could do
most things in most ways you know that was kind of a nice nice thing but yeah yeah absolutely the um
the kind of mention of having more, essentially more registers available.
For some reason, almost every episode of the podcast thus far, register windows have come up.
Right. That's probably why it's top of mind for me.
Right, right, right. I forgot that you had listened to some of them.
I will say that I can give you a preview of them i will say that um i can i can give you
a preview of the next episode that is going to come out which is very relevant to register windows
because my interview is with robert garner who designed the spark instruction set oh cool so uh
he goes very deep on on register window i look forward to that then this is now but folks
listening to this now will be like well this is a real window into when things are recorded wherever right right the
curtain is fully open now exactly right they're getting a window into my my need for a backlog to
to keep going here but um well you know that's that's uh quite the experience you had you know
while you were still kind of growing up uh being exposed to all these different things you mentioned um that it was kind of a blessing uh to i i might say like have to be
exposed to computers at that level right because it was a choice right but there was there was right
i was receptive i know right but but it was there you know if you had the need to to make a game
that was how you were going to do it yeah right i mean one of the things that i felt uh kind of like growing up and you know earlier on in my career and that sort of
thing where i was being introduced to computing at uh with machines that were much more complex
and also uh tooling that was much higher level and more capable um is that you know investing
in kind of like learning uh lower level concepts and that sort of thing could be viewed,
I would push back on this notion, but could be viewed as kind of unproductive, right?
Not doing the most productive thing there.
Why would you learn how this stuff works when really it should be hidden from you?
If you're learning to drive a car, you don't need to understand how an ignition coil works, right?
Right.
But it's kind of of it is useful to
know somehow absolutely and apparently you know there's there's other people like us who think
that as well one of my my favorite uh quotes was from uh tom lion who uh has been on a number of
different podcasts he was an early uh sun employee and uh he i i always butcher the quote but it was something like um uh abstractions are
meant to create boundaries for machines not people so or people are meant to pierce abstraction
layers even though machines are not so um it's kind of like yes we should use abstraction to
enable us to do things uh faster and you know with more certainty but that doesn't mean that we
are resigned to not
look no i think that's it yeah abstractions are a tool and we can use them to help and they can
be used in all sorts of things you know like they can be used in an organization to say well you
know this isn't really how that part of the organization works but what we have to do
is fill in this form and then a bit later on a computer arrives and i don't need to know anything
about how that happened but you know that's how I purchased things or whatever.
Or we can use it as like, well,
I typed this thing into the computer and then I get linear algebra solutions.
And that's great.
But as long as you can keep going down the levels of abstraction,
as long as there's no barriers to you, you know,
I think you should always be aware of the lower,
the layer below you and a couple of layers above you. If, if such a thing exists, you know, I think you should always be aware of the lower, the layer below you and a couple of layers above you.
If if such a thing exists, you know, and it doesn't matter how low you are.
There's always at least one layer below you.
As I'm sure you're learning in this journey, too.
You know, you think things you take as read and then you're like, oh, wait, someone had to.
Oh, yeah, that doesn't work the way I thought at all. I just assume that work like RAM. worked like ram it just works right you're like no there's a whole set of things to think about
how does that work right right absolutely well okay so moving um after you know maybe like
going through high school i imagine um was uh some of that storyline there and then uh you
eventually um go to university what's kind of your like most folks when they're go to university. What's kind of your, like, most folks when they're going to university,
they're thinking, what do I want to, you know, do and learn about and that sort of thing.
What was kind of your motivation at that time?
Obviously, lots of exposure to computing.
But did you see that as a career path?
No, that was it.
I think it never ever crossed my mind.
That's not true.
I think it probably did cross my mind. That's not true. I think it probably did cross my mind.
But I had always been interested in physics and science in general.
And I sort of designed a route in my head that was like, I'm going to go to university.
I'm going to get a master's in physics.
And I'm going to do my PhD.
And then I'm going to do quantum physics or astrophysics or something like that.
And this computing thing was just my
almost life-defining hobby right even then and i never really thought about it as anything more
than that um my my journey for physics started from so this is a strange non-secretary story but
like in the uk in the 80s i used to wake up really really early and there would be nothing on the
television the tv shows tv stations would shut down there were only four of them or probably In the UK in the 80s, I used to wake up really, really early and there would be nothing on the television.
The TV stations would shut down.
There were only four of them or probably three or two of them at the time even then.
And so overnight it was just a test picture of like, you know,
with the little like nothing here.
But there was one channel where a distance learning university
used to transmit its lectures that you would set your VCR for like 3 a.m or 4 a.m and you
would record an hour-long lecture and i used to wake up and watch this because it was the only
thing that was on and i have these vivid memories of these bearded 70s men dropping marbles into
like bowls and then through land of extremely primitive camera technology showing superimposing
all of the various frames to show the pattern that
the marble was rolling in and then writing out equations on boards about it and i was like again
i think the common theme here is like weird sigils on a screen gets me interested right
and so that was started my interest in physics and then yeah i went to university i studied physics
and i'm it studied i'm gonna have to do air quotes
here that your listeners will not see uh because really as soon as i've discovered the internet
such as it was back then and computers where they were more they were bigger and more powerful than
i was used to so by this time i'd graduated on from the bbc master so like i think i was 17 so it's like last or
penultimate year of of uh high school that i got an archimedes which was made by acorn who were the
same company made the bbc micro it was a natural progression from that but they had decided to
jump this 8-bit era all the way to 32-bit and forget this 16-bit era so like all my contemporaries
so i hung on to the bb three years past its best before date right it was way overdue everyone else was already on
their ataris and their amigas and learning about blitter chips and things that were really cool
and interesting but i was like no i can do this on my 8-bit machine it's fine and then eventually
when i gave in i thought well i'm going to go with acorn still and by this point acorn had designed their own 32-bit microprocessor and this microprocessor
was inspired heavily by the 6502 that they'd cut their teeth on the team and knew all about it they
went out to western digital or whoever was the designer at the time of the 6502 and said can you
tell us about how you make a chip and it turns out it's like three people in by this point three people in
like a bungalow in texas going like sure this is how we made it like what you this is so it's
possible for like mortal humans like a small number of them to design a chip and they're like
yeah of course it is i i think you know the original 6502 bill mentioned all that kind of
stuff was you know bearded men again unfortunately as is the way
in our industry at the moment although we're trying to change that right um right with sharpies
on a big acetate sheet drawing out the 6502 but it was you know the the later versions of it were
done uh similarly and so anyway the folks from acorn came away and said well we can do this too
how hard can it be nobody told them how hard it was to make a chip so they you know they
were like we can do this and they designed this really beautiful 32-bit machine and they'd learned
from the 6502 where it's like this almost nice separation of addressing modes and flag setting
and all this thing and they thought well if i've got 32-bit fixed size opcos i can fit
them in nice places and so it's really kind of a nicely designed system
and that they called it the acorn risk machine because it was very much a load store architecture
with 15 registers or 16 if you include the program counter and of course we all know i'm doing the
whole long reveal for you here as you're smiling and you know what i'm talking about here as well
almost all of your listeners but this was the arm chip the very first arm chip and so the very first 32-bit machine i ever got my hands on was an arm and just like the acorn before it uh sorry the bbc
before it straight into assembly because it was the same basic you could open squiggly braces and
start typing uh 65 arm assembly and it was you know it was beautiful it was so uh simplistic
uh it was super fast for the clock speed.
I think mine was like an 8 megahertz or 12 megahertz.
And the multiple load and store instructions that it had,
which was a lovely way of reading and writing multiple registers
from an ascending or descending memory location,
which was perfect for pushing and popping,
going in and out of functions.
But also it was amazing because you could point it at the screen
and blitz sprites as fast as you could so although it didn't have sprite hardware to write
games you could do pretty well with these with clever use of these multiple load and store
instructions you know read from here put that over here um and so i had learned arm assembly and i'd
thrown everything out the wall and so i was writing everything still in arm assembly so i got to university that's where we were but before we started this
i discovered the internet and the internet was amazing and uh one of the first things i did was
write an internet relay chat client for my acorn because they were still niche even in the uk you
know nobody had them right and so if you wanted to join in irc you either went to the the lab and you used irc
like the command line client in unix or if you had as a client on your your your local machine
and you had like a serial cable to connect to the network then you could you know actually uh
do it from a gui i decided to write my own and because i only knew assembly i wrote the whole
thing in arm assembly and it's i don't know how many thousands and thousands of lines it's on github
if you want to go and laugh at it all but it was well link in the show notes for sure
if people want to torture themselves but it was a fascinating experience of learning so
while i was supposedly doing my physics degree i was writing this irc client um the irc ended up, because all IRC clients at the time had like scripting languages built in them,
so you could like do auto greeters and things like that.
I ended up writing a scripting language in it, which looks remarkably like BBC Basic,
except it was object orientated.
And then I was doing managed memory.
And so I invented this way of cleaning up the memory after you'd finished with it
without having to free it manually, which I later discovered is Mark garbage collection and i'm like oh right and at some point along this
path it should have dawned on me that i could should ask my roommates who were like doing a
actual computer science degree what the heck it was i was really building um but towards the end
of this it became obvious that it was absurd to be writing large GUI applications in pure assembly.
And so begrudgingly, and because I wanted to have my programs run on the computers at the university lab, I learned C.
But C back then was the kind of C that the compilers weren't sophisticated enough.
The kind of thing where you could see pun intended again what assembly was
going to come out the other side you know in x equals zero oh i know that's going to be an ldr
zero you know r zero common or whatever mob sorry see i've forgotten all these opcodes now
um and i think you know this is a setup for where i ended up with you know seeing the way that the
compiler takes your code and puts it out into uh uh, uh, right into the output. But, uh, yeah, so that was, that was how I learned, um,
C. Um, and where did you get your compiler, uh, from at that time? So for, uh, at, uh,
university, it was GCC or the CC that was on the spark station, the ERIX workstation or whatever
it was i could
get a hold of around this time as well we inherited between me and my roommates we
inherited a multi-user dungeon source code which was kind of how i learned c really was was having
to hack on it and extend it and add new stuff to it um so that was that was fun um and um yeah so
that would compile on whatever machine we could steal time
on to run our mud and have other people connect to which obviously was not very many people didn't
they didn't like the idea of us running long-lived services so right um yeah you could imagine um
and oh i've just lost my train of thought sorry what we got you what what came after so so you're you're kind of uh uh you know learning c
you're experimenting with uh various machines and uh you know running some of these services and
that sort of thing at that point did you start to think okay maybe maybe i'm spending a lot of time
on this maybe this could be more related to my profession as well i don't know that I did explicitly you know I was I was I I was scraping by my degree I got like a
mid-tier degree a 2-2 in the uh by the end of it all and in the last few weeks I started looking
for a job somewhat half-heartedly and then somebody on IRC in the hash acorn channel
said well you could try applying to my company. We make computer games.
And I went, well, I've always made computer games.
I've got them around.
You know, this MUD is kind of a computer game.
It's a different kind of computer game.
But, you know, I've still got my eye in, as it were.
So I messaged him and he said, gave me the details.
I applied and that was my route into the games industry,
which was basically my career for a decade.
It was based on a random conversation with an internet stranger on an IRC channel using my own handwritten IRC client from a computer that nobody knew about.
Right.
And so did you start working there pretty much immediately and also uh was
this like was there it seems like i'm not super familiar with what the culture was like um at that
time obviously computing you know was a a big part of the university and you know you mentioned the
government kind of like commissioning computers right so it wasn't like this was a uh you know
an unheard of thing but was there any sort of notion of like you you were going to get a phd
right now you're gonna go work on games or was it pretty much not really i mean i mean it probably
took 15 years for my mom to stop asking me when i was going to get a proper job right so you know
from her point of view i never had a proper job but then it was a games job anyway so i mean you
probably could have walked into some mortgage company writing admin systems or whatever and that
would have been seen as like a real good real job but but um no it was so yeah i i got the job
actually it was the the end of the penultimate year i i know that's um yeah i don't know how
common that is over here but like you know there's not kind of an internship but I went for it anyway ahead of time and they said we don't need you to have a physics degree
you should just quit and come and work for us but yeah I thought I better at least finish my
degree and have something to have my name on which actually turned out to be a very good decision
later on when I tried to move to the US and it was very helpful to have a degree in order to help the process there but that's a whole other story right but no so I actually went to so the company
was called Argonaut Games it was one of the biggest independent games companies in the UK
in fact probably in Europe at the time it ultimately floated on the stock exchange so it
was a big enough company to go on to the the uk stock exchange although that was kind of the beginning of the end unfortunately like so many dot com style booms
although a lot earlier than that um the argonaut is probably noted because uh it was the sort of
silent partner in the super effects chip which powered starfox in the super nintendo so if
people have ever played starfox you know i came in at the tail end of that
that starfox had been out and there were some sort of secondary and even tertiary games that
were using the super effects chip but uh jez the the ceo um had sort of basically lied to nintendo
telling them that he could easily generate you know 3d graphics it can't be that hard kind of
thing again so there's kind of a theme here going right you know like how hard can it be he said and then he sort of came back from
a meeting with japan um there's a long convoluted story but this is an extremely short version and
probably equally inaccurate version um and basically said to people um who knows how to
make a6 and maybe i don't know and so they designed this chip which was essentially a 3d co-processor well
before its time although insanely convoluted to wedge it into a cartridge as a sort of secondary
on a system which wasn't expecting to have a secondary chip other than like ram and ppu and
maybe some other sort of addressable stuff so it kind of involved a lot of dancing between the CPU that was running instructions
where essentially like read from RAM, read from RAM, read from RAM, write to RAM,
read from RAM, write, you know, to copy the data that was being created by the 3D accelerator.
The main CPU was just like hot passing plates.
And there was some DMA behind the scenes.
I know it was very, very complicated and complicated complicated but they got 3d graphics out of it and um you know i i was lucky enough to work
with the folks that the designed the chip and argonaut itself separated into arc which became
a chip manufacturer although i subsequently bought out by various folks they had their own cpu soft
core cpu which is kind of interesting and then the technology group which is what i was actually working with so i got to work with
some of the tech folks from there and you know there's some fascinating things that they stories
that they had um but yeah so that was that was actual silicon that was designed and implemented
um and you know around that time as well was like the of the consoles. And so we were starting to see these really strange beasts
that Sony and Sega and Nintendo were putting together.
So I was exposed pretty quickly to these very esoteric, to me,
my lovely, beautiful ARM instruction set notwithstanding,
these strange processors, the Hitachi SH- uh in the the dreamcast which is probably my
favorite with its 16-bit fixed width instructions and its strange addressing modes and things like
this and you're like well yeah this is this is cool um and you know starting to learn um and
have very simple tooling about how multiple issue stuff was going to happen like cpus that could do more
than one thing at a time the arm was pipelined and very beautifully so like everything was done
it's like extremely easy to predict what was going on but um with things like the sh4 they were like
well there were pairs of instructions that you would go together provided there were no
detected hazards between the two instructions and they were of you know sort of appropriate types like you couldn't do two
multipliers at the same time of that kind of thing then they would pair
together and so you would see these you know rather you would write this is
still at that time when the compiler was good but it was still pretty worthwhile
spending the time to write the assembly yourself right and so you would sit there and pair them
together and uh that was that was a really interesting learning experience and i that's
so my github is a mind of like nonsense that i've left behind from from the years i've got before i
thankfully got the permission from jez to to publish the source code to this so you can go
and actually have a laugh at the source code and it's not just mine obviously but the the renderer is mine you can go look at some comments from like 2001 i think
that i was writing where i'm swearing and cursing at various things that don't actually work the way
they are and you can sort of see this strange format that i picked up where i was pairing
instructions in the assembly and where there were unpairable instructions i would put a knob
so that i could show and but it was not a real knob. It was a knob that I could hash define in or out.
And so I'd assemble it once with the knob in place
and then run it, measure how fast it was.
And then I would disable, sorry,
get rid of the knob completely and then compile it
and then prove that it was the same speed,
give or take the fact that the code
was a tiny bit more compact, right?
It was a little bit more.
And that would prove to me that I'd done it right and i was still pairing the instructions that i thought
i was i was pairing right so that was it yeah this was all this was all um like explicit instruction
level parallelism it wasn't doing the machine itself wasn't doing any of this for it was yeah
the machine would would very simply pick up like four bytes at a time and if you could see the two instructions were like okay based on it's like the the registers didn't overlap and there were instruction types
that were compatible with each other then it could issue them together but yeah it wasn't doing any
out of order it was like just two at a time and around the same time actually the the x86 was in
the same kind of world this was like um intel had the u-pipe and the v-pipe
they were the two issue stations and you know there was everything i never really did much of
this but around me in the atg group was the the folks who were writing brenda which was a
uh a so-called blazing renderer of course these terrible names that we come up with
but um brenda was using a number of games like there was like a middleware um for a number of games including things like carmageddon um and um uh croc pc which was a game
i actually worked on and but the the interesting thing was um that yeah they were still writing all
this stuff in assembly for because it was software rendering pre the you know the beginning of like
um uh graphics cards you know they were they were
around but a lot of people we couldn't afford them or it was like the 3d effects which was a
secondary graphics card you would plug in and then you would have to put a pass-through cable from
your vga card that did your 2d graphics up through into the 3d effects and then you'd have another
cable that went to your monitor and it would kind of like essentially you could hear the relay click
as it went into 3d mode and like took over all this kind of nonsense.
But yeah, so there was a lot of concentration on like, how do we lay out the code so that you and the V pipes are fed so that certain instructions could go in the U pipe and certain other instructions could go in the V pipe.
And they would be issued together again, similarly, if they didn't have the right, the wrong kind of hazards and then just as i was getting into the pc stuff myself
um the pentium uh pro was out i think it was a pro and um i had one of the early prototypes
the klamaths it was a huge thing and for a long time afterwards actually after the so
spoiler alert argonaut folded and a lot of the stuff went home with the employees and my klamath
this strange prototype went home with me and and my klamath this strange prototype
went home with me and was my for the longest time was my dial-up modem um like um gateway machine
running on my prototype property of intel do not distribute all over it like fine right um
but anyway um but at this point was the first time that they were starting to do proper out of order execution and so we had them come into us and say hey you know this unv pipe nonsense that you've been doing
forget it um you just can't predict what it's going to do anymore it's so clever it optimizes
for you everything's magic you know use our compiler uh everything will be fine um just
measure it we have this thing called vtune which kind of tells you after the effect what happened.
And, you know, great, I guess.
And, you know, there were obviously things
that you could see that it was doing,
but it was, we started to consider it,
at least I started to consider it really a black box of like,
I just don't know what magic it's doing.
And so around that, you know,
so I spent some time on PC things.
And so just enough to get that kind of exposure around that time.
And then I moved on to Xbox and PS2, which was similarly painful.
That one, certainly for the VU processors,
there was dual issue, so it's sort of VLIW style,
dual issue with the U and the v pipe were very explicit in this long
vliw thing and there were no data hazards you just had to remember oh yeah if you do a multiply
it'll get written back on cycle five you better be ready for it but that meant you could interlace
things yourself you could go well okay and so um i think carl graham who was one of the super effects
folks actually he came up with this rather novel spreadsheet programming method with macros in the spreadsheet so that you would type the instructions and things.
And it would highlight with colors where the result of this instruction comes out down here.
And then you could work out that it would actually fit and all this.
And it was very painful to do, but it was actually necessary. At that time, there wasn't even a C compiler that could target this stuff
because it was so Byzantine, you know, it's so weird
and so very special case for geometry processing.
You know, we call it vertex and pixel shaders,
well, vertex and geometry shaders, I guess, these days.
Later on, they had like a very smart assembler
that let you write the assembly without thinking about the hazards
and it did the interleaving and the VLIw wing so it got a little bit better but that was my sort of
my first um real foray into oh my gosh there's a lot of things that the cpu could do for us that
i've been spoiled into it doing them for me right is the this one doesn't do it where the uh so i'm i'm not knowledgeable of making games at all uh
which is is kind of a i feel like an an uncommon thing but i just have have not had any interest
in games themselves but i'm very interested in the hardware that goes along with it so i'm curious
um you know when you're writing uh like you know software that's going to run in a data center, you don't really think about the underlying hardware that much.
And I believe now, you know, with game engines and that sort of, well, yeah, we might get into when you do think about the hardware sometimes.
But with, I believe now, like, game engines and things like that allow you to kind of, you know abstract across multiple platforms were y'all writing games at that time that were targeted to just a single platform and was it a lot of work to
move from one platform to another or deliver on multiple platforms so yes and no um i think all
of the best games of that era were single platform and they really played to the strengths of their
individual platform there was enough to discriminate between the platforms um you know like the playstation had an insane fill rate it could write the pixels to the screen
so quickly um but it could hardly do anything it was no blending modes that it had there were you
there were some tricks to doing some uh some of the things you would otherwise like to do
um but whereas the xbox was not so high the fill rate, but had higher vertex throughput and was easier to work with.
But you,
you know,
so you kind of like you are,
you would trade off of the different,
different approaches,
but yeah,
so the games that I worked on were actually multi-platform,
but we didn't really have a generalized engine.
The,
the engine that me and my friend Nick Hemmings wrote became the de facto
engine for two platforms and a few games around the time
so it powered the game SWAT which was SWAT Global Strike Team which was like one of the SWAT
franchise games for Xbox and PlayStation 2. PlayStation 2 came along late because at the time
we were Xbox exclusive and so we kind of went to town and i
wrote shader language and i wrote a shader compiler that compiled from that like my little uh dsl down
to a vertex shader program which could you know calculate all the uvs and a pixel i'd like i'd
been enamored by toy story and i've been reading up about how pixar did things and i heard about
these shader things i was very excited and so i did all this this stuff and they I mean that was fascinating the the way that this the systems were working
under the hood and how they managed to get the the power that they got out of a very it was a
relatively early nvidia part um and um and interestingly they told us you know we we can't
tell you how it works because we have agreements with nvidia um it's directx as far as you're concerned and then they would cough politely and say but if you look in the header
file maybe you'll learn a thing or two and then they walk away you know and you open up the header
file and all this so directx is com i don't know if you've ever heard of com or you know about com
it's this really i have yeah janky business thing that like you inquiry an object for what interfaces
it supports and then you say get me that interface and it returns you like essentially it's all c++ virtual tables and
things behind the scenes or c function pointer arrays or whatever but you look through and you
see that it's actually just a bunch of macros that they defined in a header file to make it look just
enough like com for you to be able to write com and then very clearly you were being handed back
structures that were you know obviously the actual
things that were being sent to the hardware like thank heaven for that you know we're we're able to
talk to the hardware ourselves because again these earlier machines like the playstation the
playstation 2 um the the dreamcast they essentially just send you the hardware manuals you know poorly
translated uh hardware manuals this register does this thing good luck off you
go it's mapped at this memory location have fun bye you know you know oh okay um so you were very
very much exposed whereas microsoft couldn't expose us at that level because a they had an
api they wanted to kind of marketingly say hey it uses direct x and b they couldn't breach their
contract with nvidia but we got to learn how the nvidia chip was working we understood how the the various like tricks that it was doing and how it was stamping
down multiple pixels at once and how it was you know discarding things based on some clever uh
tricks behind the scenes it was it was a fun experience to learn that you know cpus don't
have to look like you know fetch an instruction run the instruction get on with the next instruction
it could be like no fix fix 80 copies of the data run them on little threads that are running the same bit of
code but different data you know but not in a simdy way in a kind of like parallelized across
another way it's like really interesting like how do you hide the latency well you just do another
one just keep doing more of the same one you're doing the fetch in the first cycle for loads of
them oh that's really clever i'd never thought of that so that was an eye opener um and i there was a reason we were going this way and i can't remember what it is
no just uh targeting multiple different platforms oh that's right it's a different platform so yeah
we painted ourselves into a corner by putting all these whiz bang features into the xbox and
then saying like well the xbox isn't doing as well as we'd like how about we port it for the playstation 2 and that was a very painful operation that's where we grafted someone else's
like core core rendering library onto the bottom of our xbox 3d engine and kind of pounded it until
it worked found a number of ridiculous ways of of getting the full screen effects that we had on
the xbox on the xbox
using the xbox beautiful blending modes to work on a playstation 2 which were all variants of
the theme of if you've got a 24-bit frame buffer in memory but you lie and say no it's an 8-bit
frame frame buffer by setting the flag that says it's an 8-bit frame buffer well it's actually
planar and so there are bits of
the way the ram chip on the graphics unit work map each plane in this particular way which means
that the red pixels are like a 16 by 2 array if you're viewing through an uh an 8-bit lens of this
32-bit buffer and so you can draw a little set of triangles that just picks out those and then you can use it as a multiply because it's got an 8-bit multiply you can do an 8-bit multiply so you can draw a little set of triangles that just picks out those and then you
can use it as a multiply because it's got an 8-bit multiply you can do an 8-bit multiply so you can
do the red multiply if you do this but like that's only 16 pixels now you have to move 16 across and
grab the next batch of red and then the next one it was zigzagging and all this stuff so you'd end
up like sending you know hundreds of thousands of triangles to the system to pluck out the red
the green the blue independently to then map like a full screen red pick triangle a full screen green triangle a blue triangle just
essentially get a 24-bit multiply of red red with red green with green blue with blue you're like
you know why does it have to be so difficult but right it makes you appreciate um the trade-offs
that you make in this design space my understanding was that the blending modes that were available,
so the blending modes are like, am I replacing the pixel that I'm writing to?
Am I adding to it? Am I subtracting from it? Or am I multiplying with it?
And this gives you different transparency or opacity or other special effects.
But you've got to have a lot of adders and subtractors and multipliers to be able to do that and i believe the way the playstation worked is they pushed the circuitry
out to be in and amongst the ram of the uh the frame buffer so that the sort of the last stage
blending happened with the packet from the the sort of gpu going hey i just want you to do this
operation to the ram and i don't have to read it to modify it to write it back again i just send it to you and you do it in place and that's a really
cool trick but it means it's really limiting because you can't have all these other blending
modes or because you're just blowing up and blowing up the amount of silicon you need at
least that's my understanding how it would again this is through a lens of like you know 20 20
years of like hardly remembered things but yeah so it yeah, so it was a challenge to do cross-platform development.
The machines were significantly different from each other.
I think you said earlier that nowadays, engines are sort of commoditized these days.
And I have a friend who still develops for Unreal Engine.
I still have another friend who does consultancy work in the games industry.
And I said to them,
oh, you must do all this stuff still.
And he goes, no, not anymore.
You know, there's five people at Epic
that do that kind of stuff.
And then everyone else just uses the engine.
And actually it was a very sad thing
that he said to me.
He said like 90% of the work
that we do in games these days is UI work.
I'm like, what?
He's like, every game is just another 3d game with
whatever textures and animations and stuff which is all solved problems right and ai this and
whatever moving this and said like but every game is its own unique bespoke shop for you to buy
all of the merch is how they really make their money and you know like and so you're writing
like web pages in 3d like drawings and stuff like click and rebates and it's been very sad how
the industry has changed right you uh i i believe after um argonaut uh shut down that you started
your your own company for a period of time i was curious if um i guess you know first why uh you
decided to do that and what that experience was like and and two um if you know
maybe some of those changes or changes you saw in the gaming industry kind of started to lead you
away from from working in that industry yeah it was definitely so the the games industry is and
probably so it was and still probably is uh very crunch heavy you know i was fine in my 20s when i
didn't really have anything else to do and i was
my entire life was like doing this kind of stuff that i was happy to spend till very late at night
and then go get last orders in the pub and then crash come back again the next day to do it all
over again um so yeah as you said argonaut ended up folding it went under um and uh around towards
the end of this my my friend nick and i the guy who had
written the the engine with uh we we had been enamored of trying to make the build time lower
so c++ is notably very slow to build not as bad as the those folks who are listening who are
screaming saying but what about like chip synthesis you're like yeah okay we're not in the same league as that but but it's still frustratingly slow and there
are ways and means of laying out your code in a different structure unlike most other programming
languages where there's only one way to like do something in c and c plus plus you've got a choice
about do i put this in the header file do i make it a template do i not make it a template which
has actual structural build time difference changes.
And so we had this great idea for like,
well, we thought a great idea
for how to change the way people program.
And we were kicking this idea around the end of Argonaut.
And then when Argonaut went down,
we looked at each other in the eye and went,
should we give this a go?
And my then girlfriend, now wife,
had just moved in with me.
And so I was like, well, I guess you could help
pay the mortgage while I try out my idea.
So we formed a company called Profactor
and we had this idea for storing code in a different way
so that it was easy to render the code out
in a way that was very friendly to easy to render the code out in a way that was
very friendly to the compiler without the human having to remember oh if i'm pre-declare this
rather than not pre-declare it and you know all the rules and things that you can do to make your
code um faster to compile say or to make it more incremental to build and then you can render it
in a different way and say like hey the compiler can see everything now this is a so-called unity
build um it'll take it forever but you'll get a really good build out of it you know nowadays compilers are able to do this kind of stuff
without you doing so many of those tricks but they're still sort of relevant anyway we thought
it was a great idea we did a whole bunch of technology it didn't work out we ended up making
ends meet by doing consultancy for the only thing we knew what to do which is video games
so i got to do a tour
of duty at a few places around including rockstar which was cool to to work uh with with those folks
and see the see some of the code uh yeah some of the code which you're like wow you make a lot of
money out of this code uh i'm i'm very glad you do because i wouldn't want to work on it myself
it's really complicated looking and full of bugs and oh gosh.
But those were fun times and we really enjoyed them.
But yeah, like anything, you get a window into another person's world.
You know, I'd been at essentially a monoculture at Argonaut.
It was a big, big company for the time.
But, you know, it was only one viewpoint about how to do things.
There were teams that differed, but going into a whole other company and going, oh, gosh, you developed very differently was eye-opening.
Right.
And how long did you all run Profactor for?
I'd have to – it's a few years, three or four years, I think.
Yeah, something like that.
This is where I would bring up my linkedin and go and look
i've got such a bad memory it's like i don't have to remember anything more the internet holds it
for me right right so it was a few years um and you know we were doing we were doing fairly well
we had two products out that were actually under our our own name um they were essentially small
pieces of this big project that we were doing one was like a a c++ code formatter which
sounds very you know you could a few regular expressions surely is all you need but secretly
it was our way of actually parsing the entirety of c++ into an intermediate representation that
we could then re-render out like we would re-render it for the compiler or for you know various
different optimization things except that we could
re-render it out and change the white space right that was an easy thing to do so that was our way
of getting in a marketable product that was a plug-in for like visual studio and you know some
folks bought it we did all right but not enough to keep the lights on really and then some other
stuff that came out um around um following include paths and things again sort of thematically correct for our mission but
not the actual thing we wanted to get out and then towards the end of that i i had a friend at google
who kept going to the pub with me and sort of saying i really wish i could tell you what i was
doing but i can't because i am not allowed to tell you and after a few years of this you know the
interest gets peaked and you're like well right all right
well maybe all right and so i applied to google um and probably the hardest conversation i ever
had was telling nick uh our little partnership was like uh i'm gonna go work for google i'm
really sorry right um luckily he still talks to me uh he works for deep mind now actually he's
doing some really cool things that he can't talk to me about now so he kind of gets me yeah um and i went off to google and uh immediately got handed a nokia phone
an early nokia phone and told youtube needs to work on this can you make youtube work on this
and this was like before people had data plans it was like hardly any phones even had wi-fi on them
so we're like who is the target market for this this 320 by 200 pixel screen right you know and what is who who even uses youtube you know
wasn't that huge of deal at least in my life back then and so my my life was was optimizing and
trying to get essentially game level trickery amongst other things a lot of other things as
well but to get the video to decode reasonably well which mostly meant liaising with the hardware because these
things would have hardware mpeg decoders in them and that kind of stuff or but more notably it
would be more like going out to san bruno where the head office of youtube was and then groveling
to the people who do the service to say yeah this is one phone and it's it's too it's not powerful enough to do full
software decoding and its hardware mpeg decoder is broken and switches red and green around um
can you transcode all the videos in a new format so that red and green is mixed up just so that
this stupid phone can like because we wouldn't have the cpu time to like switch them back right right and then so like that eye rollingly all right fine um so that they actually ended up doing
that yeah there were a couple of things like that a couple of workarounds i don't know that they do
and they would do it on demand it would or when a video had triggered so many views it was it was
an fascinating experience to see how that stuff was done behind i'm sure it's vastly different
now now 15 years on but like back then it was like hey you could actually log in and i could sort of ls
the directory that videos were in and kind of look at them and like wow that's they're just files and
you're like blows your mind it's like well of course they're just files but like what were you
expecting but right but still so yeah so i spent a couple of years um doing various cell phone based youtube things so
if you used a non-iphone non-android version of youtube back in the day either we had like j2ee
which was like the java thing that ran on or j2me sorry that ran on phones um then yeah you probably
used a bit of my code and then we did latterly pick up on the
android stuff that was developed um over in mountain view but so i was in london still at
this point in time so i'd had a you know google's a was a fantastic company probably still is
i i don't know i don't want to make too many comments about that kind of thing right um but
was the uh it that's pretty different environment though from your
first you know organaut seems like it was a relatively large company but not to google
sale and not google like a couple hundred people at its peak right you know i still knew pretty
much every single person in the organization especially having been there eight years
but yeah google was you know hey there's 20 000 people you you know even on the floor that
you're on there are more people than you'll ever be able to recognize you're like wow it was it was
mind-blowing was that it was also uh what was that a uh kind of like um informative experience for
what you wanted to do later like did did you have the experience of oh this is kind of big and i
think i i prefer something a little smaller or was it you know, this has its own trade-offs and there are pros and cons of each? I don't think I had that
level of introspection going in. I think latterly, when I rationalized my decision to leave, I think
that some of those things folded into it. But certainly to start with, it was just amazing.
It was, you know, you'd felt like you'd been given the keys to the chocolate factory internally google is so open you know there wasn't really
much information about how anything was done and there weren't so many of the white papers out
about how their internal stuff worked and so to be let a mock you know free in and watch all these
videos and learn how queries were handled learn how they were doing locking at scale,
learn how they were doing some of their fleet-wide profiling
and the fact that there may be a person somewhere
who's shaving one cycle off of a mem copy
and knowing that that's worthwhile.
I think one of your earlier guests was making these kind of mentions
that that's something you can only do when you have like cloud scale and Google were early in that.
And it's like, wow, how amazing is that to be doing that kind of work?
You know, it's it's it's bonkers. But yeah, it was great.
But then, yeah, it was kind of a comedown retrospectively to realize that I couldn't move the needle.
I couldn't move the needle at all.
I mean, I was still relatively junior in my world views about how things were even 15 years ago.
Because I'd been lived in this sort of very cloistered world of games.
And then I was like, oh, I don't even know how software is really made in professional big companies.
But it's all the same. Anyone who's big companies, but it's all the same.
Anyone who's listening to this, it's all the same.
But yeah, so I realized that, yeah, being at a satellite office, being in London, which was a big office, but mostly was marketing people, salespeople, and a reasonably large uh division of programmers but they were all
essentially to cater for the european phones so it was very mobile centric and so we were seen as a
sort of strange backwater in some ways it was hard to even within that make a difference and then it's
certainly harder to make a difference within the company as a whole and you know the two or three
times that i would pop over a year to to mountain view or san bruno or montreal whatever um you know
you could feel that you were making more of a difference just having a few conversations than
you were you know beavering away sending um change lists to to people and so yeah i think again that
was a post-hoc rationalization when i decided to leave um And I ended up in finance. I had a friend who had
left Google about a year before. The pair of us had worked on like an open source meetup that
Google sponsored. And so we'd bring in people and we were chatting. And so that's how I knew him. I
didn't work with him directly in Google, although we were both worked for Google. We were both like
organizer type people that were happy to stand up in front of a bunch of people and talk so we would
do that and we would get people in from the london open source community and we'd have presentations
and laughs and drinks and all that kind of good stuff and then he left and i didn't really know
where he went but i sort of took over open source uh jam with some other people um i don't know if
anyone listens to this now i'm gonna be like no i did it sorry yes um and i didn't really think too much of it until about a year later presumably
around the sort of non-solicit end of a contract end he he out of nowhere reached out to me and
said hey matt you should probably come and talk for you know come come and have lunch with me
i'm like what are you doing you went to like finance i don't know about that and he said no
just trust me come for lunch and so i went and met him for lunch and i went around this office and i was like
wow you're solving really interesting performance problems um this is not what i was expecting
finance to be at all i was expecting you know huge database query type things and all that
nonsense but like no there are people that are solving difficult computer science-y problems. And maybe I am interested in this after all.
And I went for an interview and they said, sounds great, but not in London.
Come to Chicago.
So I did.
And this is where I still am now, 13 years on.
It turns out that that very thing that you were saying earlier about why on earth do we need to
know how computers work these days and they are with these huge data centers full of machines
doing whatever that is true for 99.9 of the world but for the point one of the remaining world that
is the finance industry or the hyperscalers doing their web serving probably as well in fairness but
we care
about that stuff and so suddenly i've been thrust back into the same joyous position that i started
in when i was 10 15 years old learning assembly to get more sprites on the screen and coming up
with crazy ways of like jiggering around things to get another one cycle's worth in my loop for performance reasons.
Except that now, instead of cycle counting
on a two megahertz machine,
I've got the fastest CPU that we can throw money at,
cooled as much as we can possibly cool it
with all of the trimmings turned on
and all of hyperthreading's turned off.
Why would we want hyperthreading?
That steals away from the cores that we carefully have crafted to do that thing.
You know, we carefully manage our thermal stuff, you know, like pin to these cores,
isolate them in the operating system.
Don't run anything on those other cores because if you do, it heats it up and then we lose
power from the other one.
You know, that kind of nonsense.
And you're like, wow, this is fun again.
Now we're right back to where we care about what's really happening under the hood.
And obviously that's, even in our world,
that kind of excitement that I'm demonstrating
represents 0.1% of the job, right?
Everything else is just like everyone else's stuff
of like, well, we still have to write the tests.
We still have to write the code
and someone has to write the build system
and we have to kind of deploy it
and we have to make sure that it's right
and all that good stuff.
But yeah, every now and then you're like okay how are we going to
make this go fast and right knowing how the hardware works at a deep level even though most
of the time you're floating above seven or eight layers above it abstraction layers above it is
still fun and exciting and that's when i started looking into micro architecture so i picked up on
the thread that i dropped when the intel engineers
had told us to just use vtune and i was like no no there must be a way of understanding this is
tractable surely surely somebody has worked this out and by this point people had started
seriously reverse engineering how intel processes work and that was an eye-opener for me and the
fact that they then published how they did it and you was an eye-opener for me and the fact that they then
published how they did it and you could learn tricks and techniques for taking the chip inside
your computer and like running experiments and going well this must be what this thing is then
wow i'd never really thought of that and so that was a huge uh moment in my life of like going wow
we can understand this we can rationalize it we can even measure it some of the times with intel's own tools that they don't really specify very well for obvious reasons but right yeah exciting
exciting stuff and so what what are some of those like resources i want to talk about the the finance
uh world because i think that's uh particularly uh opaque especially to folks on the outside
um which there's there's, there's probably,
that's probably going to impact maybe some of the things we can talk about, but to some extent,
yeah. But I mean, yeah, yeah. I've had, I've had some exposure. Um, I went to university in St.
Louis and, um, and so we would go up to Chicago to the high frequency trading firms and they'd
have like these competitions where you, it was basically like algorithmic trading competitions and they do a simulation um so i got a little
bit of exposure but i am interested to dive into that but i would be remiss if i didn't dig in on
you mentioned some of those resources um that you've been able to use to kind of do some of
that reverse engineering and experimentation uh what what are some of those well so the first one is the sort of the bible
by agner fogg who is a sort of very uh interesting person from a some nordic country i think he's a
professor of of um of something unusual it's not actually computer science or anything like that
it's it's some some something else but he's got a passionate interest in reverse engineering and he's written these pdfs that are like fully take
take a parts of the the pipeline of all of the major revisions of of the intel pipe uh intel um
line of chips you know starting from the the earliest pentium 3 all the way through to modern
day core 2 type processors and you know he explains everything that he's been able to work out in a very
accessible way and like it's one of those things where i don't know if you have anything like this
in your life where once a year i reread it anyway just though even though i think i know it there's
stuff that i miss and i've got there's two or three books that fall into this category i've got
bjarnestrewstrips like tour of c++ which is a small book but every time i read it i go oh i don't
think i knew you could
do that you know it's a huge language right um another one is agner fog's um performance
manuals um i think the third one if you go to his website which is delightfully 1990s era white
website with the most disgusting background color and animating gifs and things across the top you
really honestly feel like you've fallen into a myspace from yeah the 90s or too early 2000s and it's you know it's a choice right that tells you
who he is um so i'll read that and then there's also uh charles petzold's um annotated turing
which is a fantastic book of just you know like learning where this whole thing started and how
it came out of one person well obviously lots of people have contributed over the years but like there's such a defining
story of uh how computers came to be in a very abstract way you know that's about as abstract
as you can possibly get with an actual turing machine and it's infinite piece of tape and you're
like well that's very different actually uh right right but um yeah so the resources that he gets
first of all you know he's he's he's done the research and he's got the receipts and he can show you the receipts, but he's also written quite a very accessible prose around how all these things fit together, what the various stages are, how long things take in general, what the various execution ports are on the x86, how many there are, what types of instructions go to which ports um how
retirement happens how the register files accessed how there's and a lot of this stuff comes because
like intel want to be able to tell you where the bottleneck is in your code they won't tell you
exactly what's going on but there's probably a counter somewhere with a name in the manual which
just says reg file stall or something like that with like number of registers file stalls and
that'll all it'll say and then you can go well let's write an experiment how many instructions file stall or something like that with like number of registers file stalls and that little say
and then you can go well let's write an experiment how many instructions can i queue up to access
different registers that haven't been renamed which is another thing like so i'm gonna thousands
of nops beforehand so everything's out of the rename buffer okay let's try these things and go
oh i can do four i can do five oh six. Okay, this counter started going up. That kind of feel, right?
And so he has an open source project as well,
which you can go and fiddle with
and you can use to set up
and tweak little experimental pieces of code.
And so that's one of the main resources.
And yeah, again, that's something
you can reread over and over again
and always learn something new.
Similarly, there's,
I don't know the folks behind it, but the uops.info is a website that has essentially all the xml or json or yaml or
whatever description of every single instruction that there ever is or was for every single
architecture they could possibly run the code on and then you get like well this is how many cycles
delay it is this is the reciprocal throughput this is all these other things these are which
ports we observed it going through.
So this goes through ports 0, 1, and 2, but not 3 or 4.
Those kinds of things.
And then they have some of their own code as well,
which at some point I will integrate into a website
to make it available to all,
that does a very good job of like a Python-based simulator
of all of this stuff.
And they've kind of done a, there's a paper out somewhere
that describes the process by which they went through the process by which
they went through to get to the sort of almost one-to-one mapping with the real hardware which
i thought was totally impossible you know here i am sweating over getting a 1980s era computer
with like very very simple to be perfectly in sync with the reality and then they're like no we can
write a python program that can simulate this tens of billion transistor monstrosities that we build
these days right so those are some of the resources um and um yeah i mean i my own tiny tiny tiny
contribution to this was trying to reverse engineer how the branch predictor worked
under some circumstances.
One of those things where I thought,
I'd read this thing on the forum over and over again.
Oh yeah, the branch prediction, blah, blah, blah.
It always assumes that branches backwards are true
because they're presumably a loop
and branches forwards are false.
And I've got it in my head like,
well, the thing is,
it doesn't even know that there's a branch there
until it's decoded the branch which is actually five or six pipeline stages from the fetch and
so it's already too late at that point so there's all these various different well you know if
there's a branch here um and if it's a conditional branch you've already done all this work maybe you
should just let it fall through right also how do you know if you've seen this conditional branch
before or not? Because most
branch prediction algorithms these days use some kind of hashing function that kind of
hashes the branch, the pattern, the phase of the moon, yesterday's lottery results,
comes up with a number, and then it looks in the table there, and it doesn't know whether this is
really for this branch or not. It doesn't store tag bits, because it's like, well,
if it isn't, what am I going to do? i might as well come up with a guess right um and then so you know you think well if
it doesn't know the branch has been in the table before or not that it's actually for this branch
then how can it predict forward or backwards because either it's too late because it's already
run through the pipeline they might as well carry on or it's got a prediction and the prediction
it doesn't know if it's for this branch anyway so so i wrote a whole bunch of stuff about this and um i had access to like a really
a weird server machine i had in my basement still in my basement in fact still my main server uh and
so i ran all these experiments and it found some really interesting patterns in the way that
it um the branch target buffer which is i think a thing that one doesn't think about with branch
prediction i certainly when i talk to to folks like in an interview setting we talk about branch prediction and it's always is
the branch taken or not right that's what most people think of but like it's like is there a
branch there at all is the question you need to ask before you even start fetching because like
i said it'd be five cycles on you've like finally decoded the world and you've got oh there's a
branch here you're like well too late the train's already gone down that route ahead of you right so you have to kind of predict if
there's a branch there at all and then where the heck it's going to because decoding the destination
is half of the the trouble so and obviously a lot of branches are not conditional they are jumps or
their calls or their rats or whatever and so trying to make that prediction happen early is what the branch target buffer is doing
and then secondly if and only if it's conditional is it taken or not right but we always think about
the conditional or not conditional thing so um anyway i was doing this whole bunch of analysis
on the branch target buffer and then my one time micro claim to fame is when the the um the the
paper came out about um and so yeah when the the paper for meltdown
inspector came out i uh i got a little footnote in the the the as a citation saying like this is
how some of the ways that you can predict where the branches are going to go or what or not and
i was like wow this is my first like proper security paper thing which cites me i mean
it's literally like the bottom of the list of
things but you know it was cool i yeah that's awesome i think that's very cool yeah um it kind
of circling back around to um you know getting into the the finance industry and some of these
performance qualities maybe like i don't i don't think you know i have enough context to even ask the
appropriate questions so maybe even start from the uh like immediate differences in terms of
the infrastructure uh and compute that you're using and how you'll manage that and how that's
set up as maybe in contrast with you know at one extreme maybe you're using like a public cloud
provider but even for folks that are uh using you
know are hosting their own racks and that sort of thing uh where does kind of finance start to
diverge at that highest level so you know obviously we have a ton of normal needs and requirements and
they have their sort of so we have our own internal clouds and things to run like big batch jobs and
there's a lot of like you know data gets shuttled around and it's not latency sensitive or even particularly performance but um and and finance or certainly trading is a
huge huge huge wide diverse um pursuit uh and it you know like in my current company we have
some things that are we're trying to predict the future in months time. And then, you know,
it doesn't really matter how quickly you predict something that's happening in three months time,
because you've still got three months to take advantage of it, right? You know,
so it took 10 seconds. Sure, that's fine. I've written the whole thing in Python, it takes,
you know, 10 seconds to run. And that's absolutely fine. No one's no one's going to bat an eyelid at
that. Obviously, if you're making a prediction, it's five minutes in the future. Now 30 seconds,
if it took you 30 seconds to make a prediction that's eroded into your prediction it's now like
your prediction is already 30 seconds old by the time you've made it you're like okay i can see
that's problematic so um you know and we might want to make predictions all these different
horizons you know canonically you know like real estate folks will buy up large swathes of land
and hope you know and that's one that's a perfectly valid thing to do and hold that for years and hope that it goes up in value.
On the far other extreme, you've got low latency traders who are more colloquially known as high frequency traders,
which is sort of less true because you can be you could trade once a day.
And if it's the right trade, you can make a lot of money if you're very low latency.
But, you know, trading a lot isn't always a good thing although there are strategies that do do that
but at that point you are peering down and microscoped every single packet coming in and
out of your network so the way that most financial institutions like exchanges the places you can buy
and sell shares or options or futures or whatever that work is that you have usually a tcp connection to
the server so like a regular a bit like a you know web server style thing but it's a persistent
connection with a relatively simple protocol to say i'd like to send an order and then it would
say congratulations you've now you're now the proud owner of 100 shares of google you know
thank you very much it cost you this much whatever that kind of thing right so that's on the one hand
now the public exchanges that are uh so-called lit and not in the
youth term of like awesome and cool lit but like um not dark if you've heard of dark pools and dark
exchanges that kind of thing that means that they actually advertise and publish the information
about what's going on inside their exchange in real time so every time i place an order
it's a bit like going on ebay and registering that you
would like to buy something which is not actually what you do on ebay i guess you register you want
to sell something and you put a price right and then maybe you've got a buy it now price
and that means that i can then look at it after the effect after you've placed it and go oh i will
buy that actually and you click the buy button and you get it right so but there are sort of two
stages to that one stage is you register that i would like to buy it or sell it at a particular price.
And then if that happens to match anyone who's currently on the system and they're buying and
you're selling and the prices agree and they're all the better, then there's a match and the trade
happens. But if it doesn't, it goes on to like a bulletin board of like, here's what everybody
wants to buy or sell. And that's what market data is it's
the information that flows off of the exchange that says here is the interest to buy this
particular share somebody would like to buy a hundred shares of google for a hundred dollars
you're like i bet they would because the current price of google is like a thousand or whatever it
is right you know um right and there's nothing really to stop you know there are certain people
who can place these orders in the market it's not everyone you can't just go on you can't register on your
fidelity account or your you know your robin hood account and do this but you reach certain criteria
and then you get this tcp connection and you get this um data stream which is essentially
um everything that possibly happens if you think of it as a database of orders that the exchange
is holding every ad remove every trade every modify
every exogenous event that could possibly happen on the exchange that affects its internal state
is broadcast literally broadcast or in fact multicast to all interested participants and
then you're expected to update your internal idea about what the market looks like from that change
so you know you're trying to keep your internal database up to date with what's really going on in the exchange and then you run your
magical mystical algorithm over it and go oh i think it's mispriced and so i'll go and buy it
or no actually i will join the market and i will also say that i would be prepared to sell google
for a thousand and one dollars or whatever and you know that's where the real magic happens and
then clever maths people work it all out and then they they tell me how they would like it to work and
then then i get involved again right i don't get involved in that bit um and there are there are a
set of things you know like there are certain things that are very much like you can boil that
information down into signals that you can feed to a machine learning system which then churns out
some expected value and then you can make a decision machine learning system which then churns out some expected value
and then you can make a decision based on that expected value and that tends to be somewhat slow
because you're doing some level of post-processing on that data and maybe you're matching it up with
other markets and other symbols and other things that are going on and you're throwing it through
a model that's relatively expensive to to to operate and then you're making a decision and
you're turning it around and then you're sending an order say hey i'd like to buy this and you know at that level you might be talking about
hundreds of microseconds which is you know a long time in our world but also not a very long time
in most other people's world right or it could be milliseconds even or whatever but um and then as
you get down and towards um trades that are that require less finesse, less inference, and they're more like, well, if the price of Apple go up buy all of the other tech stocks and then hope that you get in before everyone else does
and you buy while they're still low before they've actually caught up with the price of apple
assuming that's a valid thing to do again this is not financial advice please consult
right um but these are the kinds of things you know and at that point we call those lead lag
trades where there's a very obvious like economic reason for two things to be linked.
And then the only reason they're not linked is either because something idiomatic has happened in the world, like, I don't know, Apple have just cancelled their self-driving car thing.
And now, whoops, it's not the tech sector that's going up, it's Apple that's going up.
And now you're left holding all these shares that you didn't want.
And that's a risk you have to take as a someone who's trading or um you know apple went up and then it's a race between you and
everyone else who knows that when apple goes up google is going to go up as well right and then
you know there it's now you're playing now you're back in the game video games of industry where
you're like well everyone's got the same dreamcast because everyone's bought the same high-powered
computer everyone's bought the same high-powered computer. Everyone's bought the same high-powered networking card
and they're using the same tricks
to access the network card through kernel bypass.
There's no kernel involved at all.
They've all got the same fast switches.
They've all paid the exchange the same amount of money
to get the same length of fiber optic cable,
I kid you not,
so that you have essentially a level playing field, level amongst all the people who can afford to do all these things, like essentially a level playing field level amongst all the people can
afford to do all these things right but level nonetheless and so the only thing that remains
between you and the the other person down the road at jump trading as opposed to you know hrt
or whatever the trading company is how smart can you how fast can you make this go right how can i
craft this to be faster and there
was a time when that was all cpu all the time and that was really that was kind of the i came in
sort of the middle to the end of that part part so like a lot of stuff that i was doing was
100 cpus it was these exotic network cars these exotic kernel bypass things and then during the
time that i was there people started going well you know what's even faster than the cpu well it's not faster than the cpu but if you're only doing if this then this and
you've got network packets coming in we can do this in hardware and we can push it out to the
edge even further and have an fpga do this and then you're into the world of like well
something you could never do on a cpu is like hey by the time you get to the 15th byte of the packet coming in you know
if it's a buy order or a sell order and you can start going oh and you start sending a packet the
other way so that as the light the laser beams on this way you started turning on the other going
well maybe we'll want to sell something on this and then you get to the end of the thing and just
make a decision as it's flowing through to say okay yeah now we'll buy now or uh actually no let's
not do that and put something at the end either i mean you're not not allowed to corrupt packets
or anything like that but there are ways and means of of like getting to the end and going
i didn't mean to do that actually you know i jumped the gun a little bit but that's how folks
are able to get down to nanoseconds between an action coming in and their reaction going out is they're actually
pipelining between the incoming and outgoing events which is kind of mind-boggling right yeah
that's that is fascinating i uh i've i've had a kind of personal fascination with fpga is mostly
because you know it gives you that world into uh micro architecture and that sort of thing
absolutely yeah without having to fab a chip which
uh turns out to be uh it's getting easier but it is it is actually i was gonna say there are there
are ways and means these days you know like yeah but but it's still not as easy as just like plugging
a little usb thumb drive like thing into the side of your machine running some open source software
and having the lead blink you know go oh that's cool right i'm a hardware designer right
right and you know in terms of the kind of like uh you have a pretty uh a large distance in your
stack there right you have you have kind of like the interface that i'm sure folks that are doing
trading or perhaps some of the um folks designing models you know need to be able to interact with
all this data that's being maintained you You have, you know, typical networking software and that sort of thing.
You might have, um, some of that kernel bypass side of things, and then you're doing like RTL
on the FPGAs and that sort of thing. And, you know, I'm sure this varies quite a bit in the
size of the organization and, you know, just the organizational style, but is it typical for,
you know, engineers at, um, trading organizations to be kind of working up
and down that entire stack i don't know how typical it is um actually you know certainly
organizations i've worked in have had folks who specialize in in different areas of that you know
you've got the folks who are you know usually the fpga designers are their own breed they're with
i've got two noteworthy exceptions to that who are both
software engineers and i think they first and foremost and then they went into the a bit of
hardware design and they are absolutely you know it's fascinating to see through their eyes because
i think you know if you've been brought if you've come from the hardware design standpoint you're
used to certain things like the aforementioned almost infinite build times
the right really very rigorous testing the extremely um process driven way of doing
everything the very regimented source code you know you if you if you can see like a
dyed in the wool uh vhdl or verilog engineer because all their comments line up beautifully
and everything is formatted within an inch of its life because if your compile is going to take 14 years
it may as well be beautiful right or seemingly that seems to be the the rationale between behind
it um and then if you come into the software engineer you're like immediately this is terrible
i hate everything about this and you start going what can i do to make this better and then you
start discovering like these python based projects that can do simulations so that you can run your tests
using Python and async stuff in Python
and then interacting with the Verilog simulator.
And it's just a better world.
And these folks go look over,
you're like, what on earth are you doing over there?
Surely you should be writing lots of system Verilog
and then writing out thousands of lines
and then going home and coming back two days later
over the weekend and looking at the result. And they know like no i just i'm i've got too much adhd
tendencies to be able to have the patience to do that so it's been fascinating seeing their journey
go through that and they've been very successful and i think you know the folks actually behind the
was it coco tb i think is the name of the python project that i alluded to i think they also had a
similar like software engineer first mindset and i don't mean to impugn the hardware designers who'll be listening to this
but it's just really interesting to see a different perspective of it and understand you know like
the the trade-offs and also i think for us as software engineers to learn the humility of like
how how long this process and how painful this process is and like how less how much less cavalier you can be about testing
for example when it's that expensive to find a mistake and fix it um compared to oh i guess we
just cut a new build and do it again you know like oh no we've actually gotta go through another
two two night build process and pnl and then all that kind of stuff so and if the uh you know when you
are getting to uh levels where you know you're you're doing things on the nanosecond scale
that i imagine you know when when new hardware is released that it's pretty important to evaluate
that and decide whether to incorporate it right if it's going to give you a competitive advantage. Now, if you are, well, you know, there's the FPGA hardware as well,
which could be a separate conversation, but let's just, you know, focus on maybe like CPUs or
something like that. How often are y'all turning over hardware in that environment? Because I
imagine that, you know, as soon as there's something better, it's, you know, optimal to
move over to that system. when the in the work that
i had done before and without going into too much detail like it became increasingly less important
we had moved to fpga stuff and then the the the speed of the cpus was more like how quickly can
we reprogram or at least configure these fpgas to do the thing that we're we want to do i mean this
is i think this is fairly common.
Folks gravitate towards an FPGA design where you have like essentially a CPU,
a software-defined CPU that's like extremely tailored
for deep packet inspection
and if-then-else kind of state machine type things.
And then the else is,
here's a block that I need to be sent out but
because you can't really make any deep maths you can't do any huge maths mathematical things in
that in that you're really looking for particular key characteristics of the messages coming in
and so behind the scenes there's the clever program written in c plus plus or whatever
that's doing the real thinking and then going like, okay, I need to continually update and resend
these if-then-else rules
because I can see the big picture.
I know that a move in Apple more than two ticks
will mean this kind of message will come through
with the byte three being this and byte seven being that.
That's what I need to get over to the FPGA
because it's too dumb to really understand what's going on.
It can only look for like, you know,
regular expression style things.
I just keep changing the regex to find the thing that i want to find and then hope that it is actually
finding that signal when it comes out of the noise i mean again i'm trying to blur it a little bit
because i'm a bit vague on like like how much i should be saying about this stuff and i don't do
this anymore for what it's worth my current company i've moved on from from the company
where i was doing the lower latency stuff and it it's much more quantitative trading, so it's a bit longer term,
but it's still important to be fast.
Anyway, so to your question about
whether we were always on the cutting edge of CPUs,
we weren't, actually.
It was relatively expensive to make those changes.
These things have to be put in physically co-located data centers
next to the exchange where they're trading
for all the reasons that the cable needs to be the right length
and all that kind of stuff. And you normally need a lot of them you know
you've got like 20 or 30 servers in a rack with these super fast switches and these careful cut
through things and these companies that make um almost like a physic physical based um switch
technology so they can you can split a beam into send one off to one machine one off
to another machine so it's not even really a switch in between them they both get a copy of
the data or you know one goes off to your packet capture system and one goes to your you know your
trading system so you always have the exact thing that happens so you can do your simulations later
and all that kind of good stuff and so like the changing the machines out where you know that
each you know you've got 20 of them in Iraq And they're all like 25 grand each
That's a significant
Outgoing
That's not to say that we didn't do it
And you know there was definitely experimentation
With unusual
Hardware
So again without going
Into too much detail there but I'll talk about
One thing that I thought was an interesting one in terms of what it was
Sure
So the there was a chip called a tyler which was a
relatively simple 32-bit risk cpu except it was a grid array of them on a single die
and they were like either 64 i think 64 of them something like that or 16 of them i think there were 64 arranged in a you
know 8x8 grid and the peripherals hung around the chip on the outside and so the the sort of
eight at the top and the six and then the eight and the six whatever however you want to think
of it the peripheral literal peripheral cpus could talk to the pins on the way outside you know they
were all fully functional right they all had access the way outside you know they were all fully
functional right they all had access to ram if you wanted to and all that kind of nonsense but
a way of configuring it would be to say well i'm going to run linux on these top the top two left
left hand corner ones um the rest of them are uncommitted and then i'm going to run dedicated
programs on them and some of their registers would be like north south east and west and if you wrote to north
it would block until the processor above you had read from south so maybe with a small fifo in
between them something like this um there was also an on-chip network where you could send messages
through uh to a particular cpu cell and it used like new york taxi cab routing algorithm of like
if if no one's reading or writing from north or south, then I'll go north or south.
I'll go east and west until I'm lined up left, right or whatever.
But anyway, what it allowed you to do was in software, do the kind of things that you do or you have to do naturally on an FPGA or an ASIC based solution.
You know, effectively, each of these things was a software pipeline stage and so you could sit there and be like okay the ethernet chip is up here and it writes 64 bytes or 64 bits of the ethernet frame to the east every time it comes in
and the next the next program is decoding the ethernet frame looking for the ip header and
then once the ip header is good it then starts passing the udp payload to east
and then the udp payload gets to the next guy and he's like adding like looking for the particular
things and decoding and then going well i'll go south if it's this kind of pack or east if it's
another one or north maybe and then you can kind of actually define a physical route around the
chip to get to a place where you are able to process particular sequences very efficiently because every clock
cycle another 64 bits is going through or every other clock cycle or whatever it was
and that's very similar to how you have to think about the world when you're doing hardware because
everything's parallel you know like every transistor is its own little computer and you
don't really have much choice about that you know and in fact we have to kind of impose our clock
based will upon it rather
heavily to make it look like the kind of thing that we're expecting where everything moves along
one step at a time and that this is an aside but it was always a thing that made me laugh once i
spent some time with our fpga engineers and really started i believe to grok the way that they thought
about the world the way that you have to do things and the way that you can get this amazing speed up
if you do it this particular way on an FPGA,
then we would have people come in and say,
like vendors would come in and say,
take your C++ code and compile it to FPGA,
and you get the huge boost of speed.
And I'm like going, the compilation is not the problem.
Which language you specify in it is not the problem
the problem is you have to think about it in a fundamentally different way and anyone who's
trying to write c++ is not thinking about how to i don't know uh do a 256 way hardware lookup
because you're willing to dedicate 256 comparators or however many you
can multiplex in and just go well this is fine like nine tenths of my chip real estate is this
set of comparators but you know what in one clock cycle i know if it's interesting or not right and
you can't do that in c++ um or any high level language really the other than than than these HDLs.
Yeah, I feel like the area in doing RTL myself
that really took a while to get used to
is if you chain more logic together,
the propagation delay is going to increase, right?
I know.
You don't really think about that
when you're writing a sequential program or something like that. I mean, you obviously think about that when you're writing uh you know a sequential program
or something like that i mean you obviously think about the perhaps the number of instructions that
are maybe i mean maybe you do but yeah yeah i mean that's that's bonkers you know uh you know
i think your first guest philip was talking about like the ripple carries and then the kind of look
ahead things and then there's you know if you start going down that wikipedia minefield of like
uh like oh what about this idea what about this i and you think about how do they do multiplies oh my
gosh that's even more complicated and how how do they do divides and that's one of my favorite
things actually to teach you know incoming sort of fresh faces is to sort of say um you know give
me your best guess as to how many cycles these things will take and then you sort of go through
the list of things and then you say integer division and they're like i don't know 20 you're like well maybe 200 uh it
depends actually the latest revision of intel processors are now down to like teens again i
think for even 64-bit divisions and i just i would love to know how they're doing it you know or maybe
somebody just screaming into that again to their headphones right now that like it's obvious but like it has long been like the thing that i just
think you know because we can do a floating point multiply or a floating point division
in don't really think about it anymore now kind of level of time as opposed to back in the games
industry where it's like everything was fixed point until floating point became you know
commonplace um and to think that you can't you know do a division and you think well when do I do an
integer division why would I care it's like well every time you use a hash map you're modding with
the size of the hash map most of the time and that's a division with the remainder and that's
actually kind of expensive and you're like oh I hadn't thought that yeah you're like yeah right
right it's like if you know a total aside if you look at the um implementations of really fast
hash maps they usually have a switch statement for they do switch on like the how big is my table
they don't store the size of the table in terms of like is it like five you know one oh two three
you know whatever appropriate nearly power of two but prime size they switch on the ordinal value of which it is.
Is it 13 or is it 252?
No, that's obviously not prime.
Sorry.
Whatever.
And then they just do return x mod that.
And so the compiler sees it's a constant.
And so you're trading off, and the compiler then can do magical tricks
to make it not actually a divide.
It's modulus with a constant, which is a division with a constant,
and there are tricks to use multiplies
and other things that are much, much cheaper.
So these fast hash maps are going,
trading off on the,
there's a branch predictor mismatch maybe
because I have to jump to the right sequence of instructions,
but that's faster than doing the darn divide
in the first place,
which is just like bonkers.
But nowadays, maybe it isn't.
Who knows? The number of instructions you you see and this is like getting towards like the
uh perhaps the a destination for where the heck we're going in this conversation but like the
number of instructions isn't necessarily a great indicator of how fast things are going to be
right you know that these things like divides will take longer or maybe they won't these days you know it's it's yeah it's fascinating how how uh how complicated these things we've built are
right i i'm curious you know one of the things that um i've kind of uh in having this experience
of talking to folks uh who you know worked on uh processors in the 70s and 80s and kind of where we started this conversation as well
about talking about the simplicity and the elegance of them and really the determinism,
I feel like is the key thing there. And when you start to see some of the vulnerabilities,
you mentioned Spectre and Meltdown, you kind of at some point start to wonder, are we actually making progress here?
And obviously, there's been lots of improvements due to some of these microarchitecture concepts.
You mentioned branch prediction and pipelining and some of those things.
But I'm curious, in your own experience, do you feel frustration with the the increasing level of complexity and do you
think there's uh perhaps like a a ceiling where we're actually getting perhaps diminishing marginal
returns from from continuing so that's a really really interesting question i mean i i do honestly
miss the days when i would have the hardware manuals open in my lap and then you could make
very strong
guesses as to what would happen you know like i know how many cycles this device is going to take
i know how many cycles it takes to draw a triangle this big so i can do something and then i can go
back to it when it's finished those were great times um but that's that was eroding even towards
the end of my time and so the games industry because people wanted for commercial reasons actually in this particular instance people would like interpose
well we want to put like a kind of operating system so that we can have a pop-up display
above your game and you know show that your friend has just logged in and all this kind of stuff
you're like oh wait i'm not in control anymore yet no no no no you're nearly in control but we
have this thing behind you so you know we've started to lose that determinism even then, although it was still fairly deterministic for you.
But the sheer gains that we've gotten and every time I think we've reached the point where we couldn't possibly squeeze any more out of it, somebody clever does something else.
And you're like, oh, wait, oh, that's smart.
You know, registry naming.
That's clever.
Now, suddenly it doesn't matter that we have a puny register file because, you know, it's actually as big as we can fit onto the chip or, you know, branch prediction.
Hey, we can we're so good at guessing where you're going that we can afford to have 100 plus instructions in flight, even though the vast majority of them, we have no strong belief that we lose the determinism but we go so much faster so much of the time
that uh it does seem to um undo the the harm but then you know you expect again spectrum
meltdown and the difficulty of solving those while also maintaining um the performance that
we've come to expect is so tricky.
Yeah, I think, you know,
I think Thomas, it was you spoke to about like VLIW and Itanium
and there was some sort of like sensitivities
around the failure or not of that.
But, you know, one of the things, you know,
and this is coming from somebody
who's made a sort of side career
about saying how clever compilers are
and how we should trust them to do everything smart, right?
I don't see that there are enough ways for a compiler to be smart enough, given how dynamic the flow of execution is in most cases, at least in my experience right and i've seen um i can't think what the heck the the belt computer
with these straight it's like almost like conditionals built into the instruction where
you can do one or this or that and obviously the arm had its beautiful originally at least it's
you know conditional stuff so that you could like do some clever things with with that but like
really um nothing beats the ability for the silicon to just go well i can i can try all the
paths it's almost quantum like i will go ahead of you and i will start looking and i will make guesses and as long
as the guesses are better than even we're still better off than me not doing the guesses at all
as long as i can afford the silicon and obviously that's where the trick is it's not really the
silicon it's the heat that it generates when it's running and the power that it takes and
and that kind of stuff which then limits how much could be on at the same time and all that kind of stuff but it's yeah i've been remarkably surprised how how often the next generation comes out it's
still faster somehow you know we've got however many levels of cash you're like how could this
be helpful like there's so much going on between and then you're like you learn that each level of
the cash has its own independent pre-fetching unit that's like also intuiting from the flow of
instructions and the flow of misses where you're going and starting to run ahead of you you're like
good grief there are so many little robots running around making their own decisions in here it's a
miracle it works as well as it does but there's doesn't seem to be much sign that it's slowing
down despite you know the fact that i don't really like that i can't easily tell what's gonna happen right it does it does feel like you know you mentioned uh kind of
like the the heat issues which you know eventually uh kept us from continuing to clock processors
faster and faster and faster where's my 10 gigahertz processor right you know that never
happens right and you know there's there's other things that pop up like i know as we like
shrink the process node the issues with leakage and things like that you know start to happen
with transistors we start getting quantum computers even though we don't want them
exactly exactly so there's like the the physical aspect of it you know alluding to you know your
earlier statement about there's always a lever beneath your abstraction no matter how low you are.
The other thing that's kind of been top of mind for me recently, I guess, is, you know,
if your workloads fundamentally change, that's another reason why you might rethink your
architecture.
And I think, you know, I was talking with Thomas about this a little bit, and I don't know if you've seen some of the discourse recently about Grok that just, I don't know if it's new, but they came out with this like language processing unit. interesting combination of like highly parallel problems but also a sequential nature of you know
processing tokens uh in order where you have dependencies between them um and that's it's
kind of like a uh uh driving some of these new architectures i think is interesting and i think
in some of those they are pushing more onto the compiler, but you have to take into context there
that the compiler might be compiling once for a model
that runs for a very extended period of time
as opposed to compiling a new build every 30 minutes or whatever.
So it seems like there's lots of different vectors to consider.
It's the dynamism of what the user is going to do
in the case of user-based models or whatever,
and the fact that the compiler can't guess, right?
So taking the branch prediction side of things here,
there was all this brouhaha about,
well, maybe we can flag the branches as likely taken,
likely not taken,
or you can have all this branch prediction hinting in there.
It's like, well, yeah, but it'll never know that this branch this loop is always
taken 64 times and um until it isn't and then it's taken 128 times and then you know it or you know
even you know i uh you know so i'm known for c++ but um you know folks like to compare languages and java has it's both proponents
and detractors and the last thing i ever want to do is fan flames between the two because there's
some amazing things that java can do because java takes this the sort of like this predictive
thing into software and so you can and as does you know like javascript in browsers and anything
that has like a modern jit these days can kind of go i can notice
regime changes in line and kind of like oh yeah well you know this happens until this thing stops
happening and then we can adapt and the program can re-optimize around that and you know people
in the c++ community may say oh but we have profile guided optimization we can run our system
we can profile and we feed it back to the compiler and a compiler can make smart things i'm like yeah right can you give me two binaries so that halfway
through the day when we get to midday and everything's like now instead of it being am
it's pm and whatever and that branch is now the other way around or whatever it's been the whole
way through can you flip the binary at that point they're like oh no you're like no you're still
relying on the processor doing this right the processor can do you know you've all seen the
the stack overflow post
about the branch predictor with, you know,
sorting the things means that, you know,
the thing goes faster than not sorted.
It's because like whatever condition you've got
is 100% predictable until it gets to halfway
through the sorted array.
And then it's exactly wrong twice.
And then it's 100% right for the rest of the time.
You know, you can't get that behavior
if you've got a static compiler
because the data is dynamic.
And so maybe I'm still very skeptical about this. Maybe for certain domains, it makes sense. Maybe, you know, the kind of things you've got a static compiler because the data is dynamic and so maybe i'm still
very skeptical about this maybe for certain domains it makes sense maybe you know the kind
of things you've described these you know i think transformers or whatever the the these ai type
processes that compile a very different kind of program maybe there's a lot more statistical
knowledge you can have and you can say well this is the inputs they're going to look this way we
don't care that there's going to be that one dreadful input that if you feed it in it'll give you it'll be dreadful dreadful performance um which is so back to your
sort of determinism thing that's actually an interesting aspect in like in the world at least
one of the finance issues that we have isn't you know like these these markets are huge and you can
come up with these amazingly optimized algorithms which like for the common case it's super fast but
then in the there is like a terrible case you know it's like you know for example if you use uh an array to store uh the the list of orders
the things that want to be bought or sold because they are in a strict priority and it's useful to
steal from the front and take from the back or whatever um then a common trick is to actually
store it backwards because most of the action happens at the front of the book,
IE the end out.
And now you can pop and push from the back of an array.
Everything else stays where it is.
Hooray.
You know,
like this is clever,
right?
But then some Joker does something at the back of the book,
which is now the front of the book.
And now you've got to shuffle the whole thing down one.
And you're like,
well,
that's unfortunate.
And so you've got this.
And in our case,
when you're dealing with this fire hose of information that's
coming over this broadcast if you can't keep up with the network data coming in you drop packets
and then you've lost information then you have to go through a very expensive recovery process
which means essentially you you can't do anything for like tens of milliseconds hundreds of
milliseconds it's a very expensive very expensive operation um to do and so you have
to think about your tail latency and so suddenly the predictability is sort of an important thing
and so these clever algorithms that concentrate on like the fast case is really really really fast
but there's a terrible worst case is now bad for you and so a lot of the wisdom um for these kinds
of things gets thrown out so for example one of these data structures i use a linked list and i am unashamed to tell the world that there are occasions when a
linked list is the right choice because you could yes cash misses and they can be very expensive but
most of the time these things are in in in the cash right and if they're not in the cash then
you've got other problems um and um i can now it's order one right it doesn't
matter what i do i can put things in the front i can take things off the back i can move things
out the middle of it it's order one right it's not as fast as like just tacking 64 bits on the
end of an array of course it isn't but it's consistently okay and that's maybe a good enough
right and so coming back to that prediction that you said with the compiler maybe that is fine you know if you don't mind having bad worst cases that are rare with
your statistical model of what is going to go through which is essentially what i guess all
compilers are doing this at some level they're having to use a heuristic of some description
to kind of go i'm guessing this is more likely taken than not so i'm going to lay the code out
this way so yeah maybe it's not as bad maybe i've just taught myself around to to saying that it's fine for some workloads well i think i
think that's the uh i i think that's you know a description or an illustration of kind of like
the problem space it's understand your domain right and and approach it accordingly so um i
think that that makes sense um i did want to uh kind of as the final sort of parts here that we explore in this
conversation, uh, I'm, I'm very proud of us getting, uh, you know, two hours and change
in here and we haven't mentioned compiler explorer yet, which I'm sure is what the majority
of folks who, who clicked on this episode know you for.
I suppose so.
I would, I would love to, to you know just get a little bit of
the um background uh you can also you know for folks who haven't uh used the site before i
explain um what it is but also like the uh the background um on it and you know how you're able
to open source it and maybe what it takes to run it today as well absolutely yeah so um in like 2011 2012
ish i was uh at this trading company and they had a very old c++ code base and i was having an
argument with the very conservative like he head programmer because i wanted to use this new c++
feature called ranged fours which you know is like what all other languages have for going over a container,
you know, the equivalent of for i in thing.
In C++, it's for auto x colon something, right?
And it should be equivalent to iterating over all of the elements in the thing.
And obviously, the thing is probably, say, a vector,
which is to say a variable length array it's just a pointer and a size is what it really is down
under the hoods and the pointer points to the first element and the size is how many elements
there are in them and so you know normally you get the size and you start your counter at zero
and you work for through you know pointer bracket zero pointer brackets one and all that kind of
good stuff and the compiler of course rewrites it behind the scenes
to be like a pointer that walks along the memory locations
one after another.
And that's all great and good.
But it's a pain to write that.
It's kind of error prone.
We've all done things where we've used the wrong size,
we've used the wrong kind of iteration or whatever.
And so C++11 came along and said,
we should make this a language facility.
But we've been bitten by this before we were also had
some java code and in java if you loop over um and again not to bash languages but this is just
a side effect of the way that java works at the time it may have changed since caveat caveat
in java if you had a container and you looped over it using an index that was garbage free
right you were just making an int on the stack and you were bumping it forward until you got to the end of the
size of the container and you were accessing the container and provided you weren't doing anything
it was generating garbage you were done right beautiful but if you did the equivalent of 4x
in whatever i can i forget the java syntax right now behind the scenes it created an iterator object
that was then the thing that held where I am in this object.
And you called next on it.
And that's what was happening.
So it was syntactic sugar for rewriting it that way.
And at the time, there was a trading system that was written predominantly in Java.
And they would train themselves into writing garbage-free Java, which is about as horrible as it sounds.
It takes all the benefits of a really useful and easy-to-write like java and throws them away and tries to write c code in java but without any of the benefits of
like memory checkers and things because no one's expecting you to do this kind of thing anyway
that's that's a whole other brand so right um so understandably we were they were a bit reluctant
to just with gay abandon start changing the way we wrote our c++ code because it was
very performant and they wanted to keep it that way so i got stroppy um which is british for angry
uh got upset about it all and then i um said well okay come here and i got jordan to sit next to me
i said right let me show you and so we were experimenting backwards and forwards with like
snippets of code where i was turning this flag on and compiling it one way or the other. And eventually, being the Unix heads that we were,
I wrote the command line of run GCC on a file,
output to dash, as in stood out,
pipe it through C++ filth,
which then demangles all the symbols,
pipe it through some sed to get rid of some of the nonsense
that was the assembler outputs
and then i ran that in a watch which means it runs every second and just displays the output
and then in the tmux session i halved and on the other side i opened up the editor to the file that
the other side was you can see where this is going um the other side was was editing so i had the
editor on one side and i had the results of the compiler at once a second on the right hand side
and then we went back and forth between the various things we tried different
compiled settings and we you know we kind of fiddled around and i was able to show him that
actually it was one instruction cheaper to do it the other way for boring reasons that we don't
have to get into and so anyway he was like fine with it and now around the same time that i was
that we were doing this um joe the person who had dragged me across from Google and dragged me to finance, and then ultimately he joined me in Chicago.
He was one of those polymath folks who knows how to do a bit of everything.
And he had been dabbling in Node.js apps.
And so he was forever knocking up node apps and showing you
know little database cruddy things and um we've done some uh previously at uh he showed me how
to do them at google or whatever and anyway and so i in the back of my head i'm like hey i know
how to wait web apps you know crap little web apps but web apps nonetheless i think i can take what
i just did and put it in a little web app and And yeah, Compiler Explorer, or GCC Explorer as it was called then,
was born.
And it was a few hundred lines of code
running on a machine that I had set up
in the trading company at that time.
And it proved very useful.
It doesn't take long to pull down
a couple of off-the-shelf widgets for editors.
And then you put a little bit of filtering in
and a Node app that runs a couple of hundred lines,
runs the compiler,
and then just pukes out the output and filters it in some way.
And it sat for a couple of months
and then I thought this is kind of useful actually.
And so at the time we were experimenting
with more and more open source stuff,
the company was still very dodgy
about putting its name to anything.
But they said, okay, you can open source this.
It's not like competitive
advantage or anything like that um but you know you just can't put our name anywhere near it you
know because they were worried about legal comeback or something like that anyway their loss um
because because you know um in 2012 uh i stood up an amazon server running the same code base having open sourced it and um yeah gcc explorer was was born it has it had a couple of compilers it was like four or
five thousand lines of javascript and um very simple docker-based security and again air quotes
around that and there it sat for years uh and no one really used it that i knew of um it was
convenient we still used it internally
um it was dead handy for like trying out stuff so you know it grew so that you can change the
compiler settings you can change which compiler you're using and then as you're typing for
given how much bad rap c plus plots gets for like slow compiles for very small snippets the
compilers are blazing fast it's just these giant monstrosities we tend to feed it so if you're just looking at like a a small loop or a couple of functions that
call each other it takes milliseconds to to build and so we can build and parse and send back to the
the website on the right hand side the sort of annotated syntax highlighted output of the compiler
and it becomes a sort of interactive almost syntax highlighted output of the compiler and it becomes
a sort of interactive almost like a repl that you can start tweaking going like what if i do i plus
plus or plus plus i which of these is faster and you see that makes no difference whatsoever and
that's kind of it leads this sort of journey of discovery and immediacy that makes you kind of
like really get a deeper understanding of what you're doing um but you, fast forward 12 years and it now is 60,000 lines of TypeScript.
It is three and a half thousand different compilers, which is about three and a half terabytes of compiler.
It is running on somewhere between anywhere between eight and 15 AWS instances at any one time, varying different types.
We've got some that have GPUs in them.
We have some that are running Windows.
We have the majority of them running Linux.
At some point, we'll stand up some ARM ones
so we can do ARM compilers as well.
And we have become a we.
I'm not just a plural person.
I've got a small team now.
It's open source.
And we've got like five or six people
who have the keys to my Amazon account
and can administrate the site.
And it's become kind of the de facto C++ pastebin stroke experimental thing.
So by default, it shows the assembly output.
And so I like to think that my contribution is putting assembly
in front of people who would never have otherwise seen it,
talking about those flaws and abstraction layers.
It's like it really puts it right in the face of people
and go like, hey, this is what really happens.
This is what your compiler does.
You may not think of it doing this.
But then obviously folks use it
just as a general compilation tool.
And we now support that.
We can actually execute the code,
which is security-wise terrifying.
That you've random, you know random you know
what what is your website it's essentially a giant remote code execution service you know
and uh yeah right how are you securing it i don't know some amateur people have looked at it and
said it looks fine um but you know we it's it's it's become a relative pretty significant second job um it's
a lot of fun when it's fun it's a lot of toil when it's not um again i'm very very lucky and
blessed to have the number of uh contributors that i have and as again they can help out on
the admin side as well you know it takes a lot of care and feeding to keep a website up especially
one that has you know daily builds of all the major compilers we have our own ci infrastructure
we have our own load balancing stuff we have our own it's it's huge now um yeah and i don't tend
to use it as much as i used to because my job has changed and for a long while i was writing python
all day and it's like what am i doing with myself right but i'm glad to say i'm starting to use it again i'm back writing c++ in my day job again so
awesome but yeah most folks know know me from that is the short answer um and i think you know
you've been very kind by calling it compiler explorer which is what i call it but i hosted
it on my my personal domain and so most people didn't know that that was my name they just thought
it was a cool name which which it is i'm very blessed so most people didn't know that that was my name. They just thought it was a cool name,
which it is.
I'm very blessed and lucky to have the name
that I was given.
Right.
But a lot of folks, yeah, didn't realize.
And then they were surprised when I turned up
and I said, yeah.
And they're like, hey, wait, like the website?
I'm like, yeah, I guess.
Maybe.
Yeah, I've definitely uh uh had had plenty of uh interactions with folks where
where they've said to uh to just godbolt it so yeah right that's it's it's I know I I did so I
have got you know you can get to it at compiler explorer.com as well because that's my sort of
hedge for the future if I ever need to get my domain name back or whatever.
But yeah, I have now,
I took advice from a friend who said,
look, took me to one side and said,
don't keep, you know, you can call it that, right?
You know, this is like Google never calling it Googling something.
They call it web searching or whatever
because it kind of sort of devalues it
and to get in on the joke, you know,
the joke as it were is not wise.
But I was, you know,
I was poised to completely just go to the compiler
explorer name get rid of the the the vanity domain name away um and they said this is a gift horse
don't don't look it in the mouth right you know this is people think of it as a verb now or a noun
and so you should accept that i'm like so i begrudgingly do now and in fact my linkedin
profile i think says you know programmer and sometime verb or something like that so you know i've kind of i've kind of accepted it now and come
to peace with it absolutely absolutely well uh last thing i wanted to to chat about here was uh
you also have a podcast that uh i think it's been a couple couple years now it is yeah somehow we've
reached two years now. Yeah.
Yeah.
So what was the decision to start Two's Compliment?
And what's it kind of about?
So I think many of us during the lockdown went a little bit silly.
You know, you heard maybe if you're extremely good at editing. So listener, if you don't understand what i'm saying here you know that dan has uh uh been an excellent editor but my dog has been barking in
the background um and i apologize for that but my dog is also a pandemic silliness he's lovely but
he was got in the pandemic i learned how to bake bread and i started a podcast these are all the
things that i think most people did i think you're late to the party actually in this in this regard maybe you started planning right so i i had it bubbling away in me
to to start something as i felt i had something to say and then i kind of bottled it a little bit
i thought well i'm you know maybe maybe not and then i confided in my my friend at work ben
that i was thinking of doing and he said you know you know what? I was thinking this too. And so we're like, oh, what if we,
would you do it together?
And so Two's Compliment was born.
And he and I have worked together
at a number of companies along the way,
but we never worked directly with each other
until more recently.
So we've been very well aware of each other
and we both like giving presentations.
And so we've seen each other's presentations
at the companies we've worked at before, um we hadn't you know directly worked with each other and in
fact we haven't really worked that much directly together even though we're like in a small company
together now but we have very compatible views and then our little the backstory goes right so i
in 1996 went off to go into the games industry. And Ben's a little younger.
A few years later on, Ben was planning to go into the games industry
and then butted a sort of sliding doors accident of fate
where something to do with his wife's job or whatever at the time.
He suddenly had to rescind his offer or it was rescinded
and he had to go and get a real job, right?
And so what we've got is like two people.
I never really planned to be in the games industry,
but fell into it through, you aforementioned irc uh accident um and he meant to go in the
games industry but due to some other exogenous event did not and then we've kind of followed
parallel tracks and then we found how reasonably compatible our views are and then we've gotten
together and we keep discussing things that are interesting to us which is to say two people who've been doing this for 20 and change years um ben is very much
into testing and i'm into obviously the c++ and performance type stuff but it's fun to
play those things off because they're not exclusive they're very compatible
and there's a whole host of things that we do a certain way and
you know having um grown up in the sort of similar circumstances we've got yeah some interesting
things that certainly when we talk to people they're kind of interested in it it seems so
you know we just open up a we open up a web browser we start talking at each other and then
a half hour episode comes out every every month once a month we're trying to
you know not it's low effort ours is low effort yours is beautiful and well prepared and you're
researched and everything and in fairness when we have a guest on which is rare we we try to be too
but most of the time it's just hey let's talk about uh make makes my favorite program off you go yep well i i will say that uh uh i am definitely a big fan of
it so um i i appreciate y'all y'all putting it together whatever you decide to talk about and
you know there's been a uh a number of kind of podcasts that um i've taken like little bits and
pieces of inspiration from in terms of um putting putting this show together so um i
definitely count that one on the list so well i appreciate the the time that y'all do invest into
it no it's i mean one of the things that i don't know to what extent you've discovered this so far
yourself is that podcasts are very unidirectional where you know you get a few tweets and then you
hear these kind of anecdotes where people say oh i listen to your podcast but it's like you don't get the feedback it's more like radio in that way you know you
could imagine like at one stage my sister was dating some radio dj and like you're sat in a
room talking to yourself for like four hours a day and you don't really know if anyone's listening
to you or not or whether they like it or not right and it feels like that and especially it's so
federated you don't know how many people are listening really you've got all these things that kind of guess but they're
guesses and so it's lovely to hear that feedback and you know i'm glad to say that we've we've
there have even been some folks that uh we've hired now at our company it's a it's a very long
and protracted hiring mechanism to get people interested in your podcast and go maybe i should
work with them and then they turn up and on the similar note actually compiler explorer i've now hired two people who have been contributors to compiler explorer is a
very long and complicated interview process that it is right it turns out if you can fit a large
javascript program in your head and make meaningful contributions in a across a variety of languages
and you're a kind person who can hang out on our discord and be nice to people you're probably a good person to work with in the day job too absolutely well i can definitely uh
speak to that coming out of college my uh first uh post-college job i guess i started working on
the open source component of this company um uh while i was in college and they basically just
were like,
you're doing a lot of work for no money.
Would you like to do the same amount of work for some money?
And I astutely realized that was a good deal.
That is a good deal, yeah.
Yeah, but it was kind of,
the interview process after that is kind of funny because you have like a fairly large body of work
of literally like collaborating on
something so um it is kind of funny how open source can can be a conduit for that right right
for certain i mean yeah absolutely interviews are so difficult so yeah anything you can do to stick
out is worthwhile doing right but you know not everyone has the spare time or will or energy
after their day job to do like open source work if they can't do it as well so you know not everyone has the spare time or will or energy after their day job to do
like open source work if they can't do it as well so you know one has to one has to be careful anyway
absolutely that's a whole other topic and i'm just realizing we don't really want to open any more
cans right now that's for uh the episode we're recording next week together okay
no but in all seriousness i i i would love to have you back again in the future.
I definitely appreciate you spending nearly two and a half hours with me and talking through a lot of different things.
I definitely had a great time and learned a bit, and I hope our listeners will as well.
Well, thank you so much for having me.
This is a great podcast.
I've enjoyed the two episodes that I've been able to listen to so far,
and I'm really looking forward to hearing the rest of them.
I only hope this one stands up and that we've not bored to tears the poor listener
by this point two and a half hours in.
I'm sure folks will love it.
But thanks again, Matt, and hope you have a great rest of your week.
Thank you. You too.