Microarch Club - 101: Matt Godbolt

Starting point is 00:00:00 Hey folks, Dan here. Today on the MicroArchClub podcast, I am joined by Matt Godbolt. Matt is wellroprocessors, namely the Zilog Z80 and MOS Technology 6502, including a discussion of undocumented opcodes and their creative uses. We then talk about Matt's time in the gaming industry and what went into building games for early consoles, before discussing his experience working on YouTube for cell phones at Google. The last part of our conversation focuses primarily on the past 15 years of Matt's career, which has been in financial trading. Matt explains why trading requires a deep understanding of hardware and software and shares how technologies such as FPGAs allow firms to gain a competitive advantage. A common theme throughout our conversation is the ever-rising complexity of

Starting point is 00:01:02 processors and the systems built on top of them. While abstraction of hardware and low-level software has allowed us to build new applications faster than ever before, Matt and I both assert that there is value in understanding what is going on behind the scenes. Or, as Matt states succinctly in our conversation, you should always understand the abstraction level directly above and directly beneath you, and there is always at least one level beneath you. and there is always at least one level beneath you. I followed Matt's work for quite some time, but I wanted to extend a thank you to Jonathan Yu for suggesting that I ask Matt to join for an episode of the MicroArch Club. With that, let's welcome to the show.

Starting point is 00:01:52 Thank you for having me. It's great to be here. Absolutely. I've followed some of your work and I might say tooling for a period of time um and i think you know when we were chatting about before this show i think we may have uh crossed paths on social media or github at some point in time i think so i literally just before the show when i was searching uh for your name to find the outline you'd sent me i found some github repos we'd obviously both been looking at the same time where you'd committed to so like i think we've been broadly following in the same footsteps for a long while. Absolutely. And I will say, I think that you are the first listener requested guest

Starting point is 00:02:34 because Jonathan Yu on Mastodon kind of pinged you and I think John Masters as well and said, y'all would be great candidates. And I said, that sounds like a great idea. So I'm super glad to have you here. I'm very pleased he did. Although I feel slightly fraudulent looking at the folks that have already been on the podcast and also, you know, the sort of general belief behind the podcast of like discovering things.

Starting point is 00:02:58 I'm like, I'm also on the same journey. I think you are to discover how this world got put together that we're so enamored of. Right, absolutely. Well, you know, I think that one of the common themes with at least all the guests I've had so far, which some of them have been released and I've got a few that I've recorded that haven't released yet, is just kind of an interest in computing history. And so in some ways, right, we're all like on this journey of understanding about how the industry is evolving and all the things that have have come before. And one of the things that's really neat, I think, is looking back at computing history and seeing that what's new is kind of old. And we're really new on the side. It's all been done before. We go around in

Starting point is 00:03:41 circles. Yeah, quite. Exactly. And so maybe maybe that's kind of a good place for us to get started, just kind of talking about your introduction to computing and maybe when you were growing up, how you first were exposed to computers and what that environment was like. Absolutely. There's a sort of family story about the first time I ever saw a computer. I was at a friend's house, and he had a Sinclair Spectrum.

Starting point is 00:04:06 I think I was seven. I must've been seven at the time. So I'm going to age myself here. This was 1983. And apparently there was like one of the really, really simple flight simulators where it was literally a line where the horizon was and then four lines for where the runway was.

Starting point is 00:04:22 And then most of the screen was the instrument panel because that didn't change very often. And so the poor thing only had to draw the tiny little like window at the top. And my parents said I was interested in watching this at my friend's house, but then he apparently reset the machine and which was in those days,

Starting point is 00:04:37 you pull the power cable out and plug it back in again, right? And then of course it drops into basic and he started typing in a simple program and the numbers were scrolling up the screen. So as my mom tells me, and then of course it drops into basic and he started typing in a simple program and or the numbers were scrolling up the screen so as my mum tells me and apparently I was wrapped with that that was so interesting to me I don't really remember this but that was the story and then as a result of that on my eighth birthday I was very lucky to get my own spectrum and that's where

Starting point is 00:05:01 my journey started typing in the programs from the book that came with the computer back when you know manuals were actually pretty full and had like the data sheet in the back and had the circuit diagram of it even and um you know there were like the two or three programs that would print out a british union jack flag uh and i remember christmas time my mom reading it out and me typing it in and and you know that was where the journey began um so the the spectrum was like probably the most i mean so it was the timex the sinclair timex over here in i say over here i'm in the states now despite my accent um uh this it was the sininclair Timex over here. So it was a Z80, or Z-A-T, depending on where you come from,

Starting point is 00:05:47 processor. And it was kind of relatively cheap for the time. It was a very compact computer and it had a very terrible rubber keyboard, which felt awful and was pretty nasty. But it was a gateway into a whole new world and of course what 10 year old or whatever by the time i was sort of got to grips with it didn't want to play computer games and so we would you know back in the day this was the games would come on audio cassettes

Starting point is 00:06:17 they would be encoded so you know if you think of modem screech but lower and more rubbish that's the kind of sound that we were we grew up with and even now i get the hairs on the back of my neck go up when i hear that noise because it reminds me of those days back when right and so you'd load up games and whatever and you know they were reasonably easy to duplicate legally or otherwise and so there was quite a circuit around the playground of folks sharing but eventually you reach the point where you couldn't get more games and then you're like well maybe i could make my own game maybe that would be more fun and so you learn basic you know you probably already had learned basic just because of the way that you know you turn the computer on you have to type in commands to even get it to load from the cassette tape right but very quickly

Starting point is 00:06:58 you realize especially with the spectrum its implementation of basic was pun intended rather basic and it wasn't very fast it was incredibly slow it was not a fast interpreter and so any game that was more than like the number guessing game where you say is it higher than seven yes or is your number seven you know eight whatever anything more than that was a little bit too much for it so i remember i wrote a couple of strategy games and i wrote a little adventure game and i even got as far as selling one of these games in the back of a you know magazine where you could like the classified ads at the back you know send 10 pounds to this address and we'll send you a cassette in the post i sold one copy not very much but it was better than nothing right right um but

Starting point is 00:07:42 then you get to the point where you're like well i really want a game where i can shoot things because you know that's more exciting than typing stuff in and at that point the only way to get any kind of performance is to write this thing called assembly and you didn't really understand what it was but you knew you had to to to do it to get the performance it was the thing that you know you knew that that stuff was written either in assembly or basic um the spectrum didn't have an assembler which was a which was a shame you, you know, you knew that stuff was written either in assembly or basic. The Spectrum didn't have an assembler, which was a shame. You know, you had to go and buy one and I couldn't afford one.

Starting point is 00:08:20 And so I remember the very first assembly program I ever got working was a scroll text, very simple scroll text at the bottom of a screen. And it was written during a very boring swimming gala that i had to attend because my sister was was in and i hand assembled it on the back of the the program for the the you know the schedule for the for the gala swimming oh right right and then i got the program is an incredibly overloaded word in this context right i was gonna ask did you bring the the spectrum with you to this no it was written in pencil right on the back and then hand assembled when I got home. And it worked first time. And I was really hooked then. Then I managed to find someone who had an assembler. And I wrote a sort of simple block-based game where you ran around.

Starting point is 00:08:57 But it was a lot faster than you could ever achieve with BASIC. And there were like a little tiny bit of programmable logic in it. And the door was open. But I consider it an absolute blessing that I was born when I was, and computers were as simple, air quotes, simple, as they were back then, because a 10-year-old, 12-year-old

Starting point is 00:09:16 could fit the whole thing reasonably in their head and understand enough of it, especially with the right amount of will and motivation to make a game to to to uh yeah understand the whole thing and make a game themselves uh nowadays of course you know my i've got kids that are older than that now and the pair of them are like well i want to make minecraft and i want to make you know some new fps like, well, that's a long way away from like a one block, a star moving around inside a maze of pluses and minuses.

Starting point is 00:09:50 But yeah, so then I moved from the ZX Spectrum. A very good friend of mine had a BBC. So back in the 80s, the British government decided that in order to kind of get ahead, they should teach all their citizens about this newfangled thing called a microcomputer. And they commissioned the BBC to make a program, a TV program now.

Starting point is 00:10:16 And in UK English, that's program MME at the end, which is a strange thing. But yeah, you can't hear that on a podcast. Anyway, a program which would then be distributed, you know, broadcast. That's what we call it back, isn't it? Distributed, broadcast to everyone to teach them what a computer was. And the makers of this TV show,

Starting point is 00:10:37 we'll go with a slightly less ambiguous name here, decided that they should have an official computer to go alongside of it so that they could teach the ideas that they were doing with an actual physical computer that you could also get yourself and obviously there were other ones around but they wanted to make it sort of mostly affordable and it critically it was going to go into schools at the time so that schools would have this sort of backup as well and so like the whole there was a generation my age that grew up with a particular computer in their school and that computer was made by a

Starting point is 00:11:05 company called acorn who very famously at the last minute sort of outbid and outmaneuvered sir clive sinclair of the zx spectrum or timex sinclair um and uh got the contract to make this computer even though the computer had been made in like three days with them soldering it together and writing the software or in all nighters. And literally as the person from the BBC was due to come in to see this apparent demonstration machine, it wasn't working. And for the whole time,

Starting point is 00:11:34 someone was having to hold like a wire that they discovered was loose with their hand or otherwise earthed or grounded in some way, you know, one of those amazing stories. It's probably more apocryphal than real, but it's great to think about. But they won the contract, and this machine was pretty prevalent in the UK,

Starting point is 00:11:55 and that was the computer I moved to, actually. I moved to the sort of 128-kilobyte version of it, which, you know, woo-hoo, a whole 128K. And that had a 6502 in it, which was the movement of, you know, a different CPU from the Z80, obviously. And sort of curiously was like really a RISC processor. If the Z80 is a CISC processor, you know,

Starting point is 00:12:16 like there's 768 odd opcodes that it has. The 6502 has less than 256. Not every single opcode in the one byte that is an op code actually encodes for something that they they meant to happen right which is a whole other story we can get into in a minute uh and so that was what i really focused on so during my my teen late teens i was i was working with a good friend of mine and we were making games and we were sending there was a magazine uh back when that was a thing. Perhaps some folks will remember what magazines were. They're like thin, flimsy books that you could buy once a month from a special shop that sold them.

Starting point is 00:12:56 It's like if you, you know, printed a PDF or something like that, you know? That's right. Yeah. Or like a blog post or something like that. A sequence of blog posts are all printed out. Yeah. But there was an appetite for type in programs where you would buy the magazine and at the back of the magazine in like a yellow pages and really cheap quality print. There was, you know, 800 line programs for various different things. And there'd be articles in a magazine explaining why you might want to type the program in with little screenshots and things

Starting point is 00:13:29 and like and lots of artist drawings that made it look a lot more impressive than it actually was but it was we were hooked right that was that was a way that you could learn more about how to program the computer essentially they were like the blog posts and all the stack overflow of their days you know you would type it in and inevitably you type it in wrong and it wouldn't work and then you'd scratch your head and you'd stare at the thing and you kind of go well i think i understand enough of the flow of it to now work out where i must have typed it in wrong and so you learn debugging skills before you even knew what debugging was and that was great and yeah later in my teens i was i was writing articles with my friend and sending them to this and it was a great way of of uh you know keeping us in a few 10 pounds here 20 pounds

Starting point is 00:14:15 there for for for buying more games as it happened right you know that that was but it meant that we again the the chips were simple enough the computer was simple simple enough, and the BBC had a built-in assembler just out of the gate. You turn the computer on, you open a square bracket, and you're typing assembly. It was fantastic. But the BASIC was also incredibly fast. It was a very good put-together version of BASIC. The person who wrote that BASIC went on to write the BASIC

Starting point is 00:14:42 for the Archimedesedes which we'll probably talk about well maybe we'll talk about a second um i know some foreshadowing here i'm getting all excited as well because it's such a great story set of stories uh but yeah so it was it was wonderful that we were able to learn so much about this this system and it was so uh everyone had the same system under their desk not like pcs these days where everything's different so if you found some clever trick about your computer it would work on everyone else's computer, too, even if it wasn't in the manual. Right. And so people enterprising folks would realize that of those 256 opcodes in the 6502, some were not specified. But like somewhere there's a network of transistors doing something.

Starting point is 00:15:23 And it's not like they threw an exception, a hardware level exception. It's just like, no, different parts of the chip turned on because these bits were high and these bits were low. And so very famously, there was like a store instruction. Store A was one opcode. Store X, the X register, was the next opcode. Store Y was the next opcode. And then doesn't do anything, undefined, was the next one. You're like

Starting point is 00:15:45 well is there a fourth secret register or what and so you try it out and through a bit of working out you you realize that no what it's doing is there's a those two bits there are the bottom two bits of the opcode are selecting either if the both are clear then it's the accumulator that's put onto some internal bus inside the 6502 and then if the low bit is set the x is put onto the bus and if the high bit is set the y bit y register is put onto the bus but if you set both of them it just puts the x register and that this is the like as well as and the y register onto the bus and because it was an nmos design we discover later on that meant that essentially the zero bits would win so it was actually an and it was x and y that was put onto

Starting point is 00:16:32 the bus which meant that then when the store circuitry went to go and now push this out to memory you got the x register anded with the right y register written out to memory and maybe that's useful to you if you're writing a sprite routine. And for example, you need to mask the bits that you don't want to change with a sort of like don't change these bits mask. So, you know, these were clever things that people would discover and determine. And even like the video circuitry was clever. You could do some tricks to change and lie to the system that it had slightly more lines or slightly fewer lines

Starting point is 00:17:04 and cause it to generate the H syncsync or the v-sync at different times which would be interpreted interpreted then by the monitor or more likely the television that it was plugged into as moving up and down slightly so you could get it to wiggle around and then with careful other things timed you could get the screen to scroll around in a very smooth way that was otherwise totally impossible for something that underpowered right So there were a lot of really cool things that you would learn back then about how to make the most of it. But by the end of this process... Yeah, go ahead. Sorry. No, I was just going to jump in while we're kind of like on the 6502. So you mentioned

Starting point is 00:17:40 two very important 8-bit microprocessors, right? The Z80 and the 6502. And there's kind of a couple of other contemporaries around there. But in some of my own experience and some of the research I was doing for this show, the BBC Micro, and then I believe from my pre-show stalking that we talked about before we jumped on here, you had a BBC Master, is that right? That's correct, yes. That was the posh 128K version, yeah. Right. And so I've heard in talking with a lot of folks

Starting point is 00:18:16 that these computers are really impactful, and also the 6502 was in a number of other very notable systems, so the Apple I, the Apple II, the NES. I'm leaving off a number of other very notable systems so that apple one apple two um the nes i'm leaving off a number of here i don't know if you have any off the top of your head that that i haven't named but um bender from futurama has a 6502 and the terminator okay if you freeze frame the terminator when he's got the stuff scrolling down the screen at the very beginning of the movie it's 6502 op codes and it's like a bootloader boot rom thing it's copying memory down low right okay perfect so so both uh both real and fictional uh impactful computers exactly but yeah so you mentioned um

Starting point is 00:19:00 kind of the uh that it was somewhat of a reduced instruction set computer and that there was a space of 256 opcodes I think there was 151 used for 56 instructions which you know is is drastically smaller than some of the the contemporaries which I think was a contributor to it being cheap which I think was like the the big driver of, you know, the systems that it was put into being able to be cheap. And then also is probably why when I talked to so many folks that they had exposure to it, right, because it was more accessible and kind of like drove this revolution, you know, having personal computers and that sort of thing. I was I was curious, you know, in moving from the the Spectrum to the BBC Master,

Starting point is 00:19:46 was there a significant price difference between those two machines? Because I know the Z80 was also on the cheaper side, but more expensive than the 6502. The BBC was actually very expensive for what it was. Okay. The 6502 may have been cheap, but I think the expensive part was the ram they put in ram that was and this is this is the probably one and only time in the history of the universe uh that this has been true the ram was twice as fast as the cpu which meant that their the cpu and the video circuitry shared it on alternating cycles so it was running at four megahertz and the tv output system was running at

Starting point is 00:20:26 two megahertz as was the um the cpu and then they were out of phase by uh you know half a clock or however that works and so that meant that you never had contended ram which we had got on the z80 there were banks of ram which was slower to access because that's also where the screen was and so you were sharing it with the screen every time the tv needed to serialize out more colors the spec spectrum would have the ula and the spectrum would grab the bus essentially steal it away from the cpu and go no this is mine now um take the information it needed and then the cpu would run slower whereas on the um on the the bbc uh it was just shared time-sliced style,

Starting point is 00:21:06 which is pretty bonkers to even think about, right? The RAM is twice as fast as the CPU. What would we give for that these days, right? Right, right. We definitely have the opposite situation now. Right. And one of the other things that I kind of observed about the 6502, that it was, despite it having you know, having less functionality,

Starting point is 00:21:27 if you will, or seemingly less functionality, it was much more performant in a lot of cases than some of the competitors. And, you know, I think there was a variety of reasons for that. You mentioned that there was only a handful of registers. I think you mentioned the accumulator, the X and Y registers, and I think there was a stack handful of registers. I think you mentioned the accumulator, the X and Y registers, and I think there was a stack pointer and a program counter. That's right, although those weren't really registers in the same sense, just like they aren't on most architectures. Right.

Starting point is 00:22:02 Yeah, the Z80, on the other hand, had all these sort of paired 16-bit registers, sort of pseudo 16-bit registers that sort of very presage the 8080 that it was sort of around contemporaneously. And there was a lot of cross-pollination and some strange IP-related nonsense that then sort of bled. So there's a little bit of x86 smell to it, right? Even back then. But the Spectrum was really,

Starting point is 00:22:20 oh, sorry, the Z80 was very interesting because to save money in the Z80, they only had a four bit alu that they just pumped twice to get eight bit answers or four times to get the 16 bit answers which meant that even like really simple things although it was clocked at a higher speed at least it wasn't a spectrum i think 3.57 someone's gonna correct me in the comments i'm sure um but somewhere of that range it took more cycles to do anything and they were like very complicated p states and t states and other things that were to do with like

Starting point is 00:22:50 am i accessing ram or not accessing ram the 6502 on the other hand access ram every single cycle unconditionally there was no even memory um enable pin on it was like nope if if the clock's happening i'm looking at ram or i'm reading from ram or I'm writing to RAM. That's the only thing, you know, that's the two things I'm doing. Right. Which, you know, reduced the pin count on the actual chip itself, simplified the design of everything. You just plugged it into RAM and went, there you go. like load the accumulator were one cycle to read the byte to load the accumulator, one cycle to store the value, to read the value from the opcode and put it into the accumulator. And I think there was another one cycle always because everything took three.

Starting point is 00:23:36 Is that right? Oh, no, now I'm doubting myself. This is awful. I've got a huge table of them somewhere. But, you know, it was pretty straightforward, although I've just demonstrated it's a bit more complicated. Because it was just, it was how many memory accesses did you need to do the work?

Starting point is 00:23:50 And that was it. Whereas the Z80 had this, like, it may take four cycles to do an ad even, because there's, you know, four bit things to do. But that led to some really interesting side effects, actually, on the 6502 now we're here, that were kind of unobservable as a programmer. And yet.

Starting point is 00:24:10 So one of the opcodes is a rotate instruction. So it reads a value and rotates it, as in shifts it up one and takes the top bit and puts it back down where the bottom bit was. And then it writes it back. So this is a read modify write instruction uh the first op the first cycle would be read the roll opcode the next two cycles would be read the address that i'm going to be doing this from the fourth cycle would be read from that address now i know where it is the fifth cycle well i'm doing the rotate, dot, dot, dot.

Starting point is 00:24:47 And the sixth cycle is when I'm writing it back. But as I've said, there is no memory-enabled disable pin. So what's it doing on that fifth cycle? It's accessing something. It's doing something with the RAM. So what is it doing? And again, it wouldn't matter, right? As long as it's not destroying anything, presumably whatever it's going to do is it doing um and the only and again it wouldn't matter right as long as it's not

Starting point is 00:25:05 destroying anything presumably whatever it's going to do is it's going to write the correct piece of information at the end but it could reasonably just read the same value twice maybe maybe it you know it could write to some dummy location or it could read some dummy location or whatever but it turns out it actually writes back the unmodified value effectively the little table in the alu not in the alu in the, what do they call it? Not the ULA. There's like a little array of, like, it's not quite microcode.

Starting point is 00:25:34 It's just like on step three of instruction five, then- Oh, the PLA and the- PLA, thank you. Yes, yes, thank you. I knew that it was, it's one of those three letter acronyms that I can't remember. But on that fifth cycle, they just said, well, thank you. I knew that it was, it's one of those three-letter acronyms that I can't remember. But on that fifth cycle,

Starting point is 00:25:47 they just said, well, we might as well start the write operation, even though it doesn't do anything, because we're going to write something, and then on the sixth cycle, we're going to write the correct value anyway. So on the fifth cycle, it redundantly wrote the value it just read,

Starting point is 00:25:57 and on the sixth cycle, it wrote the correct value. And you think, again, totally unobservable. Why would you care? Except lots of hardware was memory mapped in those days as it is now in fact right but that meant that reading and writing to memory sometimes had a side effect right and so nobody would choose to do this really but if you are for example

Starting point is 00:26:20 making a game and you want to make sure no one can copy your game or no one can at least, you know, hack it to put extra lives or cheats or whatever into it. What you might reasonably do is encrypt your game and then write the decryption routine and have the decryption routine, like decrypt the code

Starting point is 00:26:38 that's immediately after it and then run into it. As in the last instruction of the decryption routine, the next thing after that is the first byte of the thing it decoded right there's no break points on these machines there's nothing like that um there are like registers you can set that say if we get interrupted reset the machine and wipe ram so like i could once it's got to that point it's this it's like a one-way street the only thing i can do is reset the computer after that but i can play the game right but it means i can't get into it and look at it and hack it or got to that point, it's like a one-way street. The only thing I can do is reset the computer after that.

Starting point is 00:27:05 But I can play the game, right? But it means I can't get into it and look at it and hack it or anything like that. But obviously, if you can see the instructions that do the decoding, because you can load it off the disk yourself, you can just do yourself what those instructions did, either by copying them somewhere else and running them and then running the decoding and then maybe saving it before you run to the the decrypted game and now you've got a decrypted version of the game and so there was a cat and mouse game in the early 90s about this kind of

Starting point is 00:27:34 stuff and um the sort of the the cat and mouse game increased from just simple exclusive or with some random keys that i made up through to well what if we as as the encryption writer what if we use random bytes we read off the disk in places that you wouldn't expect okay fair enough that stops you from copying the disk uh what if we start doing things like there are these hardware timers i can read from a hardware timer the value is always changing now if we copy the code down if me as a hacker copies the code down low and tries to do this the time is changing and because i'm manipulating it myself externally time is changing more differently than if it was running free so now the key the decryption key

Starting point is 00:28:14 isn't the same and so i don't decode the game but there are ways and means of stopping the timers and then rewinding them back exactly the right amount and then carrying on and stopping them again and rewinding them and all this kind of nonsense. And then, so eventually somebody came up with a protection system where they threw the kitchen sink of everything they could possibly think of. That was like essentially either not, it was deterministic, but unspecified. One of the things was things like rotating some of these timers.

Starting point is 00:28:40 If you rotate the timer, then obviously reading and writing to a timer has a side effect of, of setting and resetting it. And this role was one of the many things that there was done that would cause this weird behavior that no one would have known and in fact many years later we tracked down the person who wrote this protection system and said how did you know all this stuff because you know all these things fed into the key and you know things like enabling interrupts and then having these timers make the interrupts go off and then the interrupt deliberately corrupting registers so the decryption routine would actually return

Starting point is 00:29:08 in a specified place with a different accumulator value than when it's like, who would do such a thing? And we're like, well, how did you know what it was going to do? And he said, I didn't. I just knew it was deterministic. But then how did you encrypt this? How did you have this depth of knowledge and whatever he said well i desoldered the chips off of the board i disabled the

Starting point is 00:29:30 functionality that wipes the memory when it breaks you know when when it gets when it when it um right uh when it hits the end and through some clever tricks which i won't go into now as we're already been talking about this for 10 minutes but um he found a series of um decryption or rather sort of um yeah i suppose it is decryption things which formed a ring of cycle 255 and so he painstakingly did this 255 times and then saved the penultimate one and that was the one that went to the fabrication factory and he still doesn't understand how it was now the i shall tell you now why i know this which perhaps will segue in or we can go back and this is because many years later well first of all i tried to hack that game as a kid and i failed and it was a my friend uh richard and i wrote a 6502 simulator in 6502 to try and simulate it perfectly in order to decode the stupid thing and we failed but

Starting point is 00:30:27 fast forward 20 years and i wanted to write an emulator for my bbc my beloved bbc micro and in order to just run the game not try to decrypt it just to run it normally i had to solve all of those problems and really understand at the lowest level what's going on so i can tell you that the fifth cycle of a roll writes the uninitialized value back and i and i know why because i simulate it in the emulator in order to have this work and in fact uh the the protection system is now one of the unit tests of my uh of my uh of my emulator it's like does it decode yes good right where you go right so anyway that was a huge derailment um that was great uh i i've looked through your your emulator a little bit and actually was uh poking at some of your

Starting point is 00:31:11 unit tests because um i was curious about um one of the the other attributes of the 6502 which is documented behavior um it turns out um And that is the zero-page addressing mode, which I thought was, I don't know if this was common at that time, but it was an interesting thing. Yeah. I think it was, basically, it was the only way for it to have pointers because, as we've discussed, we had an A register, an X register, and a Y register, and those were all 8-bit registers,

Starting point is 00:31:43 unlike the Z80 with its paired HLDBCAF registers. The 6542 didn't have 16-bit registers, but you could indirect through a pair of memory locations in the zero page, as you say. So the first 256 bytes of RAM was just still normal RAM. It wasn't cache. cache wasn't special it was still out in the on the board but the opcodes that accessed it could first of all they only needed one byte if the opcode said hey i'm a zero page opcode then there was only one byte for the

Starting point is 00:32:16 address and the second there were several indirect instructions that would operate through a pair of zero page addresses and treat it as a 16-bit address to then read from somewhere else so it's almost like you had 128 16-bit registers available to you which was really quite a powerful concept and and and some of the more exotic architectures these days that have like belt computers or like register files that spill there's a sort of flavor of that there isn't any way of like offsetting the zero page you could use the x register to actually offset into the zero page but that was very uncommon you know it was essentially like you had to very carefully allocate your zero page if you're writing a game and you're like well the

Starting point is 00:32:59 operating system such as it is still writes to this in an nmi routine so i have to like leave those ones alone but but I can, if I page the ROMs out from the basic and then disable interrupts, then I can use four zero through four F or whatever it is, you know, and somehow you could get some memory in the zero page. And yeah, I think it was a really interesting and innovative way. And again,

Starting point is 00:33:20 it's very simplistic, right? And it's not traditionally risky because it's not like load store. I mean, like, you know, there were instructions that would do these read, modify, writes and things like that. But it was a really simple set of very straightforward concepts

Starting point is 00:33:33 that were used to build all of the rest of the instructions. And I think that's sort of what I think of as being risky, even though that, yeah, as I say, there's not strictly load store. Right. Yeah, it was interesting. I feel like the, I haven't encountered many other instruction sets with the,

Starting point is 00:33:48 I think there was 13 addressing modes maybe for the 6502. Sounds about right. There'd be a file somewhere in the emulator with them all listed. Right, right, yeah. They've shrunk down the number of instructions, but you can execute them all in a variety of ways. Right. Yeah. down the number of instructions but you can execute them all in a variety of ways so right um yeah and it wasn't as beautiful as as um some of the later things that were inspired by it in terms of the way it's laid out so you couldn't mix and match them arbitrarily but you could do

Starting point is 00:34:15 most things in most ways you know that was kind of a nice nice thing but yeah yeah absolutely the um the kind of mention of having more, essentially more registers available. For some reason, almost every episode of the podcast thus far, register windows have come up. Right. That's probably why it's top of mind for me. Right, right, right. I forgot that you had listened to some of them. I will say that I can give you a preview of them i will say that um i can i can give you a preview of the next episode that is going to come out which is very relevant to register windows because my interview is with robert garner who designed the spark instruction set oh cool so uh

Starting point is 00:34:58 he goes very deep on on register window i look forward to that then this is now but folks listening to this now will be like well this is a real window into when things are recorded wherever right right the curtain is fully open now exactly right they're getting a window into my my need for a backlog to to keep going here but um well you know that's that's uh quite the experience you had you know while you were still kind of growing up uh being exposed to all these different things you mentioned um that it was kind of a blessing uh to i i might say like have to be exposed to computers at that level right because it was a choice right but there was there was right i was receptive i know right but but it was there you know if you had the need to to make a game that was how you were going to do it yeah right i mean one of the things that i felt uh kind of like growing up and you know earlier on in my career and that sort of

Starting point is 00:35:48 thing where i was being introduced to computing at uh with machines that were much more complex and also uh tooling that was much higher level and more capable um is that you know investing in kind of like learning uh lower level concepts and that sort of thing could be viewed, I would push back on this notion, but could be viewed as kind of unproductive, right? Not doing the most productive thing there. Why would you learn how this stuff works when really it should be hidden from you? If you're learning to drive a car, you don't need to understand how an ignition coil works, right? Right.

Starting point is 00:36:24 But it's kind of of it is useful to know somehow absolutely and apparently you know there's there's other people like us who think that as well one of my my favorite uh quotes was from uh tom lion who uh has been on a number of different podcasts he was an early uh sun employee and uh he i i always butcher the quote but it was something like um uh abstractions are meant to create boundaries for machines not people so or people are meant to pierce abstraction layers even though machines are not so um it's kind of like yes we should use abstraction to enable us to do things uh faster and you know with more certainty but that doesn't mean that we are resigned to not

Starting point is 00:37:06 look no i think that's it yeah abstractions are a tool and we can use them to help and they can be used in all sorts of things you know like they can be used in an organization to say well you know this isn't really how that part of the organization works but what we have to do is fill in this form and then a bit later on a computer arrives and i don't need to know anything about how that happened but you know that's how I purchased things or whatever. Or we can use it as like, well, I typed this thing into the computer and then I get linear algebra solutions. And that's great.

Starting point is 00:37:34 But as long as you can keep going down the levels of abstraction, as long as there's no barriers to you, you know, I think you should always be aware of the lower, the layer below you and a couple of layers above you. If, if such a thing exists, you know, I think you should always be aware of the lower, the layer below you and a couple of layers above you. If if such a thing exists, you know, and it doesn't matter how low you are. There's always at least one layer below you. As I'm sure you're learning in this journey, too. You know, you think things you take as read and then you're like, oh, wait, someone had to.

Starting point is 00:38:01 Oh, yeah, that doesn't work the way I thought at all. I just assume that work like RAM. worked like ram it just works right you're like no there's a whole set of things to think about how does that work right right absolutely well okay so moving um after you know maybe like going through high school i imagine um was uh some of that storyline there and then uh you eventually um go to university what's kind of your like most folks when they're go to university. What's kind of your, like, most folks when they're going to university, they're thinking, what do I want to, you know, do and learn about and that sort of thing. What was kind of your motivation at that time? Obviously, lots of exposure to computing. But did you see that as a career path?

Starting point is 00:38:37 No, that was it. I think it never ever crossed my mind. That's not true. I think it probably did cross my mind. That's not true. I think it probably did cross my mind. But I had always been interested in physics and science in general. And I sort of designed a route in my head that was like, I'm going to go to university. I'm going to get a master's in physics. And I'm going to do my PhD.

Starting point is 00:38:58 And then I'm going to do quantum physics or astrophysics or something like that. And this computing thing was just my almost life-defining hobby right even then and i never really thought about it as anything more than that um my my journey for physics started from so this is a strange non-secretary story but like in the uk in the 80s i used to wake up really really early and there would be nothing on the television the tv shows tv stations would shut down there were only four of them or probably In the UK in the 80s, I used to wake up really, really early and there would be nothing on the television. The TV stations would shut down. There were only four of them or probably three or two of them at the time even then.

Starting point is 00:39:31 And so overnight it was just a test picture of like, you know, with the little like nothing here. But there was one channel where a distance learning university used to transmit its lectures that you would set your VCR for like 3 a.m or 4 a.m and you would record an hour-long lecture and i used to wake up and watch this because it was the only thing that was on and i have these vivid memories of these bearded 70s men dropping marbles into like bowls and then through land of extremely primitive camera technology showing superimposing all of the various frames to show the pattern that

Starting point is 00:40:05 the marble was rolling in and then writing out equations on boards about it and i was like again i think the common theme here is like weird sigils on a screen gets me interested right and so that was started my interest in physics and then yeah i went to university i studied physics and i'm it studied i'm gonna have to do air quotes here that your listeners will not see uh because really as soon as i've discovered the internet such as it was back then and computers where they were more they were bigger and more powerful than i was used to so by this time i'd graduated on from the bbc master so like i think i was 17 so it's like last or penultimate year of of uh high school that i got an archimedes which was made by acorn who were the

Starting point is 00:40:54 same company made the bbc micro it was a natural progression from that but they had decided to jump this 8-bit era all the way to 32-bit and forget this 16-bit era so like all my contemporaries so i hung on to the bb three years past its best before date right it was way overdue everyone else was already on their ataris and their amigas and learning about blitter chips and things that were really cool and interesting but i was like no i can do this on my 8-bit machine it's fine and then eventually when i gave in i thought well i'm going to go with acorn still and by this point acorn had designed their own 32-bit microprocessor and this microprocessor was inspired heavily by the 6502 that they'd cut their teeth on the team and knew all about it they went out to western digital or whoever was the designer at the time of the 6502 and said can you

Starting point is 00:41:39 tell us about how you make a chip and it turns out it's like three people in by this point three people in like a bungalow in texas going like sure this is how we made it like what you this is so it's possible for like mortal humans like a small number of them to design a chip and they're like yeah of course it is i i think you know the original 6502 bill mentioned all that kind of stuff was you know bearded men again unfortunately as is the way in our industry at the moment although we're trying to change that right um right with sharpies on a big acetate sheet drawing out the 6502 but it was you know the the later versions of it were done uh similarly and so anyway the folks from acorn came away and said well we can do this too

Starting point is 00:42:20 how hard can it be nobody told them how hard it was to make a chip so they you know they were like we can do this and they designed this really beautiful 32-bit machine and they'd learned from the 6502 where it's like this almost nice separation of addressing modes and flag setting and all this thing and they thought well if i've got 32-bit fixed size opcos i can fit them in nice places and so it's really kind of a nicely designed system and that they called it the acorn risk machine because it was very much a load store architecture with 15 registers or 16 if you include the program counter and of course we all know i'm doing the whole long reveal for you here as you're smiling and you know what i'm talking about here as well

Starting point is 00:42:59 almost all of your listeners but this was the arm chip the very first arm chip and so the very first 32-bit machine i ever got my hands on was an arm and just like the acorn before it uh sorry the bbc before it straight into assembly because it was the same basic you could open squiggly braces and start typing uh 65 arm assembly and it was you know it was beautiful it was so uh simplistic uh it was super fast for the clock speed. I think mine was like an 8 megahertz or 12 megahertz. And the multiple load and store instructions that it had, which was a lovely way of reading and writing multiple registers from an ascending or descending memory location,

Starting point is 00:43:38 which was perfect for pushing and popping, going in and out of functions. But also it was amazing because you could point it at the screen and blitz sprites as fast as you could so although it didn't have sprite hardware to write games you could do pretty well with these with clever use of these multiple load and store instructions you know read from here put that over here um and so i had learned arm assembly and i'd thrown everything out the wall and so i was writing everything still in arm assembly so i got to university that's where we were but before we started this i discovered the internet and the internet was amazing and uh one of the first things i did was

Starting point is 00:44:14 write an internet relay chat client for my acorn because they were still niche even in the uk you know nobody had them right and so if you wanted to join in irc you either went to the the lab and you used irc like the command line client in unix or if you had as a client on your your your local machine and you had like a serial cable to connect to the network then you could you know actually uh do it from a gui i decided to write my own and because i only knew assembly i wrote the whole thing in arm assembly and it's i don't know how many thousands and thousands of lines it's on github if you want to go and laugh at it all but it was well link in the show notes for sure if people want to torture themselves but it was a fascinating experience of learning so

Starting point is 00:44:59 while i was supposedly doing my physics degree i was writing this irc client um the irc ended up, because all IRC clients at the time had like scripting languages built in them, so you could like do auto greeters and things like that. I ended up writing a scripting language in it, which looks remarkably like BBC Basic, except it was object orientated. And then I was doing managed memory. And so I invented this way of cleaning up the memory after you'd finished with it without having to free it manually, which I later discovered is Mark garbage collection and i'm like oh right and at some point along this path it should have dawned on me that i could should ask my roommates who were like doing a

Starting point is 00:45:34 actual computer science degree what the heck it was i was really building um but towards the end of this it became obvious that it was absurd to be writing large GUI applications in pure assembly. And so begrudgingly, and because I wanted to have my programs run on the computers at the university lab, I learned C. But C back then was the kind of C that the compilers weren't sophisticated enough. The kind of thing where you could see pun intended again what assembly was going to come out the other side you know in x equals zero oh i know that's going to be an ldr zero you know r zero common or whatever mob sorry see i've forgotten all these opcodes now um and i think you know this is a setup for where i ended up with you know seeing the way that the

Starting point is 00:46:22 compiler takes your code and puts it out into uh uh, uh, right into the output. But, uh, yeah, so that was, that was how I learned, um, C. Um, and where did you get your compiler, uh, from at that time? So for, uh, at, uh, university, it was GCC or the CC that was on the spark station, the ERIX workstation or whatever it was i could get a hold of around this time as well we inherited between me and my roommates we inherited a multi-user dungeon source code which was kind of how i learned c really was was having to hack on it and extend it and add new stuff to it um so that was that was fun um and um yeah so that would compile on whatever machine we could steal time

Starting point is 00:47:07 on to run our mud and have other people connect to which obviously was not very many people didn't they didn't like the idea of us running long-lived services so right um yeah you could imagine um and oh i've just lost my train of thought sorry what we got you what what came after so so you're you're kind of uh uh you know learning c you're experimenting with uh various machines and uh you know running some of these services and that sort of thing at that point did you start to think okay maybe maybe i'm spending a lot of time on this maybe this could be more related to my profession as well i don't know that I did explicitly you know I was I was I I was scraping by my degree I got like a mid-tier degree a 2-2 in the uh by the end of it all and in the last few weeks I started looking for a job somewhat half-heartedly and then somebody on IRC in the hash acorn channel

Starting point is 00:48:00 said well you could try applying to my company. We make computer games. And I went, well, I've always made computer games. I've got them around. You know, this MUD is kind of a computer game. It's a different kind of computer game. But, you know, I've still got my eye in, as it were. So I messaged him and he said, gave me the details. I applied and that was my route into the games industry,

Starting point is 00:48:24 which was basically my career for a decade. It was based on a random conversation with an internet stranger on an IRC channel using my own handwritten IRC client from a computer that nobody knew about. Right. And so did you start working there pretty much immediately and also uh was this like was there it seems like i'm not super familiar with what the culture was like um at that time obviously computing you know was a a big part of the university and you know you mentioned the government kind of like commissioning computers right so it wasn't like this was a uh you know an unheard of thing but was there any sort of notion of like you you were going to get a phd

Starting point is 00:49:09 right now you're gonna go work on games or was it pretty much not really i mean i mean it probably took 15 years for my mom to stop asking me when i was going to get a proper job right so you know from her point of view i never had a proper job but then it was a games job anyway so i mean you probably could have walked into some mortgage company writing admin systems or whatever and that would have been seen as like a real good real job but but um no it was so yeah i i got the job actually it was the the end of the penultimate year i i know that's um yeah i don't know how common that is over here but like you know there's not kind of an internship but I went for it anyway ahead of time and they said we don't need you to have a physics degree you should just quit and come and work for us but yeah I thought I better at least finish my

Starting point is 00:49:55 degree and have something to have my name on which actually turned out to be a very good decision later on when I tried to move to the US and it was very helpful to have a degree in order to help the process there but that's a whole other story right but no so I actually went to so the company was called Argonaut Games it was one of the biggest independent games companies in the UK in fact probably in Europe at the time it ultimately floated on the stock exchange so it was a big enough company to go on to the the uk stock exchange although that was kind of the beginning of the end unfortunately like so many dot com style booms although a lot earlier than that um the argonaut is probably noted because uh it was the sort of silent partner in the super effects chip which powered starfox in the super nintendo so if people have ever played starfox you know i came in at the tail end of that

Starting point is 00:50:45 that starfox had been out and there were some sort of secondary and even tertiary games that were using the super effects chip but uh jez the the ceo um had sort of basically lied to nintendo telling them that he could easily generate you know 3d graphics it can't be that hard kind of thing again so there's kind of a theme here going right you know like how hard can it be he said and then he sort of came back from a meeting with japan um there's a long convoluted story but this is an extremely short version and probably equally inaccurate version um and basically said to people um who knows how to make a6 and maybe i don't know and so they designed this chip which was essentially a 3d co-processor well before its time although insanely convoluted to wedge it into a cartridge as a sort of secondary

Starting point is 00:51:36 on a system which wasn't expecting to have a secondary chip other than like ram and ppu and maybe some other sort of addressable stuff so it kind of involved a lot of dancing between the CPU that was running instructions where essentially like read from RAM, read from RAM, read from RAM, write to RAM, read from RAM, write, you know, to copy the data that was being created by the 3D accelerator. The main CPU was just like hot passing plates. And there was some DMA behind the scenes. I know it was very, very complicated and complicated complicated but they got 3d graphics out of it and um you know i i was lucky enough to work with the folks that the designed the chip and argonaut itself separated into arc which became

Starting point is 00:52:17 a chip manufacturer although i subsequently bought out by various folks they had their own cpu soft core cpu which is kind of interesting and then the technology group which is what i was actually working with so i got to work with some of the tech folks from there and you know there's some fascinating things that they stories that they had um but yeah so that was that was actual silicon that was designed and implemented um and you know around that time as well was like the of the consoles. And so we were starting to see these really strange beasts that Sony and Sega and Nintendo were putting together. So I was exposed pretty quickly to these very esoteric, to me, my lovely, beautiful ARM instruction set notwithstanding,

Starting point is 00:53:00 these strange processors, the Hitachi SH- uh in the the dreamcast which is probably my favorite with its 16-bit fixed width instructions and its strange addressing modes and things like this and you're like well yeah this is this is cool um and you know starting to learn um and have very simple tooling about how multiple issue stuff was going to happen like cpus that could do more than one thing at a time the arm was pipelined and very beautifully so like everything was done it's like extremely easy to predict what was going on but um with things like the sh4 they were like well there were pairs of instructions that you would go together provided there were no detected hazards between the two instructions and they were of you know sort of appropriate types like you couldn't do two

Starting point is 00:53:47 multipliers at the same time of that kind of thing then they would pair together and so you would see these you know rather you would write this is still at that time when the compiler was good but it was still pretty worthwhile spending the time to write the assembly yourself right and so you would sit there and pair them together and uh that was that was a really interesting learning experience and i that's so my github is a mind of like nonsense that i've left behind from from the years i've got before i thankfully got the permission from jez to to publish the source code to this so you can go and actually have a laugh at the source code and it's not just mine obviously but the the renderer is mine you can go look at some comments from like 2001 i think

Starting point is 00:54:29 that i was writing where i'm swearing and cursing at various things that don't actually work the way they are and you can sort of see this strange format that i picked up where i was pairing instructions in the assembly and where there were unpairable instructions i would put a knob so that i could show and but it was not a real knob. It was a knob that I could hash define in or out. And so I'd assemble it once with the knob in place and then run it, measure how fast it was. And then I would disable, sorry, get rid of the knob completely and then compile it

Starting point is 00:54:57 and then prove that it was the same speed, give or take the fact that the code was a tiny bit more compact, right? It was a little bit more. And that would prove to me that I'd done it right and i was still pairing the instructions that i thought i was i was pairing right so that was it yeah this was all this was all um like explicit instruction level parallelism it wasn't doing the machine itself wasn't doing any of this for it was yeah the machine would would very simply pick up like four bytes at a time and if you could see the two instructions were like okay based on it's like the the registers didn't overlap and there were instruction types

Starting point is 00:55:31 that were compatible with each other then it could issue them together but yeah it wasn't doing any out of order it was like just two at a time and around the same time actually the the x86 was in the same kind of world this was like um intel had the u-pipe and the v-pipe they were the two issue stations and you know there was everything i never really did much of this but around me in the atg group was the the folks who were writing brenda which was a uh a so-called blazing renderer of course these terrible names that we come up with but um brenda was using a number of games like there was like a middleware um for a number of games including things like carmageddon um and um uh croc pc which was a game i actually worked on and but the the interesting thing was um that yeah they were still writing all

Starting point is 00:56:18 this stuff in assembly for because it was software rendering pre the you know the beginning of like um uh graphics cards you know they were they were around but a lot of people we couldn't afford them or it was like the 3d effects which was a secondary graphics card you would plug in and then you would have to put a pass-through cable from your vga card that did your 2d graphics up through into the 3d effects and then you'd have another cable that went to your monitor and it would kind of like essentially you could hear the relay click as it went into 3d mode and like took over all this kind of nonsense. But yeah, so there was a lot of concentration on like, how do we lay out the code so that you and the V pipes are fed so that certain instructions could go in the U pipe and certain other instructions could go in the V pipe.

Starting point is 00:56:56 And they would be issued together again, similarly, if they didn't have the right, the wrong kind of hazards and then just as i was getting into the pc stuff myself um the pentium uh pro was out i think it was a pro and um i had one of the early prototypes the klamaths it was a huge thing and for a long time afterwards actually after the so spoiler alert argonaut folded and a lot of the stuff went home with the employees and my klamath this strange prototype went home with me and and my klamath this strange prototype went home with me and was my for the longest time was my dial-up modem um like um gateway machine running on my prototype property of intel do not distribute all over it like fine right um but anyway um but at this point was the first time that they were starting to do proper out of order execution and so we had them come into us and say hey you know this unv pipe nonsense that you've been doing

Starting point is 00:57:51 forget it um you just can't predict what it's going to do anymore it's so clever it optimizes for you everything's magic you know use our compiler uh everything will be fine um just measure it we have this thing called vtune which kind of tells you after the effect what happened. And, you know, great, I guess. And, you know, there were obviously things that you could see that it was doing, but it was, we started to consider it, at least I started to consider it really a black box of like,

Starting point is 00:58:15 I just don't know what magic it's doing. And so around that, you know, so I spent some time on PC things. And so just enough to get that kind of exposure around that time. And then I moved on to Xbox and PS2, which was similarly painful. That one, certainly for the VU processors, there was dual issue, so it's sort of VLIW style, dual issue with the U and the v pipe were very explicit in this long

Starting point is 00:58:46 vliw thing and there were no data hazards you just had to remember oh yeah if you do a multiply it'll get written back on cycle five you better be ready for it but that meant you could interlace things yourself you could go well okay and so um i think carl graham who was one of the super effects folks actually he came up with this rather novel spreadsheet programming method with macros in the spreadsheet so that you would type the instructions and things. And it would highlight with colors where the result of this instruction comes out down here. And then you could work out that it would actually fit and all this. And it was very painful to do, but it was actually necessary. At that time, there wasn't even a C compiler that could target this stuff because it was so Byzantine, you know, it's so weird

Starting point is 00:59:28 and so very special case for geometry processing. You know, we call it vertex and pixel shaders, well, vertex and geometry shaders, I guess, these days. Later on, they had like a very smart assembler that let you write the assembly without thinking about the hazards and it did the interleaving and the VLIw wing so it got a little bit better but that was my sort of my first um real foray into oh my gosh there's a lot of things that the cpu could do for us that i've been spoiled into it doing them for me right is the this one doesn't do it where the uh so i'm i'm not knowledgeable of making games at all uh

Starting point is 01:00:06 which is is kind of a i feel like an an uncommon thing but i just have have not had any interest in games themselves but i'm very interested in the hardware that goes along with it so i'm curious um you know when you're writing uh like you know software that's going to run in a data center, you don't really think about the underlying hardware that much. And I believe now, you know, with game engines and that sort of, well, yeah, we might get into when you do think about the hardware sometimes. But with, I believe now, like, game engines and things like that allow you to kind of, you know abstract across multiple platforms were y'all writing games at that time that were targeted to just a single platform and was it a lot of work to move from one platform to another or deliver on multiple platforms so yes and no um i think all of the best games of that era were single platform and they really played to the strengths of their individual platform there was enough to discriminate between the platforms um you know like the playstation had an insane fill rate it could write the pixels to the screen

Starting point is 01:01:08 so quickly um but it could hardly do anything it was no blending modes that it had there were you there were some tricks to doing some uh some of the things you would otherwise like to do um but whereas the xbox was not so high the fill rate, but had higher vertex throughput and was easier to work with. But you, you know, so you kind of like you are, you would trade off of the different, different approaches,

Starting point is 01:01:31 but yeah, so the games that I worked on were actually multi-platform, but we didn't really have a generalized engine. The, the engine that me and my friend Nick Hemmings wrote became the de facto engine for two platforms and a few games around the time so it powered the game SWAT which was SWAT Global Strike Team which was like one of the SWAT franchise games for Xbox and PlayStation 2. PlayStation 2 came along late because at the time

Starting point is 01:02:02 we were Xbox exclusive and so we kind of went to town and i wrote shader language and i wrote a shader compiler that compiled from that like my little uh dsl down to a vertex shader program which could you know calculate all the uvs and a pixel i'd like i'd been enamored by toy story and i've been reading up about how pixar did things and i heard about these shader things i was very excited and so i did all this this stuff and they I mean that was fascinating the the way that this the systems were working under the hood and how they managed to get the the power that they got out of a very it was a relatively early nvidia part um and um and interestingly they told us you know we we can't tell you how it works because we have agreements with nvidia um it's directx as far as you're concerned and then they would cough politely and say but if you look in the header

Starting point is 01:02:49 file maybe you'll learn a thing or two and then they walk away you know and you open up the header file and all this so directx is com i don't know if you've ever heard of com or you know about com it's this really i have yeah janky business thing that like you inquiry an object for what interfaces it supports and then you say get me that interface and it returns you like essentially it's all c++ virtual tables and things behind the scenes or c function pointer arrays or whatever but you look through and you see that it's actually just a bunch of macros that they defined in a header file to make it look just enough like com for you to be able to write com and then very clearly you were being handed back structures that were you know obviously the actual

Starting point is 01:03:25 things that were being sent to the hardware like thank heaven for that you know we're we're able to talk to the hardware ourselves because again these earlier machines like the playstation the playstation 2 um the the dreamcast they essentially just send you the hardware manuals you know poorly translated uh hardware manuals this register does this thing good luck off you go it's mapped at this memory location have fun bye you know you know oh okay um so you were very very much exposed whereas microsoft couldn't expose us at that level because a they had an api they wanted to kind of marketingly say hey it uses direct x and b they couldn't breach their contract with nvidia but we got to learn how the nvidia chip was working we understood how the the various like tricks that it was doing and how it was stamping

Starting point is 01:04:09 down multiple pixels at once and how it was you know discarding things based on some clever uh tricks behind the scenes it was it was a fun experience to learn that you know cpus don't have to look like you know fetch an instruction run the instruction get on with the next instruction it could be like no fix fix 80 copies of the data run them on little threads that are running the same bit of code but different data you know but not in a simdy way in a kind of like parallelized across another way it's like really interesting like how do you hide the latency well you just do another one just keep doing more of the same one you're doing the fetch in the first cycle for loads of them oh that's really clever i'd never thought of that so that was an eye opener um and i there was a reason we were going this way and i can't remember what it is

Starting point is 01:04:50 no just uh targeting multiple different platforms oh that's right it's a different platform so yeah we painted ourselves into a corner by putting all these whiz bang features into the xbox and then saying like well the xbox isn't doing as well as we'd like how about we port it for the playstation 2 and that was a very painful operation that's where we grafted someone else's like core core rendering library onto the bottom of our xbox 3d engine and kind of pounded it until it worked found a number of ridiculous ways of of getting the full screen effects that we had on the xbox on the xbox using the xbox beautiful blending modes to work on a playstation 2 which were all variants of the theme of if you've got a 24-bit frame buffer in memory but you lie and say no it's an 8-bit

Starting point is 01:05:38 frame frame buffer by setting the flag that says it's an 8-bit frame buffer well it's actually planar and so there are bits of the way the ram chip on the graphics unit work map each plane in this particular way which means that the red pixels are like a 16 by 2 array if you're viewing through an uh an 8-bit lens of this 32-bit buffer and so you can draw a little set of triangles that just picks out those and then you can use it as a multiply because it's got an 8-bit multiply you can do an 8-bit multiply so you can draw a little set of triangles that just picks out those and then you can use it as a multiply because it's got an 8-bit multiply you can do an 8-bit multiply so you can do the red multiply if you do this but like that's only 16 pixels now you have to move 16 across and grab the next batch of red and then the next one it was zigzagging and all this stuff so you'd end

Starting point is 01:06:18 up like sending you know hundreds of thousands of triangles to the system to pluck out the red the green the blue independently to then map like a full screen red pick triangle a full screen green triangle a blue triangle just essentially get a 24-bit multiply of red red with red green with green blue with blue you're like you know why does it have to be so difficult but right it makes you appreciate um the trade-offs that you make in this design space my understanding was that the blending modes that were available, so the blending modes are like, am I replacing the pixel that I'm writing to? Am I adding to it? Am I subtracting from it? Or am I multiplying with it? And this gives you different transparency or opacity or other special effects.

Starting point is 01:07:00 But you've got to have a lot of adders and subtractors and multipliers to be able to do that and i believe the way the playstation worked is they pushed the circuitry out to be in and amongst the ram of the uh the frame buffer so that the sort of the last stage blending happened with the packet from the the sort of gpu going hey i just want you to do this operation to the ram and i don't have to read it to modify it to write it back again i just send it to you and you do it in place and that's a really cool trick but it means it's really limiting because you can't have all these other blending modes or because you're just blowing up and blowing up the amount of silicon you need at least that's my understanding how it would again this is through a lens of like you know 20 20 years of like hardly remembered things but yeah so it yeah, so it was a challenge to do cross-platform development.

Starting point is 01:07:48 The machines were significantly different from each other. I think you said earlier that nowadays, engines are sort of commoditized these days. And I have a friend who still develops for Unreal Engine. I still have another friend who does consultancy work in the games industry. And I said to them, oh, you must do all this stuff still. And he goes, no, not anymore. You know, there's five people at Epic

Starting point is 01:08:12 that do that kind of stuff. And then everyone else just uses the engine. And actually it was a very sad thing that he said to me. He said like 90% of the work that we do in games these days is UI work. I'm like, what? He's like, every game is just another 3d game with

Starting point is 01:08:26 whatever textures and animations and stuff which is all solved problems right and ai this and whatever moving this and said like but every game is its own unique bespoke shop for you to buy all of the merch is how they really make their money and you know like and so you're writing like web pages in 3d like drawings and stuff like click and rebates and it's been very sad how the industry has changed right you uh i i believe after um argonaut uh shut down that you started your your own company for a period of time i was curious if um i guess you know first why uh you decided to do that and what that experience was like and and two um if you know maybe some of those changes or changes you saw in the gaming industry kind of started to lead you

Starting point is 01:09:11 away from from working in that industry yeah it was definitely so the the games industry is and probably so it was and still probably is uh very crunch heavy you know i was fine in my 20s when i didn't really have anything else to do and i was my entire life was like doing this kind of stuff that i was happy to spend till very late at night and then go get last orders in the pub and then crash come back again the next day to do it all over again um so yeah as you said argonaut ended up folding it went under um and uh around towards the end of this my my friend nick and i the guy who had written the the engine with uh we we had been enamored of trying to make the build time lower

Starting point is 01:09:54 so c++ is notably very slow to build not as bad as the those folks who are listening who are screaming saying but what about like chip synthesis you're like yeah okay we're not in the same league as that but but it's still frustratingly slow and there are ways and means of laying out your code in a different structure unlike most other programming languages where there's only one way to like do something in c and c plus plus you've got a choice about do i put this in the header file do i make it a template do i not make it a template which has actual structural build time difference changes. And so we had this great idea for like, well, we thought a great idea

Starting point is 01:10:30 for how to change the way people program. And we were kicking this idea around the end of Argonaut. And then when Argonaut went down, we looked at each other in the eye and went, should we give this a go? And my then girlfriend, now wife, had just moved in with me. And so I was like, well, I guess you could help

Starting point is 01:10:49 pay the mortgage while I try out my idea. So we formed a company called Profactor and we had this idea for storing code in a different way so that it was easy to render the code out in a way that was very friendly to easy to render the code out in a way that was very friendly to the compiler without the human having to remember oh if i'm pre-declare this rather than not pre-declare it and you know all the rules and things that you can do to make your code um faster to compile say or to make it more incremental to build and then you can render it

Starting point is 01:11:18 in a different way and say like hey the compiler can see everything now this is a so-called unity build um it'll take it forever but you'll get a really good build out of it you know nowadays compilers are able to do this kind of stuff without you doing so many of those tricks but they're still sort of relevant anyway we thought it was a great idea we did a whole bunch of technology it didn't work out we ended up making ends meet by doing consultancy for the only thing we knew what to do which is video games so i got to do a tour of duty at a few places around including rockstar which was cool to to work uh with with those folks and see the see some of the code uh yeah some of the code which you're like wow you make a lot of

Starting point is 01:11:57 money out of this code uh i'm i'm very glad you do because i wouldn't want to work on it myself it's really complicated looking and full of bugs and oh gosh. But those were fun times and we really enjoyed them. But yeah, like anything, you get a window into another person's world. You know, I'd been at essentially a monoculture at Argonaut. It was a big, big company for the time. But, you know, it was only one viewpoint about how to do things. There were teams that differed, but going into a whole other company and going, oh, gosh, you developed very differently was eye-opening.

Starting point is 01:12:32 Right. And how long did you all run Profactor for? I'd have to – it's a few years, three or four years, I think. Yeah, something like that. This is where I would bring up my linkedin and go and look i've got such a bad memory it's like i don't have to remember anything more the internet holds it for me right right so it was a few years um and you know we were doing we were doing fairly well we had two products out that were actually under our our own name um they were essentially small

Starting point is 01:13:00 pieces of this big project that we were doing one was like a a c++ code formatter which sounds very you know you could a few regular expressions surely is all you need but secretly it was our way of actually parsing the entirety of c++ into an intermediate representation that we could then re-render out like we would re-render it for the compiler or for you know various different optimization things except that we could re-render it out and change the white space right that was an easy thing to do so that was our way of getting in a marketable product that was a plug-in for like visual studio and you know some folks bought it we did all right but not enough to keep the lights on really and then some other

Starting point is 01:13:38 stuff that came out um around um following include paths and things again sort of thematically correct for our mission but not the actual thing we wanted to get out and then towards the end of that i i had a friend at google who kept going to the pub with me and sort of saying i really wish i could tell you what i was doing but i can't because i am not allowed to tell you and after a few years of this you know the interest gets peaked and you're like well right all right well maybe all right and so i applied to google um and probably the hardest conversation i ever had was telling nick uh our little partnership was like uh i'm gonna go work for google i'm really sorry right um luckily he still talks to me uh he works for deep mind now actually he's

Starting point is 01:14:21 doing some really cool things that he can't talk to me about now so he kind of gets me yeah um and i went off to google and uh immediately got handed a nokia phone an early nokia phone and told youtube needs to work on this can you make youtube work on this and this was like before people had data plans it was like hardly any phones even had wi-fi on them so we're like who is the target market for this this 320 by 200 pixel screen right you know and what is who who even uses youtube you know wasn't that huge of deal at least in my life back then and so my my life was was optimizing and trying to get essentially game level trickery amongst other things a lot of other things as well but to get the video to decode reasonably well which mostly meant liaising with the hardware because these things would have hardware mpeg decoders in them and that kind of stuff or but more notably it

Starting point is 01:15:13 would be more like going out to san bruno where the head office of youtube was and then groveling to the people who do the service to say yeah this is one phone and it's it's too it's not powerful enough to do full software decoding and its hardware mpeg decoder is broken and switches red and green around um can you transcode all the videos in a new format so that red and green is mixed up just so that this stupid phone can like because we wouldn't have the cpu time to like switch them back right right and then so like that eye rollingly all right fine um so that they actually ended up doing that yeah there were a couple of things like that a couple of workarounds i don't know that they do and they would do it on demand it would or when a video had triggered so many views it was it was an fascinating experience to see how that stuff was done behind i'm sure it's vastly different

Starting point is 01:16:03 now now 15 years on but like back then it was like hey you could actually log in and i could sort of ls the directory that videos were in and kind of look at them and like wow that's they're just files and you're like blows your mind it's like well of course they're just files but like what were you expecting but right but still so yeah so i spent a couple of years um doing various cell phone based youtube things so if you used a non-iphone non-android version of youtube back in the day either we had like j2ee which was like the java thing that ran on or j2me sorry that ran on phones um then yeah you probably used a bit of my code and then we did latterly pick up on the android stuff that was developed um over in mountain view but so i was in london still at

Starting point is 01:16:51 this point in time so i'd had a you know google's a was a fantastic company probably still is i i don't know i don't want to make too many comments about that kind of thing right um but was the uh it that's pretty different environment though from your first you know organaut seems like it was a relatively large company but not to google sale and not google like a couple hundred people at its peak right you know i still knew pretty much every single person in the organization especially having been there eight years but yeah google was you know hey there's 20 000 people you you know even on the floor that you're on there are more people than you'll ever be able to recognize you're like wow it was it was

Starting point is 01:17:29 mind-blowing was that it was also uh what was that a uh kind of like um informative experience for what you wanted to do later like did did you have the experience of oh this is kind of big and i think i i prefer something a little smaller or was it you know, this has its own trade-offs and there are pros and cons of each? I don't think I had that level of introspection going in. I think latterly, when I rationalized my decision to leave, I think that some of those things folded into it. But certainly to start with, it was just amazing. It was, you know, you'd felt like you'd been given the keys to the chocolate factory internally google is so open you know there wasn't really much information about how anything was done and there weren't so many of the white papers out about how their internal stuff worked and so to be let a mock you know free in and watch all these

Starting point is 01:18:20 videos and learn how queries were handled learn how they were doing locking at scale, learn how they were doing some of their fleet-wide profiling and the fact that there may be a person somewhere who's shaving one cycle off of a mem copy and knowing that that's worthwhile. I think one of your earlier guests was making these kind of mentions that that's something you can only do when you have like cloud scale and Google were early in that. And it's like, wow, how amazing is that to be doing that kind of work?

Starting point is 01:18:55 You know, it's it's it's bonkers. But yeah, it was great. But then, yeah, it was kind of a comedown retrospectively to realize that I couldn't move the needle. I couldn't move the needle at all. I mean, I was still relatively junior in my world views about how things were even 15 years ago. Because I'd been lived in this sort of very cloistered world of games. And then I was like, oh, I don't even know how software is really made in professional big companies. But it's all the same. Anyone who's big companies, but it's all the same. Anyone who's listening to this, it's all the same.

Starting point is 01:19:29 But yeah, so I realized that, yeah, being at a satellite office, being in London, which was a big office, but mostly was marketing people, salespeople, and a reasonably large uh division of programmers but they were all essentially to cater for the european phones so it was very mobile centric and so we were seen as a sort of strange backwater in some ways it was hard to even within that make a difference and then it's certainly harder to make a difference within the company as a whole and you know the two or three times that i would pop over a year to to mountain view or san bruno or montreal whatever um you know you could feel that you were making more of a difference just having a few conversations than you were you know beavering away sending um change lists to to people and so yeah i think again that was a post-hoc rationalization when i decided to leave um And I ended up in finance. I had a friend who had

Starting point is 01:20:27 left Google about a year before. The pair of us had worked on like an open source meetup that Google sponsored. And so we'd bring in people and we were chatting. And so that's how I knew him. I didn't work with him directly in Google, although we were both worked for Google. We were both like organizer type people that were happy to stand up in front of a bunch of people and talk so we would do that and we would get people in from the london open source community and we'd have presentations and laughs and drinks and all that kind of good stuff and then he left and i didn't really know where he went but i sort of took over open source uh jam with some other people um i don't know if anyone listens to this now i'm gonna be like no i did it sorry yes um and i didn't really think too much of it until about a year later presumably

Starting point is 01:21:09 around the sort of non-solicit end of a contract end he he out of nowhere reached out to me and said hey matt you should probably come and talk for you know come come and have lunch with me i'm like what are you doing you went to like finance i don't know about that and he said no just trust me come for lunch and so i went and met him for lunch and i went around this office and i was like wow you're solving really interesting performance problems um this is not what i was expecting finance to be at all i was expecting you know huge database query type things and all that nonsense but like no there are people that are solving difficult computer science-y problems. And maybe I am interested in this after all. And I went for an interview and they said, sounds great, but not in London.

Starting point is 01:21:53 Come to Chicago. So I did. And this is where I still am now, 13 years on. It turns out that that very thing that you were saying earlier about why on earth do we need to know how computers work these days and they are with these huge data centers full of machines doing whatever that is true for 99.9 of the world but for the point one of the remaining world that is the finance industry or the hyperscalers doing their web serving probably as well in fairness but we care

Starting point is 01:22:25 about that stuff and so suddenly i've been thrust back into the same joyous position that i started in when i was 10 15 years old learning assembly to get more sprites on the screen and coming up with crazy ways of like jiggering around things to get another one cycle's worth in my loop for performance reasons. Except that now, instead of cycle counting on a two megahertz machine, I've got the fastest CPU that we can throw money at, cooled as much as we can possibly cool it with all of the trimmings turned on

Starting point is 01:23:00 and all of hyperthreading's turned off. Why would we want hyperthreading? That steals away from the cores that we carefully have crafted to do that thing. You know, we carefully manage our thermal stuff, you know, like pin to these cores, isolate them in the operating system. Don't run anything on those other cores because if you do, it heats it up and then we lose power from the other one. You know, that kind of nonsense.

Starting point is 01:23:19 And you're like, wow, this is fun again. Now we're right back to where we care about what's really happening under the hood. And obviously that's, even in our world, that kind of excitement that I'm demonstrating represents 0.1% of the job, right? Everything else is just like everyone else's stuff of like, well, we still have to write the tests. We still have to write the code

Starting point is 01:23:39 and someone has to write the build system and we have to kind of deploy it and we have to make sure that it's right and all that good stuff. But yeah, every now and then you're like okay how are we going to make this go fast and right knowing how the hardware works at a deep level even though most of the time you're floating above seven or eight layers above it abstraction layers above it is still fun and exciting and that's when i started looking into micro architecture so i picked up on

Starting point is 01:24:03 the thread that i dropped when the intel engineers had told us to just use vtune and i was like no no there must be a way of understanding this is tractable surely surely somebody has worked this out and by this point people had started seriously reverse engineering how intel processes work and that was an eye-opener for me and the fact that they then published how they did it and you was an eye-opener for me and the fact that they then published how they did it and you could learn tricks and techniques for taking the chip inside your computer and like running experiments and going well this must be what this thing is then wow i'd never really thought of that and so that was a huge uh moment in my life of like going wow

Starting point is 01:24:39 we can understand this we can rationalize it we can even measure it some of the times with intel's own tools that they don't really specify very well for obvious reasons but right yeah exciting exciting stuff and so what what are some of those like resources i want to talk about the the finance uh world because i think that's uh particularly uh opaque especially to folks on the outside um which there's there's, there's probably, that's probably going to impact maybe some of the things we can talk about, but to some extent, yeah. But I mean, yeah, yeah. I've had, I've had some exposure. Um, I went to university in St. Louis and, um, and so we would go up to Chicago to the high frequency trading firms and they'd have like these competitions where you, it was basically like algorithmic trading competitions and they do a simulation um so i got a little

Starting point is 01:25:29 bit of exposure but i am interested to dive into that but i would be remiss if i didn't dig in on you mentioned some of those resources um that you've been able to use to kind of do some of that reverse engineering and experimentation uh what what are some of those well so the first one is the sort of the bible by agner fogg who is a sort of very uh interesting person from a some nordic country i think he's a professor of of um of something unusual it's not actually computer science or anything like that it's it's some some something else but he's got a passionate interest in reverse engineering and he's written these pdfs that are like fully take take a parts of the the pipeline of all of the major revisions of of the intel pipe uh intel um line of chips you know starting from the the earliest pentium 3 all the way through to modern

Starting point is 01:26:18 day core 2 type processors and you know he explains everything that he's been able to work out in a very accessible way and like it's one of those things where i don't know if you have anything like this in your life where once a year i reread it anyway just though even though i think i know it there's stuff that i miss and i've got there's two or three books that fall into this category i've got bjarnestrewstrips like tour of c++ which is a small book but every time i read it i go oh i don't think i knew you could do that you know it's a huge language right um another one is agner fog's um performance manuals um i think the third one if you go to his website which is delightfully 1990s era white

Starting point is 01:26:56 website with the most disgusting background color and animating gifs and things across the top you really honestly feel like you've fallen into a myspace from yeah the 90s or too early 2000s and it's you know it's a choice right that tells you who he is um so i'll read that and then there's also uh charles petzold's um annotated turing which is a fantastic book of just you know like learning where this whole thing started and how it came out of one person well obviously lots of people have contributed over the years but like there's such a defining story of uh how computers came to be in a very abstract way you know that's about as abstract as you can possibly get with an actual turing machine and it's infinite piece of tape and you're like well that's very different actually uh right right but um yeah so the resources that he gets

Starting point is 01:27:42 first of all you know he's he's he's done the research and he's got the receipts and he can show you the receipts, but he's also written quite a very accessible prose around how all these things fit together, what the various stages are, how long things take in general, what the various execution ports are on the x86, how many there are, what types of instructions go to which ports um how retirement happens how the register files accessed how there's and a lot of this stuff comes because like intel want to be able to tell you where the bottleneck is in your code they won't tell you exactly what's going on but there's probably a counter somewhere with a name in the manual which just says reg file stall or something like that with like number of registers file stalls and that'll all it'll say and then you can go well let's write an experiment how many instructions file stall or something like that with like number of registers file stalls and that little say and then you can go well let's write an experiment how many instructions can i queue up to access different registers that haven't been renamed which is another thing like so i'm gonna thousands

Starting point is 01:28:35 of nops beforehand so everything's out of the rename buffer okay let's try these things and go oh i can do four i can do five oh six. Okay, this counter started going up. That kind of feel, right? And so he has an open source project as well, which you can go and fiddle with and you can use to set up and tweak little experimental pieces of code. And so that's one of the main resources. And yeah, again, that's something

Starting point is 01:28:58 you can reread over and over again and always learn something new. Similarly, there's, I don't know the folks behind it, but the uops.info is a website that has essentially all the xml or json or yaml or whatever description of every single instruction that there ever is or was for every single architecture they could possibly run the code on and then you get like well this is how many cycles delay it is this is the reciprocal throughput this is all these other things these are which ports we observed it going through.

Starting point is 01:29:26 So this goes through ports 0, 1, and 2, but not 3 or 4. Those kinds of things. And then they have some of their own code as well, which at some point I will integrate into a website to make it available to all, that does a very good job of like a Python-based simulator of all of this stuff. And they've kind of done a, there's a paper out somewhere

Starting point is 01:29:44 that describes the process by which they went through the process by which they went through to get to the sort of almost one-to-one mapping with the real hardware which i thought was totally impossible you know here i am sweating over getting a 1980s era computer with like very very simple to be perfectly in sync with the reality and then they're like no we can write a python program that can simulate this tens of billion transistor monstrosities that we build these days right so those are some of the resources um and um yeah i mean i my own tiny tiny tiny contribution to this was trying to reverse engineer how the branch predictor worked under some circumstances.

Starting point is 01:30:27 One of those things where I thought, I'd read this thing on the forum over and over again. Oh yeah, the branch prediction, blah, blah, blah. It always assumes that branches backwards are true because they're presumably a loop and branches forwards are false. And I've got it in my head like, well, the thing is,

Starting point is 01:30:42 it doesn't even know that there's a branch there until it's decoded the branch which is actually five or six pipeline stages from the fetch and so it's already too late at that point so there's all these various different well you know if there's a branch here um and if it's a conditional branch you've already done all this work maybe you should just let it fall through right also how do you know if you've seen this conditional branch before or not? Because most branch prediction algorithms these days use some kind of hashing function that kind of hashes the branch, the pattern, the phase of the moon, yesterday's lottery results,

Starting point is 01:31:15 comes up with a number, and then it looks in the table there, and it doesn't know whether this is really for this branch or not. It doesn't store tag bits, because it's like, well, if it isn't, what am I going to do? i might as well come up with a guess right um and then so you know you think well if it doesn't know the branch has been in the table before or not that it's actually for this branch then how can it predict forward or backwards because either it's too late because it's already run through the pipeline they might as well carry on or it's got a prediction and the prediction it doesn't know if it's for this branch anyway so so i wrote a whole bunch of stuff about this and um i had access to like a really a weird server machine i had in my basement still in my basement in fact still my main server uh and

Starting point is 01:31:54 so i ran all these experiments and it found some really interesting patterns in the way that it um the branch target buffer which is i think a thing that one doesn't think about with branch prediction i certainly when i talk to to folks like in an interview setting we talk about branch prediction and it's always is the branch taken or not right that's what most people think of but like it's like is there a branch there at all is the question you need to ask before you even start fetching because like i said it'd be five cycles on you've like finally decoded the world and you've got oh there's a branch here you're like well too late the train's already gone down that route ahead of you right so you have to kind of predict if there's a branch there at all and then where the heck it's going to because decoding the destination

Starting point is 01:32:32 is half of the the trouble so and obviously a lot of branches are not conditional they are jumps or their calls or their rats or whatever and so trying to make that prediction happen early is what the branch target buffer is doing and then secondly if and only if it's conditional is it taken or not right but we always think about the conditional or not conditional thing so um anyway i was doing this whole bunch of analysis on the branch target buffer and then my one time micro claim to fame is when the the um the the paper came out about um and so yeah when the the paper for meltdown inspector came out i uh i got a little footnote in the the the as a citation saying like this is how some of the ways that you can predict where the branches are going to go or what or not and

Starting point is 01:33:18 i was like wow this is my first like proper security paper thing which cites me i mean it's literally like the bottom of the list of things but you know it was cool i yeah that's awesome i think that's very cool yeah um it kind of circling back around to um you know getting into the the finance industry and some of these performance qualities maybe like i don't i don't think you know i have enough context to even ask the appropriate questions so maybe even start from the uh like immediate differences in terms of the infrastructure uh and compute that you're using and how you'll manage that and how that's set up as maybe in contrast with you know at one extreme maybe you're using like a public cloud

Starting point is 01:34:01 provider but even for folks that are uh using you know are hosting their own racks and that sort of thing uh where does kind of finance start to diverge at that highest level so you know obviously we have a ton of normal needs and requirements and they have their sort of so we have our own internal clouds and things to run like big batch jobs and there's a lot of like you know data gets shuttled around and it's not latency sensitive or even particularly performance but um and and finance or certainly trading is a huge huge huge wide diverse um pursuit uh and it you know like in my current company we have some things that are we're trying to predict the future in months time. And then, you know, it doesn't really matter how quickly you predict something that's happening in three months time,

Starting point is 01:34:48 because you've still got three months to take advantage of it, right? You know, so it took 10 seconds. Sure, that's fine. I've written the whole thing in Python, it takes, you know, 10 seconds to run. And that's absolutely fine. No one's no one's going to bat an eyelid at that. Obviously, if you're making a prediction, it's five minutes in the future. Now 30 seconds, if it took you 30 seconds to make a prediction that's eroded into your prediction it's now like your prediction is already 30 seconds old by the time you've made it you're like okay i can see that's problematic so um you know and we might want to make predictions all these different horizons you know canonically you know like real estate folks will buy up large swathes of land

Starting point is 01:35:22 and hope you know and that's one that's a perfectly valid thing to do and hold that for years and hope that it goes up in value. On the far other extreme, you've got low latency traders who are more colloquially known as high frequency traders, which is sort of less true because you can be you could trade once a day. And if it's the right trade, you can make a lot of money if you're very low latency. But, you know, trading a lot isn't always a good thing although there are strategies that do do that but at that point you are peering down and microscoped every single packet coming in and out of your network so the way that most financial institutions like exchanges the places you can buy and sell shares or options or futures or whatever that work is that you have usually a tcp connection to

Starting point is 01:36:06 the server so like a regular a bit like a you know web server style thing but it's a persistent connection with a relatively simple protocol to say i'd like to send an order and then it would say congratulations you've now you're now the proud owner of 100 shares of google you know thank you very much it cost you this much whatever that kind of thing right so that's on the one hand now the public exchanges that are uh so-called lit and not in the youth term of like awesome and cool lit but like um not dark if you've heard of dark pools and dark exchanges that kind of thing that means that they actually advertise and publish the information about what's going on inside their exchange in real time so every time i place an order

Starting point is 01:36:43 it's a bit like going on ebay and registering that you would like to buy something which is not actually what you do on ebay i guess you register you want to sell something and you put a price right and then maybe you've got a buy it now price and that means that i can then look at it after the effect after you've placed it and go oh i will buy that actually and you click the buy button and you get it right so but there are sort of two stages to that one stage is you register that i would like to buy it or sell it at a particular price. And then if that happens to match anyone who's currently on the system and they're buying and you're selling and the prices agree and they're all the better, then there's a match and the trade

Starting point is 01:37:17 happens. But if it doesn't, it goes on to like a bulletin board of like, here's what everybody wants to buy or sell. And that's what market data is it's the information that flows off of the exchange that says here is the interest to buy this particular share somebody would like to buy a hundred shares of google for a hundred dollars you're like i bet they would because the current price of google is like a thousand or whatever it is right you know um right and there's nothing really to stop you know there are certain people who can place these orders in the market it's not everyone you can't just go on you can't register on your fidelity account or your you know your robin hood account and do this but you reach certain criteria

Starting point is 01:37:52 and then you get this tcp connection and you get this um data stream which is essentially um everything that possibly happens if you think of it as a database of orders that the exchange is holding every ad remove every trade every modify every exogenous event that could possibly happen on the exchange that affects its internal state is broadcast literally broadcast or in fact multicast to all interested participants and then you're expected to update your internal idea about what the market looks like from that change so you know you're trying to keep your internal database up to date with what's really going on in the exchange and then you run your magical mystical algorithm over it and go oh i think it's mispriced and so i'll go and buy it

Starting point is 01:38:35 or no actually i will join the market and i will also say that i would be prepared to sell google for a thousand and one dollars or whatever and you know that's where the real magic happens and then clever maths people work it all out and then they they tell me how they would like it to work and then then i get involved again right i don't get involved in that bit um and there are there are a set of things you know like there are certain things that are very much like you can boil that information down into signals that you can feed to a machine learning system which then churns out some expected value and then you can make a decision machine learning system which then churns out some expected value and then you can make a decision based on that expected value and that tends to be somewhat slow

Starting point is 01:39:10 because you're doing some level of post-processing on that data and maybe you're matching it up with other markets and other symbols and other things that are going on and you're throwing it through a model that's relatively expensive to to to operate and then you're making a decision and you're turning it around and then you're sending an order say hey i'd like to buy this and you know at that level you might be talking about hundreds of microseconds which is you know a long time in our world but also not a very long time in most other people's world right or it could be milliseconds even or whatever but um and then as you get down and towards um trades that are that require less finesse, less inference, and they're more like, well, if the price of Apple go up buy all of the other tech stocks and then hope that you get in before everyone else does and you buy while they're still low before they've actually caught up with the price of apple

Starting point is 01:40:12 assuming that's a valid thing to do again this is not financial advice please consult right um but these are the kinds of things you know and at that point we call those lead lag trades where there's a very obvious like economic reason for two things to be linked. And then the only reason they're not linked is either because something idiomatic has happened in the world, like, I don't know, Apple have just cancelled their self-driving car thing. And now, whoops, it's not the tech sector that's going up, it's Apple that's going up. And now you're left holding all these shares that you didn't want. And that's a risk you have to take as a someone who's trading or um you know apple went up and then it's a race between you and everyone else who knows that when apple goes up google is going to go up as well right and then

Starting point is 01:40:55 you know there it's now you're playing now you're back in the game video games of industry where you're like well everyone's got the same dreamcast because everyone's bought the same high-powered computer everyone's bought the same high-powered computer. Everyone's bought the same high-powered networking card and they're using the same tricks to access the network card through kernel bypass. There's no kernel involved at all. They've all got the same fast switches. They've all paid the exchange the same amount of money

Starting point is 01:41:16 to get the same length of fiber optic cable, I kid you not, so that you have essentially a level playing field, level amongst all the people who can afford to do all these things, like essentially a level playing field level amongst all the people can afford to do all these things right but level nonetheless and so the only thing that remains between you and the the other person down the road at jump trading as opposed to you know hrt or whatever the trading company is how smart can you how fast can you make this go right how can i craft this to be faster and there was a time when that was all cpu all the time and that was really that was kind of the i came in

Starting point is 01:41:52 sort of the middle to the end of that part part so like a lot of stuff that i was doing was 100 cpus it was these exotic network cars these exotic kernel bypass things and then during the time that i was there people started going well you know what's even faster than the cpu well it's not faster than the cpu but if you're only doing if this then this and you've got network packets coming in we can do this in hardware and we can push it out to the edge even further and have an fpga do this and then you're into the world of like well something you could never do on a cpu is like hey by the time you get to the 15th byte of the packet coming in you know if it's a buy order or a sell order and you can start going oh and you start sending a packet the other way so that as the light the laser beams on this way you started turning on the other going

Starting point is 01:42:36 well maybe we'll want to sell something on this and then you get to the end of the thing and just make a decision as it's flowing through to say okay yeah now we'll buy now or uh actually no let's not do that and put something at the end either i mean you're not not allowed to corrupt packets or anything like that but there are ways and means of of like getting to the end and going i didn't mean to do that actually you know i jumped the gun a little bit but that's how folks are able to get down to nanoseconds between an action coming in and their reaction going out is they're actually pipelining between the incoming and outgoing events which is kind of mind-boggling right yeah that's that is fascinating i uh i've i've had a kind of personal fascination with fpga is mostly

Starting point is 01:43:17 because you know it gives you that world into uh micro architecture and that sort of thing absolutely yeah without having to fab a chip which uh turns out to be uh it's getting easier but it is it is actually i was gonna say there are there are ways and means these days you know like yeah but but it's still not as easy as just like plugging a little usb thumb drive like thing into the side of your machine running some open source software and having the lead blink you know go oh that's cool right i'm a hardware designer right right and you know in terms of the kind of like uh you have a pretty uh a large distance in your stack there right you have you have kind of like the interface that i'm sure folks that are doing

Starting point is 01:43:58 trading or perhaps some of the um folks designing models you know need to be able to interact with all this data that's being maintained you You have, you know, typical networking software and that sort of thing. You might have, um, some of that kernel bypass side of things, and then you're doing like RTL on the FPGAs and that sort of thing. And, you know, I'm sure this varies quite a bit in the size of the organization and, you know, just the organizational style, but is it typical for, you know, engineers at, um, trading organizations to be kind of working up and down that entire stack i don't know how typical it is um actually you know certainly organizations i've worked in have had folks who specialize in in different areas of that you know

Starting point is 01:44:38 you've got the folks who are you know usually the fpga designers are their own breed they're with i've got two noteworthy exceptions to that who are both software engineers and i think they first and foremost and then they went into the a bit of hardware design and they are absolutely you know it's fascinating to see through their eyes because i think you know if you've been brought if you've come from the hardware design standpoint you're used to certain things like the aforementioned almost infinite build times the right really very rigorous testing the extremely um process driven way of doing everything the very regimented source code you know you if you if you can see like a

Starting point is 01:45:18 dyed in the wool uh vhdl or verilog engineer because all their comments line up beautifully and everything is formatted within an inch of its life because if your compile is going to take 14 years it may as well be beautiful right or seemingly that seems to be the the rationale between behind it um and then if you come into the software engineer you're like immediately this is terrible i hate everything about this and you start going what can i do to make this better and then you start discovering like these python based projects that can do simulations so that you can run your tests using Python and async stuff in Python and then interacting with the Verilog simulator.

Starting point is 01:45:52 And it's just a better world. And these folks go look over, you're like, what on earth are you doing over there? Surely you should be writing lots of system Verilog and then writing out thousands of lines and then going home and coming back two days later over the weekend and looking at the result. And they know like no i just i'm i've got too much adhd tendencies to be able to have the patience to do that so it's been fascinating seeing their journey

Starting point is 01:46:14 go through that and they've been very successful and i think you know the folks actually behind the was it coco tb i think is the name of the python project that i alluded to i think they also had a similar like software engineer first mindset and i don't mean to impugn the hardware designers who'll be listening to this but it's just really interesting to see a different perspective of it and understand you know like the the trade-offs and also i think for us as software engineers to learn the humility of like how how long this process and how painful this process is and like how less how much less cavalier you can be about testing for example when it's that expensive to find a mistake and fix it um compared to oh i guess we just cut a new build and do it again you know like oh no we've actually gotta go through another

Starting point is 01:46:59 two two night build process and pnl and then all that kind of stuff so and if the uh you know when you are getting to uh levels where you know you're you're doing things on the nanosecond scale that i imagine you know when when new hardware is released that it's pretty important to evaluate that and decide whether to incorporate it right if it's going to give you a competitive advantage. Now, if you are, well, you know, there's the FPGA hardware as well, which could be a separate conversation, but let's just, you know, focus on maybe like CPUs or something like that. How often are y'all turning over hardware in that environment? Because I imagine that, you know, as soon as there's something better, it's, you know, optimal to move over to that system. when the in the work that

Starting point is 01:47:46 i had done before and without going into too much detail like it became increasingly less important we had moved to fpga stuff and then the the the speed of the cpus was more like how quickly can we reprogram or at least configure these fpgas to do the thing that we're we want to do i mean this is i think this is fairly common. Folks gravitate towards an FPGA design where you have like essentially a CPU, a software-defined CPU that's like extremely tailored for deep packet inspection and if-then-else kind of state machine type things.

Starting point is 01:48:21 And then the else is, here's a block that I need to be sent out but because you can't really make any deep maths you can't do any huge maths mathematical things in that in that you're really looking for particular key characteristics of the messages coming in and so behind the scenes there's the clever program written in c plus plus or whatever that's doing the real thinking and then going like, okay, I need to continually update and resend these if-then-else rules because I can see the big picture.

Starting point is 01:48:49 I know that a move in Apple more than two ticks will mean this kind of message will come through with the byte three being this and byte seven being that. That's what I need to get over to the FPGA because it's too dumb to really understand what's going on. It can only look for like, you know, regular expression style things. I just keep changing the regex to find the thing that i want to find and then hope that it is actually

Starting point is 01:49:08 finding that signal when it comes out of the noise i mean again i'm trying to blur it a little bit because i'm a bit vague on like like how much i should be saying about this stuff and i don't do this anymore for what it's worth my current company i've moved on from from the company where i was doing the lower latency stuff and it it's much more quantitative trading, so it's a bit longer term, but it's still important to be fast. Anyway, so to your question about whether we were always on the cutting edge of CPUs, we weren't, actually.

Starting point is 01:49:33 It was relatively expensive to make those changes. These things have to be put in physically co-located data centers next to the exchange where they're trading for all the reasons that the cable needs to be the right length and all that kind of stuff. And you normally need a lot of them you know you've got like 20 or 30 servers in a rack with these super fast switches and these careful cut through things and these companies that make um almost like a physic physical based um switch technology so they can you can split a beam into send one off to one machine one off

Starting point is 01:50:06 to another machine so it's not even really a switch in between them they both get a copy of the data or you know one goes off to your packet capture system and one goes to your you know your trading system so you always have the exact thing that happens so you can do your simulations later and all that kind of good stuff and so like the changing the machines out where you know that each you know you've got 20 of them in Iraq And they're all like 25 grand each That's a significant Outgoing That's not to say that we didn't do it

Starting point is 01:50:31 And you know there was definitely experimentation With unusual Hardware So again without going Into too much detail there but I'll talk about One thing that I thought was an interesting one in terms of what it was Sure So the there was a chip called a tyler which was a

Starting point is 01:50:50 relatively simple 32-bit risk cpu except it was a grid array of them on a single die and they were like either 64 i think 64 of them something like that or 16 of them i think there were 64 arranged in a you know 8x8 grid and the peripherals hung around the chip on the outside and so the the sort of eight at the top and the six and then the eight and the six whatever however you want to think of it the peripheral literal peripheral cpus could talk to the pins on the way outside you know they were all fully functional right they all had access the way outside you know they were all fully functional right they all had access to ram if you wanted to and all that kind of nonsense but a way of configuring it would be to say well i'm going to run linux on these top the top two left

Starting point is 01:51:34 left hand corner ones um the rest of them are uncommitted and then i'm going to run dedicated programs on them and some of their registers would be like north south east and west and if you wrote to north it would block until the processor above you had read from south so maybe with a small fifo in between them something like this um there was also an on-chip network where you could send messages through uh to a particular cpu cell and it used like new york taxi cab routing algorithm of like if if no one's reading or writing from north or south, then I'll go north or south. I'll go east and west until I'm lined up left, right or whatever. But anyway, what it allowed you to do was in software, do the kind of things that you do or you have to do naturally on an FPGA or an ASIC based solution.

Starting point is 01:52:18 You know, effectively, each of these things was a software pipeline stage and so you could sit there and be like okay the ethernet chip is up here and it writes 64 bytes or 64 bits of the ethernet frame to the east every time it comes in and the next the next program is decoding the ethernet frame looking for the ip header and then once the ip header is good it then starts passing the udp payload to east and then the udp payload gets to the next guy and he's like adding like looking for the particular things and decoding and then going well i'll go south if it's this kind of pack or east if it's another one or north maybe and then you can kind of actually define a physical route around the chip to get to a place where you are able to process particular sequences very efficiently because every clock cycle another 64 bits is going through or every other clock cycle or whatever it was

Starting point is 01:53:09 and that's very similar to how you have to think about the world when you're doing hardware because everything's parallel you know like every transistor is its own little computer and you don't really have much choice about that you know and in fact we have to kind of impose our clock based will upon it rather heavily to make it look like the kind of thing that we're expecting where everything moves along one step at a time and that this is an aside but it was always a thing that made me laugh once i spent some time with our fpga engineers and really started i believe to grok the way that they thought about the world the way that you have to do things and the way that you can get this amazing speed up

Starting point is 01:53:46 if you do it this particular way on an FPGA, then we would have people come in and say, like vendors would come in and say, take your C++ code and compile it to FPGA, and you get the huge boost of speed. And I'm like going, the compilation is not the problem. Which language you specify in it is not the problem the problem is you have to think about it in a fundamentally different way and anyone who's

Starting point is 01:54:12 trying to write c++ is not thinking about how to i don't know uh do a 256 way hardware lookup because you're willing to dedicate 256 comparators or however many you can multiplex in and just go well this is fine like nine tenths of my chip real estate is this set of comparators but you know what in one clock cycle i know if it's interesting or not right and you can't do that in c++ um or any high level language really the other than than than these HDLs. Yeah, I feel like the area in doing RTL myself that really took a while to get used to is if you chain more logic together,

Starting point is 01:54:57 the propagation delay is going to increase, right? I know. You don't really think about that when you're writing a sequential program or something like that. I mean, you obviously think about that when you're writing uh you know a sequential program or something like that i mean you obviously think about the perhaps the number of instructions that are maybe i mean maybe you do but yeah yeah i mean that's that's bonkers you know uh you know i think your first guest philip was talking about like the ripple carries and then the kind of look ahead things and then there's you know if you start going down that wikipedia minefield of like

Starting point is 01:55:24 uh like oh what about this idea what about this i and you think about how do they do multiplies oh my gosh that's even more complicated and how how do they do divides and that's one of my favorite things actually to teach you know incoming sort of fresh faces is to sort of say um you know give me your best guess as to how many cycles these things will take and then you sort of go through the list of things and then you say integer division and they're like i don't know 20 you're like well maybe 200 uh it depends actually the latest revision of intel processors are now down to like teens again i think for even 64-bit divisions and i just i would love to know how they're doing it you know or maybe somebody just screaming into that again to their headphones right now that like it's obvious but like it has long been like the thing that i just

Starting point is 01:56:07 think you know because we can do a floating point multiply or a floating point division in don't really think about it anymore now kind of level of time as opposed to back in the games industry where it's like everything was fixed point until floating point became you know commonplace um and to think that you can't you know do a division and you think well when do I do an integer division why would I care it's like well every time you use a hash map you're modding with the size of the hash map most of the time and that's a division with the remainder and that's actually kind of expensive and you're like oh I hadn't thought that yeah you're like yeah right right it's like if you know a total aside if you look at the um implementations of really fast

Starting point is 01:56:45 hash maps they usually have a switch statement for they do switch on like the how big is my table they don't store the size of the table in terms of like is it like five you know one oh two three you know whatever appropriate nearly power of two but prime size they switch on the ordinal value of which it is. Is it 13 or is it 252? No, that's obviously not prime. Sorry. Whatever. And then they just do return x mod that.

Starting point is 01:57:14 And so the compiler sees it's a constant. And so you're trading off, and the compiler then can do magical tricks to make it not actually a divide. It's modulus with a constant, which is a division with a constant, and there are tricks to use multiplies and other things that are much, much cheaper. So these fast hash maps are going, trading off on the,

Starting point is 01:57:32 there's a branch predictor mismatch maybe because I have to jump to the right sequence of instructions, but that's faster than doing the darn divide in the first place, which is just like bonkers. But nowadays, maybe it isn't. Who knows? The number of instructions you you see and this is like getting towards like the uh perhaps the a destination for where the heck we're going in this conversation but like the

Starting point is 01:57:55 number of instructions isn't necessarily a great indicator of how fast things are going to be right you know that these things like divides will take longer or maybe they won't these days you know it's it's yeah it's fascinating how how uh how complicated these things we've built are right i i'm curious you know one of the things that um i've kind of uh in having this experience of talking to folks uh who you know worked on uh processors in the 70s and 80s and kind of where we started this conversation as well about talking about the simplicity and the elegance of them and really the determinism, I feel like is the key thing there. And when you start to see some of the vulnerabilities, you mentioned Spectre and Meltdown, you kind of at some point start to wonder, are we actually making progress here? And obviously, there's been lots of improvements due to some of these microarchitecture concepts.

Starting point is 01:58:54 You mentioned branch prediction and pipelining and some of those things. But I'm curious, in your own experience, do you feel frustration with the the increasing level of complexity and do you think there's uh perhaps like a a ceiling where we're actually getting perhaps diminishing marginal returns from from continuing so that's a really really interesting question i mean i i do honestly miss the days when i would have the hardware manuals open in my lap and then you could make very strong guesses as to what would happen you know like i know how many cycles this device is going to take i know how many cycles it takes to draw a triangle this big so i can do something and then i can go

Starting point is 01:59:34 back to it when it's finished those were great times um but that's that was eroding even towards the end of my time and so the games industry because people wanted for commercial reasons actually in this particular instance people would like interpose well we want to put like a kind of operating system so that we can have a pop-up display above your game and you know show that your friend has just logged in and all this kind of stuff you're like oh wait i'm not in control anymore yet no no no no you're nearly in control but we have this thing behind you so you know we've started to lose that determinism even then, although it was still fairly deterministic for you. But the sheer gains that we've gotten and every time I think we've reached the point where we couldn't possibly squeeze any more out of it, somebody clever does something else. And you're like, oh, wait, oh, that's smart.

Starting point is 02:00:18 You know, registry naming. That's clever. Now, suddenly it doesn't matter that we have a puny register file because, you know, it's actually as big as we can fit onto the chip or, you know, branch prediction. Hey, we can we're so good at guessing where you're going that we can afford to have 100 plus instructions in flight, even though the vast majority of them, we have no strong belief that we lose the determinism but we go so much faster so much of the time that uh it does seem to um undo the the harm but then you know you expect again spectrum meltdown and the difficulty of solving those while also maintaining um the performance that we've come to expect is so tricky. Yeah, I think, you know,

Starting point is 02:01:12 I think Thomas, it was you spoke to about like VLIW and Itanium and there was some sort of like sensitivities around the failure or not of that. But, you know, one of the things, you know, and this is coming from somebody who's made a sort of side career about saying how clever compilers are and how we should trust them to do everything smart, right?

Starting point is 02:01:27 I don't see that there are enough ways for a compiler to be smart enough, given how dynamic the flow of execution is in most cases, at least in my experience right and i've seen um i can't think what the heck the the belt computer with these straight it's like almost like conditionals built into the instruction where you can do one or this or that and obviously the arm had its beautiful originally at least it's you know conditional stuff so that you could like do some clever things with with that but like really um nothing beats the ability for the silicon to just go well i can i can try all the paths it's almost quantum like i will go ahead of you and i will start looking and i will make guesses and as long as the guesses are better than even we're still better off than me not doing the guesses at all as long as i can afford the silicon and obviously that's where the trick is it's not really the

Starting point is 02:02:16 silicon it's the heat that it generates when it's running and the power that it takes and and that kind of stuff which then limits how much could be on at the same time and all that kind of stuff but it's yeah i've been remarkably surprised how how often the next generation comes out it's still faster somehow you know we've got however many levels of cash you're like how could this be helpful like there's so much going on between and then you're like you learn that each level of the cash has its own independent pre-fetching unit that's like also intuiting from the flow of instructions and the flow of misses where you're going and starting to run ahead of you you're like good grief there are so many little robots running around making their own decisions in here it's a miracle it works as well as it does but there's doesn't seem to be much sign that it's slowing

Starting point is 02:03:00 down despite you know the fact that i don't really like that i can't easily tell what's gonna happen right it does it does feel like you know you mentioned uh kind of like the the heat issues which you know eventually uh kept us from continuing to clock processors faster and faster and faster where's my 10 gigahertz processor right you know that never happens right and you know there's there's other things that pop up like i know as we like shrink the process node the issues with leakage and things like that you know start to happen with transistors we start getting quantum computers even though we don't want them exactly exactly so there's like the the physical aspect of it you know alluding to you know your earlier statement about there's always a lever beneath your abstraction no matter how low you are.

Starting point is 02:03:47 The other thing that's kind of been top of mind for me recently, I guess, is, you know, if your workloads fundamentally change, that's another reason why you might rethink your architecture. And I think, you know, I was talking with Thomas about this a little bit, and I don't know if you've seen some of the discourse recently about Grok that just, I don't know if it's new, but they came out with this like language processing unit. interesting combination of like highly parallel problems but also a sequential nature of you know processing tokens uh in order where you have dependencies between them um and that's it's kind of like a uh uh driving some of these new architectures i think is interesting and i think in some of those they are pushing more onto the compiler, but you have to take into context there that the compiler might be compiling once for a model

Starting point is 02:04:49 that runs for a very extended period of time as opposed to compiling a new build every 30 minutes or whatever. So it seems like there's lots of different vectors to consider. It's the dynamism of what the user is going to do in the case of user-based models or whatever, and the fact that the compiler can't guess, right? So taking the branch prediction side of things here, there was all this brouhaha about,

Starting point is 02:05:18 well, maybe we can flag the branches as likely taken, likely not taken, or you can have all this branch prediction hinting in there. It's like, well, yeah, but it'll never know that this branch this loop is always taken 64 times and um until it isn't and then it's taken 128 times and then you know it or you know even you know i uh you know so i'm known for c++ but um you know folks like to compare languages and java has it's both proponents and detractors and the last thing i ever want to do is fan flames between the two because there's some amazing things that java can do because java takes this the sort of like this predictive

Starting point is 02:05:55 thing into software and so you can and as does you know like javascript in browsers and anything that has like a modern jit these days can kind of go i can notice regime changes in line and kind of like oh yeah well you know this happens until this thing stops happening and then we can adapt and the program can re-optimize around that and you know people in the c++ community may say oh but we have profile guided optimization we can run our system we can profile and we feed it back to the compiler and a compiler can make smart things i'm like yeah right can you give me two binaries so that halfway through the day when we get to midday and everything's like now instead of it being am it's pm and whatever and that branch is now the other way around or whatever it's been the whole

Starting point is 02:06:36 way through can you flip the binary at that point they're like oh no you're like no you're still relying on the processor doing this right the processor can do you know you've all seen the the stack overflow post about the branch predictor with, you know, sorting the things means that, you know, the thing goes faster than not sorted. It's because like whatever condition you've got is 100% predictable until it gets to halfway

Starting point is 02:06:54 through the sorted array. And then it's exactly wrong twice. And then it's 100% right for the rest of the time. You know, you can't get that behavior if you've got a static compiler because the data is dynamic. And so maybe I'm still very skeptical about this. Maybe for certain domains, it makes sense. Maybe, you know, the kind of things you've got a static compiler because the data is dynamic and so maybe i'm still very skeptical about this maybe for certain domains it makes sense maybe you know the kind

Starting point is 02:07:08 of things you've described these you know i think transformers or whatever the the these ai type processes that compile a very different kind of program maybe there's a lot more statistical knowledge you can have and you can say well this is the inputs they're going to look this way we don't care that there's going to be that one dreadful input that if you feed it in it'll give you it'll be dreadful dreadful performance um which is so back to your sort of determinism thing that's actually an interesting aspect in like in the world at least one of the finance issues that we have isn't you know like these these markets are huge and you can come up with these amazingly optimized algorithms which like for the common case it's super fast but then in the there is like a terrible case you know it's like you know for example if you use uh an array to store uh the the list of orders

Starting point is 02:07:50 the things that want to be bought or sold because they are in a strict priority and it's useful to steal from the front and take from the back or whatever um then a common trick is to actually store it backwards because most of the action happens at the front of the book, IE the end out. And now you can pop and push from the back of an array. Everything else stays where it is. Hooray. You know,

Starting point is 02:08:10 like this is clever, right? But then some Joker does something at the back of the book, which is now the front of the book. And now you've got to shuffle the whole thing down one. And you're like, well, that's unfortunate.

Starting point is 02:08:21 And so you've got this. And in our case, when you're dealing with this fire hose of information that's coming over this broadcast if you can't keep up with the network data coming in you drop packets and then you've lost information then you have to go through a very expensive recovery process which means essentially you you can't do anything for like tens of milliseconds hundreds of milliseconds it's a very expensive very expensive operation um to do and so you have to think about your tail latency and so suddenly the predictability is sort of an important thing

Starting point is 02:08:51 and so these clever algorithms that concentrate on like the fast case is really really really fast but there's a terrible worst case is now bad for you and so a lot of the wisdom um for these kinds of things gets thrown out so for example one of these data structures i use a linked list and i am unashamed to tell the world that there are occasions when a linked list is the right choice because you could yes cash misses and they can be very expensive but most of the time these things are in in in the cash right and if they're not in the cash then you've got other problems um and um i can now it's order one right it doesn't matter what i do i can put things in the front i can take things off the back i can move things out the middle of it it's order one right it's not as fast as like just tacking 64 bits on the

Starting point is 02:09:34 end of an array of course it isn't but it's consistently okay and that's maybe a good enough right and so coming back to that prediction that you said with the compiler maybe that is fine you know if you don't mind having bad worst cases that are rare with your statistical model of what is going to go through which is essentially what i guess all compilers are doing this at some level they're having to use a heuristic of some description to kind of go i'm guessing this is more likely taken than not so i'm going to lay the code out this way so yeah maybe it's not as bad maybe i've just taught myself around to to saying that it's fine for some workloads well i think i think that's the uh i i think that's you know a description or an illustration of kind of like the problem space it's understand your domain right and and approach it accordingly so um i

Starting point is 02:10:20 think that that makes sense um i did want to uh kind of as the final sort of parts here that we explore in this conversation, uh, I'm, I'm very proud of us getting, uh, you know, two hours and change in here and we haven't mentioned compiler explorer yet, which I'm sure is what the majority of folks who, who clicked on this episode know you for. I suppose so. I would, I would love to, to you know just get a little bit of the um background uh you can also you know for folks who haven't uh used the site before i explain um what it is but also like the uh the background um on it and you know how you're able

Starting point is 02:10:59 to open source it and maybe what it takes to run it today as well absolutely yeah so um in like 2011 2012 ish i was uh at this trading company and they had a very old c++ code base and i was having an argument with the very conservative like he head programmer because i wanted to use this new c++ feature called ranged fours which you know is like what all other languages have for going over a container, you know, the equivalent of for i in thing. In C++, it's for auto x colon something, right? And it should be equivalent to iterating over all of the elements in the thing. And obviously, the thing is probably, say, a vector,

Starting point is 02:11:44 which is to say a variable length array it's just a pointer and a size is what it really is down under the hoods and the pointer points to the first element and the size is how many elements there are in them and so you know normally you get the size and you start your counter at zero and you work for through you know pointer bracket zero pointer brackets one and all that kind of good stuff and the compiler of course rewrites it behind the scenes to be like a pointer that walks along the memory locations one after another. And that's all great and good.

Starting point is 02:12:10 But it's a pain to write that. It's kind of error prone. We've all done things where we've used the wrong size, we've used the wrong kind of iteration or whatever. And so C++11 came along and said, we should make this a language facility. But we've been bitten by this before we were also had some java code and in java if you loop over um and again not to bash languages but this is just

Starting point is 02:12:33 a side effect of the way that java works at the time it may have changed since caveat caveat in java if you had a container and you looped over it using an index that was garbage free right you were just making an int on the stack and you were bumping it forward until you got to the end of the size of the container and you were accessing the container and provided you weren't doing anything it was generating garbage you were done right beautiful but if you did the equivalent of 4x in whatever i can i forget the java syntax right now behind the scenes it created an iterator object that was then the thing that held where I am in this object. And you called next on it.

Starting point is 02:13:08 And that's what was happening. So it was syntactic sugar for rewriting it that way. And at the time, there was a trading system that was written predominantly in Java. And they would train themselves into writing garbage-free Java, which is about as horrible as it sounds. It takes all the benefits of a really useful and easy-to-write like java and throws them away and tries to write c code in java but without any of the benefits of like memory checkers and things because no one's expecting you to do this kind of thing anyway that's that's a whole other brand so right um so understandably we were they were a bit reluctant to just with gay abandon start changing the way we wrote our c++ code because it was

Starting point is 02:13:44 very performant and they wanted to keep it that way so i got stroppy um which is british for angry uh got upset about it all and then i um said well okay come here and i got jordan to sit next to me i said right let me show you and so we were experimenting backwards and forwards with like snippets of code where i was turning this flag on and compiling it one way or the other. And eventually, being the Unix heads that we were, I wrote the command line of run GCC on a file, output to dash, as in stood out, pipe it through C++ filth, which then demangles all the symbols,

Starting point is 02:14:20 pipe it through some sed to get rid of some of the nonsense that was the assembler outputs and then i ran that in a watch which means it runs every second and just displays the output and then in the tmux session i halved and on the other side i opened up the editor to the file that the other side was you can see where this is going um the other side was was editing so i had the editor on one side and i had the results of the compiler at once a second on the right hand side and then we went back and forth between the various things we tried different compiled settings and we you know we kind of fiddled around and i was able to show him that

Starting point is 02:14:51 actually it was one instruction cheaper to do it the other way for boring reasons that we don't have to get into and so anyway he was like fine with it and now around the same time that i was that we were doing this um joe the person who had dragged me across from Google and dragged me to finance, and then ultimately he joined me in Chicago. He was one of those polymath folks who knows how to do a bit of everything. And he had been dabbling in Node.js apps. And so he was forever knocking up node apps and showing you know little database cruddy things and um we've done some uh previously at uh he showed me how to do them at google or whatever and anyway and so i in the back of my head i'm like hey i know

Starting point is 02:15:34 how to wait web apps you know crap little web apps but web apps nonetheless i think i can take what i just did and put it in a little web app and And yeah, Compiler Explorer, or GCC Explorer as it was called then, was born. And it was a few hundred lines of code running on a machine that I had set up in the trading company at that time. And it proved very useful. It doesn't take long to pull down

Starting point is 02:15:58 a couple of off-the-shelf widgets for editors. And then you put a little bit of filtering in and a Node app that runs a couple of hundred lines, runs the compiler, and then just pukes out the output and filters it in some way. And it sat for a couple of months and then I thought this is kind of useful actually. And so at the time we were experimenting

Starting point is 02:16:15 with more and more open source stuff, the company was still very dodgy about putting its name to anything. But they said, okay, you can open source this. It's not like competitive advantage or anything like that um but you know you just can't put our name anywhere near it you know because they were worried about legal comeback or something like that anyway their loss um because because you know um in 2012 uh i stood up an amazon server running the same code base having open sourced it and um yeah gcc explorer was was born it has it had a couple of compilers it was like four or

Starting point is 02:16:51 five thousand lines of javascript and um very simple docker-based security and again air quotes around that and there it sat for years uh and no one really used it that i knew of um it was convenient we still used it internally um it was dead handy for like trying out stuff so you know it grew so that you can change the compiler settings you can change which compiler you're using and then as you're typing for given how much bad rap c plus plots gets for like slow compiles for very small snippets the compilers are blazing fast it's just these giant monstrosities we tend to feed it so if you're just looking at like a a small loop or a couple of functions that call each other it takes milliseconds to to build and so we can build and parse and send back to the

Starting point is 02:17:39 the website on the right hand side the sort of annotated syntax highlighted output of the compiler and it becomes a sort of interactive almost syntax highlighted output of the compiler and it becomes a sort of interactive almost like a repl that you can start tweaking going like what if i do i plus plus or plus plus i which of these is faster and you see that makes no difference whatsoever and that's kind of it leads this sort of journey of discovery and immediacy that makes you kind of like really get a deeper understanding of what you're doing um but you, fast forward 12 years and it now is 60,000 lines of TypeScript. It is three and a half thousand different compilers, which is about three and a half terabytes of compiler. It is running on somewhere between anywhere between eight and 15 AWS instances at any one time, varying different types.

Starting point is 02:18:23 We've got some that have GPUs in them. We have some that are running Windows. We have the majority of them running Linux. At some point, we'll stand up some ARM ones so we can do ARM compilers as well. And we have become a we. I'm not just a plural person. I've got a small team now.

Starting point is 02:18:39 It's open source. And we've got like five or six people who have the keys to my Amazon account and can administrate the site. And it's become kind of the de facto C++ pastebin stroke experimental thing. So by default, it shows the assembly output. And so I like to think that my contribution is putting assembly in front of people who would never have otherwise seen it,

Starting point is 02:19:03 talking about those flaws and abstraction layers. It's like it really puts it right in the face of people and go like, hey, this is what really happens. This is what your compiler does. You may not think of it doing this. But then obviously folks use it just as a general compilation tool. And we now support that.

Starting point is 02:19:19 We can actually execute the code, which is security-wise terrifying. That you've random, you know random you know what what is your website it's essentially a giant remote code execution service you know and uh yeah right how are you securing it i don't know some amateur people have looked at it and said it looks fine um but you know we it's it's it's become a relative pretty significant second job um it's a lot of fun when it's fun it's a lot of toil when it's not um again i'm very very lucky and blessed to have the number of uh contributors that i have and as again they can help out on

Starting point is 02:19:59 the admin side as well you know it takes a lot of care and feeding to keep a website up especially one that has you know daily builds of all the major compilers we have our own ci infrastructure we have our own load balancing stuff we have our own it's it's huge now um yeah and i don't tend to use it as much as i used to because my job has changed and for a long while i was writing python all day and it's like what am i doing with myself right but i'm glad to say i'm starting to use it again i'm back writing c++ in my day job again so awesome but yeah most folks know know me from that is the short answer um and i think you know you've been very kind by calling it compiler explorer which is what i call it but i hosted it on my my personal domain and so most people didn't know that that was my name they just thought

Starting point is 02:20:44 it was a cool name which which it is i'm very blessed so most people didn't know that that was my name. They just thought it was a cool name, which it is. I'm very blessed and lucky to have the name that I was given. Right. But a lot of folks, yeah, didn't realize. And then they were surprised when I turned up and I said, yeah.

Starting point is 02:20:57 And they're like, hey, wait, like the website? I'm like, yeah, I guess. Maybe. Yeah, I've definitely uh uh had had plenty of uh interactions with folks where where they've said to uh to just godbolt it so yeah right that's it's it's I know I I did so I have got you know you can get to it at compiler explorer.com as well because that's my sort of hedge for the future if I ever need to get my domain name back or whatever. But yeah, I have now,

Starting point is 02:21:29 I took advice from a friend who said, look, took me to one side and said, don't keep, you know, you can call it that, right? You know, this is like Google never calling it Googling something. They call it web searching or whatever because it kind of sort of devalues it and to get in on the joke, you know, the joke as it were is not wise.

Starting point is 02:21:42 But I was, you know, I was poised to completely just go to the compiler explorer name get rid of the the the vanity domain name away um and they said this is a gift horse don't don't look it in the mouth right you know this is people think of it as a verb now or a noun and so you should accept that i'm like so i begrudgingly do now and in fact my linkedin profile i think says you know programmer and sometime verb or something like that so you know i've kind of i've kind of accepted it now and come to peace with it absolutely absolutely well uh last thing i wanted to to chat about here was uh you also have a podcast that uh i think it's been a couple couple years now it is yeah somehow we've

Starting point is 02:22:23 reached two years now. Yeah. Yeah. So what was the decision to start Two's Compliment? And what's it kind of about? So I think many of us during the lockdown went a little bit silly. You know, you heard maybe if you're extremely good at editing. So listener, if you don't understand what i'm saying here you know that dan has uh uh been an excellent editor but my dog has been barking in the background um and i apologize for that but my dog is also a pandemic silliness he's lovely but he was got in the pandemic i learned how to bake bread and i started a podcast these are all the

Starting point is 02:23:01 things that i think most people did i think you're late to the party actually in this in this regard maybe you started planning right so i i had it bubbling away in me to to start something as i felt i had something to say and then i kind of bottled it a little bit i thought well i'm you know maybe maybe not and then i confided in my my friend at work ben that i was thinking of doing and he said you know you know what? I was thinking this too. And so we're like, oh, what if we, would you do it together? And so Two's Compliment was born. And he and I have worked together at a number of companies along the way,

Starting point is 02:23:35 but we never worked directly with each other until more recently. So we've been very well aware of each other and we both like giving presentations. And so we've seen each other's presentations at the companies we've worked at before, um we hadn't you know directly worked with each other and in fact we haven't really worked that much directly together even though we're like in a small company together now but we have very compatible views and then our little the backstory goes right so i

Starting point is 02:24:00 in 1996 went off to go into the games industry. And Ben's a little younger. A few years later on, Ben was planning to go into the games industry and then butted a sort of sliding doors accident of fate where something to do with his wife's job or whatever at the time. He suddenly had to rescind his offer or it was rescinded and he had to go and get a real job, right? And so what we've got is like two people. I never really planned to be in the games industry,

Starting point is 02:24:24 but fell into it through, you aforementioned irc uh accident um and he meant to go in the games industry but due to some other exogenous event did not and then we've kind of followed parallel tracks and then we found how reasonably compatible our views are and then we've gotten together and we keep discussing things that are interesting to us which is to say two people who've been doing this for 20 and change years um ben is very much into testing and i'm into obviously the c++ and performance type stuff but it's fun to play those things off because they're not exclusive they're very compatible and there's a whole host of things that we do a certain way and you know having um grown up in the sort of similar circumstances we've got yeah some interesting

Starting point is 02:25:11 things that certainly when we talk to people they're kind of interested in it it seems so you know we just open up a we open up a web browser we start talking at each other and then a half hour episode comes out every every month once a month we're trying to you know not it's low effort ours is low effort yours is beautiful and well prepared and you're researched and everything and in fairness when we have a guest on which is rare we we try to be too but most of the time it's just hey let's talk about uh make makes my favorite program off you go yep well i i will say that uh uh i am definitely a big fan of it so um i i appreciate y'all y'all putting it together whatever you decide to talk about and you know there's been a uh a number of kind of podcasts that um i've taken like little bits and

Starting point is 02:26:00 pieces of inspiration from in terms of um putting putting this show together so um i definitely count that one on the list so well i appreciate the the time that y'all do invest into it no it's i mean one of the things that i don't know to what extent you've discovered this so far yourself is that podcasts are very unidirectional where you know you get a few tweets and then you hear these kind of anecdotes where people say oh i listen to your podcast but it's like you don't get the feedback it's more like radio in that way you know you could imagine like at one stage my sister was dating some radio dj and like you're sat in a room talking to yourself for like four hours a day and you don't really know if anyone's listening to you or not or whether they like it or not right and it feels like that and especially it's so

Starting point is 02:26:42 federated you don't know how many people are listening really you've got all these things that kind of guess but they're guesses and so it's lovely to hear that feedback and you know i'm glad to say that we've we've there have even been some folks that uh we've hired now at our company it's a it's a very long and protracted hiring mechanism to get people interested in your podcast and go maybe i should work with them and then they turn up and on the similar note actually compiler explorer i've now hired two people who have been contributors to compiler explorer is a very long and complicated interview process that it is right it turns out if you can fit a large javascript program in your head and make meaningful contributions in a across a variety of languages and you're a kind person who can hang out on our discord and be nice to people you're probably a good person to work with in the day job too absolutely well i can definitely uh

Starting point is 02:27:30 speak to that coming out of college my uh first uh post-college job i guess i started working on the open source component of this company um uh while i was in college and they basically just were like, you're doing a lot of work for no money. Would you like to do the same amount of work for some money? And I astutely realized that was a good deal. That is a good deal, yeah. Yeah, but it was kind of,

Starting point is 02:28:00 the interview process after that is kind of funny because you have like a fairly large body of work of literally like collaborating on something so um it is kind of funny how open source can can be a conduit for that right right for certain i mean yeah absolutely interviews are so difficult so yeah anything you can do to stick out is worthwhile doing right but you know not everyone has the spare time or will or energy after their day job to do like open source work if they can't do it as well so you know not everyone has the spare time or will or energy after their day job to do like open source work if they can't do it as well so you know one has to one has to be careful anyway absolutely that's a whole other topic and i'm just realizing we don't really want to open any more

Starting point is 02:28:34 cans right now that's for uh the episode we're recording next week together okay no but in all seriousness i i i would love to have you back again in the future. I definitely appreciate you spending nearly two and a half hours with me and talking through a lot of different things. I definitely had a great time and learned a bit, and I hope our listeners will as well. Well, thank you so much for having me. This is a great podcast. I've enjoyed the two episodes that I've been able to listen to so far, and I'm really looking forward to hearing the rest of them.

Starting point is 02:29:07 I only hope this one stands up and that we've not bored to tears the poor listener by this point two and a half hours in. I'm sure folks will love it. But thanks again, Matt, and hope you have a great rest of your week. Thank you. You too.

Your Ad Here

Microarch Club - 101: Matt Godbolt

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.