Two's Complement - Time For Computers

Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Rady. And this is Two's Compliment, a programming podcast. Hey, Ben. Hey, Matt. So I was giving a presentation at work the other day to a bunch of new hires and one of the things that i asked them during the presentation is i show a bit of code which is deliberately awful c++ code just to sort of prove a point and it's just a bit of string formatting but i asked

Starting point is 00:00:40 them how long do you think this function will take to run? Just ballpark it, you know, order of magnitude. In fact, two orders of magnitude. Where do you think it will be? And it's amazing that a room full of people, and there were some folks who are much more experienced in the room too. The range was absolutely astronomical from people saying hundreds of microseconds to people saying 10, 15, 20 milliseconds. You know, it was kind of, you know, we were sort of playing

Starting point is 00:01:09 Price is Right rules, you know, who was nearest without going over. And, you know, one person got it about right. This particular bit of code was tens, low tens of microseconds, right, even though it was absolutely awful. And it struck me that we all don't really know how fast we don't really have internalized very well how fast computers are and what they're good at and what they're not good at even if you spend most of our waking lives thinking about it it still hits us with surprise right i have to think about it like very i have to like do the math on it. I have no intuition, right?

Starting point is 00:01:45 Right. Looking at a piece of code, like it's just, I have to like sit down and be like, okay, that's going to be this and it's going to be that. And I got to probably look some things up on the internet. It's not intuitive at all to me. Absolutely. So, you know, trying to get a handle of how fast a computer is, is tough. I mean, I'm just, I'm in fact, just thinking about it right now.

Starting point is 00:02:03 One of the things that amuses me very much is when i plug my laptop into a docking station like at work yeah and there are these two giant monitors plugged in with you know 32-bit color per pixel however many you know 4k displays and i plug the tiny thin little usbC thing into my laptop, and somehow, miraculously, that amount of data is flowing out of my computer continuously to drive these screens. And again, it boggles my mind how much it is. But if I sat down and did the math, it's probably fairly reasonable, right? Well, it has to be because it works, right? We're looking at each other right now, and there's not a problem.

Starting point is 00:02:44 But developing intuition about these things is is tricky especially when computers are have surprising edge cases yes yes and it's really easy to be off by many orders of magnitude exactly exactly i mean how how fast is a modern pc like like let's say on the computer i'm on now it's like let's just say three gigahertz which means every tick of the clock is a third of a nanosecond and just putting that yeah stupid fast my golly a third of a nanosecond now i mean you brought this up when we were talking about this before but there's a nice way of thinking about what a nanosecond is. Yeah, yeah. So Grace Hopper, who is obviously a famous woman in computing, has this great talk that she did many years ago, talking about explaining, I think, to generals why certain satellite communication wasn't going to be possible in the way that they were thinking

Starting point is 00:03:42 about it. And she had a piece of wire that was, you know, she sort of called a nanosecond long, right? And that was like the amount of distance that light could travel in a nanosecond. And it was a little under a foot, I think like 11.8 inches, if I'm remembering correctly. She sort of held it up and be like, this is a nanosecond, right? So your satellite is way up in space, many, many, many, many nanoseosecond, right? So your satellite is way up in space, many, many, many, many nanoseconds away, right?

Starting point is 00:04:11 And that, she was obviously talking about like communication, but it's like so useful to, when I am trying to intuit, when I'm trying to sort of break out of the, like, I don't have an intuitive sense for how this works. I think back to that all the time of like, you know, about a foot is a nanosecond. Right. And that is how fast

Starting point is 00:04:26 light is moving so if you're doing something the cosmic speed limit nothing can go faster than that so exactly gives you like the baseline at least like this is if everything was perfect that's as fast as you could go exactly and i will many times sort of envision that wire that's a foot long and sort of like twist it around into a small shape and and project it onto the chip of like a cpu and be like there's a wire that's in there that is like as you say a third of the length of a foot right so only a few inches and that's how long it takes for the for the the light essentially right to move around in there now it's not actually exactly like that but it's just sort of like my my intuition sort of clicks in a little better when i think about it when you

Starting point is 00:05:08 think and i mean that obviously doesn't take into account the fact that electrons in wires go slower than light in a vacuum but they're not just moving in a sensible way although there's arguments about how the propagation of um of charge and um uh voltage really works but there's there's a transistor that takes some time to to to flip state and all that kind of stuff so that's all the kind of like the physical reasons why but as you say it gives you at least something you can stare at and look at and you look at like your arm from your elbow to your wrist and go there's a nanosecond there they're roughly speaking right so but first of all what a nanosecond right i don't know you know as as engineers we are probably more used to talking

Starting point is 00:05:51 about time or or things that are nano scale nano something or other but to put it into perspective right like a million sorry a millisecond which is like the sort of standard human wow that's fast is a thousandth of a second and right you know you'll be hard pushed to um do anything on a human scale that isn't in the tens of milliseconds so the way that i think about this is like when i used to work in video games we um we always wanted to try and get things in under a frame like the refresh of the tv screen that we were projecting to which in the uk is 50 times a second over here it's 60 times a second so you need to have completely recalculated the next uh viewpoint taking into account all of the ai the the user's movement on the controls anything else that's going on you're playing music and all

Starting point is 00:06:44 that kind of stuff and you have to be repainting the screen completely from scratch every 60th of a second, which is roughly 16 and a bit milliseconds. So that is sort of human scale time, right? And so when people were guessing with this string formatting routine, take 20 or 30 milliseconds, even though it was egregiously bad, there's no way, well, never say no way, but it wasn't as bad as an entire game repainting the screen. That's not a reasonable comparison, but again, computers are surprising, so I'm not trying to call people out for this. This is just the way it is. So that's the millisecond. So then if you go a thousand times smaller than a millisecond, you get microseconds,

Starting point is 00:07:29 which is kind of a more typical performance level that you might measure things at. So like when you're running non-trivial slices of code, you might be measuring things in terms of microseconds because that's, again, it's a sort of sensible domain for most things um but we're way out below what humans can perceive at that point and then a thousand times smaller than a microsecond is a nanosecond and now we're into the domain that the computer itself is actually ticking along at and so we're all that's just to kind of get our head around how many

Starting point is 00:08:03 thousand times smaller than things that we know is a nanosecond. And, you know, the idea that we have something which is reliably running at three gigahertz and therefore a third of a nanosecond for every clock tick. And not only is it doing one thing at a time, it's doing each CPU on your die is doing probably a a half dozen things in parallel plus the the chip itself has got many cpus on it and you just kind of get the idea of how much mind-boggling performance there is on a tiny piece of sand in a plastic uh wrapper which right right is just what it is right it's yeah so it when we were talking about this earlier it reminded me that i had totally and utterly stolen an amazing um article very short article by jeff dean i believe from google one of the original uh engineering folks at google where he had um put out like a list of

Starting point is 00:09:01 these are just the things that take computers time and here's how many nanoseconds or milliseconds they take and i've got a spin on that where i try and equate it to human time so that you can develop a bit of an intuition about like well what does this mean so that when you go oh yeah i i i didn't um uh the information wasn't in the cache i had to go and get it from main memory how long does that kind of quote really mean because i don't think any of us have a decent intuition about what you know oh it's 230 nanoseconds you're like that's still it's so small i can't get my head around it right and so i was just going to go through some of these with you and we can sort of talk about them as we go but oh yeah so the first thing computers can add numbers and exclusive or

Starting point is 00:09:44 numbers and do other sort of elementary arithmetic, not multiplication division, but like add, subtract, zors and compares in usually in one cycle. Now, there's a bit of pipelining going on here. I'm going to draw a line over that. But that means that like it takes one CPU cycle to add two numbers together, which is a third of a nanosecond and that's already bonkers but let's use that as a baseline right and then rather charitably i'm going to say that a human adding say a few digit number together takes one second like who's good at it right if you're good at like mental arithmetic and i say what's 392 plus 4964 perhaps you can do that in your head in under a second, right?

Starting point is 00:10:25 So we're going to use that as our scale for the rest of this conversation. So a third of a nanosecond of computer time equals one second of human time. Of human time. Exactly right. So then let's sort of move up the hierarchy of the kind of elementary operations that a computer is doing under the hood all the time. So the next obvious thing is multiplying, right? It takes i'm this is hand waving completely of course because it's different for every revision of computer it's different for different architectures and stuff

Starting point is 00:10:52 but let's on on the next 86 such as i'm on right now it's anywhere between four and six cycles to do a multiply of two numbers together these are integers for what it's worth anyone who's really counting and so that works out as 1.2 nanoseconds thereabouts 0.3 times 4 right and in human time that's four seconds and again now we're still in roughly the ballpark of that makes sense to me i you know probably faster than i could multiply numbers together but it's not a bad um approximation i'm sure there are folks who can do you know three or four digit multiplications in their head like like that So four seconds. So what about division?

Starting point is 00:11:27 So what would be your instinct, your guesstimate of how long it would take a human, and we'll work backwards, to do a long division of two numbers? A long division of two longs might take a while. But, you know, like assuming that you don't have uh the lookup table yes right you haven't got it basically like memorized where it's like 100 divided by 10 um right then i yeah you know many seconds like you're gonna maybe bust out some pencil and paper and exactly write it down like that yeah and it turns out that intuition is about right. So computers can't divide integers that much better than humans can, actually, with a big fat caveat that the latest round of Intel machines have somehow made it go a lot faster.

Starting point is 00:12:14 But at least when I first wrote these slides, and for most CPUs that I'm used to dealing with on a day-to-day basis, integer division is anywhere between 30 and 100 cycles, which is a lot longer. So that's 10 to 33 nanoseconds, which in human time of this scale is 30 seconds to a minute and a half. So sounds about right. You know, you're sketching a bit of paper and that's how long it's going to take you to divide things into. And again, the reason it's such a wide range actually on the computer side is that it does depend on the numbers you gave it.

Starting point is 00:12:40 Unlike multiplication and addition and stuff like that that where it just does 32 bits worth it's worth noting that just like long division once you kind of get to the end you're like well there's nothing else to divide let's stop now we might as well finish we've got the answer now there's no need to go through the other bit so it varies and um again that's only i like to point out to to people who are to new to computers is unlike most of the other things in the chip because the division takes up a load of space there is usually only one divider per um per cpu and it's not pipelined so some of the other things although i'm saying it takes four or five cycles um there can still be multiple multiplications going on at once you can have you know three or four multiplications going on at once each of the different stage or four multiplications going on at once.

Starting point is 00:13:25 Each of the different stage of the multiplication, it's just that there are four or five stages that the multiplier goes through before it comes out the other end. So it takes four cycles to get the answer. But with a divide, if you're doing a division, it's taking you 32 minute and a half. You can't be doing anything else at the same time,

Starting point is 00:13:41 at least in that divide. You can't get another divide started. There's no way of breaking the work up. So the reason I bring that up is because everyone's favorite data structure is a hash map. And almost every hash map, at least naive implementations, use a modulus with how big the hash table is to find which slot. You know, you do your hash. You get like some giant number and you go, mod 257. There's 257 entries in my in my hash table which slot am i

Starting point is 00:14:07 going to look in i will just mod and that's a division and it's long it takes a while you can only do one at a time so um so there are actually some trade-offs um you'll see some hash tables will actually not use a module so they deliberately use a power of two sized hash map table bucket size even though it doesn't give the best distribution of hashes it's probably worth spending more cycles with a better hash function and then using an and to get you into your table than it is to rely on a divide because it takes a minute a minute in human time yeah right yeah so then one of the other things that CPUs do that we've talked about on this podcast before, or at least I love to talk about,

Starting point is 00:14:50 so I know that we've talked about it more than once, is that they try and get ahead of themselves. They look along the stream of instructions and they try and do more than one thing at once. So they're trying to unlock parallelism by finding sets of instructions that can run together in parallel even though they weren't necessarily written explicitly to be parallel so this is not like threads this is just a single stream of instructions

Starting point is 00:15:14 it's like well there's an add and then there's a multiply and the add and the multiplier are distinct let's do them together but in order to do that it needs to go beyond branches so there will be some conditional branches in the in the flow of instructions and that would normally stop you if you were trying to get ahead of yourself because you're like well i don't know which way it's going to go until i get to it so i guess i can't do any more work but thankfully hardware engineers have gone well why don't we just make a guess and then if we guess wrong we'll we'll undo the work that we did, speculatively, and we'll chalk it up to experience. But if we get it right, then, hey, we're unlocking more parallelism.

Starting point is 00:15:50 So branch mispredictions is the name for when that is guessed wrong, guessed incorrectly. So that means that the pipeline has to flush, it has to refill up, and it has to do a bunch of extra work. Now, the average branch branch misprediction depending on when it was noticed can be anywhere between i'm here i've got it in nanoseconds for reasons i can't remember but anywhere between 5 and 30 nanoseconds which is between 15 and 15 seconds and a minute and a half so it's almost as bad as one of those uh divides

Starting point is 00:16:22 that we were talking about which is really interesting what's that sorry i said you can multiply quickly too yes right right so i mean it's it's it's amazing there's no sort of human analog to this i guess this is just the um i'm trying to think what it could be you know if you're running down the list of instructions in a recipe and one of the instructions was, if the previous instruction is blah, then go to step five. And you're like, well, that hasn't finished yet, right?

Starting point is 00:16:53 The egg is still boiling or whatever. So I'm just going to do the next few things. And then, you know, you eventually go back to step two and the egg is now finished. You look at it like, oh, actually the egg has set. It's a terrible analogy. I haven't thought this through at all. And you're like, oh crap. Now I have to go and redo all that stuff and i have to throw away the stuff

Starting point is 00:17:08 that i was doing and that you lose a minute and a half clearing up the desk and kind of going back to it like okay step three i mean how much human effort is wasted by trying to half-ass two things at the same time well there's that probably a lot exactly so yeah maybe maybe it's a false uh a false comparison although actually no i was i think it's a good comparison i was trying to cook too so we um we found a local uh place that sells uh pre-ground and pre-made little packs of spices for indian food which is like my favorite thing to cook and i love cooking it myself but i'll always support somebody who's um got a new little business so we went and saw and spoke to her whatever and so we bought every single one that she had and then

Starting point is 00:17:49 i was like well what are we going to do about this uh well i'm going to cook two recipes in parallel which definitely had exactly as you said the sort of symptom of i was half asking two things rather than whole asking one thing yeah right it came out just fine actually and it was delicious but um um that's more like multi-threading for what it's worth although there is only one cpu in this instance yeah good point so what else do computers do that take time uh reading from cache exactly yes so well reading from memory at all right you access variables all the time, right? Right.

Starting point is 00:18:25 A variable is either on the stack or it's on the heap or it's wherever, but it's in memory. And so we know that memory is slow, we're told. Well, that's why we have these caches that are supposed to make it go faster. The average access to, well, average, an access to level one cache is about the fastest thing you can get. So this is the tiny, tiny cache that's nearest the CPU. It's usually about 32K, which is absolutely ridiculously small. Like, although, you know,

Starting point is 00:18:54 my first computer only had 32K of memory, so it's still quite big in that respect. But it takes three cycles to read from L1, which is like three seconds in human terms, right? We've sort of said. So that seems, that's a bit like the piece of notepaper that you're currently working on. Maybe this is a little bit slower than that. Yeah, Ben's holding up a sheet of paper in front of him, right? If you just had to find an arbitrary bit in your flip notebook in front of you, two or three seconds sounds about right. I mean, again, again slightly so that's l1 and it's tiny it's as small as a tiny notebook if uh l2 which is a bigger

Starting point is 00:19:32 further away cache now if we were thinking like l1 is the size of a notebook this is like a a ring binder or a set of ring binders that you've got on your shelving behind you. And so typical L2s are hundreds of K, you know, 512 K, maybe a meg-ish. Actually, I should check that. In fact, I'm going to check that by running the command on my Kuta now because I've actually forgotten, which is super embarrassing, and I'd hate to get it wrong. So my L2 cache is, oh, 18 meg, it reckons, but I don't think that's right. All right. Well, anyway, it's it reckons, but I don't think that's right. All right.

Starting point is 00:20:05 Well, anyway, it's megabytes of information local to this individual CPU. I bet you that was a sum of them all. So yeah, I should probably... Anyway, so hundreds of K to low megabytes. That's 10 cycles away, which is 10 seconds away. That seems a bit quick if it's ring binders on your shelf for me, but maybe the analogy still stands. Level three is the final layer of the cache.

Starting point is 00:20:29 That's the furthest away, and it's shared with all of the other CPUs. So this is a bit like, I guess, a bunch of folks sitting at desks and having a centralized library of commonly used books in between them all. That's going to take you around about 40 cycles to get information. It varies there because it depends on whether it's in the part of the library that's actually physically close to you or not. That actually matters now. And so that's 40 cycles, so 40 seconds. Again, now that seems a bit fast for a library in my analogy. But really what we're coming and getting to now is like, what happens if the cache system fails and you actually have

Starting point is 00:21:04 to go out to the real memory, you know know the things you literally slot into the motherboard when you're building your computer and you have to actually get the data off of that right we're talking like 100 nanoseconds then which is six minutes so that's a trip that's a drive that's a trip down on the elevator to the you know the archives to then find that one book you want to get out and then get back on the elevator, back up to your right, and then put it in the shared library. Again, it's 120 nanoseconds, which sounds tiny. And it is tiny because we can't comprehend time scales that small. But in the working life of a computer that's adding numbers together as

Starting point is 00:21:45 it's like primary job that's like twiddling your thumbs for a tea break right i can make a cup of tea a decent cup of tea in six and a half minutes and that's what you're taking every time you miss your cash which is why that kind of thing becomes really important when you are talking about really performant code. Just to sort of finish this off then. Before we go into more general stuff. But like if you're talking about reading from peripherals. Like the real genuine outside world. As opposed to things that are literally soldered on your motherboard.

Starting point is 00:22:19 Or very close to. Reading something from an SSD. At least when I wrote these slides. Which was a while ago was you know 50 microseconds which is like two days right that's jumping orders of magnitude there yeah we're off now to ordering from amazon and waiting for it to come through right this is that book we didn't have and amazon you know it's got prime delivery and it'll be through tomorrow so that's you know when you're um we need to read something from from disk obviously

Starting point is 00:22:45 you try to make sure you have lots of things to read from this so you don't wait for any one particular piece but if you're using an old school spinning disk and it's not in the right place and it has to seek to wherever your disk uh your information is stored on the disk drive we're talking milliseconds now so there's another three orders of magnitude different from the microseconds so one to ten milliseconds on a good day which is one to twelve months on this scale so this is sending away to somewhere you know some some you know some obscure company that has to custom make the thing and then it comes through and it says you know 60 days business days for it to come through kind of feel to it right that is what the kind of level we're talking about when you're reading from a regular old school hard disk right i was gonna say it's like this is like oh i need this book but the problem is is it hasn't been written yet right yeah that's more of that i guess so i

Starting point is 00:23:40 mean yeah because even nowadays they can print stuff on demand right like i've got a couple of books on my my shelf here that are on demand printed, which surprised me. I like flicked through and I looked at the end and went, wait a second. It says it was printed, what, three days ago? Yeah. It's crazy how far things have come. So my analogy with books maybe is not exactly right. if we're going to go from from disk drives which are physically inside the chassis or chassis of your computer and we start talking about networking more generally like the internet

Starting point is 00:24:12 if you ping your switch neighbor that is the computer that is plugged adjacent to you in the switch that you're both plugged into we're talking hundreds of microseconds which is about a week so it takes a week to get to the closest thing that's not your actual computer now maybe obviously networking has gotten better since i wrote these things and if you're using cool techniques i'm sure you can go faster than that but like just as an order of magnitude thing a week of time to go and get that thing from the network yeah yeah so all of these things are so much faster than the spinning disk right isn't that funny yeah it's actually faster as long as the computer you're talking to has the response

Starting point is 00:24:50 in its own in its ram then maybe you can get the answer back quicker which is you know used by things like memcached and redis is exactly for that reason if you uh ping when i ping google.com from my computer it takes me just over a millisecond, which is a month and a half. So Googling it is not the answer. Pinging the other side of the Atlantic back to my home country. If I ping bbc.co.uk, it takes 90 milliseconds. So it's nine years. That's like going to Mars, right?

Starting point is 00:25:20 If you want to get the information from the other side of the of the world um and then obviously the the far end of this this scale is like well it takes me five minutes to reboot my computer so when someone says can you not just turn it off on on again uh if i have to turn it on and off again especially as this particular machine takes so darn long to go through its post um i've i've conservatively put that at five minutes which is probably a bit high but five minutes to turn your computer off and then booting back up again and then you remembering to come back and log in

Starting point is 00:25:49 and all the kind of things that you have to do because you've wandered away to make the cup of tea in human time 32 millennia so that is a civilization ending event in the CPU time so just think of the computer man every time destroying civilizations every time. Destroying civilizations every time you turn it off and turn it back on again.

Starting point is 00:26:08 Right. I mean, in a way you are. So my first book where I learned to program from, which I'm glad to say that the Usborne book company still exists, and they have recently sent out PDFs or made PDFs of those original books from the late 80s, early, yeah, no, mid 80s available. They have little cartoony robot characters that are clambering around inside the CPU. And there's like pigeonholes everywhere. And they're climbing in and putting numbers in pigeonholes to sort of represent writing numbers from memory. So a reboot in that world really is killing them all and starting again.

Starting point is 00:26:41 So, I mean, maybe it is. Maybe it isn't such a wrong thing. I feel bad for them now. Don't kill all the robots inside your computer that make it work. So I was going to say, like, I want to think a little more about those sort of timescales when you were talking about, like, network access

Starting point is 00:26:59 and this access. So going across the pond over to the uk you were saying is 90 milliseconds which is did you say nine years if we're scaling this up in time is that right is uh 90 milliseconds is yeah nine years apparently again someone will probably check my maths and find that i'm completely wrong here but like i'm sure it's order of magnitude correct right and then like but like And then just pinging Google, which is probably hitting some edge server that's geographically located.

Starting point is 00:27:31 Almost certainly in the same place that my provider is plugged into. There's going to be a pop there. Right. And how long was that again? That was a month. 1.2 milliseconds. A month and a half.

Starting point is 00:27:41 Yeah, six weeks. Okay. Okay. And 1.2 milliseconds. And I mean, actually, let's think about it. So the UK is 4,000 miles away. Ping is the round trip time. So it's 8,000 miles.

Starting point is 00:27:51 I'm going to type into Google right now. 8,000 miles in feet. You can hear my appalling clicky keyboard. And it is, oh God, 4.2 times 10 to the 7 feet uh so that's four point yeah or 4.2 million i'm gonna do that so that's the number of seconds so i'm gonna divide it by uh that's that many minutes that's then that many hours. Okay, and then that many days, 335, oops, that's not right, 488. So I don't know what I got, nine, well, I mean, I know the number was right from an actual measured point of view.

Starting point is 00:28:38 But according to my appalling math here, just the light going across um takes uh one point oh hang on a second 1.3 oh no sorry i've just been i've completely balls this up haven't i yes i've got several orders of magnitude that i need to work out first of all um uh and i'm gonna yeah what's that 1.33 uh no i've lost myself this is daft all right i made a mistake in terms of that so it's 4.2 times 10 to the 7 feet therefore it's that many nanoseconds away okay right and then let me do that right there we are gosh so that's 1.9 so that's yeah okay that was a much easier thing i don't know where i was doing the years and whatever um according to this it would take 42 milliseconds

Starting point is 00:29:31 with this stupid approximation of one foot is one nanosecond 42 milliseconds to get there and back so our yeah our time so that's that's the absolute maximum time right that the minimum time absolute minimum time it could possibly take. Gosh, we got there in the end. I'd like to pretend that I will edit this to make me sound intelligent, but I won't. So the world will be exposed to me being a fool and being unable to do elementary maths at the start.

Starting point is 00:29:59 I mean, in fairness, there was some division in there, I think. It was. It did take me a minute and a half to get the right answer. Yeah, quite. some division in there i think so it was it did take me a minute and a half to get the right answer yeah but this is so that's interesting because i think one of the sort of more surprising things for me in the last few years uh has been and we have uh run into this at work actually the sort of emergence as the network as an incredibly fast device for data access on par with sort of local storage mechanisms especially when you sort of design your network to facilitate that kind of thing because certainly you know when i was when i was sort of generally thinking about these sort of order of magnitude times many years ago naturally you'd be oh, we want to avoid network access

Starting point is 00:30:46 because it's going to be much slower. So, you know, we're going to cache this locally on disk. And even back then it was like probably a spinning disk, but that was still faster, right? And I feel like the tables have flipped a little bit. If you sort of, you know, take that into account with your network storage and the design of the network storage, where that network storage can be on par, if not in some cases, maybe even faster, than what you would be able to purchase for the same price stored locally. And then when you couple that with the fact that network storage has the benefit of being able to be accessed by many computers at the same time, then things

Starting point is 00:31:26 get also very interesting. So this is obviously the sort of very micro-optimization benchmark area is one place where your preconceptions and your intuition can be wrong. Absolutely. But I think it can also be true at these sort of more macro levels

Starting point is 00:31:41 where you're thinking about the design of whole systems and how they interact, where your intuition about what's fast and what's slow is maybe off by many orders of magnitude. That's a really good point. Yeah, I mean, it is definitely true that accessing your neighbor's RAM or your neighbor's is faster or at least the same speed as reading your own SSD sort of in the same ballpark. And that could make a big difference, as you say, rather than filling all your servers with terabytes and terabytes of local storage. Then, you know, being smart about using shared storage where maybe the I mean, one thing that we glossed over in this, of course, is like that was reading from disk where it wasn't in the file system cache.

Starting point is 00:32:24 So that's obviously a system level, operating system level cache. Everything's a cache, right? There's just caches everywhere to make everything go faster. And so very often, if you've just written a file to disk and then you're just reading from it again, then you're not actually touching the spinning disk. So you get it from memory. And so there's that. But obviously, that blows up at some point if you will run out of memory or the cache gets flushed, the disk and new other things come in.

Starting point is 00:32:48 Whereas in a shared storage environment, the cache could be on the shared node as well in RAM. And I think, you know, we certain devices that we've got access to have layers in themselves of, well, this stuff is all in fast ram this stuff is all in ssd and it's all backed finally with actual big old fat spinning disks that can write the sort of journal of record for forever storage and so the layers go through that but it means that most of the time in a shared environment it's probably faster to just keep asking for it off the network have it streamed to you than it is to try and store it on your local disk and then get it back later so that as you say that that affects the way you think about your systems design which i've not thought of that's great yeah especially when you start getting into the concerns around having to manage that local storage much more carefully and it's like oh yeah well this is

Starting point is 00:33:38 going to be faster if we get it off a disk but we only have like two terabytes of disk and then when that fills up we got to swap it out then we got toetch it. And then all these other sort of trade-offs you make. It's like, well, if you're just going to degenerate into reading stuff from the network all the time because you don't have enough local storage to actually make it worthwhile, just cut the local storage out of the equation and read it from the network all the time. Yeah, it's interesting.

Starting point is 00:34:01 And then maybe concentrate any caching inside your application and cache stuff in from the network in memory if you need that as well. This is a super interesting exercise, though. I love this sort of metaphor of going to get something from your desk or going to the shared library or waiting for you to go on your computer. Yeah, in my head I've got this sort of mental plan to come up with a really good set of analogies that work this way and then when i dream about retirement or whatever i've got like maybe we can make get someone to do like an animation of this

Starting point is 00:34:33 and you know that i've got like elves and goblins is like the real way i'd like to express this i want to make it interesting to like like the little robots like that's the reason i brought it up those books were so important to me as to how i internalized like how computers quote really work that i think that there's a new world where we can show this is now how computers really work and both you can start at the top and say well this is just what computers do adding and subtracting whatever haha and you can have your little goblin with his uh parchment paper writing out the answers to whatever and you're like oh there you are and then now we've got two goblins because we've got two cpus or whatever and how do they

Starting point is 00:35:07 agree on this and whatever and then you can start going further and further in you're like well you know the goblins instructions come from the forest and depending on where you are in the forest it might take you longer to go and get the forest and grow the thing and you know again it's not really fully thought fully formed but you knowof-order execution can be done this way and cache misses and all that stuff, I'm sure, in an interesting and entertaining way, rather than the boring library analogy we used earlier. But bringing it back onto the subject,

Starting point is 00:35:36 one of the things that we were discussing recently at work was how surprised some people were about other operations. For example, we were looking at whether or not raising something to the power of two was the same as multiplying it with itself and obviously we have a lot you know we work in finance we have a lot of folks who are very mathematically focused and of course those two things are absolutely equivalent right raising something to the power of two is the same as multiplying it with itself which is to say it's squaring the the value and

Starting point is 00:36:12 these values sometimes are matrices or giant arrays of numbers and things like that so it's not as straightforward as literally a number but if we were to just think about it in terms of a single number like the computer like i described before the computer can do a multiply in a single cycle or sorry four cycles wasn't it four cycles but raising to the power is not a primitive operation that it knows how to do it has to be built out of code to do that operation just like you know taking the inverse tangent of something is like a procedure there's a program that runs and procedure there's a program that runs and so there's a vast vast difference in the number of instructions that need to be

Starting point is 00:36:50 executed to raise something to a power than there is to just multiply it with like the circuit that does multiplication and but they are we can see their equivalent right you can look at them and say well this is a power but you've asked the computer to do two different things now in some very optimized compiled languages there might be scope for the compiler to go well these things are equivalent but typically something that is so high level in terms of what the compiler sees that like it's a call to a function to raise something to a power and unless it has special knowledge that this is this this is really what a power is, it doesn't know that it could be replaced

Starting point is 00:37:27 with multiplying by itself. And so that was surprising to some folks, but I guess not to me. I'm looking at it going like, gosh, that's a very different operation that you're asking it to do. They're functionally equivalent. Yeah.

Starting point is 00:37:42 I mean, there's a lot of things with the intersection of computers and mathematics where the computer and and the pure math don't really agree at all i mean you know obviously floating point operations are an example of that well there's that too yeah um you know uh and we can go into that but i feel like there's a lot of these kinds of areas where it's like if you're looking at things from a purely mathematical standpoint, it's different when you map it into computation, right?

Starting point is 00:38:15 And it's like sometimes a limit of like the way that computers are actually implemented now in the technology that we have. But I think there's also some, I could be wrong about this, but I feel like there's some things that are just like, no, no, no. This has nothing to do with the way that we tricked

Starting point is 00:38:30 little bits of sand into thinking. This is just like a fundamental limitation of computation, right? Like you just can't do this in the same way that you can do it mathematically because the physical world just doesn't work that way. I'm trying to think of a good example of that though. I was going to ask you but i i was thinking myself now what else there could be i don't know maybe i'm making stuff up but yeah i feel like i want to think about that now we go quite thinking about stuff yeah not good this is not a good podcast material we're

Starting point is 00:39:01 both staring at each other like with our fingers on our chins going, hmm. Yeah, this is like we're nerd sniping each other. But yeah, I mean, those things are, they can be very surprising. They can be very surprising. Right. And I think in our case as well, my teeth, in fact, my sort of speaking career is based on the website that bears my name and the kind of cool things that compilers can do. And that's what really gets me excited. And so my intuition is based around what compilers are able to do. What compilers often can do is to do exactly the kind of transformations that you might

Starting point is 00:39:34 reasonably think, like seeing, maybe not specifically power, but things like that. Hey, look, you're doing this sequence of operations. There is a faster way of expressing them that has the exact same semantics and meaning to the cpu so i'm going to do some work to to to change it for you and so you often you don't have to worry about these things as a program which is great right i mean that's ideally you want to be able to express your intent at the the highest level where you can achieve your goals that seems to be a reasonable thing to want. But in an interpreted language like Python, like this particular thing was in Python, there isn't an opportunity to have that sort of high-level view

Starting point is 00:40:13 and kind of go, oh, I noticed that you're always doing these two things together, or I can prove that this is a constant on this side, and therefore I can do something different from what you said. That transformation is not really part of it. Now, that's not to say that there isn't some clever code somewhere deep, deep, deep down in in numpy that's kind of doing a comparison to go oh this is raising to the power of two let's just multiply it but if it is doing

Starting point is 00:40:32 it i was on it doesn't seem to be helping in the particular case i was looking at similarly you know square roots again um you know it's raising the power of a half and and at what point if you say well there's there's 12 or 13 different um possibilities maybe for this maybe if you're raising to the power of you know eight that's the same as multiplying by itself and then squaring the square and then squaring the square or whatever you know there's there's a there are algorithms there are known algorithms for doing raising to high powers for things like encryption actually because a lot of the stuff that you do is raising to high powers and then taking the modulus with a big prime and all that kind of nonsense um and so to rise to raise to a high power you actually devolve the number that you're raising it to into the binary

Starting point is 00:41:12 representation then you keep squaring and adding and squaring and multiplying sorry squaring and multiplying with either the thing you just squared or with the original value again to kind of get you up in log two steps which is all cool But if you were to generally apply that all the time, it takes probably more time to work out the best way to do that than it would do to just have done it the long way. So it's only when you have a compiler that can say, well, look, I'm going to do this the once now, right? You, poor program writer, you get to suffer this time while I compile,

Starting point is 00:41:43 but then everyone at runtime benefits from the fact that I saw that this was actually this particular kind of multiplication and in fact I can replace your multiplication with shifts and ads or whatever you know there is all that kind of stuff so so maybe there is a bit of bias in my my recent experience because of it being an interpreted language which obviously has its own trade-offs and one of them is that you can write it really quickly and another one is that maybe you can make a lot of cups of tea while it's doing its work right right saving saving on uh the writing time at the expense of the running time and the coffee and tea making time often that's the right trade-off right you know that's i mean certainly

Starting point is 00:42:23 if you want to write a little command line tool, then you want to be writing something which is quick and easy and not necessarily hard to write. So there's always trade-offs. Yeah. All right. Well, this has been a really fun adventure

Starting point is 00:42:36 from like the, you know, third of a Grace Hopper wire. Yes. All the way up to rebooting your computer, taking 32 millennia and all the various up to rebooting your computer, taking 32 millennia, and all the various effects of that vast difference in time. So I feel like I have a much better understanding of how time works and how computers work

Starting point is 00:42:59 just from going through this. Absolutely. And I'm going to leave on a tweet that I saw from a friend recently. Well, not a tweet. It was a conference talk that he did. And then halfway through, he said, the first rule of profiling is that you are wrong.

Starting point is 00:43:15 And I think that's the intuition that everyone should take away from this is that you're always wrong. Start with that and you'll probably wind up in a better place. Exactly. All right, my friend. All right.

Starting point is 00:43:30 Until next time. Until next time. You've been listening to Two's Compliment, a programming podcast by Ben Rady and Matt Godbolt. Find the show transcript and notes at twoscompliment.org. Contact us on Twitter at twoscp. That's at T-W-O-S-C-P. Theme music by Inverse Phase.

Starting point is 00:44:02 Inverse Phase.

Two's Complement - Time For Computers

Ben and Matt examine how fast computers are by comparing them to humans. Turns out they're mind-boggling-ly fast. Or maybe humans are just slow? I don't know, let's not make the humans feel bad. They'...re trying their best with those adorable squishy meat brains.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Two's Complement - Time For Computers

Ben and Matt examine how fast computers are by comparing them to humans. Turns out they're mind-boggling-ly fast. Or maybe humans are just slow? I don't know, let's not make the humans feel bad. They'...re trying their best with those adorable squishy meat brains.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.