Two's Complement - Slow Builds and Fast Feedback

Episode Date: April 6, 2021

Ben and Matt talk about builds and build systems, mostly in C++. Matt talks about lots of different ways to speed up builds for C++, and is very helpful. Ben questions whether you want a build that ne...ver fails, which is moderately helpful.

Transcript
Discussion (0)
Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Rady. And this is Two's Compliment, a programming podcast. Hey, Ben. We've spoken a lot about how important it is to have a fast feedback cycle. That's easier said than done in some languages. I spend most of my time in compiled languages like C++ and they're well known for having really slow builds. Now I'd like to talk a bit about how it might be possible to
Starting point is 00:00:40 make our life easier and whether there are things we can learn from other languages or other approaches that you might know or if there are like some tricky tricks that i know from c++ that i can we can talk about to just basically try and give people an idea about how one might test or run or deploy your compiled language project or indeed any other language project as quickly as possible and get that feedback that rule of eights thing that we've talked about a number of times uh yeah i think that would be cool it might be kind of important to talk about uh when we say like feedback cycle like what do we mean by that like what are some examples of feedback cycles in software development and you know how do you speed them up so when i'm
Starting point is 00:01:25 developing i'm a dyed in the wool ide person as you know having worked with me and i love being able to make a small change to my code and then very quickly find out where i've gone wrong and very very often that is either the squiggly lines in the IDE itself where it's where it worked out that hey you know you mistyped something or maybe the green squiggly lines that like the linter has run and said you know what that's probably bad practice or mostly it's I hit build and very quickly I get the result back from the compiler that says you're a fool you're missing a semicolon you know that kind of thing so that's that's the first level of feedback that I'm looking for is, am I on the right track as a developer, just on the nuts and bolts stuff about have I got my syntax right?
Starting point is 00:02:12 But then once I've got something working and building, I want to know whether it's right. And, you know, we've talked a bunch about testing, and that basically means running automated tests, or indeed just running it and kicking the tires myself and saying, is this what I wanted? Yeah. So that's what I'm thinking about when I'm thinking about feedback. I think I'd like to go from the point where I've made a change in my code to seeing if it works, however we define that, as quickly as possible.
Starting point is 00:02:40 Yeah. In fact, there's kind of a model of software development where it's really nothing more than just a number of feedback cycles all increasing in length, right? Right. to starting up the system and maybe doing some manual testing or some integration, automated integration testing to see if you have a system level there to, you know, giving it to a user and making it sure that the user got what they expected deploying it. You know, it can kind of go further and further out. I guess.
Starting point is 00:03:16 Yeah. They're all part of the same thing. Each, each is a little bit further down the line to the finished product than the last step. But if you think about it holistically then you get each of those is it's important to be able to to get too quickly like like deployment for example that's often something which is grafted on at the very very end and is a massive
Starting point is 00:03:35 pain if you're especially with binary deploys like in a compiled language sometimes if you've got to deploy a particular build and it has certain dependencies, if you think about that at the end, it can be a nightmare. But if you start from the beginning, then you're always deploying and it's not such a problem. Yeah. I mean, you can think of all of these things, I think, as sort of spending a little bit of time to get confidence that you are ready to spend a little bit more time, right? Like, you know, from a very naive point of view, as a, you know, almost not a person not involved in programming at all, you could just say, well, why don't you just sit down and type out all the code from beginning to end exactly correct and then give it to me, right? Like,
Starting point is 00:04:22 in the final finished form. And, you know, we obviously know as programmers, that's not how it works. But, you know, it's maybe useful to think about, okay, well, specifically, why doesn't that work, right? Like, what do you actually need to do to, you know, deliver working software? And I think a lot of it is the recognition that it's, you know, I would say it's basically impossible to build large things correctly. You can only build large things out of small things and try to build the small things correctly. And I mean, you know, maybe they're the lines where you draw between big
Starting point is 00:04:56 and small can vary from person to person or team to team or problem to problem. But I think that's kind of mostly true. And so the feedback cycles are just a way to define small things and and to try to buy a little bit of time by saying well i'm going to do some very very small things and check that they're correct and then do some more small things and slightly bigger things and slightly bigger things so builds are obviously a big part of that and how you set up those those feedback cycles is a big part of that in a yes particularly compiled languages and let's face it even javascript these days when i'm hacking on my my hobby project i spend most of my time waiting for webpack to do its magic and you know
Starting point is 00:05:36 convert my typescripts to javascript or do whatever so like everything is compiled these days so there's definitely something to be said for getting that fast feedback cycle. And in C++ in particular, I've worked on very large projects. Back a decade and a half ago when I was working on games industry, there were giant, giant megalithic projects that were very long to build. And that made testing hard. It was a big deal. You know, you would build the code,
Starting point is 00:06:08 you'd go and make a cup of tea, you'd come back, and hopefully it actually built and linked, and then you'd deploy it through a serial cable to like a Dreamcast on the other end of it, and then you'd run it and you'd be like, oh, crap, yeah, I got that the wrong way around, didn't I? And that was bad.
Starting point is 00:06:24 Nobody really wanted that situation yeah and so um actually inspired with a colleague of mine at the time uh we started taking some approaches that a guy called john lakos who now works at bloomberg uh had come up with as like these are principles by which i lay out structurally, physically lay out my software so that the build time is minimized. And that's a really interesting thing for me. That was, for Nick and I, it was a real eye-opener that as well as designing software for testability, not that we really were at the time, but modularity certainly, and ease of understanding and separation of concerns, if only because then we knew that we could give this to Tim and then that to the other guy. And then we know that who's doing what and they aren't treading on each other's toes.
Starting point is 00:07:14 But there was this idea of structurally designing your code so that you laid out things to minimize rebuilds particularly and these are some of these are very specific to c++ like those listening who aren't as familiar with c++ you kind of have to repeat yourself twice in c++ in many many ways where you declare something exists with a particular interface be it just a function or an entire class and all of its contents. And then you can define what those things are somewhere else. And that's the header file and the CPP file. And there's a huge blur between the two of them because you can put things in the header file that you could otherwise have put in the CPP file and vice versa. But the real trick is that anything that's calling your code needs to see the declaration, i.e. it exists and it has this form, but doesn't
Starting point is 00:08:06 care necessarily about the definition. It doesn't actually care what you're doing with it. It just says there exists a function called printf. Hey, it takes a variable number of arguments. Good luck. And that's all you care about as a caller. But anytime you change that contract, anytime you modify, well, now it takes three parameters or two parameters or it takes an integer instead of a double all of the callers of your function need to be rebuilt in order to be updated to know that they have to change their calling convention or anything like that things like that and so the trick is to minimize the chance that you're going to be changing something that's a declaration in a header file so that you don't force everything to rebuild and so there are a number of tricks that you can do to do this and
Starting point is 00:08:50 and lakehouse's book um large scale c++ design i think is what it's called was a huge eye opener for us and actually we realized that a lot of these things could be automated and we when our then games company went down uh we founded a little startup to try and make this a machinable thing an automatic uh uh you know set of i guess what they would be nowadays is refactorings of an existing code base we had a very different idea and it didn't work out but as a whole other episode i think about the failures of my my startup business but um what it means is that there are ways of changing your code that sometimes are just free. Don't put stuff in header files if you can avoid it.
Starting point is 00:09:33 Right? Don't expose your inner workings to your consumers if they don't need to be up in your business. Right? They don't need to know that i'm um calling some other function and of course there's a sort of transitivity issue here which is that if you are exposing your innards to another person and your innards use another library now the person who's including you is transitively including another library and so c++ gets a bad rap for for build times understandably yeah well i think it is one of the few languages where...
Starting point is 00:10:07 Well, actually, I don't know about the few. It is uncommon among popular programming languages, I think it's a fair characterization to make, that the structure of your code has such a dramatic impact on your build. I don't even think that you see that in other compiled languages. You definitely don't see that in, you know, non-compiled languages. I mean, the load time of, you know, a Ruby or a Python file is negligible compared to
Starting point is 00:10:33 the execution time a lot of the times. And so even with something like Java, like, you don't really think about those things, right? No, there's some magic behind the scenes and some, like, very quick pass over the Java files that's generated enough understanding of, like, like hey this is what the module contains that you just don't think about it it's hidden away from you but it's exposed warts and all to you in c++ and it's in many ways it's a feature because in some cases i do want to expose that my deepest implementation to the outside world because i'm i'm a generic uh algorithm
Starting point is 00:11:07 and if i'm a generic algorithm generic genericized on some type that i don't even know yet c++ decided to go down the route of effectively code generating with my algorithm and your type you inject your type into my algorithm and the compiler sees it and is able to do all sorts of amazing optimizations given that. And so it's a feature that the innermost part of my binary search or whatever I've written generically
Starting point is 00:11:34 is exposed to you so that when you put in your type that has a particular less than operator, the compiler can make a really good decision about how to optimize that. That's brilliant. That's brilliant. That's great. And of course, if I do change the algorithm inside my binary search, of course you have to rebuild. I've changed how it works and you will be able to optimize in a
Starting point is 00:11:56 different way. So that's a good thing. I think a lot of, I mean, just to take one simple example, it's really convenient as a C++ developer to put trivial functions into the header files of, say, a class. So if you're writing a class, you have a choice. When you declare a function, you can also define it in place and just actually open the squiggly braces and say, well, okay, this is return my member thingamajig right so then then um the implementation is in the header file too and that's great and it's super convenient because it's as a programmer right you don't want
Starting point is 00:12:35 to be opening two files and keeping these things in track and whatever so you might do that but now of course you have put your implementation in the header and so you've exposed it to other people which is fine. Now, there are two reasons why you might do that. One, you're lazy, and I'm very lazy all of the time, so mea culpa. And two, because certainly in older generation compilers, it was the case that the only time that things could be inlined is if the compiler could see the body of a function. Which, again, makes sense, right? The compiler needs to be able to see what the function is doing, i.e. into its innards and having a look
Starting point is 00:13:12 at what actual operations are being done in order to make the determination about whether it should inline something and then ultimately to inline it. And as we know, sort of inlining is like the Uber optimization. Everything runs off inlining you inline stuff and then you notice that things are constant and then you can delete huge areas of code and then you can align some more stuff and then before you know it a huge complicated chain of things becomes like a simple operation so inlining is something you definitely definitely want to have available so obviously you want to put some things in the header file in order to say to the world, yeah, okay, you can inline my function, but now you've got this sort of decision to make.
Starting point is 00:13:51 Is the performance important to me? Is it really important if it's a trivial accessor, like, you know, I get something, then return something. You're like, well, it doesn't hurt me to do this. So I'm going to put it in the header file something i've seen that people haven't realized is that compilers have moved on and now with link time code generation or um yeah lto link time optimization compilers are able to do the inlining process even with things that they didn't at the point of compilation have access to and so what i have started doing now is putting very very few things in headers very few things indeed and relying on link time optimization when i'm doing my release builds which does take longer but i'm not caring about that right i'm for my testable builds
Starting point is 00:14:46 i really wanted to build fast while remaining honest to this is what i'm going to ship yeah so there's a bit of a dichotomy there because obviously everything you do that's different between shipping and debugging um that isn't the same is is maybe problematic or could be problematic down the road but i found that like having a very tight turnaround in debug mode where i i uh don't turn on link time code generation and i um and i move everything out of the header files usually comes back as a as a boon for me. So that means I can make changes. You know, hey, I need to – who all is accessing this thing?
Starting point is 00:15:30 I want to print out who's doing it. Or actually, no, I'm going to get rid of this field entirely and I'm going to replace it with a calculation. Well, actually, that maybe is a problem. But anyway, now I can make those changes. And then the only thing that rebuilds is my test, the C file that it's implemented in, and then the linker has to churn and do some magic, and that's great. Yeah. And then when I come to do my release build, I turn on link time code generation, and it's as if I had written it the old-fashioned way, the traditional way.
Starting point is 00:16:00 Anything that could be inlined will be inlined if it's profitable to do so. Right. And so it's kind of the best of all worlds but what i definitely do when i do this is i make sure that i build and run my tests in debug and release in my ci right is anathema to a lot of people they're like why on earth would you do this right just build it in debug and i'm like well i am sort of sort of voiding the warranty a bit by saying um i'm trusting that the compiler's optimizations don't change the meaning of my code which i think at some point you have to do i think we've talked about this before programming is a faith-based
Starting point is 00:16:36 activity at some level one has to trust the compiler yeah most of the time right you're going to spend a lot of time checking things that are already true if you don't have some level of trust somewhere. Exactly. So assuming that the compiler works, you can get a lot of coverage by running the debug mode day in, day out. But just to be sure, just to be sure, and it's relatively low cost when you're doing a release candidate or something like that,
Starting point is 00:17:04 you should run the tests in release mode as well. And mainly because it's not actually the compiler that will be at fault. 99 times out of 100, if you have a difference in the release versus the debug, it's almost certainly your own fault. C++ has plenty of traps for the unwary uh if you're invoking undefined behavior which is sort of dreaded you're you've gone off script from what the language says you're allowed to do then in some cases the the compiler is is completely uh able to to do something very very differently than you intended because you did it wrong and uh yeah and that typically it turns off in release builds when the optimizer sort of comes out so i guess that's just
Starting point is 00:17:51 an example of one of the things that one can do uh with a more modern compiler and i mean anything greater than like gcc5 i think has supported this and actually microsoft's compilers have supported this forever and the magic here is really in the linker right correct yeah yes yeah the linker effectively when you're doing this your your builds only built with like an intermediate language and then when the linker is invoked as it's connecting the dots of this function calls this function calls this function it's like well they're very abstract it's just a dag almost of like what calls what and then he goes oh i need to reify this function now and i have the whole program visible to me now it does make for a slow link of course so that's another reason why you try to do this only in your release builds yeah but it's it's open opens the door to making
Starting point is 00:18:39 your your development cycle faster yeah and i mean certainly you know this fits very well with the model that we were just talking about in terms of you know doing those debugs as a a way to spend a little bit of time now to give yourself confidence that you could spend more time later right right if the release builds like i'm intent you're intentionally shifting work into the release builds and making them slower because it enables you to make some of those other feedback loops the more frequent feedback loops faster right yeah and you know it takes maybe a little bit of in your head math to make sure that that makes sense but it sounds like from your experience that makes sense quite often um i have to ask
Starting point is 00:19:21 though is is the whole like running the tests on the release build because you don't trust that it's exactly the same something that's from hard one experience have you have you seen that i'm afraid so i'm afraid so yeah i've found i've i have found compiler bugs uh before now in this instance but as i say 99 times out of 100 i've found cases where i was inadvertently relying on undefined behavior and i think it when one has spent a couple of decades doing this you kind of build a mental model of what the compiler is allowed to do and the kind of optimizations it's allowed to do and so you might on purpose or otherwise write the body of a function in a separate file and know that calling it from another
Starting point is 00:20:05 cpp file won't be able to see, in inverted commas, the sort of nasty trick that you're pulling in some other place. So the compiler, having compiled these two things separately, will do the right thing. But when it can see the whole program, it's like, oh, wait a second, you do this thing. Oh, I can throw that away then. You're like, no, no, no, no, no, don't do that. You know, I was wrong as a programmer. I programmer i was wrong definitely i was relying on what was undefined behavior but i was getting away with it which is a dangerous dangerous world to live in and i definitely don't want anyone listening to this to think that it's okay to rely on undefined behavior it is not it's definitely you're outside of the the warranty of the compiler such as there is one but um but yeah it is hard one unfortunately and it's just
Starting point is 00:20:47 worth doing and i mean oftentimes if you're going to build if you're going to build a release version to test you can do like performance analysis as well and that seems like a worthwhile thing to have as a side effect yeah yeah and i mean you know you're kind of getting into a little bit of a whole release pipeline with that right where you know again it's this 60 successive feedback cycles are going to run my my tests i'm going to do my debug build i'm going to run my tests i'm going to do my release building run a test again i'm going to do some performance testing i'm going to deploy i'm going to deploy to one server i'm going to deploy to 10 servers i'm going to put 100 servers i'm going to turn the one server. I'm going to deploy to 10 servers. I'm going to deploy to 100 servers. I'm going to turn the feature flag on, whatever it is, right, to sort of go through those progressions of more and more confidence that things are working.
Starting point is 00:21:45 But, you know, you don't want to get to that point before you discover that you've, you know, flipped a sign operator somewhere or, you know, used a double equal instead of a single equal or whatever the thing might be, right? Like, it's an intentional moving of that cost to later only because the chances of it failing are lower, right? There is like an optimal amount of time for a build to fail. It's not never, right? If your build literally never fails, why do you even have it, right? It's not giving you any new information. So you do want it to fail sometimes. You just want it to fail the right amount of time for the length of the feedback cycle, right? Yeah. The longer feedback cycles should fail less frequently and the shorter feedback
Starting point is 00:22:26 cycles should fail more often. It's worth saying that not every compiler supports this. I know I said GCC 5 and plus do support this, but the more unusual compilers for embedded systems won't support this feature, the link time code generation stuff. So maybe depending on your exact situation you might not be able to apply this all the time and in fact another thing that i have sort of internalized as a server developer as i am is that my servers are so fast that even a debug version running on my computer runs tractably fast and you know it can be 10 20 30 times slower than the release build maybe even more and that's still okay for me to run all my tests in certainly the tests that i care about before i'll check code
Starting point is 00:23:11 in and that isn't always the case if you're deploying or you're running tests on a system that is considerably slower or has timings to meet then maybe you can't do that maybe you can't do that. Maybe you can't rely on the compiler working fast enough in release. But I think a lot of people fall into the category of able to use these kinds of tricks. Oh, for sure. And I mean, one of the actually great things about working in C++ is that, as you've said, you're probably choosing C++ for a reason. And that reason is probably performance. Well, guess what? That means your tests run super fast if you write them correctly, right? Like, especially in comparison to a language like, you know, Ruby or Python or JavaScript or
Starting point is 00:23:53 even Java, in some cases, the tests can be extremely fast. So, you know, I sort of have this benchmark in my head, which I forget if I've mentioned before, actually, of like, you know, your test, you should be able to run hundreds of tests per second. Well, in C++, it's hundreds of tests per second, averaged, including your build time, is kind of how I would say that. Because the actual test run should be like thousands of tests per second. Right. Right.
Starting point is 00:24:19 I mean, test systems are, especially in debug, are a bit slower than that for all the reasons we discussed. And test frameworks like to try and capture as much information. So when they ultimately fail. So there's a bit of, I'd say it's okay for them to only run hundreds of tests a second. I'd be surprised if my tests actually run that fast. I'll be honest with you. I should check. I haven't really thought to do it.
Starting point is 00:24:42 Yeah. I mean, I think it definitely depends on the style of test that you write. And I think that you can actually even sometimes get into a little bit of a broken Windows thing. Well, maybe this isn't exactly broken Windows, but you sort of fall into the trap of the speed of the code itself actually hides some not great testing techniques that you wind up doing. Like there's this sort of subtle interaction between tests that are brittle and tests that are slow, right? Right. Tests that are brittle tend to do things like access databases and, you know, read and write
Starting point is 00:25:18 files and communicate with services and do all these other things. And tests that are slow also tend to do those things, right? And so if you sort of listen to the speed of your tests, I don't know if that's a workable metaphor, but if you pay attention to the speed of your tests, they can sometimes tell you when you've done some of that stuff inadvertently. Like I have definitely been in the middle of writing tests
Starting point is 00:25:42 and all of a sudden the test got really slow and I'm like, oh, what's going on here? I here i'm like oh i'm actually reading data from this service that's why it's so slow i need to knock that out yeah um one of the other things i kind of wanted to ask you is is um there is there's definitely you know you were talking earlier about you know having in c++ i'm going to sort of say things twice right like there's this choice of like do i put this in a cpp file or do i put it in a header file? And, you know, I think the case that you gave, it was more obvious that you shouldn't be putting certain things in a header file. But I can imagine there are lots of situations in which that decision almost becomes arbitrary,
Starting point is 00:26:18 right? Like maybe, you know, whether or not I should be using templates or something like that is a decision that you might make. Are there ways to sort of frame those decisions in terms of faster feedback and faster builds where it's sort of like, well, normally from one perspective, these two solutions might be equal in dignity. But from a build time perspective, actually one of them is much better than the other. That's a really good question um it's definitely the case that certain c++ features mandate you putting code in the header or almost mandate it so you mentioned templates that is a great example where the generic programming almost always has to go into the header file because anyone who might reasonably use that that functionality needs the implementation of it in order to
Starting point is 00:27:10 to optimize like we talked before or even instantiate it as we said before um another thing is uh the compile time programming of constexpr functions which is a sort of new money style way of doing programming that um well to take a small diversion you know templates are initially designed to be a generic programming tool and then very quickly it was discovered that um they were themselves the way that the instantiations were done and the way that the compiler resolved certain features was itself turing complete so you can actually write a program purely using templates which is a meta program a program about a program and that's a useful characteristic it turns out it's essentially like an in in uh in build uh code generator of of sort. And you can start doing all sorts of tricks with
Starting point is 00:28:06 like, well, okay, I want to do this if the template type that I was passed isn't unsigned because I don't have to check now if it's negative because it can't be negative. And therefore, I can actually write my algorithm to take advantage of that, potentially. And then eventually you re-implement Lisp.
Starting point is 00:28:22 I mean, all languages re-implement Lisp, eventually. And all languages re-implement Lisp eventually. And in fact, yeah, a lot of template metaprogramming looks like all the consing and addering and all the weird Lispy-type terms come up. So constexpr is another way of writing a much more imperative programmer-based metaprogram of a sort. It can be used in many other contexts too but very often again that means that the program the functions that are constexpr have to be put into a header file because the compiler has to be able to see their their body in order to evaluate them appropriately which is which is great you know but those two techniques if you start out as the as
Starting point is 00:29:03 that being like i'm going to do everything that way you you basically don't have the option of pulling things into a cpp file without extreme tricks and there are some tricky tricks for doing some of those things and i have definitely seen a modern style of c++ which i don't write just because i'm the way that i am i think you know my journey has meant that i'm much more of an imperative one line after another kind of programmer. And I use templates when they are profitable, obviously profitable to me
Starting point is 00:29:32 based on my sensibilities and constexpr I use as much as I can. But usually it's not by default, which means a lot of my stuff is pushed into CPP files just because that's how I go. So I can make these statements about like, oh yeah, just put it into the cpp file and your bills go faster um and it's very hard to be in the situation where you have a template heavy piece of code
Starting point is 00:29:53 and try and reduce the build time of it uh just because there isn't anything to reduce in a way so i guess the design of your system and i suppose this this comes back to Lakos' thing, you know, like actual physical design of your system has to be factored in when you're making high level design of how you're going to fit your components together. If you want to choose, if you want to select for build time, then that's something you need to consider earlier on. Yeah. are on yeah although in terms of some of the things we've been talking about it occurs to me now that another sort of dimension is if i am in fact changing a template heavy piece of code let's say it's the binary search that we sort of i made up earlier so i'm editing the binary search and i want to make sure that i haven't broken it right i mean that seems no one's ever written a faulty binary search i know it's a trivial thing to get right especially interviews
Starting point is 00:30:50 perfect every time every time without fail uh but you will have a test file for that almost certainly and so now the way that your build system is set up contributes to how quickly you can iterate on did i break my binary search right because if you if your build system is like the sort of default go go to make or c make or whatever where you just sort of say make test it's going to run all the tests and if you've just touched binary search.h sure as heck 97 of your code base needs to rebuild. Yeah. For you to run the one test that tests it with, you know, ints and floats and doubles or whatever your binary search test case is.
Starting point is 00:31:34 And what you really, really want to do is build just that one test. Yeah. And run that one test. Right. And that's hard to do in C++ because there's a lot of complexity in the way the build systems work and almost necessarily you group things in modules and then sort of treat them as an atom and build and link that one library together and so in my own projects i tend to make one library per subdirectory and then i try and put minimal
Starting point is 00:32:03 stuff in each subdirectory and i have a test for that like library and then branch out but it's not always possible to do that uh and i mean it's certainly the cases of some of the um like bigger code bases where everything's in everything else it may be hard to extract sensible modules which i think comes back to the design stuff we've talked about before where if you design for say testability and this is another metric that you measure testability by how isolated can i make this object and it's a tests at the build level which again is like a physical property not a conceptual thing it's like no can i just type make binary search test or go into my ide and hit just that and know that i'm not even compiling the rest
Starting point is 00:32:46 of the code yeah that could be something worth thinking about i don't have an answer for that but i you know one one one tries to to organize one's code as best as one can i actually i do know i have a an ex-colleague who works for another financial company and he's looking i think he's trying to open source it a build tool that essentially starts from a test a c++ build tool that starts from a test you point at the test and say can you run that test please and it kind of backs out the build tree sort of uh by look for following all the include files and working backwards and go well this is the minimum set of things I need to have built to be up to date in order to run
Starting point is 00:33:27 that one test, which is, if you can open source it and if it works as well as he described it to me, would be a huge boon to be able to say, no, those tests, please. And there are some subtleties because, of course, everything's more complicated in C++. You've got global objects and all sorts of nonsense like that. But that kind of
Starting point is 00:33:43 thing is maybe a sea change in the way, pardon the pun, in the way that we want to test one's code. You can literally point it to that. And in fact, do refactorings. How many times in C++ have you gone, I really want to change this interface and I want to test
Starting point is 00:33:59 it with a subset of my code. But everything's broken now because I broke the interface and 97%, again, 97% of my code. But everything's broken now because I broke the interface and 97%, again, 97% of my code now fails to build. And I don't want to spend the time updating it until I'm sure that this is a right step for me to take.
Starting point is 00:34:14 And so I want to make sure my tests pass first so that I haven't broken the intent of what I'm changing. And then I want to test a subset of my code. Does it smell right? Does it feel right? Can I run the test in that part of the code before I then commit to rolling up my sleeves and dealing with the 3000 compiler errors in the rest of the code base? And that sometimes is hard to do.
Starting point is 00:34:33 Yeah. So a build system maybe contributes with that. Yeah, definitely. I mean, absolutely. That's a scary prospect of having to make a change to an interface and not really having confidence that it's correct until you've done an hour's worth of work. Right? Right. Very often, unfortunately, that is the way of C++, where you make the change and then you type make and start fixing every single error you've got one after another. And then you go, I hope that was worthwhile.
Starting point is 00:35:09 Which, of course, leads to that kind of false dichotomy as well where having made that change yeah if you're on the fence about whether it was worthwhile or not you're definitely going to say i'm keeping that that was an hour of toil right exactly i'm not undoing it to an extent you know um and again as an ide friend uh friendly person, I'm starting to trust IDEs more and more with some of the refactorings which I would never have done in C++. And that can be a superpower with some of these changes because you can say, hey, add a new parameter and you worry about it. It's a danger that it gets it wrong, of course. And then in which case, undoing that change doesn't feel quite as personal. It doesn't feel like as personal it doesn't feel like a failure as much when you discover actually no it was not the right thing to do
Starting point is 00:35:48 but that's probably a whole other conversation for another time true true enough but yeah those automated refactorings are very powerful but yeah i mean it's it's i don't know i i think that they're we've talked a lot about some of these things sort of being more about philosophy than they are about technique right like if you start with the philosophy the tech you you know there's a lot of smart programmers out there doing doing clever things if you start with the philosophy that the techniques will come right you'll either find them um because you sort of have that desire for them or you'll invent them or you'll borrow them um but whatever it is like if you don't have that buy-in of like no we're not gonna have the situation where every time i change you know my binary search implementation or my binary search
Starting point is 00:36:38 you know algorithm that i'm gonna just go get a coffee for 30 minutes and then come back later and figure out if anything is broken. Right. You got to design it from the beginning. I mean, this is the whole – there's an XKCD for this, which tells you how endemic and problematic it is in the industry of the sword fighting. That's exactly right. On chairs. And I just – for the record, no one should really be writing their own binary search. There's an STL implementation that is perfectly good.
Starting point is 00:37:04 So use that. Even if it's confusingly named just use that yeah please yeah all the more reason right and i mean there are so yeah while we're just there are other tricks of course um but some of the other tricks that one might use to reduce one's uh coupling in c++ do have a runtime impact now i've sort of said about pulling stuff out of headers and then relying on link time code generation to kind of undo that is one thing because that essentially nets out these days, I think. But sometimes you can extract an interface
Starting point is 00:37:36 and declare it and define it literally as an interface, like an actual virtual interface thing, knowing that everyone who's ever calling you is now going to be cursed to go through a virtual function call to get to your implementation but now that acts as this beautiful uh disconnect both in terms of the a pure interface like from a design standpoint and from the well now i can substitute any old object i like and i've built that separately to you you may not have even been from the same build process. It could be from some other thing completely differently. And so I've insulated you from changes to my implementation at the very highest level. And that can be a
Starting point is 00:38:14 powerful technique. And then just because I love this stuff so much, and I'm sorry to get excited about more compilation trickery, even that- That's why we're doing the podcast i guess get excited about this stuff that is a very fair point mate yeah uh but even that which seems like it's an irredeemable change to the way you've written your code and you've put this massive doorstop in between so i don't know if doorstop's the right thing you've put this massive uh uh barrier for the optimizer between my implementation and your calling through an interface, compilers are just starting to get clever enough to see through that. Now in managed languages like Java, this kind of
Starting point is 00:38:58 trick has been able to be done for a while. At runtime, as you're calling a virtual method, it kind of goes, hey, you know what? this is always concrete file system i wonder if i should just call concrete file system directly and then inline it and then put a check and if it's never if it's not a concrete file system then immediately like go oh no we're done here we have to de-optimize and do something else but most of the time as long as it is just a concrete file system then it's as fast as if i'd written a direct call c++ is starting to pick up on this there is some de-virtualization going into clang and gcc and a lot of that stuff is is getting more and more sophisticated it's not a total panacea yet but i would imagine that as time ticks on it will become more and more possible to rely on the compiler doing magic for you and that way
Starting point is 00:39:47 i can write my code with an interface between things to separate it both from a testability point of view so i can have my file system and i can have my mock file system and my concrete file system and whatever and from a build time standpoint because i very rarely change the interface to my file system but i probably if i'm working on the file system i'm hacking around inside the implementation all the time and all i'm doing is building and then the linker just has to run and so there's a lot of good things going on on there and there is if one doesn't have to use virtual methods to divorce your implementation from your interface there's a technique called p-imple which insulates your callers from the structural layer of your object so if you have like an int and a float as member
Starting point is 00:40:29 variables and you decide to add a char later on of course you've changed the size of your object which means anyone who has one of your objects in there has to be rebuilt and all these kinds of things so there are ways and means around those those two but that does come with a runtime cost and that's something you might just have to to take on the chin for certain things and you know again if you iterate fast you can probably find quickly the areas that you're that need the performance so i'm throwing out ideas really for the for build time um insulation techniques which is less about the build time itself and about more about reducing the uh the coupling between components so that a small change to an implementation doesn't
Starting point is 00:41:12 necessitate a giant rebuild of your program right right and you know the the the sort of add-on benefits of of doing that beyond just the build time right like the decoupling that you get otherwise right exactly i was going to say what about um other things like distributed builds and other kind of external tricks for speeding up builds what's your take on those in general so i think they're a necessary evil once you get to some level of complexity certainly the companies i've worked at before we've had good success with commercial offerings that allow you to distribute your build in a relatively straightforward way but most of the time the time and effort in getting that to work reliably and not to have misbuilds and not have issues with distributing the particular version of the compiler that you've got or anything like that is outweighed by just put just, and I say just with little air quotes here,
Starting point is 00:42:12 just putting as fast a computer you can possibly afford in the hands of your developers for their day-to-day activities. So I'm lucky enough to have like a 16-core machine here. And so building locally with 16 cores, 16 threads of build is fast enough. And obviously that's great. Not everyone can afford that. Not every company is going to shell out for a machine with that amount of power. But given the cost to engineers of them twiddling their thumbs, waiting, that is probably the best bang for buck. Otherwise, you're talking about buying in an external distributed build system or trying
Starting point is 00:42:52 to get distcc or equivalent to work, which is a very good product. But as soon as you're having to worry about which version of the compiler is installed on some other person's machine and making sure that they have the same header files that you do the the effort of doing that plus the effort of debugging it when it goes wrong is very very high nobody wants to be in the situation where you do the build you run your test they fail in a surprising way and so you blow away everything and rebuild only to discover that it it was actually a genuine problem and you just wasted the time because you didn't trust your build yeah and this is i know you know this but i'm very passionate about being able to trust the reliability of your builds yeah and yeah so i've heard you
Starting point is 00:43:36 rail before about you should never maybe never is a strong word but you should be careful about using make clean that's yeah you were right the first time if you type make clean uh then yeah you're all all bets are off no my my particular case in that is i have seen it when people have builds that are not reproducible when people have builds that are flaky in some way often to do with parallel builds because they haven't either specified their build properly or they have a step that is not parallelizable and doesn't say that it isn't and so it's non-deterministic then what tends to happen is that any sort of weird unexpected behavior in the program either like a weird crash or a test that fails unusually, that you can get into the situation where the knee-jerk response is, well, maybe it was a bad build.
Starting point is 00:44:30 And then you do make clean, and then you run it again, and maybe it works. And then there are two reasons for this. The first reason is that your build was indeed buggy and broken, and you were in an indeterminate state. And my strong belief is that you should fix your build. And the only way really should fix your build and the only way really to fix your build is to look at the the carcass pick over it and try and understand
Starting point is 00:44:50 from all the files that you have what the heck happened and if you just type make clean you got rid of that yeah it's like trying to solve a crime by cleaning up the crime scene like i get rid of this blood these fingerprints are all messy i can fix that all right no more murders right murder solves nothing to see here folks that's kind of the good case in a way because make clean make and it starts working again you're like oh okay i had a bad bill but you won't ever be able to fix it if you if you take that approach the second thing is that maybe you do actually have a genuine one in a million bug in your code like it's a race condition or it's a strange uh case that happened like the network drive was was down temporarily whatever whatever unusual circumstances in which case you have chalked it up to and maybe missed the one opportunity to debug and dig into
Starting point is 00:45:40 a freak occurrence yeah and so now you just do make clean make and it goes away and then you don't trust your build and you've lost the opportunity and so i this is why i feel so strongly about this kind of stuff as you can tell and to bring it back onto the subject because i know this is a thing you can wind me up for hours on is um you know obviously adding distribution into your build system adds yet another reason for your build system to be non-deterministic or to be broken in some way or to have issues. And so if you give people another excuse to kind of go, oh, I don't really know what that problem was. I fancy another cup of coffee. Make clean, make, and then walk away from their computer for a bit.
Starting point is 00:46:20 Then I think that's a bad direction to go in. But I understand that i have strong opinions about this no i mean and i it's funny because i share your opinions in a few other forms one of you know staying on brand here obviously uh testing of course of course you knew i was going to say that distributed test systems right like whole industries were built trying to figure out how to get ruby on rails apps to run their tests in a performant manner once they had thousands and tens of thousands of them because each one took a second because active record. So, oh, we'll run them in parallel. And it's like you're solving the wrong problem when you do that.
Starting point is 00:46:58 You're introducing a whole other form of unreliability into your tests in the name of speeding up your feedback loop, which is, you know, like they're feeling the pain and they should try to speed up those feedback loops. But the way to do that is, you know, there are certain situations in which the best that you can do is try to run it in parallel, but you have to recognize that when you do that, you're creating a whole other kind of unreliable failure that you're then also just going to have to deal with. And certainly in the world of make clean, I feel like there are two kinds of developers in the world. The kind where when you turn it off or turn it back on again or reset it or do whatever the thing is to make the bug go away, they're happy. And then the other kind where they're disappointed, right? Like what I wanted to see is when you reloaded that page that the bug happened again.
Starting point is 00:47:46 How many times have you had to tell a family member to turn off a system and turn it back on again and then hated yourself so much because it's the only solution available? But yeah, you're so right. You're so right. Yeah. Like those opportunities are sometimes the next time you're going to get that is in production when there's like serious stuff on the line. Yeah. And so all those opportunities to try to figure out what's going on with these things. Like the worst thing is bugs that only happen one in a thousand times, right?
Starting point is 00:48:16 Because they're so hard to fix. Yeah. If you're not interested in fixing them, it's great because, okay, I got 999 more times left before I need to worry about this again. But if you actually want to solve a problem it's you really want it to be reproducible so the the only way to fix that sort of intermittent failure is to really just kill it with fire every opportunity the first opportunity you get exactly and i think you and i've talked about this kind of stuff failing early failing fast and not doing things occasionally if you can avoid it you know in our world of finance there are some sort of things that one does when one is receiving market data from the outside world and there is a good case and there's a bad case and if you can engineer
Starting point is 00:48:55 it so you always start off with the bad case then you'll never be surprised when the peak of market activity you have to do the same thing and And similarly, like I remember that when the Linux kernel, they fixed a bug in the way the rollover worked in a timer. Like there was a – Oh, yeah. And so nowadays, they fudge the timer so it starts up with like 10 minutes before it overflows just to force it to happen like early at the time and not like four years, 55 days into the uptime of a machine.
Starting point is 00:49:26 Which I think is a pragmatic solution. Just like, okay, you boot it up. Yeah. Get to see whether it works every time now rather than it happening once literally in, well, not even a blue moon. I think it's every three months or something like that. Either it's going to work or it's going to not. If you can put it in a situation where if it's broken, it will fail, that is always better. Right.
Starting point is 00:49:43 Fail early. That's always better. Yeah. Yeah. Fail early. If you're going to fail at all. Yeah. Well, I guess that's...
Starting point is 00:49:49 I mean, there's tons more to talk about this, but this is a really good spot for us to stop. We can only go for so long. I know, that's right. Our poor listeners can only tolerate us blabbering on for so long, too. But next time, we can talk about some more C++ stuff, or we can go into other languages or maybe a whole other topic who knows but it's been great fun talking about it i'm i'm i'm glad
Starting point is 00:50:12 you riled me up on the make clean stuff i'm fully ready to go out and make you know some some strong statements about things which is very uncharacteristic for me but so yeah all right it's good i guess i'll see you next time next time you've been listening to two's compliment a programming podcast by ben rady and matt godbolt find the show transcript and notes at twos compliment.org contact us on twitter at two cp That's at T-W-O-S-C-P. Theme music by Inverse Phase.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.