Two's Complement - Is Optimization Refactoring?
Episode Date: May 15, 2024In flagrant violation of Betteridge's Law, Ben and Matt consider the question 'Is Optimization Refactoring?' and conclude that the answer is 'probably'. Ben warns our listener about overspecifying in ...tests. Matt is horrified by his own assumption that other people's code works.
 Transcript
 Discussion  (0)
    
                                         I'm Matt Godbolt.
                                         
                                         And I'm Ben Rady.
                                         
                                         And this is Two's Compliment, a programming podcast.
                                         
                                         Hey, Ben.
                                         
                                         Hey, Matt.
                                         
                                         One of these days, we're going to have to come up with a different intro to this podcast.
                                         
                                         I'll be like, good afternoon, Benjamin.
                                         
                                         Hello, Mr. Godbolt.
                                         
    
                                         There we are. That's better.
                                         
                                         A little variety there for our listener.
                                         
                                         Yeah.
                                         
                                         What are we talking about today?
                                         
                                         Today I want to talk about an idea that entered my brain a few weeks ago at work,
                                         
                                         talking to one of our colleagues.
                                         
                                         And he had asked me a question about this. And it was one of those times where it's like, I'm now going to reflect on what I
                                         
                                         have done for almost my entire career and realize something that I have been doing this whole time
                                         
    
                                         that I never actually put into words until just now when you asked me a question about this.
                                         
                                         Isn't mentoring such a rewarding experience?
                                         
                                         Being forced to use the talky bits of your brain, which normally lie dormant in the process
                                         
                                         of programming.
                                         
                                         So true.
                                         
                                         There's a whole extra CPU over there that you can use for thinking that you just normally
                                         
                                         don't right it's like uh it's
                                         
                                         like like a graphics card in your brain right like you know you can do all kinds of interesting
                                         
    
                                         things with so he was yeah what did he ask for yeah so this was we were talking about uh automated
                                         
                                         tests of course because what the hell else do i talk about? And he was saying, how do you write
                                         
                                         automated tests for when you're doing performance optimization? Do you write a unit test that
                                         
                                         shows that the performance is bad and then go and improve the performance and then you see your test
                                         
                                         pass? And I say, no, I don't really do that. I kind of treat performance optimization like refactoring.
                                         
                                         And he got this look on his face like, wait, what? And I had never really thought about it until that moment. But yes, there is actually, I think, a whole category of things that you can
                                         
                                         write automated tests for, perhaps. I think it's difficult to write good tests for them,
                                         
                                         but you can,
                                         
    
                                         and I sometimes do. But for the most part, there is a category of things of which performance optimization is one, where my explicit intention is to keep the sort of observable behavior,
                                         
                                         and I can define what that means very specifically later, but keep the observable behavior constant
                                         
                                         while changing the code in some other dimension.
                                         
                                         And obviously, like the most common thing that you do with this is to change the design of the software so that it's easier to understand, easier to change. That's the very traditional refactoring
                                         
                                         like definition. Yeah, that's like the real, you know, changing the structure of code without
                                         
                                         changing its behavior as defined by a set of automated tests, right, is like online seven
                                         
                                         of Martin Fowler's refactoring book,
                                         
                                         and I know that because I've said it so many damn times.
                                         
    
                                         Of course, it's probably not on page seven,
                                         
                                         and I've just been wrong this whole time.
                                         
                                         But the general reason why one might want to change the structure of code
                                         
                                         without changing its behavior as specified by a set of tests
                                         
                                         is for the purposes of reducing
                                         
                                         complexity or making the code generally easier to change or easier to understand or removing
                                         
                                         duplication, things like that, which actually are all kind of complexity. But another reason
                                         
                                         that you might want to do this is performance optimization. And I really do view it as something that is, for some definition of behavior,
                                         
    
                                         almost, but not quite orthogonal to behavior. And I think a great example of this, I was recalling
                                         
                                         an earlier podcast where we were talking about comments and documentation. And you were saying
                                         
                                         how you have had many situations, and I have also had this, I have
                                         
                                         had the same experience, where you're changing code and you're adding comments to it because
                                         
                                         the reason that you're changing the code is very not obvious.
                                         
                                         Right.
                                         
                                         And you want to leave some hint to the later reader, who may be you, as to why this code
                                         
                                         that looks so complicated and weird has to be this way.
                                         
    
                                         Right.
                                         
                                         Right?
                                         
                                         And in those moments, quite often what I am doing is I have my unit tests that are running that are telling me that I haven't changed the behavior of the software.
                                         
                                         Yes.
                                         
                                         I have a separate thing, which might be like a flame graph or some other sort of tool that's
                                         
                                         telling me, okay, is the performance of this code that I'm changing,
                                         
                                         you know, what is it?
                                         
                                         Is it getting better?
                                         
    
                                         Is it getting worse?
                                         
                                         You know, and I'm changing the code
                                         
                                         trying to optimize that performance metric
                                         
                                         while making sure that the sort of observable behavior
                                         
                                         does not, I haven't broken anything
                                         
                                         or introduced any bugs as a result of doing that.
                                         
                                         And I realized in that moment,
                                         
                                         answering this question for this person, that it's like, oh no, that is just purely refactoring.
                                         
    
                                         It's not like refactoring. It is refactoring. It is refactoring. They said the dimension that
                                         
                                         you're measuring is not complexity or ease of change, like you said, although they can be
                                         
                                         flip sides, it is the performance of the code. Now, splitting hairs here, and you did leave
                                         
                                         enough wiggle room to have this definition there as well obviously how far something goes is an observable
                                         
                                         property of that system you can measure how long it takes to do something and but for the vast
                                         
                                         majority of tests like vanishingly rarely do you care to the level to write tests that say i have a i have a part of
                                         
                                         my specification is that i must be able to process this thing in under some amount of time right that
                                         
                                         is possible you do sometimes get that it is part of the specification sometimes but very very often
                                         
    
                                         it's just like as fast as you can go um and we'll let you know if it's not fast enough kind of feel
                                         
                                         right so you one tends not to do this.
                                         
                                         No, it tends not to write automated tests, but you could.
                                         
                                         So, yeah, absolutely, this is refactoring.
                                         
                                         And you mentioned other dimensions.
                                         
                                         So, like, I have just spent the entire day moving code around,
                                         
                                         not because it makes it less complicated, although it does.
                                         
                                         That's not my reasoning behind this,
                                         
    
                                         but it's to change the compile time of my code
                                         
                                         and the sensitivity to change the compile time of my code and the sensitivity
                                         
                                         to change for my code which is c++ and at the moment therefore means you know if there's
                                         
                                         loads of stuff that's uninteresting to most people in header files then every time I make a change to
                                         
                                         something that is an internal part of my algorithm anyone who's including my header has to rebuild
                                         
                                         and I don't like that I don't like to expose it so they shouldn't be able to observe that in the first place it's one of those rare
                                         
                                         things that c++ is weird about is that like the physical layout of my code is actually part of my
                                         
                                         not really api but it's part of the the the problems that i give to my um my my users part
                                         
    
                                         of the sort of sensitivity to change.
                                         
                                         And so I'm refactoring, and it is refactoring,
                                         
                                         to take stuff out ahead of files and hide them away from people.
                                         
                                         And they should not be able to observe that other than hopefully their compile times go down.
                                         
                                         Or when I make a change, they don't have to compile half their program.
                                         
                                         So yeah, I totally get it.
                                         
                                         And performance is another one of those.
                                         
                                         Yeah, absolutely.
                                         
    
                                         Absolutely.
                                         
                                         Now, I will say, again, here comes
                                         
                                         all the asterisks and caveats to this bold statement that I make at the head of the thing,
                                         
                                         because the real world is a little messy. There are many situations in which the way that you
                                         
                                         achieve a performance optimization is by changing the behavior. Yes. Right? The actual observable
                                         
                                         by unit test behavior, right? Like, oh, we're going to just filter out some things some things from this collection before we process it for example would be an example of how you would
                                         
                                         speed something up or or an api changes so that you know like well hey if i hand you each individual
                                         
                                         event one by one it's a lot slower than me giving you here's just a big old chunk of events off you
                                         
    
                                         go and then we lose you know that kind of feel to it, which is like, well, batching or otherwise changes the API.
                                         
                                         You could probably write an adapter to make the test still fit that hole.
                                         
                                         But, you know, you have still changed the way things look.
                                         
                                         Yeah, yeah, absolutely.
                                         
                                         So I'm not trying to say here that all performance optimization is refactoring because there's many types of performance optimization where you do actually change the behavior on purpose in order to make things faster, right?
                                         
                                         Yes.
                                         
                                         But there is definitely a category of performance optimization where it's like,
                                         
                                         I've got this function.
                                         
    
                                         I know what my inputs are and I know what my outputs are.
                                         
                                         And the question is, how quickly can I do this?
                                         
                                         And I do all kinds of crazy little things in that code to make it go faster. And the
                                         
                                         observable inputs and the observable outputs are not going to change at all. And the only thing
                                         
                                         that changes is like, you know, the time or maybe like the clock cycles that it takes to execute
                                         
                                         that code. And I think that's part of the reason why I tend not to write tests for these kinds of
                                         
                                         things, because those tests are very expensive to make reliable, right? If you tend not to write tests for these kinds of things, because those tests are very
                                         
                                         expensive to make reliable, right? If you're going to write automated tests for performance,
                                         
    
                                         you need to have control over your hardware. You need to have control over your execution
                                         
                                         environment. You need to know what architecture you're running. There's all these variables that
                                         
                                         you need to control in order to write. It's very easy to write unreliable tests for this, right?
                                         
                                         You just say, you know, start time equals blah,
                                         
                                         end time equals blah,
                                         
                                         if all the time less than whatever, right?
                                         
                                         You can make unreliable tests for this very easily.
                                         
                                         But if you want to make reliable performance tests,
                                         
    
                                         they're very expensive to get right.
                                         
                                         They're very difficult to design testable.
                                         
                                         It's very difficult to design code
                                         
                                         that is both performant and designed to be
                                         
                                         tested in that way. And you want to save that effort for like, you know, basically like the
                                         
                                         hot loop of your program, right? The thing that really makes all the difference, right? Or the,
                                         
                                         maybe you have a few hot loops, but you want to, you want to save that because they're real
                                         
                                         expensive. So most of the time, if I've just got sort of a general performance or throughput problem of some kind, I want to have some other tool that is telling me what the performance is, that is measuring it in some other way using like application level metrics or tracing or some profiling tool or, you know, there's a million.
                                         
    
                                         That's probably a whole episode of itself.
                                         
                                         I mean, we've done some observability.
                                         
                                         We've talked about observability before.
                                         
                                         It's another dimension of observability is, you know,
                                         
                                         like how many things am I processing per second?
                                         
                                         What's the average latencies of X, Y, and Z?
                                         
                                         And, you know, you can look at them.
                                         
                                         And to your point about like not writing tests for this,
                                         
    
                                         what I tend to do is I do spend a bit of time to have an environment
                                         
                                         where I can at like 2 a.m. when I know no one else is kind of on the network
                                         
                                         on one of our machines that otherwise is dedicated
                                         
                                         to running daily jobs for the trading day in my case,
                                         
                                         I copy code over to it and I run it
                                         
                                         and then I don't assert anything about it.
                                         
                                         I just record it and then I can graph it over time.
                                         
                                         And then that gives me the sort of observability
                                         
    
                                         and dimensionality of like a repeatable benchmark
                                         
                                         rather than just the day-to-day watching of grafana and seeing things change around or whatever is
                                         
                                         or maybe it's just a busy day today or maybe it's a quiet day why are my metrics thing but
                                         
                                         i then have a time series where i can look back over something and it doesn't matter that occasionally
                                         
                                         it does blip up if for example i don't get the isolation exactly right i'm not going to get a
                                         
                                         false positive um like your test failure false negative test failure here but what
                                         
                                         i can do is see trends really really easily and i can say what on earth happened three weeks ago
                                         
                                         what commit came in and suddenly made everything just a little bit worse and that's really valuable
                                         
    
                                         so but anyway that's orthogonal to the idea that the changes themselves especially when you're in in a sort of the loop of like making those modifications, those optimizations, that is a refactoring operation where you're like, hey, I wonder if we should use shell sort rather than insertion sort.
                                         
                                         In my sort method, nobody exogenously can see what kind of sort I'm doing because as long as the numbers come out in the right order, nobody cares.
                                         
                                         Right, right, right.
                                         
                                         Yeah, and the tests are there in that case only to make sure that if you like accidentally switch
                                         
                                         from like a stable sort to an unstable sort,
                                         
                                         that that doesn't matter.
                                         
                                         Yes.
                                         
                                         Right?
                                         
    
                                         And that sort of gets at the heart of like,
                                         
                                         you know, what it means to have, you know,
                                         
                                         changing code where the behavior doesn't matter as verified by a set of
                                         
                                         tests. The behavior that matters and the behavior that doesn't matter is specified by the tests.
                                         
                                         And that's one of the reasons why it's important not to over-specify the behavior in your tests,
                                         
                                         because not only does it make it harder to refactor, but if performance optimization in
                                         
                                         this way is also refactoring, it makes it harder to optimize your performance. Right. Because you're asserting behavior that you don't actually care about.
                                         
                                         That stable sort versus unstable sort is a perfect example of something that's so easy to write
                                         
    
                                         a test that will fail for the wrong reasons, right? If you accidentally assert stability of
                                         
                                         order. So like just for those who aren't familiar. Yeah, I guess we should define what stable and unstable sorts are.
                                         
                                         Exactly. So like when you're sorting things, you sort by a key and that may be only one of
                                         
                                         the properties of the objects that you have in an array or container. What do you do when there's a
                                         
                                         tie? Well, you know, one thing is like, it doesn't matter. Everything that has the same priority or whatever the key is,
                                         
                                         it doesn't matter which sequence they appear in.
                                         
                                         They will have the same values to you that you're interested in
                                         
                                         or it doesn't matter.
                                         
    
                                         Whereas, and so an unstable sort, which is often more optimized,
                                         
                                         when you're doing the sort, it doesn't matter.
                                         
                                         If you have two things that are the same,
                                         
                                         if you end up switching them or not switching them,
                                         
                                         it doesn't matter, right? And that's great. And that's often what you want. Certainly if you're doing the sort it doesn't matter if you have two things are the same if you end up switching them or not switching them doesn't matter right and that's great and that's
                                         
                                         often what you want um certainly if you're writing the sort but sometimes if there's a tie you still
                                         
                                         want to keep them in the same sequence that is objects that have the same value need to be kept
                                         
                                         in the same sequence as they were originally so you know maybe that's uh yeah anyway so that
                                         
    
                                         that is much harder to achieve
                                         
                                         in a sort algorithm and more expensive.
                                         
                                         And it could be so easy to write a test
                                         
                                         where you have things that do have the same values in them
                                         
                                         and then you assert them to be exactly some output form.
                                         
                                         And that exact output form is stable.
                                         
                                         And then later when you change your sort
                                         
                                         and they get switched around and you're like,
                                         
    
                                         why is my test failing?
                                         
                                         Yeah, right.
                                         
                                         Exactly, exactly. And, you know, people, we do this with like things like sets. and they get switched around and you're like why is my test failing yeah right exactly exactly and
                                         
                                         you know people we do this with like things like sets you know if you if you uh things that return
                                         
                                         sets of objects sometimes it's you you forget that maybe that set is not ordered or not ordered or
                                         
                                         not ordered and then again changing the internals to make the set um uh deterministically ordered
                                         
                                         or not could be an optimization it's like hey i don't have to keep them in sorted order. As long as it's still a set of all the things that you said in any order, that's fine.
                                         
                                         Yeah, over-specifying is bad. Anyway, I think that's...
                                         
    
                                         Yeah, yeah. No, that's exactly right. And switching, for example, from a list to a set
                                         
                                         in order to achieve some performance optimization and having a test fail because the test cared
                                         
                                         about the order, but really you don't actually care about the order, you just care that it's a collection, is a perfect example of this, right? And so that's one of the
                                         
                                         many reasons why over-specifying in tests and not being very intentional about what it is that you
                                         
                                         assert and what it is that you don't assert can make your test less useful because it, it, it doesn't let you do this. Uh, another area that I, I think is, um,
                                         
                                         that is like this is thread safety. Oh, when I'm, wow, I have never found a way to write.
                                         
                                         Well, that's not true. There's a few very limited situations in which I've been able to do this,
                                         
                                         but it is very rare that I can get confidence that multi-threaded
                                         
    
                                         code works as I intend it through automated unit tests. Right. You can sometimes do things that
                                         
                                         will allow you to disprove that confidence with like fuzz testing and, you know, some kind of
                                         
                                         like performance testing where you're just putting it under lots of load and then you get, you know, some error that you don't expect or some state
                                         
                                         that you don't expect. And you're like, well, clearly I screwed this up, but just because it
                                         
                                         works, doesn't really tell you much of anything. You might've just got lucky, right? Exactly.
                                         
                                         Right. So thread safety is one of those things that sort of, unfortunately, you know, I have
                                         
                                         said this thing before where it's like, I don't really think of myself as a great programmer. I'm just a good programmer with great habits. Thread safety
                                         
                                         is one of those things where it's like, well, I hope you're the best programmer that you can be
                                         
    
                                         today because you just got to get this right. And so it comes down to really just reading through
                                         
                                         the code and understanding the synchronization primitives and the way that
                                         
                                         you're using the threads and just making sure that you get it right. And hopefully as you're
                                         
                                         doing that, you can use your test to make sure that while you spent a hundred percent of your
                                         
                                         brain power on this very tricky problem, you didn't also have an off by one error, right?
                                         
                                         Yes. Yeah. It reduces the dimensionality of, of the kind of failures you're going to get. But as you say, I'm sure we've all written the sort of small loop of like,
                                         
                                         let's make 10 threads, hammer it for a thousand times,
                                         
                                         and just assert that everything seems to make sense afterwards.
                                         
    
                                         And then you write that test, and that test is, if it passes,
                                         
                                         well, all you've done is shown that one roll of the dice didn't come up with
                                         
                                         an obvious flaw now there is one exception to that particular approach as in like it's sorry
                                         
                                         let me let me clarify that um that is kind of a useful tool to just prove that there isn't an
                                         
                                         egregious issue straight up but it's maybe not something you'd even want to check in because
                                         
                                         it's so if there is a problem you're just going to have a flaky test now it
                                         
                                         doesn't mean to say that you know that's not valuable in some way because if it's flaky
                                         
                                         maybe you have got a bug in there but if you can have automated thread checking and some of the
                                         
    
                                         systems these days will allow you to do like thread sanitizing checking where the code is
                                         
                                         instrumented and every read and write to memory is checked to be sequenced with other threads, reads and writes.
                                         
                                         There is some potential that having a test that exercises the behavior you want, this threading behavior,
                                         
                                         might, even if it doesn't actually happen in that particular test pass, if the thread checker is on, it might go, wait a second.
                                         
                                         Although it didn't bite you this time this thread wrote to this memory
                                         
                                         location and another thread read from it and there was no synchronization between those two so
                                         
                                         good luck um so that might be valid but that's that actually is you mentioned fuzzing you know
                                         
                                         fuzzing is another place where having the in combination with the kind of um checking that
                                         
    
                                         you can do with thread sanitizers also things like address sanitizers and memory sanitizers and things like that that do these checks it becomes part of a an approach to
                                         
                                         finding those issues but again these aren't really tests in the same way that you and i think of
                                         
                                         tests so to your original point you lean on the test for the correctness part and then you think
                                         
                                         really really really hard about the threading part and if you think hard enough usually you can
                                         
                                         design a threading out of it which then means that you don't have to worry about it at all
                                         
                                         yes exactly think really hard about it and then go explain it to someone else on your team
                                         
                                         and if they still think it's a good idea yeah right a to use that sidecar gpu to think about
                                         
                                         it again and also to make sure that they agree that we're not
                                         
    
                                         about uh to introduce a foot gun into the system and and break a whole bunch of things for real
                                         
                                         yeah it's interesting yeah it's interesting how how how much yeah on my team even recently i've
                                         
                                         been discussing about the introduction or otherwise of threading and the opinions diverge as to how much trouble that's going to cause and my healthy
                                         
                                         skepticism is if we can avoid it as much as possible then great and if we do have to do it
                                         
                                         let's use like more like message passing style than locked locking and threading and then i've
                                         
                                         kind of essentially outsourced the correctness of my thread library to something which is a single
                                         
                                         producer single consumer queue or a multi and then you know like hey the best kind of my thread library to something which is a single producer single consumer queue
                                         
                                         or multiple and then you know like hey the best kind of shared state is the one that you don't
                                         
    
                                         share with anyone at all mutable shared state is like one the one you copy and give to someone
                                         
                                         else and then it's not shared yeah yeah but this was not meant to be about threading no well i mean
                                         
                                         it's kind of tangentially so i'm trying to think of like other situations. So this has got my brain going, of course, about like, are there other
                                         
                                         things that I have sort of leaned on my tests to do for me that I haven't really thought about
                                         
                                         before? Because it's sort of just like, it's just sort of never come up, right? Yep. And, you know, like, so, you know,
                                         
                                         if performance optimization is refactoring
                                         
                                         and sort of thread correctness
                                         
                                         or sort of scalability or, you know,
                                         
    
                                         that sort of dimension of code changing
                                         
                                         is also refactoring.
                                         
                                         Yep, yep.
                                         
                                         Then, you know, like then what other dimensions of code change do we have in the
                                         
                                         various systems that we work on where the tests should not change as a result of you doing it?
                                         
                                         Right? Obviously, there's the most fundamental one, which is sort of removing complexity and changing things. Yep.
                                         
                                         Another one might be, and I think this is maybe a little bit more controversial.
                                         
                                         Oh.
                                         
    
                                         But another one might be, or controversial in the sense of like, I'm going to say this and I'm not sure that I actually agree with it 100% of the time, but maybe.
                                         
                                         I'm exploring some ideas here, right?
                                         
                                         But maybe one dimension of this is sort of like dependency and platform upgrades, right?
                                         
                                         So one of the things that I expect to be able to do, for example, is like working in a Java
                                         
                                         project, upgrade the version of the JDK, rerun all of my tests.
                                         
                                         And if any of those tests fail, I know that that upgrade has caused a
                                         
                                         problem, right? I don't know that it necessarily follows, but maybe that if I were to do that and
                                         
                                         all the tests pass, that I know that the upgrade is okay. Right. So that's really interesting because this is something i do a reasonable amount um
                                         
    
                                         our system has like some kind of essentially like major version number which is like here's
                                         
                                         just a suite of packages that we've solved they all work together and from time to time we have
                                         
                                         to upgrade that it's essentially the equivalent of upgrading your like linux distribution from
                                         
                                         ubuntu 20 to ubuntu 22 or whatever, but we do it far more often.
                                         
                                         And I typically bump that number in my code
                                         
                                         and then type make test and watch it sit there
                                         
                                         and pull down all the new packages that come with the latest version,
                                         
                                         build against them, and then all the tests run.
                                         
    
                                         And if everything passes,
                                         
                                         then I just assume that everything's fine with that, right?
                                         
                                         But the thing is, I have never written my tests
                                         
                                         to test the assumptions in third-party packages.
                                         
                                         Right.
                                         
                                         I assume that they are right.
                                         
                                         So if it compiles at all,
                                         
                                         well, that's probably most of the battle won in my case.
                                         
    
                                         If the tests are passed, then one of two things may be true.
                                         
                                         One, I exercise some functionality,
                                         
                                         and the part that I exercise, maybe not even on purpose,
                                         
                                         but it does at least do the same thing, or at least enough of the thing that I care about
                                         
                                         that my test passed.
                                         
                                         Or two, I didn't care about that functionality enough to test it directly because I have
                                         
                                         a fake or a mock or whatever in front of it.
                                         
                                         So like I'm not actually going off to Amazon and checking that the S3 library still works.
                                         
    
                                         I have my own fake S3 thing and everything's phrased in terms of that.
                                         
                                         So I will never find out that it broke until i go to production so and i mean i've seen people talk
                                         
                                         about writing you know if you depend on behavior you should write a test for external behavior and
                                         
                                         i i'm sort of sympathetic to it but i don't have enough hours in my day to test other people's
                                         
                                         code i kind of just assume it all works which is is actually, now I say it out loud, horrifyingly naive. don't have a better process than upgrading some dependency or the or the you know standard library
                                         
                                         or whatever running make test maybe like deploying to the you know test environment and starting the
                                         
                                         system up and poking it with a stick a bit yeah exactly yeah well i mean i do this i do this once
                                         
                                         a week every monday night i upgrade all of the compiler explorer's dependencies and i run all
                                         
    
                                         the tests again and then i deploy to our staging instance and i literally do that i poke around i do a few compiles i have my little list of things
                                         
                                         i do and then i go i guess that works then and then you know yeah sometimes it hasn't someone
                                         
                                         goes oh yeah the this this syntax highlighting isn't working anymore i'm like oh shucks i don't
                                         
                                         have the bandwidth to write full ui tests for third-party components, for example, you know, and so sometimes things break. Right, right. So it's like that, you know, I think, I don't think that I can
                                         
                                         justifiably call that kind of change refactoring because I have no expectations that problems
                                         
                                         would be caught. I don't know. I just don't know how to think about this right now.
                                         
                                         I see what you mean.
                                         
                                         So you started this by sort of positing that what other things have the smell of refactoring,
                                         
    
                                         even if we couldn't call them refactoring, where we are leaning into the test as like,
                                         
                                         well, this is my function to determine whether or not this was successful.
                                         
                                         The other thing that was successful measured by an objective objective intersubjective metric of the tests so yeah i mentioned you know moving things around for
                                         
                                         compilation which is a bit like just more traditional um refactoring uh i wonder yeah
                                         
                                         does this i think now again a lot of these other things are just code changes that have a more
                                         
                                         sense more of the sense of genuine refactoring but i was i was it called
                                         
                                         to mind when i took my largely c multi-user dungeon code from like the 1990s and was slowly
                                         
                                         like line by line turning it into c++ with all the trimmings i didn't really have that many
                                         
    
                                         automated tests we had a few and i wrote a few um and you know in a way you could argue that that is
                                         
                                         reducing complexity although you could also argue in quite a very real way i was introducing
                                         
                                         complexity by turning it from like 90s era c into template c++ but but it was it felt like um you
                                         
                                         know it was a useful thing to have and a useful thing to be able to do is to be able to run those
                                         
                                         tests and say like hey yeah i've no more am i manually allocating and freeing all these pieces of
                                         
                                         memory they're now you know standard strings uh the c++ runtime is doing this all for me
                                         
                                         and it all still works all of the string manipulation stuff still work um again it's
                                         
                                         more like a traditional feel but it's like language upgrade is what i suppose that would
                                         
    
                                         fall under you know hey i'm moving from javascript to typescript and most of my tests are still
                                         
                                         written in javascript and now they all still pass okay that's cool i can be reasonably sure
                                         
                                         that i haven't broken anything just by renaming the files and running them through a processor
                                         
                                         what else could there be i mean you've mentioned operating system kind of upgrades that's
                                         
                                         another thing um yeah i suppose if you were doing anything with sort of like generated code
                                         
                                         and you changed the way that you did the code generation that would be another kind of thing
                                         
                                         where you would expect your tests to continue to pass.
                                         
                                         And it's very possible that they would catch some problems. I think it's not quite as
                                         
    
                                         unjustifiable confidence as the, you know, I upgraded the libraries and all my tests pass
                                         
                                         because we explicitly don't, right? We explicitly don't test that behavior because it's sort of
                                         
                                         like, okay, well, where would you stop testing that behavior, right?
                                         
                                         Do you trust the CPU?
                                         
                                         Do you trust the electrons, right?
                                         
                                         Like you got to draw the line somewhere.
                                         
                                         But with this sort of code generation, there might be some situations where you say, you know, we do actually at least sort of transitively can say, if this code generation is broken,
                                         
                                         something upstream of it is almost certainly going to break, right?
                                         
    
                                         So you could rely on the tests in that way.
                                         
                                         It's an interesting point you bring up.
                                         
                                         Where do you draw the line?
                                         
                                         I was actually having a discussion this morning
                                         
                                         with a friend over this kind of thing.
                                         
                                         And this is a complete subject change.
                                         
                                         Well, maybe we can bring it back round.
                                         
                                         But I was sort of saying that pragmatically,
                                         
    
                                         all of us have like a ceiling and a floor of our knowledge you know as high level as you're
                                         
                                         prepared to go like i fall down at some of the more complicated type theory in the world but
                                         
                                         i'm sure there have been really clever things that can and higher level functional stuff is
                                         
                                         above me and i can't get and then there's a ceiling sorry a floor below you by which you
                                         
                                         kind of go i assume i have some knowledge of this level i'm confident to hear and i know that there's a ceiling, sorry, a floor below you by which you kind of go, I assume I have some knowledge of this level.
                                         
                                         I'm confident to hear.
                                         
                                         And I know that there's a couple of floors down.
                                         
                                         And I kind of, I assume that anything below my understanding is just works, right?
                                         
    
                                         And so, you know, that might be the API you're calling.
                                         
                                         It might be the operating system you're running on.
                                         
                                         It might be the CPU.
                                         
                                         It might be the physics of the, you like some level you've got a a flaw
                                         
                                         and um i think we have to as humans you know you could always get you know you say oh my god is
                                         
                                         there a bug in the compiler it's like well probably not probably there isn't a bug in the compiler you
                                         
                                         know some every now and then it is but otherwise you have to hold it right in order to keep the
                                         
                                         abstractions in your head or the sort of like the boundaries in your head finite it's sensible to have some simplifications where you go i just
                                         
    
                                         assume this is right and i think you know operating systems we tend to believe in um third party apis
                                         
                                         which is the thing i'm most shaky on again now saying this out loud random github repositories
                                         
                                         we're pulling from um but you still tend to assume that they're right i mean
                                         
                                         maybe you treat them a little bit more hostilely hostilely something like that um you know like
                                         
                                         it's not yeah yeah it's not beyond believability that you'll find a bug in a third-party piece of
                                         
                                         software and you know that's what github is great for you go and find an issue you ask on discord or whatever um but yeah but if you were writing tests where would you stop
                                         
                                         yeah right right yeah i mean i i the analogy that i use for this is i feel like there was
                                         
                                         some physics professor who is many decades ago who was doing research into quantum physics
                                         
    
                                         and then started like walking
                                         
                                         around with like uh like risers on the soles of his feet because he was scared that he was going
                                         
                                         to fall through the floor right and it's like and you know you just sort of do the math and you're
                                         
                                         like wait a second this is possible and it's sort of like you can't live your life worried that all
                                         
                                         of the electrons in the ground beneath you are going to suddenly align in such a way that your legs just sort of pass through the floor.
                                         
                                         And that's just how it's going to happen, right?
                                         
                                         Like, maybe there's some infinitesimal probability of that, but you just can't do that.
                                         
                                         You have to have – actually, I did a talk that was sort of adjacent to this once, which is like sort of tongue-in-cheek saying all engineering is based on faith.
                                         
    
                                         It is.
                                         
                                         Faith-based engineering. And it's sort of
                                         
                                         like at a certain point, you just have to have faith that the CPU is going to be able to add
                                         
                                         one and one and get two. Because if you do everything, constantly questioning all of the
                                         
                                         abstractions beneath you, you won't get anything done, right? I think what makes a senior software
                                         
                                         engineer a senior engineer is that they've developed a sense for when that
                                         
                                         can be trusted yep i i think if i add one and one i'm going to get two and when it can't be trusted
                                         
                                         where it's sort of like what do you mean i can't treat this tcp socket like a file right um because
                                         
    
                                         you know you just got to learn all those situations in which the abstractions are there
                                         
                                         and that's a really fascinating and sometimes they of what factors into the seniority of someone is like understanding when to draw the
                                         
                                         line and also when there's just enough of a whiff the other side of the abstraction you're like i
                                         
                                         i'm actually gonna single step into this third party library and I'm going to have a look through it because I
                                         
                                         now smell a rat I think there's something going
                                         
                                         on here that I don't understand maybe there is
                                         
                                         a bug in it or you know
                                         
                                         sometimes
                                         
    
                                         I have to crack out you know
                                         
                                         the Linux source
                                         
                                         code and go oh that doesn't
                                         
                                         look right oh this is a
                                         
                                         bug right it doesn't happen very often but it has
                                         
                                         happened and it will happen.
                                         
                                         But yeah, knowing when to stop.
                                         
                                         And also, this is when I look for folks to employ.
                                         
    
                                         One of the things I like to look for
                                         
                                         is the sort of tenacity of like knowing,
                                         
                                         being able to follow that thread all the way down
                                         
                                         if you need to.
                                         
                                         You know, if you hold your breath,
                                         
                                         big, deep breath,
                                         
                                         down you go through the levels
                                         
                                         all the way to the very, very, very bottom. and then maybe you come out and you've learned something and
                                         
    
                                         then you come back up for air and then hopefully you don't ever have to go beneath that level for
                                         
                                         you know again or for at least for a very long while and that's a really important thing to be
                                         
                                         able to do but then similarly being able to say i'm not even going to go down beneath my floor
                                         
                                         right now i've got a very well-defined floor of where we think the problems
                                         
                                         are buried and we don't need to go beneath that because otherwise, as you say, you'd never get
                                         
                                         anything done. You're right. You're going to fall down a rabbit hole and never come back, right?
                                         
                                         And I think this is directly related to kind of what we were talking about in terms of being
                                         
                                         able to draw this line and have these dimensions of refactoring that is beyond just redesigning.
                                         
    
                                         Because again, this ties directly into the problem of over-specifying in tests, right?
                                         
                                         Or under-specifying, right? If you go the other way, right? If you're mistrustful of abstractions
                                         
                                         that are beneath you that you should be trustful of, or it maybe makes sense to... It doesn't make
                                         
                                         economic time sense to distrust them, then you're going
                                         
                                         to over-specify in your tests. And that has many problems, one of which is you're wasting time
                                         
                                         writing tests for things that are probably not going to break. But also you're limiting the
                                         
                                         dimensions of refactorability that you will have in the future because of an unwarranted fear,
                                         
                                         right? And also on the other side, right? Like you don't test things enough because you know maybe
                                         
    
                                         maybe we should have some situations you and i where we write some tests specifically for when
                                         
                                         we do that next upgrade yeah yeah just to make sure that this thing really still works the way
                                         
                                         that it should well those things get informed a little bit by experience there you know when
                                         
                                         you've been bitten a couple of times by stupid whatever file handling in they do library bob
                                         
                                         you know you go well i tell
                                         
                                         you what while i was even trying to work out that this was a problem i wrote some tests just to sort
                                         
                                         of pave the way or to explore the space and yes i tarted them up a little bit and then we checked
                                         
                                         them in because they didn't hurt us any and they might just solve it the next one but it does that
                                         
    
                                         always has a sense of a feel of like bolting the stable door after the horse has bolted, you know, like whatever.
                                         
                                         Yeah.
                                         
                                         I forget shutting.
                                         
                                         Yeah.
                                         
                                         But I mean, I think it makes sense to sort of wait to feel some pain on those fronts.
                                         
                                         Because if you did that speculatively, again, where's the boundary?
                                         
                                         You'd be testing everything, right?
                                         
                                         So there's, but that's, I think what this is, is, you know, getting that level of specification right in the Goldilocks zone,
                                         
    
                                         where not too much and not too little takes experience. It takes some hard-fought experience
                                         
                                         sometimes. But when you get it right, or when you get it close, you're able to do all of these
                                         
                                         things. You're able to optimize your code with a set of running tests to tell you that it's not broken. You're able to change your threading model, again, with tests that tell you that it's not broken.
                                         
                                         You're able to do all of these other things that we can use our tests for.
                                         
                                         But if you either over-specify or under-specify, your ability to do that will be greatly limited.
                                         
                                         Yeah, that's awesome.
                                         
                                         And you've somehow managed to tie
                                         
                                         all my complete tangent
                                         
    
                                         back around to the original topic.
                                         
                                         So, yeah.
                                         
                                         Well, it turns out these things
                                         
                                         are usually related.
                                         
                                         You know, it's like there's subtle
                                         
                                         or sometimes not so subtle
                                         
                                         dependencies on these things.
                                         
                                         And you just, you can see them.
                                         
    
                                         They're there.
                                         
                                         Well, fantastic.
                                         
                                         I think that's a great place to pause
                                         
                                         and go refactor some code see them they're there well fantastic i think i think that's a great place to uh pause and uh
                                         
                                         yeah yeah go go refactor some code or go back to my compilation yeah or optimize it yeah i mean
                                         
                                         yeah go optimize some code that's what we're talking about
                                         
                                         you've been listening to two's compliment a programming podcast by Ben Rady and Matt Godbold. Find the show transcript and notes at www.twoscompliment.org.
                                         
                                         Contact us on Mastodon. We are at twoscompliment at hackyderm.io.
                                         
    
                                         Our theme music is by Inverse Phase. Find out more at inversephase.com.
                                         
