Two's Complement - Special Guest: Clare Macrae

Starting point is 00:00:00 Hi, I'm Matt Godbolt. And I'm Ben Rady. And this is Two's Compliment, a programming podcast. Today, we're delighted to have a special guest. Claire McRae is joining us. Hi, Claire. Hi, Matt. Hi, Ben. How are you?

Starting point is 00:00:26 It's amazing to have another guest on our podcast, so we're really excited to talk with you about all sorts of things. But do you want to give yourself a little bit of introduction? Tell us about yourself. Sure. So, yeah, I've been programming for a living for more than 30 years now. But my childhood, in the early days, the first language I programmed with was BASIC. And my dad got into computing in the very early years of industrial computing in the UK. And so when I started learning BASIC at school, he got really excited and he got an amazing book called Donald Alcock's Illustrating Basic, spiral bound book with the most amazing pictorial explanations of how to program. It was absolutely brilliant. And he even, dad even bought an early home computer, a Transam Tuscan, which I've seen one of the National Museum of Computing in Bletchley Park.

Starting point is 00:01:24 My hobby is retro computing and I have never heard of that computer. That's amazing. What was it called? The Transam Tuscan. That sounds like a car. Yeah, yeah. It's an amazing, amazing thing. Or a Star Wars character.

Starting point is 00:01:39 Yeah. So I started with BASIC at home, and about the same sort of time at the high school or upper school I was at they had some kind of setup with some I don't know remote mainframe I didn't didn't matter to me we had these sort of squares of paper and we could write down our basic programs and we'd send them off and a few days later we'd get back a printout with some information or an error message and then rinse and repeat and then just before i left school the first bbc micro arrived and which i know is something quite close to matt's heart right absolutely and um so yeah i have this weird memory of uh the lunchroom the room the classroom

Starting point is 00:02:19 where it actually was and where we could go at lunch times but mostly my memories involve standing behind other people in a long queue to get to use it for a few minutes. So I don't actually know how much I used it, but it's a really major memory from my childhood. And then I went on to do a chemistry degree, but I was just rubbish at practical chemistry. Two left thumbs always poured things away at the long last moment right at the end

Starting point is 00:02:45 and I had an opportunity to do a year-long computing project in chemistry in Fortran and that in turn stood me in good stead for for my career in the next 30 years so I think probably originally from my dad but it's just sort of I like the programming I like the logic it seemed to make sense um didn't always succeed but you know you could always learn to get better so that's that's my early years really gosh and so when you say computation well chemistry what what i don't even understand what that means like i have very basic gcse level chemistry understanding of like well water is two hydrogens and one oxygen kind of level.

Starting point is 00:03:27 But what can you do with a computer program and chemistry? So I kind of have slightly weird imposter syndrome over this because really I'm a computer programmer at heart rather than a chemist. But chemists have said, yeah, but look at the software you've written. You've obviously figured out quite a lot. So you kind of learn over time. Computational chemistry is really divided into sort of theoretical side of things where people are trying to predict new facts or trying to predict the results of experiments. And for years and years and years, the impossible example that was given of that was predicting the 3D crystal structures of proteins, which is always like you can't even predict the crystal structures of small molecules. So a crystal, you know, where a molecule aggregates

Starting point is 00:04:19 in three dimensions, you know, like in your salt shaker, salt crystals, that kind of thing. Right, again, vague memories of seeing like a lattice for like sodium, whatever, salty, salt crystal things. Yeah. So it's hard enough to predict, calculate theoretically the crystal structures of small molecules. And so proteins, it was like, you know, maybe in a few decades they'll solve that. But in the last few months, Alpha Fold, I think is the name of the project, has just blown away any expectations out of the water.

Starting point is 00:04:56 What they've done is absolutely- This is Alpha, like the Google Alphabet DeepMind AI company. Yeah. So like AlphaGo a few years ago that shocked the Go world. So that's the theoretical side of things. And that's not really something I've had any involvement in. But then there's software based on experimental data. And that's where my main experience is of working for a nonprofit organization for more than 30 years that exists to collect the results of crystallography experiments.

Starting point is 00:05:29 Essentially, to collect the results of, to collect information about the shapes of molecules in three dimensions and to collect, crucially, the symmetry operations that say how the molecules aggregate together across the whole crystal structure. And that turns out to be really valuable information. And there's a lot of software involved at all stages of the process. Got it. And so what language were you writing all this stuff in? So you were saying you're collecting and aggregating across lots of experiments.

Starting point is 00:05:58 And yeah, what form did that take? So the company, well, non-profit organization that i worked for for more than 30 years when i joined it was fortran 77 i joined in 1987 oh gosh but the organization is well over 50 years old now it was it was formed in 1965 and and they really were at the forefront of electronic publishing in the late 60s and early 70s, publishing books that they'd written, Fortran 4 software to plot the chemical diagrams of the structures that were in the database and do the typesetting with Greek characters and subscripts and superscripts that was unheard of at that time. And by the time I joined, that Fortran 4 had been ported to Fortran 77. And over time, we evolved to other languages.

Starting point is 00:06:50 And the majority of my later years there was all C++ and Qt. So I think it was 1999, somebody had the idea of porting all our fortran to c++ and getting rid of the old fortran and it took quite a while but we eventually achieved that i was going to say because fortran is deeply embedded in the science community and from my own like very limited dalliance with it um fortran has some interesting guarantees about arrays and aliasing that c and C++ don't provide, which means that naively converting large Fortran programs to C can actually be a pessimization if performance is important. I mean, maybe performance isn't as big a deal there. So that's an interesting thing to have to sell people on to actually convert over to a more modern language. But then i suppose the benefits are it's more easy to work on or easy to test or easy to

Starting point is 00:07:48 extend what what kind of benefits what was the reasoning behind moving to to i say more modern but obviously fortran still going um more modern language um i think better design we were really so i don't have experience of modern Fortran, but the code we had, there was lots of global data. We were really dependent on common blocks and things like that. And I mean, it's hard as someone who's been programming for a long time to try and convey to newer colleagues that, yes, this might look ancient to you and not how you do it.

Starting point is 00:08:20 But the people who did it at the time, they were, you know, what they were doing was really powerful with what they knew at the time and providing capabilities that weren't otherwise there. It was very object-oriented C++ that we ended up with. And there were lots of things that it would never have been possible to implement in the Fortran days. My career went through a really kind of weird evolution of responsibilities. So when I started in the late 80s, early 90s, it was a nonprofit organization that evolved out of Cambridge University. We didn't have sales and marketing teams. We didn't even have user support.

Starting point is 00:09:04 So the developers did all of those things and internal systems administration as well. So it meant there wasn't a huge amount of time. You certainly didn't have 100% of your time on programming, but you were speaking to users directly. So you had a really good sense of what people needed and this really painful feeling of, yeah, I understand what you need to do but i have no

Starting point is 00:09:27 concept at all of how to do that in fortran and with the the code that we had things like you started a search of the database and when it got to the end people wanted to do a new search but our software just halted and we couldn't see how to rework it to go back to the stuff really basic things gosh and then over time we ended up with you know powerful user interfaces and really maintainable code but by that point there were multiple layers in the organization meeting the customers meeting the users and dealing with them so by then we got all our requests second hand or third hand um and it was much harder to get that that sort of you know empathy and

Starting point is 00:10:07 direct connection of what do users really want to do i think ben you were talking in an earlier episode about you want before you're implementing feature you want to actually run it from a user's perspective and really understand yeah what the user needed to do yeah and i saw that for a long time get harder and harder and and then later on with agile we ended up sort of getting more and more direct contact as well so it kind of got better yeah not only do you lose that empathy but at least i find that you also lose a um a fair amount of i don't know a better way to describe this but almost problem negotiation right yeah where you know somebody's trying to accomplish a goal and they might think

Starting point is 00:10:46 of it in certain terms. It's sort of like the old Henry Ford aphorism about if I ask my customers what they want, they would say faster horses, right? And so as a technologist, a lot of times because of your knowledge of technology, you have different ways of looking at a problem. And if you can't have a face-to-face conversation with a person who's trying to solve that problem, if that is through a second party that may be technical or non-technical, it's tricky, right? You wind up building things that either aren't the best solution or in the worst case, don't really solve the problem at all because it's sort of a preconceived notion of what the problem should be um yeah is that so i know i

Starting point is 00:11:28 understand that you are now you've sort of moved on in your career and you're now uh starting to consult um is is that one of the things that sort of motivated you to do that well the real motivation for that was uh for the last two or years, I've been volunteering on some open source software for testing hard to test code, legacy code that you think there's no way we can add tests to that. There's no way we could break it down. And that's an approach called approval tests, which was invented by someone called Laureline Falco. And it reached the point where I was just learning so much. And I was in this fantastic kind of virtuous circle where I would speak at a local meetup group or a conference, which meant I had to learn more about the software in order to be able to talk about it. But then people would ask questions and I would learn more.

Starting point is 00:12:20 And that was just taking up more and more of my time. And I kind of reached the stage where I had a fantastic 30 years, but I was learning so much that I kind of felt I could speak to a different group of people and a wider group of people to share some stuff that I'd learned and worked out, but also a lot of stuff I was learning from other people. And it wasn't really being talked about at c++ conferences and so i thought it was a way that i could you know in the time left i've got in my career a way that i could really try and help other people out um yeah so the motivation wasn't primarily the consulting company it was i i want to to carry

Starting point is 00:13:01 on traveling and going to conferences and that's worked out really well this year. Oh, yeah. Year of the plague. Can't have everything. So you mentioned approval testing. What are approval tests? Approval tests, it's this really strange thing. It's a small body of code that turns out to be fantastically powerful.

Starting point is 00:13:21 And it's easiest to explain by describing something that i think a lot of people end up inventing themselves and rolling their own which is you've got some code that already exists and you want to add a test for it and you can't break it down into small chunks so you you call some function that generates a lump of data and then you write it out to a text file and you save that as your master version, your anointed version. And then you make your test, repeat that and do some kind of diff. And if it changes in future, then you get this great big wall of output that says these 5,000 characters differ from these 5 000 characters and then you cut and paste it into a diff tool and you try and work it out um so that's the kind of the

Starting point is 00:14:12 fundamental that it's based upon and lots of people invent their sort of homegrown versions of that i know i certainly have done likewise yeah and what approval tests is, is I guess you might call it an abstraction built on top of that, but it has a lot of sensible behavior built in by default. So, for example, if there is a failure, it pops up a differencing tool and it plugs into differencing tools. And so if your differencing tool shows you these five characters on the left and your new file differ from these 10 characters on the right it makes it much easier to decide whether oh i've made a mistake i need to fix my code or oh that's good the new feature

Starting point is 00:14:59 i just implemented has worked i'll use the'll use my favourite differencing tool to copy what we just got over and anoint that as the new version. It's funny the language we use for that because we always call that blessing and you say anoint. It is a very religious overtones to this kind of thing, this holy sacred text,

Starting point is 00:15:21 which is like what I was expecting it to be. You have to have a special hat. I wonder what it is in our psyche that makes us sort of go towards those things. The other thing is like what i was expecting it to be you have to have a special hat i wonder what it is at all so it's like yeah it makes us sort of go towards those things the other thing is like golden is the other sort of term i've heard for these golden tests and stuff and certainly for giant i know like gcc's internals have like all thousands and thousands of test cases they've collected along the along the years and like well this is what it should come out to be you know we found this weird situation where it's broken and rather than well as maybe as well as writing individual tests that test those components we're like well this is the

Starting point is 00:15:52 piece of code the user had and then this is what it should have generated and so it's a really powerful thing but to plug it into a diffing tool presumably this is something that you can configure so my ci build can just go no if it doesn't match. Yes. Yeah. So it recognizes through environment variables, it recognizes if it's running on a bunch of well-known CI systems. And in that case, it writes out a text diff rather than spinning up a GUI tool,

Starting point is 00:16:22 because obviously you don't want to block your CI system if you happen to have a graphical installer on it. So, yeah, Llewellyn is a big fan of convention over configuration. So this library, this approach is implemented in Python and C Sharp and many different layers of.NET and loads and loads of different languages. With the same vocabulary describing the steps and the options and the configuration on each of them. So I've actually made quite heavy use of the Python version myself for home projects. But it turns out that even though it's a really simple approach, and for the C++ version,

Starting point is 00:17:07 we have no... It's written in vanilla C++ 11, so we have no kind of process control in it or anything like that. So it's got a list of 20 or 30 diffing tools that it looks for in standard locations and on the path, and you can always tell it to use other ones if you want but it turns out that although it's a really simple idea and it's incredibly powerful and convenient it's also really possible to write code write tests that generate walls of output that hide the detail that you're actually testing and become fragile to maintain. And the worst possible thing is to have it set up so that it writes out too much output

Starting point is 00:17:54 and not all of the developers understand the purpose of the test. Because then if you get a test failure and actually it showed a bug, somebody comes along, sees the differences, says, I want to make that test pass. I'll just approve the new output. And they've lost the signal that actually there was something wrong. So there turns out to be lots of nice patterns, one through experience of what information you choose to write and only focus it on the information that's relevant to the particular test case. Include the inputs and a description of what's happening in the output as well as what the actual outputs are.

Starting point is 00:18:32 Make columns line up so it's easy to glance at and make sure developers understand the difference between seeing a bug and improving the output. And then with that comes incredible power and incredible convenience for testing legacy code and hard to test new code. Right. Yeah, I was going to ask, it sounds like this is a tool that you would normally use on a legacy code base, right?

Starting point is 00:18:55 Like something that was never designed to be tested. But you also said you were using it in some of your personal projects. So is this also something that you use for for other things that are not legacy code yeah so i guess there's at least three broad areas for it so certainly um you've got a lump of code that you can't yet divide up and write unit tests and so if you can find um a place to hook into it um to to run the code that you need to run. And again, that comes with practice. That's a big use of it. It's also really useful. So an example I give is on my PC at home, I have some Python scripts I use. I download bank statements and munch the data so I can import it into the finance software I use. Okay, no big deal. And when I wrote it, I kind of had this little personal nagging doubt that I ought to be writing

Starting point is 00:19:54 tests for this, but I was never going to. And then I learned about approval tests and I added an old downloaded set of transactions into my, because of course it's version controlled, because why wouldn't you? And then I run approval tests on it. And whenever I rework the software, which isn't that often, I can rerun the approval tests and see if the behavior has changed. So I'm not sure whether I would do that sort of thing in a commercial environment, but for small projects that you want the convenience of knowing if you've broken it and you don't want to have to write small sort of user-centered tests that's that's really good but there's it's got other things built in like it's fantastic if you've got large

Starting point is 00:20:36 numbers of inputs so if you've got some function that takes for now take six or eight arguments it's got a thing called verify all combinations where you can pass in a container with a set of values for each of the input parameters and again there's skill about formatting the output and so on so but you know in a few lines of code if you're looking to get good test coverage even of new new code, it's very, very quick to keep running it through a test coverage and add a new data point, add a new data point, alter that existing array. And so what if you end up with a file with 20,000 lines of output? Because your diffing tool is only going to show you the differences. And you'll get, you often, by seeing the patterns in the failure, oh, look, it's all of the values where parameter three is negative. That's where it's gone wrong.

Starting point is 00:21:29 Okay, that tells us where to look. So, yeah, it's fun and it's exciting. And it answers some of the questions that you, between you, have touched on in some of the earlier episodes of yeah how would you even begin testing here yeah there's a lot of when it comes to legacy systems in general there's a lot of chicken and egg problems that you run into where you know in order to make the code testable you have to be able to change it confidently and in order to change it confidently you have to test it and it's sort of what do you do there yeah This whole approach actually kind of reminds me of a technique that I've used for a while when writing simple bash scripts, where I will, if you're familiar with the watch tool in Linux, it will run a command over

Starting point is 00:22:15 and over again. And there's an option for it to show diffs. So it will highlight the differences in the output from one run to the next. And so if I've got a bash script that is fairly simple, right, there's not really any branches or anything, and it's just process a bunch of stuff and split it out on the screen, I'll just run the bash script over and over again and watch, you know, in like a two second or three second interval, and then edit the script and watch for those differences. Now, you know, I have to be quick. But for simple things, it actually works really well. And I almost wonder if being able to, for more complicated things, being able to pipe

Starting point is 00:22:53 the output through a tool like this and run it using something else, probably like with ENTR or some other sort of file system event-based tool where it's not just running every three seconds. It's running only when I make a change. Could be another way to sort of test, because bash scripts are notoriously difficult to test. Like there are ways and they're all terrible. And so I wonder if this could be yet another tool in the toolbox of I have a 500 line bash

Starting point is 00:23:19 script that's like incredibly important to my company and if it ever breaks, we're all screwed, but I still need to change it today. So what do I do? Right? Right. Yeah. Yes. Yeah.

Starting point is 00:23:31 That's really interesting. So I guess one of the things I like about Bash is you can turn on all sorts of warnings and make it fail if you try to use an unset variable. Set UOPipeValue. Yeah, yeah. Don't you do it often enough to have memorized the rooms, but I can always find it really, really quickly.

Starting point is 00:23:49 And they are rooms. Yeah. One of the things I'm quite envious of in some other languages than C++ is tools like ncrunch. So tools that are built into IDEs, and they're always running your tests as you're typing so you know it comes back to your your rule of eights and you need your test to be really really fast or to be able to select only the tests that you're working for the area that you're working on

Starting point is 00:24:19 um so that um what was that tool that you mentioned that runs? Did you say it was Watch? Oh, yeah, Watch. So when I'm working with Bash scripts, I'll generally – I mean, I actually do this. I use Watch for lots of things. It's a super useful tool. But it just runs a command over and over again. And there are options to show, like, differences from one set of output to the next. And it can be really useful for all kinds of things.

Starting point is 00:24:47 But yeah, running bash script is a great use of it. And just seeing those diffs. More generally as well, you can use it for like, I think it's an option that says exit when the output changes. So it runs the command once and then caches it. And then it keeps running the command over it every two seconds until the output changes. So if you've got like a directory,

Starting point is 00:25:03 you're waiting for a program to write something into, do like watch and i forget the command line you know dash dash quit if change ls and then you walk away from your computer it comes back and it's like oh it completed then when someone else dropped the file in the directory you were expecting which is like super poor man's uh watching what was a file system monitoring tool. But there's a number of things you can do that with. That makes me want to play around with, if I've got small C++ projects I'm building and as I'm typing I want to rerun the tests, if I had watch building the code and running the test, because modern IDEs, you can make them save as you type, then you don't even have to remember to run the test in obviously as i because you know modern ides you can make them save as you type

Starting point is 00:25:45 then you don't even have to remember to run the test in the ide you uh set watch up running and you can keep an eye and you'll see when your tests pass or fail that's that's interesting yeah i mean you do something like this i think ben when it comes to like python um yeah the times when i've worked in c++ too i actually do this with another tool which i think i mentioned called entr where i have entr watch my um like you know resulting test binary and then whenever it changes i run it so it's like i'll just count on the ide to do the compilation for me um but then whenever it successfully compiles a new test binary, ENTR will run the tests. So even if my IDE doesn't support automatically running tests, I can kind of make it do it

Starting point is 00:26:30 in a terminal. Um, and it just, it's just one of those things where it's like, you know, why, you know, why do two steps when I can only do one, right? Especially if I'm doing it all the time. Um, so if I can turn, you know, control S into the feedback loop that tells me everything that I need to know about my code, whether it's, you know, did compile successfully. Okay, great. Yeah. You know, did the test run? Okay, great. It's just, it just makes life easier. And yeah, those, those tools are super easy to use and install because it's just, you know, apt install

Starting point is 00:27:01 ENTR and pointed at a file and tell it what command you want to run and Bob's your uncle. Really exciting. The very first version of Compiler Explorer was in fact just a watch and GCC and a bunch of pipes. So it's a very valuable prototyping, web development prototyping tool, it turns out.

Starting point is 00:27:27 Yeah, it's a very valuable prototyping, web development prototyping tool, it turns out. Yeah, totally. So I've got a question about how you'd start approval testing. How does one go? You mentioned, for example, earlier, like making a procedural set of like run all these six different parameters and here's three different values for the A parameter and true and false for B or whatever it is. Obviously, that's going to generate your 200, 200 000 line output which you correctly say you know like if you if you catch a mistake hopefully if you've done it right it's a subset of those lines and your diff tool

Starting point is 00:27:53 only shows you the lines that make sense but when you go from nothing to that how do you do you just kind of go i hope there's no bugs to begin with Let's just bless whatever comes out the first time we run it. And then we're using it more to see if I changed it. Yes, yes. So built into approval tests is the idea that the first time you run a test, of course, you don't have the approved output. It doesn't know what it is. And a lot of diffing tools don't like it if you give it a non-existing file on the command line.

Starting point is 00:28:26 So the first time you run a test and it derives the file names from the names of your tests. So it comes up with sensible names. You don't have to think about any of this. You can change the name if you want, but by default, it does sensible things. So I see you haven't got an approved file for this. I'll create an empty file and then show you the empty file and the current output side by side. Right. And when you're working with legacy code, code that exists and you're trying to sort of lock down the behavior by writing tests, the right answer is always the current behavior.

Starting point is 00:28:59 Yeah. because usually at that point you're wanting to, well, you're either wanting to add a feature or fix a bug and maybe you need to do some refactoring first, but you want to make sure that you don't accidentally change the behavior. So you lock in the existing behavior and say, even if it doesn't make sense, even if it's not exactly what the customer wants, at the moment I'm in the do no harm stage of the sort of cycle. I make my test, I lock in that behavior, then I maybe go do my refactoring, and then I run my test again. And if it all still passes, then I know I haven't changed anything that I had done before. That makes perfect sense. I mean, it's a standard sort of cycle, except that instead of me having to invest the time thinking about what is it I must test about my code, in order to be confident that the changes I'm about to make don't make it. I'm kind of using a sort of global

Starting point is 00:29:49 view of like, well, just as long as we execute the code path and as long as the representation of the output is, to your point, captures all of the things that are interesting about what I want to test, then I just need to run it and output that. And that's my starting point. That's exactly right. And it wouldn't be unusual at that point for you to see something in the output and go, oh, that looks wrong. And so don't mix up changing the behavior with refactoring the code. Those actions need to be kept separate. And I've heard some fun war stories about, and if you think there is a bug,

Starting point is 00:30:36 don't go and change it until you've spoken to somebody who knows the product. People have been relying on that bug for years. You can't go changing it now. Exactly. Iron's law. Yes. Yeah. Yeah. How do you deal with the the sort of like i mean i can think of a cat certain categories of things where you might get tripped up like time you know things

Starting point is 00:30:53 that are things that vary by time uh unique identifiers uh guids you know potentially depending on what the inputs values are like how do you how do you deal with those kinds of things in the textual output? Yeah, that's a good question. So the approval tests approach has a vocabulary for code to deal with that situation called scrubbers. And so the kind of, so the way you do it is you, you know, you run a test, you approve the output, then you run the test again, and it fails. And you see there's a date and timestamp that's different, for example. Well, we have various helpers that you can use to say, well, if it's a GUID, we have a thing which says convert anything that matches a GUID regular expression to some kind of

Starting point is 00:31:37 placeholder text. And it even says it keeps track of the GUIDs it's matched. So if the same GUID appears three times, it will say, I think it's in square brackets or something like that, GUID 1. And then the next different, if it comes up with a different GUID, it says GUID 2. Date and time is a bit harder because of locales and things like that. It's out of the scope of our project to deal with every locale worldwide. So we've also

Starting point is 00:32:05 got a regular expression functions that you can call and you pass in a regular expression and then some replacement text. And so you can say convert all, everything that matches this date and time regular expression to something perhaps in square brackets that says date and timestamp um so and so we we focus on having helper functions to make it easy to do that and you program in the logic that you want for the pattern you want to replace and so the kind of conventional answer with date times is well you create a policy object that reports what the current date and time is, and you inject that all the way down your code until the actual point. And then who knows what you've broken on the way and how much time you've spent.

Starting point is 00:32:53 Whereas the approval tests approach is, now just write out the file and then reread it afterwards and munch it in any way you want to sort of beat it into submission to serve your needs and save your time. It's a fantastic different way of looking at things. But I love it. anything you can represent in a text, but say like podcast audio or a video or a picture or something like that, where the textual representation isn't particularly informative. Yeah. So when I was working on a 3D visualization program for visualizing crystal structures, I had to add a new style of visualization that users users been asking for for decades and we finally made the time to to implement that and i was on a quite a tight loop of i don't even know the maths at this point so i'll have a first stab and then run a few hundred or a few thousand

Starting point is 00:33:57 crystal structures through it and see which ones crash and um so i used approval tests for that. So my approved and received images were PNG files, I think. And at that point, I was never going to be able to commit those to our version control system, which was already several gigabytes in size. But I was able to run it repeatedly on my machine and it made for really good conversations with the product owner. He was a fantastic product owner, really helpful, really responsive. And sometimes he would say, yeah, that matters. We need to fix that. And other times he would say, people using that kind of structure are not going to be using this display style.

Starting point is 00:34:41 So it doesn't matter and if i hadn't had that conversation i would have tried to make it look pretty rather than saying well at least stop it crashing but don't try and make it look nice the other thing i've seen it used a lot in is audio and there seems to be a pattern of people asking for help um with testing audio outputs and it turns out there are some nice approaches around, say that what you're generating is an audio wave. You'd save that as a bunch of numbers and that would be the right answer, but that's really hard to understand when there's a failure. Well, it's easy to build in extra code very quickly that says, and also generate a picture of this, some kind of visual representation of it.

Starting point is 00:35:28 Is this like ASCII art? Well, I mean, if you've got some software that can generate the, like the wave, the audio wave. Oh yeah. Okay. And you probably wouldn't version control those. You wouldn't need to. The text output is the master thing but

Starting point is 00:35:46 if you get a failure then approval test has a concept called reporters and you can say as well as popping up a difference of the numbers also convert them both to something that you can visualize and then open them up in a diffing tool that visualizes differences in images really well and you can show it to a domain expert who says yes that matters or actually no human is going to be able to distinguish between those two just accept the new answer so anything that um you can create any kind of visual representation of is really amenable to this. Interesting. Yeah, I can imagine that the combination of factors there could definitely get challenging where it's like, okay, I have date and time sensitive information inside of an SVG file that I've generated.

Starting point is 00:36:40 And the only way to see that the differences are material is to open them up in two SVG editors and compare them and say like, yep, no, that makes sense. Or versus like, no, this isn't really important. And then building all the tools to filter all that stuff out so you don't get false positives. Yes. Yeah. There's another implementation of approval test that's in Python called, I want to say TextDiff, but I'll make sure the right link is in the show notes in case I've got that wrong. And it allows you to do things like approving web pages and PDF files and things

Starting point is 00:37:16 like that. And it does that by building on the power of Python to convert those files to text representation. So for example, you've got a PDF and yeah, someone might want to inspect the styling, but mostly you want to make sure you haven't changed the content of it, the text content. So run a Python tool to extract and nicely format the text content, displayed text content, and then use that for approving um so there's all sorts of once you sort of free your mind from well there's this exact output and i have to make sure it doesn't change more to well what do users need to um perhaps uh domain experts need to see in order to understand whether there's a meaningful change or not it becomes super powerful and depending on the time that you've got you could leave the output

Starting point is 00:38:14 the testing being done via approval tests you could leave a few tests in as an integration test down the line or maybe you want to use it as a placeholder on the way and use it to learn about the behavior of the code by feeding in lots of different inputs and then write down kind of business logic-based user expressed tests. So it's, yeah, depending on your scenario, approval test might be the end point or it might be a step on the way yeah i mean i would argue in most cases that it's like if you can make that

Starting point is 00:38:53 additional transition and go from something where you know you're using a sort of capture-based testing solution whether it's approval tests or, you know, there's other tools for this as well to something where you can have more targeted tests. Like in general, that's good. But I certainly wouldn't want to recommend that everybody do that because given the state of certain legacy code bases, that just is not economically feasible. Yeah. Yeah.

Starting point is 00:39:23 But yeah, it's an interesting line. And it's sometimes hard to find those lines of it's it's an interesting line it's sometimes it's hard to to find those lines of like you know yeah you know the suite of approval tests that we have like okay i guess one question would be in your experience how long do these tests take to run i would it's kind of like highly dependent on the system right yeah yeah so the if you were running incredibly fast calculations through them then i think the majority of the time would be the file IO because it writes out the current output and then it finds a matching file and then it reads it and it says it's equivalent. And if you've put in a regular expression, it's applying that and then it launches a diffing tool if it's different so there is a certain amount of overhead with that and i do know the theory of uh if it touches the file system it's

Starting point is 00:40:12 not a unit test but the people i'm talking to have no tests at all and no confidence in changing their code and so at that point any test is better than no test. Yeah, absolutely. And so we haven't talked about what if you can't pull out a usable chunk of code to throw data in and run through approval tests. So there's a whole extra skill that comes there with learning enough about safe refactoring to be able to, and how much you trust the refactoring tools in your IDE. And that you aren't using some obscure corner of a language where you've, I don't know, got inconsistent assignment operators or something. Who would write in such a language anyway? They would allow that to be a thing.

Starting point is 00:41:04 I mean, yeah. So it turns out the more that you look at this stuff, the more experience there is out there and the more there is to be learned about using IDEs to separate out chunks of code into functions that you can call and reuse. I haven't yet attended it yet, chunks of code into functions that you can call and reuse. Mm-hmm. I haven't yet attended it yet, but Llewellyn Falco and Jay Bazzuzzi in the States have been running some workshops on how to safely get code under test that has no tests by almost

Starting point is 00:41:41 entirely doing completely automated refactorings. Yeah. almost entirely doing completely automated refactorings. So you don't make, you, at each stage, you see what the next tiny change you want to make is, and they do amazing transformations of code. It sounds really exciting, and they're running a series of those right now, and I'm really looking forward to attending.

Starting point is 00:42:06 Happens to be in C Sharp, but the requirement requirement because you're using the ide to do the work all you need to do is be able to read c sharp which is that's no problem right um yeah yeah certainly when i was working at object mentor with james uh that our our preferred technique or at least my preferred technique for doing this in java um and part of this is just the java ids are so technique for doing this in Java, and part of this is just that Java IDs are so powerful, but doing this in Java was you just rely on the automated refactorings and, you know, read the code that was produced, like make sure that it also makes sense to you. Don't just blindly hit buttons, but rely on the automated refactorings that come from you know intelliJ or Eclipse to be semantically equivalent in most cases and try to find the minimal number of automated refactorings

Starting point is 00:42:54 it takes to get the tests in place and then from there you can start refactoring the code that is covered by the tests manually um have you I mean I would imagine that that kind of technique would not only be useful for unit tests, but also for approval tests. Have you done that, where you're using automated refactoring techniques to create a hook for these kinds of things? Yes, yeah.

Starting point is 00:43:19 The main IDs I use these days are CLion for C++ and pycharm for python and the refactoring tools works so much better in python than in c++ and they tend to be quite a lot quicker as well so it becomes a case of learning what tools work well and um perhaps occasionally you need to move a variable around before you divide the code up so it's got less scope to search. So yeah, that is something that I have been enjoying learning about and by coincidence, I'm going to be putting my money or my time where my mouth is uh coming up soon so on the 16th of february i'm doing a webinar for jet brains a live webinar um talking about using refactoring in sea lion to add tests

Starting point is 00:44:17 from no tests that builds on a that builds on an earlier webinar that umnie Mertz did a few years ago, where he was showing sea lion refactoring tools. And I learned a bunch from watching his video. But at the end, somebody said, what if you don't have tests? And I thought, well, that's a good thing to explain. Right. That's where your experiences can come in and show, well, this is how you can get tests from essentially from nothing almost, right? You can use the existing behavior to build your tests so if you're listening to this podcast sometime soon after the 16th of february the video of that will be available and the main focus of that is going

Starting point is 00:44:53 from no tests to having tests that you can trust and have confidence in and then in march i'm doing a talk at the accu conference online conference and that's much more specifically about refactoring and getting the ide to do your work for you so there'll be a lot of a lot of practicing and a lot of experimenting in different ide's and comparing the refactoring tools and so on so you mentioned actually just going back a little bit here that um the approval test library that you've been working on is open source so uh presumably we can go and find out remind me what it's called and where we might find it so the approach is called approval tests and there's a couple of good urls to go to uh approvaltest.com has got a bunch of links from there. And many of the implementations are in the GitHub user approvals.

Starting point is 00:45:48 So GitHub slash approvals is the other possibility. And you'll see all the different languages that are supported there. Right. So there's more than just C++ there. You said Python before now and many other languages as well. So this sounds like something which is, as you say, it's a generally applicable approach. And given the same sort of set of, was it scrubbers was one of the sort of terms that you used. And then what was the thing that does the transformation from the text output to something else for diffing at the end?

Starting point is 00:46:19 Like you mentioned with the audio example, what was the name of that? So the vocabulary it uses for the diffing tools is reporters so that's how it shows and you can write your own reporter that converts to a different file format and then does the diffs and things like that cool um looking at approvaltest.com it's got about the the logos of about 12 or more different languages java c- c plus plus php python swift javascript uh lure objective c ruby and um pearl pearl yeah i mean i guess it makes sense it's still going that's fantastic so how can people find out more about you and how can they contact you online? So on Twitter, I'm ClaireMcRaeUK and my website is ClaireMcRae.co.uk. And I would say I love testing challenges. If you've got small testing challenges, just contact me and we can chat and perhaps share code online or something like that.

Starting point is 00:47:23 Also available for training and consulting share code online or something like that also available for training and consulting on code as well but um i'm in the lucky position of this is this is my fun and my hobby um and uh i love sharing what i've learned so help me learn more yeah sounds fantastic well thank you so much for being with us today and yeah it's been absolutely brilliant to have you thank you it's been huge fun and great to meet you too ben yeah great to meet you you've been listening to two's compliment a programming podcast by ben rady and matt godbolt find the show transcript and notes at twoscompliment.org

Starting point is 00:48:03 contact us on twitter at twoscp. Theme music by Inverse Phase, inversephase.com.

Two's Complement - Special Guest: Clare Macrae

Ben and Matt trick another live human being into joining them on the podcast. Clare Macrae joins to talk about her work with approval testing, her experiences dealing with legacy Fortran and C++ code,... and an upcoming Webinar she's doing on refactoring-to-testability using CLion.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.