CppCast - Reducing Binary Sizes

Starting point is 00:00:00 Episode 388 of CppCast with guest Sandor Daga, recorded August 5th, 2024. In this episode we talk about the latest updates in CPP2, autoconfig for SonarCube, and whether NoExcept improves performance. Then we're joined by Sandor Dargo. Sandor talks to us about binary sizes and clean code. Welcome to episode 388 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Phil Nash, joined by my co-host for today, Anastasia Kesikova. Anastasia, how are you doing today?

Starting point is 00:01:05 Good, good. Thank you, Phil, for having me here. Hi, very welcome. Welcome back. Good to have you here. Timur's still away, of course, until September. I'm actually going to be away for a couple of weeks as well, because Timur is not back until September. I'm going to have a little bit of a break, so it's going to be about four or five weeks before the next episode as this one is released. So we may still have one more guest co-host yet. We're not quite sure yet. But you may have the honour of being the last guest co-host for this season, so welcome to the show.

Starting point is 00:01:39 Sounds good. It's actually great to be back. Yeah, generally happy just to join you here. You haven't been here for quite a long while? Yes, it's been a while. What have you been up to in that time? Many, many things, mostly busy at my work. But also a very exciting thing, which I'm super excited about, is the C++ Under the Sea. You probably might know where the name is coming from, right?

Starting point is 00:02:03 I can have a guess. Yeah, we've mentioned it a couple of times on the show. Yeah, so C++ Under the Sea, for those who don't know, it's a new conference in the Netherlands, and it's a C++ conference, for sure. And it's Under the Sea, you might guess why, because it's Netherlands. But it's actually in Breda. And as far as I know, Breda is like three meters elevation.

Starting point is 00:02:25 So it's not actually under the sea, but we'll do our best. Right. Well, don't let accuracy get in the way of a good name. That's what I say. Yeah. Yeah, we'll try. Actually, I would invite everyone to join us. So the program is not yet published, but we've already announced Jason Turner as a keynote speaker, and we

Starting point is 00:02:45 announced workshops, so we have you, Phil, for the workshop, and Mateusz and also Jason Turner as well for the workshops. I can't really tease you all the program, but I have to say that we already accepted to talk about the boost geometry, which I find personally very exciting,

Starting point is 00:03:01 and also about the spaceship operator, which I also quite like. And the program is coming soon, so believe me, I need program media chairs so I'm promising you the program quite soon actually. Take it as a commitment. Well announced out on the show when that arrives, so looking forward to that.

Starting point is 00:03:18 Yeah, please do. Good to see another new conference. Okay, well at the top of every episode we like to read a piece of feedback. uh nick stone sent us in an email which uh reads in part many thanks for all the work you do in producing this podcast i enjoy it and look forward to it but i have to say the episode on stood execution was way too advanced what's the point of a talk that only the best c++ gurus can understand i've been working in c++ for 30 years but almost all of that talk was over my best C++ gurus can understand. I've been working in C++ for 30 years,

Starting point is 00:03:46 but almost all of that talk was over my head, and I've implemented concurrency. So Nick goes on to suggest that for more advanced topics like that, maybe we should insert a couple of minutes of extra explanation in simple terms. Well, thanks for the feedback, Nick. I mean, we do try to vary the topics and sort of the level that we cover them at, while at the same time, you know, trying to keep them as accessible to everyone as much as we can. And maybe we missed the mark on that particular one. But it is an inherently complex topic, which I think is quite hard to actually try to explain in simple terms, especially in a couple of minutes.

Starting point is 00:04:25 I think in that episode, Timur actually said that he didn't understand it even after a couple of hours of explanation from an expert. So maybe that's just going to take a little bit longer to take that one in. There are some topics like that. And I expect within the next year or two, there's going to be loads of conference talks about it. It's going to be just like co-routines were the last couple of years.

Starting point is 00:04:44 Everyone wants to do a talk on it, try and explain it in their own way. You have to watch more than one just before it clicks. talks about it. It's going to be just like co-routines where the last couple of years, everyone wants to do a talk on it, try and explain it in their own way. You have to watch more than one just before it clicks. So it's probably going to be a bit like that. So if you didn't follow that episode, don't worry, we've got more and we're going to cover some more topics. So we do like to hear your thoughts about the show. You can always reach out to us on xMaced, LinkedIn, or you can email us at feedback at cppcast.com. Now joining us today is Sandor Dargo. Sandor is a passionate software craftsman

Starting point is 00:05:15 focusing on reducing the maintenance costs by developing, applying, and enforcing clean code standards. His other core activity is knowledge sharing, both oral and written, within and outside his employer. When not reading or writing, he spends most of his free time with his two children and his wife baking at home and traveling to new places. Sandor, welcome to the show. Hi Phil, hi Anastasia, hi everyone. Thanks a lot, thanks a lot for having me here. Yeah, great to have you here, Sander. So I was just wondering, I saw recently that you were presenting at C++ on C, like another conference with a fantastic name, I would say. And I saw that your talk was actually scheduled

Starting point is 00:05:55 in the East Const room. So my first very important question, is it like East Const or Const West for you? And the second question is how much you enjoyed the conference. Well, let me start with the east coast and west const then. You know, I'm all for consistency and this could mean that I would go with the east const, right? Because they say that's the most consistent solution,

Starting point is 00:06:19 but that's not really what I meant. I try to follow the existing guidelines and conventions of a project, so I will just go with whatever that is already used there. Well, I don't think that consistency, the so-called

Starting point is 00:06:38 foolish consistency, should stop us from making things better. But the question of cost is not the hill that I will die on. And yeah, but CPP&C, I absolutely love the conference. I told

Starting point is 00:06:54 the organizers at the end that it kind of feels like going home after a few years. That was the first conference where I presented during COVID. So that time it was virtual. And I think I've been there actually in Folkestone three times by now.

Starting point is 00:07:20 And, you know, people are kind. They want to learn and they want to share. And they also help out. So, for example, during my presentation, I was asked a question. I don't remember what it was, but I couldn't give a proper answer. And, you know, someone just helped out from the audience and explained that part. And they managed to do that in a way that I didn't feel bad about it. So, yeah, people are really kind.

Starting point is 00:07:56 I also think that it's just about the right size, C++ on C. It's enough to have people with all different backgrounds, industries, seniority levels, but it's still human. You can easily make human connections. So if there's only one conference in a year where I apply to,

Starting point is 00:08:18 it's definitely C++ on C. Thank you. I don't think it's the mark of a good conference and hopefully you do think C++ on C is a good conference and hopefully you do think the c++ on c is a good conference that so you you go for the technical talks but you come back for the community and i think that's what definitely what i hear a lot from people so exactly yeah that's worked out for you too all right well we'll get more into what you have been working on and talking about at c++ on c and other places in a few minutes.

Starting point is 00:08:47 But before we do that, I've got a couple of news articles to talk about. So feel free to comment on any of these. So the first one is CPPfront or CPP2, the sort of new language from Herb Satter. He was on a year or so ago talking about that. So if you haven't heard that it's um he's calling it syntax 2 for c++ so it's really c++ under the hood in fact cpp front is a play on uh c front the original c++ compiler that was actually a transpiler down to c so cpp front the transpiler down to c++ And he's using it as a bit of a,

Starting point is 00:09:26 partly as a bit of a playground for some of his ideas for proposals for C++. Some of them have actually already gone in. Some will probably never go in. Some are already still in the running. But I think, you know, you can take it seriously as a language in its own right as well. We'll see where that goes. It's good to see that he is continually developing it despite a busy first half of the year he's got

Starting point is 00:09:49 actually some quite big features in um just a couple of highlights from here so there was a what he calls the tercest function syntax so you may have heard of the proposed terce lambda syntax for c++ which doesn't seem to have gone anywhere yet. But this one is even terser, so the terse is function syntax. It's just colon, then arguments in parentheses, and then the actual function body with no curly brackets.

Starting point is 00:10:17 So it seems to be a little bit too terse for some people if you read the Reddit comments, but interesting that he's trying to experiment on that front. There's quite a bit of C++ 23 and c++ 26 catch up because he does want to keep the language current the languages that he's generating but also the the features in cpp2 itself one of the big ones is the ufcs unified function call syntax which is had in from the start i believe so you can call functions as their methods or methods as if they're functions interchangeably but because it's so seamless there's not really any way to sort of tell what the difference is you know if there's any performance difference or or otherwise between that and a more direct syntax so he's

Starting point is 00:11:03 actually introduced this sort of opt-out syntax with two dots. So instead of just doing dot and then a member name, you do two dots and a member name, and it always has to be a member, just so you can compare. So I thought that was interesting. What makes it particularly interesting is he's also introduced another piece of syntax, which is three dots. And that's for, well, what is originally for the half open range

Starting point is 00:11:25 so similar to actually many other languages have a syntax like this we eventually you know start and an end and use something in between to indicate whether in particular whether the the last element is part of the range or not so the half open range starts at the first element and finishes one before the end. So just like iterators, I mean, normal C++. And then the closed range is a dot dot equals, which means it includes the last element of the range. So if you're counting from one to 10, for example, it's natural to include the number 10 in that. It's useful to have both. What's interesting there as well well apart from just the potential

Starting point is 00:12:07 confusion with the um unified function core syntax is that there's a whole reddit thread about this and they actually convinced him to to change this to rather than having dot dot dot be the half open range there's like a default range make it more explicit with um dot dot less than for the for the half open range so you have um the idea is that you finish before you know less than the last element that's the idea of the syntax so you've got dot dot equals dot dot less than and no dot dot dot and then that clears up all of the the ambiguity and everybody's happy so it was quite interesting to see that actually unrolled during the reddit thread yeah reddit thread was actually much more impressive than the original post i have to say because i'm not

Starting point is 00:12:54 following the cpp thread updates uh like on a regular basis mostly like on conferences when i listen to herb but uh here like the red Reddit thread has these fantastic opinions from many people. And moreover, they have this table which they built from different languages and how in different languages the half-open and the closed range are implemented. And you can compare Swift and say other languages, Kotlin. It was interesting to see because sometimes i'm like oh yeah they're just all the same they're using very similar syntax but then these very very small tiny details differ and you look at them and you're like oh wow really and it was quite impressive that i actually convinced for a change so i was yeah really impressed with how the reddit actually

Starting point is 00:13:41 worked and just you know i'm just talking yeah i think it was quite persuasive seeing it all laid out there you can see as you say there's a lot of similarity but in some cases different languages chose the same operator to mean different things so if you move between those languages that's that's really hard but if you've got really unambiguous operator names then um i think somebody said that now CPP2 has the least ambiguous of all of them. That's the whole idea of CPP2, to be a playground for these sort of things. So I think that's a big plus. Okay, so the next news item is autoconfig. C++ code analysis redefined.

Starting point is 00:14:24 So you may remember that last year we had abbas sabra on uh he was my colleague at the time at sonar before i left sonar and he was talking about automatic analysis which is a type of auto config or zero config way to run the the sonar analyzers but only on the sonar cloud at the time which means hosted uh on sonar servers only really works with you know cloud-based build systems in general there are a few limitations not everybody could use it now that's come to sonar cube which you can run on premises on your own hardware uh so it's actually quite a quite a big step up to to make that run in that way i don't know it sounds like a little thing oh now it's come to this this new way of doing it but behind the

Starting point is 00:15:11 scenes there's a lot of work that's gone on to to make that happen so i'll put the link into the article there if that's something that interests you and the third article is so we talked before about an article from Benjamin Somerton, where he measured the impact of the final keyword, because it often be mentioned that it improved performance, but nobody seemed to be showing any benchmarks. And I found that the truth is actually a little bit more subtle than that. I would have done another one, but now for the no except keyword. So the full title, no except for the noexcept keyword. So the full title, noexcept can sometimes help or hurt performance. So you can probably guess that, again, the reality is it's a little bit different to what we might have expected. In fact, at some point, he asked a question.

Starting point is 00:15:59 So did it have an impact? The short answer is yes, but also no. It's complicated and silly. So if that gives you a taster of what the article might be about it's a packed full of charts and numbers and other little stories around how you had to run things an extra time to actually get useful results out and that sort of thing quite a long article but if that's something that interests you very well worth having in our canon so when we do talk about using no except for performance reasons, although he does go into some of the reasoning behind why we say that,

Starting point is 00:16:32 here's some actual hard numbers that show that perhaps more often than not, it actually decreases performance. But I'll let you decide. Yeah, there were quite plenty of numbers, I would say, that I read through them. And my biggest question was indeed, like, why this is happening? Like, why some specific combination actually wins on top of others? And there are not that many, like, explanations or discussions about that.

Starting point is 00:16:59 But maybe someone else could write a follow-up in the blog post actually explaining why this happens. There are some discrepancies in terms of when the Microsoft compiler is used and some specific combination with GCC. It's just interesting to understand what's happening under the hood. Yeah. It's a long article, but I really loved his approach.

Starting point is 00:17:20 Once again, he went for the benchmark. Whenever we talk about optimizations, we should always measure. And, you know, it turned me about because when I try to see how no except influences the binary size, normally everyone will say, yeah, it's good for the binary size. It will decrease. And in most cases, it's good for the binary size it will decrease and uh in most cases it's true but uh i found some cases where actually the binary size went up and it turned out that it's uh it's a compiler bug ah yeah that's that's always a possibility, which may even go some way to explaining some of Ben's results. Yeah, I think, yeah, to me, his main message is that don't forget to measure if you hear something that this will help you.

Starting point is 00:18:17 And that's the most important. Yeah. And in particular, measure in the environment that you are actually developing for. So don't necessarily go to all the work that Ben has to try to compare it across many different environments for different toolchains. But of course, that will also change over time as well. So you got in a mention of the binary sizes there, which is a great segue into our interview, because you did a couple of talks actually we sort of did them back to back so we made it like a mini workshop but uh two 90 minute sessions or was it 90 and 60 i forget but two back-to-back sessions at c++ on c just on the subject of reducing binary sizes which is not something we we often talk about we usually do do just talk about performance and things like that. But why is binary sizes so important that you had to do two talks on it?

Starting point is 00:19:11 Yeah, it's actually, yeah, it was 150 minutes. It was long and tiring. But the good news for me was that the vast majority of people came back for the second part. So that was a good message that people actually found the topic important and useful. Well, I have to be honest, probably it's not important for everyone. In my previous job, binary size meant something completely similar than what it means now. There, our biggest struggle, let's say, was, say, we had about two dozen different backend services on the server. And these services, the executables, shared the same shared library. Well, actually quite a few shared libraries.

Starting point is 00:20:09 And sometimes we run into storage issues. There, the solution was simple enough. We just had to limit the number of versions that we use, these shared libraries. But now, well, when we resize means something completely different, we actually want to limit the size of our C++ core library

Starting point is 00:20:34 that we share among the different environments, the different languages, like Android, iOS. Right. So actually now, binary size is important for us for three different reasons.

Starting point is 00:20:56 Right. For one, you know, there are markets where the bandwidth or even the data plans are limited. No. It's a matter of device. It's a matter of your mobile contract. And third, a smaller binary size,

Starting point is 00:21:18 even reduce the CO2 emissions. That's what our data scientists found, but I have no details on that. At least I heard some data, but I don't remember. But I was really surprised. But from time to time, we mentioned certain studies about different programming languages where you find that C++ and C and Rust, they are much more efficient when it comes to energy consumption than other languages like Python, for example. So I think it's completely possible. But there were many others at these talks

Starting point is 00:22:09 who it means something different. The binary size for them is actually really a storage limitation. Because on your mobile, most probably it's not really important in most cases, whether it's 100 or 105 megabytes. For some, yes, but in most cases, no. But in the embedded world, well, you will have some real limitations. And there, we are not reaching the size of 100 megabytes.

Starting point is 00:22:40 Sometimes not even a megabyte, right? Yeah, I was developing for iphone back in the early days of the the iphone store and i don't remember exactly when it changed but uh but back then at least if your binary size was greater than 50 megabytes it wouldn't download over over mobile yeah you have to wait until you've got on a wi-fi connection so that was was definitely, I mean, that's quite a small size for binary these days, particularly something like a game with images and audio files and that sort of thing. So I think there is possibly still a limit,

Starting point is 00:23:15 but it's just much bigger now, so we don't really think about it. But certainly that used to be the case. Yeah, I remember that, you know, I had a data plan of 2 gigabytes maybe, and I paid more than I pay now for 80 gigabytes. So here it's less of an issue, but there are certainly markets where it's a problem. Yeah, I guess embedded market is trending for these limitations. They are the people who do care. They don't have these capabilities of iPhone, like we know.

Starting point is 00:23:53 One more, you even have a book on Linpub on that, right? So I guess the topic is actually quite popular that you've written a whole book on that. Usually what I do is that if I write a lot of blog posts about a certain topic, then I try to collect them, I go deeper, and I publish those books on LeanPub. That probably means that you have quite many solutions to the problem, right? Quite many techniques.

Starting point is 00:24:26 Yes, yes, actually. Well, there are endless techniques to limit the binary size. Oh, wow. They're quite different. You know, there are many different approaches. And I would say what you want to do with your code, it really depends on how desperate you are in limiting your binary size. Because there are certain techniques that are as simple as just activating a new compiler or linker flag. And funny enough, when I started my quest to limit the binary size, I completely neglected those.

Starting point is 00:25:14 But we'll probably talk more about it later. And then there are certain techniques that are best practices anyway, and they will make your code cleaner. That's another topic I care about. So, for example, you shouldn't have virtual destructors

Starting point is 00:25:36 if you're not going to use them, if you're not going to inherit from that class, and it will also help your binary size. You should follow the rule of five, which will, I don't say it will help your binary size, but it might help your binary size.

Starting point is 00:25:54 And it's a best practice anyway. And you know, to make const x for as many things as you can, that's also quite in the trend, and we consider it useful. It will help your binary size as well, in most cases. And using minimal templates, well, it will

Starting point is 00:26:15 you can see this code extraction, which will make your code in most cases, if used wisely, it will make it better, plus it will decrease your binary size. So there are these techniques that are best practices anyway. But

Starting point is 00:26:31 when it comes to any kind of optimization, then you will run into techniques which are clearly compromises. Like, there's an interesting one. Using, defaulting

Starting point is 00:26:51 your special member functions in the implementation file, so in the CPP file. And I remember this question came up in my head a few years ago. I didn't care about binary size at that time. I was like, okay, in most cases, we want to provide the implementation for a certain function in the CPP file.

Starting point is 00:27:19 So when we default special member functions, where should we do that? Does it make sense to do it in the header file? Or just like for any other function, should we do it in the implementation file? So this question came up to me a few years ago. And I posted a question on X, and I mentioned a few C++ trainers. And I remember that quite a long discussion came out of it.

Starting point is 00:27:56 And the conclusion was like, why on earth would you do anything like that? Because if you want to default your special member function, well, basically what you are saying is that there's nothing special going on here. For some reason, I have to provide these special member functions

Starting point is 00:28:18 by myself. But there's nothing special. But if you do it in the implementation file, well, you lie. Because if it's not in the header, it implies that, well, actually there is something special ongoing. But then you just go to the implementation file. Well, actually the compiler will go to the implementation file and just realize that, okay, there's nothing special going on there. And with that, you lose the, well, the compiler loses the ability to

Starting point is 00:28:49 perform certain kind of optimizations. But at the same time, it limits inlining. And if it's a widely used class, it might help you on the front of the binary size.

Starting point is 00:29:09 So it's clearly a compromise to do that, not only because of possible runtime performance drawbacks, but because of readability. The first moment, you see that you think, what on earth is going on here? Why someone did that? So if you decide to do that, I highly

Starting point is 00:29:39 recommend that you document it at a central place. Otherwise, someone will come and start removing them until someone else reminds the person not to do that. So I was there. I did that. Well, I didn't lose a lot of time, but still. Now it's well documented.

Starting point is 00:30:05 Yeah, that is a good use of comments to explain why you did it this way instead of some different way so that somebody doesn't go and change it thinking they know better. Yeah. So that's definitely a compromise to me. And there are other techniques like using external templates or to think about the initial values of

Starting point is 00:30:25 class members, which might be good for your binary size, and you might want to do that. You might want to use these techniques, but sometimes they will decrease the maintainability of your code or the expressiveness.

Starting point is 00:30:46 So if, okay, talking about the initial values of class members, if you use basically zero default values everywhere for members, then the compiler can just zero fill all the bytes necessary. It doesn't have to fill the bytes, well, it doesn't have to fill the memory with certain specific values, and it might help you. But it might also limit the expressiveness of your code if you can't use whatever initial value that you want. So probably I would say that if you have certain classes that are instantiated, I don't want to say any numbers you instantiate

Starting point is 00:31:48 them a lot of times then think about these but if that's not the case probably that's not so important for you and you know there are also some techniques that are almost baked into a project

Starting point is 00:32:03 from the very beginning do you rely on runtime type information or also some techniques that are almost baked into a project from the very beginning. Do you rely on runtime type information or exceptions? You can turn these off and you can gain quite some space. Not necessarily,

Starting point is 00:32:20 but often. But if you don't think about it from the very beginning, it might become very difficult. So for example, if you overuse dynamic casts, then removing all those later, well, it can be kind of a challenge. I would say that in most cases it's probably a good thing to do but i know this could be a hot topic not everyone would uh would agree on that but i do think that without dynamic us you will end up with uh with cleaner code i can agree with that and uh and also a smaller binary. So I think that's useful.

Starting point is 00:33:06 And with the compiler flag dash f no rtti, you can just disable using certain keywords like dynamic cast. So I think that's a good thing. Well, the exception is that that's another topic. That's a big topic. Yes, that's a big topic. Maybe it's like the back to the non-except discussion. Yeah.

Starting point is 00:33:36 Looping. But, you know, we looked into turning exceptions off maybe a year ago, and it's just not possible for us. Right. Well, it would cost too much. But probably we could gain some megabytes there, but it's way too much work.

Starting point is 00:33:59 I imagine a lot of embedded projects have exceptions to say what anyway, so maybe they're already getting that benefit. Yeah, maybe that's also why they turned exceptions off. It can be a factor, but I think usually it's other reasons, particularly how they're going to handle exceptions. It's quite a discouraged behavior, I guess, embedded exceptions by everyone. It's true.

Starting point is 00:34:26 Some of the other techniques you mentioned there, particularly about sort of inlining defaulted constructors or special member functions or initializing member variables, sound like the sort of things that, at least when I first think of it, I think Charlotte is not going to make a huge difference, is it? But have you actually seen big reductions in binary size by using those techniques specifically? Well, they do add up.

Starting point is 00:34:56 But that's a very good question. And that's a discussion I had maybe a year ago with some of my colleagues, and I clearly didn't communicate clearly enough in the beginning. Well, if you think about all these techniques, they are fun. We are developers, and we try to code different ways and see the effects. And we said, wow, OK. I removed all the unused virtual destructors. And we gained, I don't know, 8 kilobytes.

Starting point is 00:35:38 How cool is that? And those kind of things and usually when you do these small things you are on the scale of kilobytes depending of course on the size of the project but when you try and you compile

Starting point is 00:35:57 your linker settings you play on the scale of megabytes so that's a bit different. So yeah, what should I say? I think both are important. So writing code that is good, that is not bad for your binary

Starting point is 00:36:24 is a good thing. It's good if you know how to do that. I don't like if your code only satisfies certain criteria because the compiler can do something for you. But you cannot ignore the compiler and linker settings. Those are the first things you should look at. Because, well, if you are not so desperate, as I mentioned earlier, then probably you should just start with these settings and think about using OS,

Starting point is 00:37:09 where OS, so O is for optimization, S is for size. And you should just try it first. And you don't give up runtime performance because if I'm not mistaken, OS is based on O2 so it's already quite optimized but it doesn't include certain techniques that would make your boundaries bigger such as loop unrolling it won't replace your for loop with I don't know how many identical instructions because it's

Starting point is 00:37:46 not good for your binary. So that's obviously the first thing you should try. So you mentioned earlier about some techniques to reduce inlining explicitly, but maybe if it's optimizing for size, it will do less inlining

Starting point is 00:38:01 to start with and you may not need to do that. I think so, yeah. That's really the thing. And, well, I found compiler setting dedicated for some inlining threshold. I found it a bit exotic. I don't remember exactly its name. It works in LLVM. It was LLVM inline threshold,

Starting point is 00:38:31 something like that. And you can set numbers somewhere between zero and a thousand. And it's a big range. And I didn't find it very well documented. Sometimes if you decrease it too much or increase it too much, you get a result completely different from what you would expect. But it's something you can experiment with

Starting point is 00:39:02 because with certain settings, you might gain quite a lot. But you have to measure. There, you really have to measure. Yeah, I wonder if it may inline it past the point that the inlined code would actually be smaller than generating the function call. I think that, yeah, that could actually happen. And a technique that I, I think that could actually happen.

Starting point is 00:39:28 A technique that I... Well, it's not a technique for reducing the binary size or inlining, but a useful technique if you want to benefit from this, measure the binary size in your CI pipeline and post it back to the main page of your pull request. Yeah. I'm talking about the CI pipeline actually, wondering so how the process looks like for you. I mean, you know quite a lot of techniques,

Starting point is 00:40:04 you shared some of them they all quite interesting and probably when you look at the code you're not just like applying random techniques but you see some criteria and you're like okay so this technique probably will work better here so is it some kind of i don't know abstract tooling or something and you apply techniques and measure and see what fits the best uh so i can't stop thinking about that as a possibility for tooling here because every time i hear this kind of frameworks and techniques and like things you can apply step by step that like brings this you know lighting in my head that that could be a nice tooling or at least a nice approach nice process

Starting point is 00:40:42 so i'm just wondering how you specifically do that. So do you have some process? Well, I would say, yeah, no, not really. We don't really have a process for trying these things, but we do measure the effects of a pull request on your binaries. I say, again, at the end of the build, you see exactly how many bytes you shaved off or actually you gained by applying a change. And if you're really curious, it goes down even to a section level.

Starting point is 00:41:21 But usually, that's just way too much information. Okay. And, you know, we try to do our best to reduce the size when you think about reducing the size, but applications also, you know, gain new and new features. If you don't do that, you die. Some die. So that will obviously increase the binary size. So sometimes we have these discussions that, okay, for a certain feature, we have three different ways to implement it.

Starting point is 00:41:57 And in a small scale, we tried all three, and we saw how much it should have gained with each. And we chose the one that did the least amount of binary size. Yeah, sounds like a nice requirement for the code. Like you have to implement this functionality and also you have no more than this megabytes of binary size. That's good. Yeah, well, we don't have that exactly. We don't have it

Starting point is 00:42:28 as a non-functional requirement. Actually, we could, but we developers are kind of in this mindset and even without the named requirement,

Starting point is 00:42:44 we think about it. Yeah, I guess in some systems, the requirement is just natural. But it just simply doesn't fit. So there is a natural requirement for that. Especially if, you know, at each pull request, you see these numbers. And you see a smiley next to the number that grows a lot. Are you often rejecting people's requests because of that? I'd say no.

Starting point is 00:43:15 No, but I do comment on it sometimes. Okay. But I also trust people that they already tried different things to limit the effects. And in some cases, there are just no other options. Okay. So you mentioned the linker a few times. Now, I know you've been talking about reducing the binary size, so presumably the final executable. But are you concerned with like objects and library sizes that the link has to work with as well?

Starting point is 00:43:47 Or is that a completely different topic? And for context, the reason I'm asking this, because I know some of the things you mentioned, like, um, you know, heavy use of templates can often lead to very big object file sizes. And then the link has a lot of work to do to de-duplicate them. And it won't necessarily impact the final executable size so much, but it can have its own problems with these large object files. Okay. Now I understand your question.

Starting point is 00:44:14 Thanks for detailing it. No, not really. We mostly care about the final size. Okay. Interesting. Because you did mention templates there, and maybe some of the same techniques you're talking about can help that as well. Now you mentioned extern template, that was something that we were trying to use a lot on a project I work on to reduce

Starting point is 00:44:33 object file size, give less work for the linker. We never quite got it to work as well as we had hoped, but... You know, all these different techniques, I haven't tried everything yet at work. I haven't tried everything at work. So some of these things I tried at work or we experimented with those things at work, but not with everything. And actually, yeah, Xtend templates is something that I would like to try one day. So, you know, we have this notion of the idea of tech weeks when, you know, you just work on anything

Starting point is 00:45:15 that you think that might be useful for your team, for the company, for the users at the end and maybe uh next time i will try uh to use extreme templates uh right uh at work not on some personal projects and research right so it's an evolving situation we need to follow your blog to to keep to date. Yeah. And, you know, there's also this thing, I don't know exactly like what am I supposed to share. So usually I say, okay, this might work in a big code base and you might shave off quite some kilobytes, even megabytes of binary size,

Starting point is 00:46:05 but usually I don't share anything exact. Yeah. Well, like any of these sort of optimizations, sometimes a thing you think may have a big effect doesn't, and then the thing you think may have no effect at all has the biggest effect. It can be very variable. Yeah.

Starting point is 00:46:22 And at the end of the day, I think when you read the blog you're looking for ideas and the fact that uh you know we managed to get rid of one megabyte it doesn't really matter to you you're just looking for ideas what could I try in my project in different circumstances? And, you know, you give it a go and it might help you. So if you're looking for ideas, we would recommend to check my blog, search for binary sizes and try the different techniques.

Starting point is 00:47:03 And if they work, leave a comment. If they didn't work, also leave a comment. It's always interesting. Yeah. Just leave a comment. Just leave a comment what happened in your code base. It's interesting. Like, for example, we apply and we accept quite a lot because we know that it will be good for the binary size as well. Plus, it's a good communication technique as well towards other developers.

Starting point is 00:47:39 And still, there might be some situations where it doesn't work and where it would actually increase the binary size. So yeah, it's not a silver bullet either. Yeah, I think we need to get Ben to do some more benchworks for binary sizes instead.

Starting point is 00:48:00 So we'll put a link to your blog in the show notes as well as to your two C++ on notes as well as to your thank you to c++ on c talks when they come out so as we speak i don't think they've been released just yet but i know they're going up like all the videos for c++ on c are starting to go up now so that shouldn't be too long so i'll try to remember to come back and and add them in so you can see that but you mentioned that you've done um talks at c++ on C for the last three or four years. And I remember last year you were doing a talk on clean code, which is something we don't hear a lot about these days.

Starting point is 00:48:33 So interestingly, you brought that up. So do you want to tell us what, first of all, what clean code is and why we should care about it? Yeah. Yeah, so first of all, I find it important to make sure what we talk about, because if it's written capitalized, about at the conference and it's not what i generally think about when i mention clean code so it's hard to define right what clean code is i remember your talk about uh about software quality oh yeah and uh what was the book zen and motorcycle maintenance the art of motorcycle maintenance and then the talk was then in the art of code life cycle maintenance i love your names phil you know just to define quality well the guy wrote at least two books yeah uh robert pierce

Starting point is 00:49:42 yeah he wrote two books at at least two books, just trying to define what quality is. So I didn't spend so much time trying to find the best definition for clean code. But the one I like is code that is easy to understand

Starting point is 00:49:58 and easy to change. And in other words, to me, it means that clean code is an optimization for maintainability. Right, yeah. And I think that in most cases, that's the most important aspect.

Starting point is 00:50:19 I know we enjoy optimizing for binary size or for runtime performance. You go to a conference, you go to the lightning talks, and usually there is a guy there who will speak five minutes about how they decrease the time it takes to print something by 100 times. It's magical. In most cases cases you don't need it you don't care yeah so as i mentioned earlier i was working at sonar until earlier this year and they have their own definition of clean code as well which is a little bit more expensive and useful in its own right but as you you say, everyone has their own definition. And it's useful to define what you actually mean when you say that.

Starting point is 00:51:09 And it usually says something about your own values as a developer, I think. To see what they mean by kingdom. What was your definition, Sonar? I think we need another whole episode for that. We'll get through it. Yeah, it's such a long definition. That's why I like this, this short one.

Starting point is 00:51:28 No, we can extend on it, but. Let's start a new podcast on CleanCard. But believe me, we tried. Okay. So yeah,

Starting point is 00:51:40 in any case, I think that, you know, until other aspects are measured, like you really have to increase runtime performance for some reason, that's not so important as maintainability. Because if it takes so much time to add a new feature, if it takes so much time to understand, track down a bug,

Starting point is 00:52:02 and ship a new version, then you will be out of competition pretty soon. And writing cleaner code, more maintainable code, I think is a good way to make sure that you're, well, it doesn't make sure that your business will be in competition in a in a long time but at least as an engineer you do your best to to ensure that and i also remembered that uh well make sure it was you phil who mentioned the alignment trap oh yes which i stole from somebody else so i can't claim credit for

Starting point is 00:52:45 ellen kelly i think ellen kelly yeah and uh he tried to understand what's more important for business doing the things right or doing the right things and it turned out that in a mid or long-term perspective, it's more important to do the things right than doing the right things. And the reason behind it is that it's very difficult to change your behavior as a human being, not just as an engineer, but as a human being. So doing the things right is very important because that's the thing that is difficult to change. But to change what you do, what exactly you work on, that's much easier to change, much easier to change a project, but much more difficult to change how you do things.

Starting point is 00:53:48 So in that sense as well, clean code is very important for programmers to follow, I think. I'm wondering how the clean code actually works with the modern C++, because you probably know there is this opinion that the modern C++ is too complex and maintaining the code base, it's really a challenge. And there are quite many developers, there are different opinions in C++ on that, for sure. But I mean, does the more complex language

Starting point is 00:54:20 actually make it harder to maintain a clean code? Or on the contrary so what do you think in terms of the c++ current evolution does it help this is a very interesting question i think that uh i think that it helps well i could also say it depends it depends but i think in general it helps because there are many modern C++ features that can make your code more understandable, can make your code more expressive and more bug-free, I would say.

Starting point is 00:54:58 Just, okay, let's think about modern C++ since C++11. If you just think about it, you know, the override keyword. Yeah, you have to maintain your code. You have to add it, but it really increases maintainability of your code. And yeah, when you ask the question, okay, does clean code help?

Starting point is 00:55:25 Again, we go back to the question, what is clean code? What is clean code in C++? And that probably changes with almost every version. So in a way, it makes things more difficult because you have to maintain your code base. But in other ways, it makes things easier, better, because you have the option to make your code

Starting point is 00:55:49 better. So, for example, recently I could eliminate all this to enable if from our code base and replace them with concepts. Yeah, in a certain way it added them with concepts. And yeah, in a certain way, it added complexity, having concepts.

Starting point is 00:56:09 Well, it's another language feature, but in another way, it really helps reducing the complexity and make the code more understandable. So it really depends from which point of view you look look at this question i'd say yeah i mean i agree whatever clears the old macros from my code would definitely help it for sure yeah but like uh i was when i was asking i was thinking mostly about the way that more than c++ allows sometimes allows us doing many things in many different ways. And so you can do different styles now in C++. And that definitely doesn't help in terms of maintainability until you maintain the common style for the whole team,

Starting point is 00:56:52 because if different people just prefer different styles, it's absolutely non-maintainable. I mean, it's really hard. It's a hard task. So that's what I was probably meaning when I was asking about, like the modern language helps helps because there are too many opportunities sometimes. Yeah, but I think even from

Starting point is 00:57:10 the very beginning, C++ offers solutions in different paradigms. You can write your code following different paradigms from the very beginning. So you already had this problem probably on a different scale, but from the very beginning. So you already had this problem, probably on a different scale,

Starting point is 00:57:26 but you already had it. Yeah. Agreed. Yeah. But I would say that I think that C++ as a multi-paradigm language is different to C++ where you can see sort of layers of archaeology

Starting point is 00:57:41 through the language and you've got code bases that have code from all those different layers. That's just adding complexity and noise. So yeah, I completely agree that if all the code is modern C++, whatever that means at any

Starting point is 00:57:57 particular time, it's generally going to be cleaner, I think, but the language itself is more complex. You also cannot just go in and change everything. No. After every compiler

Starting point is 00:58:12 upgrade. Unless you're Herb Sutter and you're developing CPP2. You're doing your own compiler. Tying it back to another user. How about that? For you, I know that I just said I removed all the enable ifs in the whole code base. it back to another user item. How about that? I know that I just said I

Starting point is 00:58:25 removed all the enable ifs in the whole code base. But there were not a lot. So then you can do that. Otherwise I think you should follow more like more granular approach. You

Starting point is 00:58:43 watch a certain piece of code and you just clean up there and, and, but you don't change everything at the same time. Unless again, you have a handful of cases or a few dozen cases and then it's easy to change all at once. Yeah. I had to go for a similar thing with,

Starting point is 00:59:04 um, I started working on catch 23 last year, which I haven't had a chance to get back to but what i did do was do something where we would use lots of different tricks mostly spin a but some other things as well all got replaced with concepts and it was a fraction of the size so much more understandable so much more reliable so concept is definitely a really big win when it comes to simplifying code but it's not a simple feature in its own it's just simpler than what we were doing before so i think we are running out of time so we probably do need to wrap up there come to a clean end for the episode i think so before we do let you go sandor is there anything else you want to

Starting point is 00:59:52 tell us about or or let us know where people can reach you if they want to follow up yeah so if you're interested in binary sizes for example, or in new C++ features, then please check out my blog. It's sandor.go.com. And you're more than welcome to leave any comments. So one thing that I want to emphasize about my blog, that it's not

Starting point is 01:00:29 necessarily about the newest, coolest things. When I started to write, I think about seven years ago, I said that, okay, that's a tool to document my learning process.

Starting point is 01:00:46 So if I learn about something new, by new I mean new to me, I will write about it because it will help me to understand it better if I have to write about it. And if others read it, it will also enforce me to be more punctual about what I learn and what I write. So it started out as a learning tool for me. And I've been posting on a weekly basis for the last seven years or so yeah i think they're often the most valuable types of um of articles because you're you're at that point where you're you

Starting point is 01:01:34 still don't quite have the curse of knowledge which makes very difficult to explain things to other people but if you've just been through that transition yourself and you're still fresh then you've got that right perspective to be able to say, this is what made it click for me. Maybe it'll make it click for you. You know, that's an excellent point, Phil, because, well, I usually don't care about these pay-few numbers, but I do check them from time to time, like, you know, once in two, three months. And interestingly, maybe like 50% of all views comes from one single article.

Starting point is 01:02:11 It's about how you use parameterized tests with Google Tests. And I wrote that article because we tried to use it one day in a coding dojo at one of my previous teams. And we spent at least an hour trying to understand it from documentation and make it work. So did I write it down? No.

Starting point is 01:02:38 A few months later, we did the same thing again. And we didn't remember. And that time, I wrote it down. And that's the most viewed article on my blog, interestingly. And it was really helping myself and my team. Excellent. So I'll put a link to that in the show notes, just to give you a few more page views so that you can read them.

Starting point is 01:03:08 Interestingly, since then, the documentation, the official documentation, also got much better. Right, right. Well, they probably read your article and incorporated it. Maybe.

Starting point is 01:03:20 Anyway, that's a nice wrap up for the show. So thank you, Sandoral for coming on and telling us all about how to reduce binary sizes how to make your code clean and maybe even how to do parameterized tests in google test and thank you anastasia for coming on and being an excellent co-host again yeah thank you for watching me. Thank you for talking to me. It was great. Thanks a lot. Thanks a lot for having me here.

Starting point is 01:03:48 It's been a great experience. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in

Starting point is 01:03:59 or if you have a suggestion for a guest or a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com. The theme music for this episode was provided by podcastthemes.com.

CppCast - Reducing Binary Sizes

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.