CppCast - Reducing Binary Sizes
Episode Date: August 9, 2024Sándor Dargó joins Phil and Anastasia Kazakova. Sándor talks to us about why and how to reduce the final binary sizes your code produces, as well as the importance of clean code. News "cpp...front: Midsummer update" Reddit Thread cpp2 episode from last year AutoConfig: C++ Code Analysis Redefined (Sonar) “noexcept Can (Sometimes) Help (or Hurt) Performance” - Ben Summerton Links Binary Sizes posts on Sándor's blog Sándor's books "Parameterized testing with GTest" - Sándor Dargó "How to keep your binaries small?" - Sándor's C++ on Sea talk(s) (will add video links when available)
Transcript
Discussion (0)
Episode 388 of CppCast with guest Sandor Daga, recorded August 5th, 2024.
In this episode we talk about the latest updates in CPP2,
autoconfig for SonarCube,
and whether NoExcept improves performance.
Then we're joined by Sandor Dargo.
Sandor talks to us about binary sizes and clean code. Welcome to episode 388 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Phil Nash, joined by my co-host for today, Anastasia Kesikova.
Anastasia, how are you doing today?
Good, good. Thank you, Phil, for having me here.
Hi, very welcome. Welcome back. Good to have you here. Timur's still away, of course, until
September. I'm actually going to be away for a couple of weeks as well, because Timur is not
back until September. I'm going to have a little bit of a break, so it's going to be about four or five weeks before the next episode as this one is released.
So we may still have one more guest co-host yet.
We're not quite sure yet.
But you may have the honour of being the last guest co-host
for this season, so welcome to the show.
Sounds good.
It's actually great to be back.
Yeah, generally happy just to join you here.
You haven't been here for quite a long while?
Yes, it's been a while. What have you been up to in that time?
Many, many things, mostly busy at my work.
But also a very exciting thing, which I'm super excited about, is the C++ Under the Sea.
You probably might know where the name is coming from, right?
I can have a guess. Yeah, we've mentioned it a couple of times on the show.
Yeah, so C++ Under the Sea, for those who don't know,
it's a new conference in the Netherlands,
and it's a C++ conference, for sure.
And it's Under the Sea, you might guess why,
because it's Netherlands.
But it's actually in Breda.
And as far as I know, Breda is like three meters elevation.
So it's not actually under the sea, but we'll do our best.
Right.
Well, don't let accuracy get in the way of a good name.
That's what I say.
Yeah.
Yeah, we'll try.
Actually, I would invite everyone to join us.
So the program is not yet published, but we've already announced Jason Turner as a keynote speaker, and we
announced workshops, so we have you, Phil,
for the workshop, and Mateusz
and also Jason Turner as well
for the workshops. I
can't really tease you all
the program, but I have to say that we already
accepted to talk about the boost geometry,
which I find personally very exciting,
and also about the spaceship operator,
which I also quite like.
And the program is coming soon, so believe
me, I need program media chairs
so I'm promising you the program quite soon
actually. Take it as a commitment.
Well announced out on the show when that
arrives, so looking forward to that.
Yeah, please do. Good to see another new conference.
Okay, well at the
top of every episode we like to read
a piece of feedback. uh nick stone sent us
in an email which uh reads in part many thanks for all the work you do in producing this podcast
i enjoy it and look forward to it but i have to say the episode on stood execution was way too
advanced what's the point of a talk that only the best c++ gurus can understand i've been working
in c++ for 30 years but almost all of that talk was over my best C++ gurus can understand. I've been working in C++ for 30 years,
but almost all of that talk was over my head, and I've implemented concurrency.
So Nick goes on to suggest that for more advanced topics like that, maybe we should insert
a couple of minutes of extra explanation in simple terms. Well, thanks for the feedback,
Nick. I mean, we do try to vary the topics and sort of
the level that we cover them at, while at the same time, you know, trying to keep them as accessible
to everyone as much as we can. And maybe we missed the mark on that particular one. But
it is an inherently complex topic, which I think is quite hard to actually try to explain in simple
terms, especially in a couple of minutes.
I think in that episode,
Timur actually said that he didn't understand it
even after a couple of hours of explanation from an expert.
So maybe that's just going to take a little bit longer to take that one in.
There are some topics like that.
And I expect within the next year or two,
there's going to be loads of conference talks about it.
It's going to be just like co-routines were the last couple of years.
Everyone wants to do a talk on it, try and explain it in their own way. You have to watch more than one just before it clicks. talks about it. It's going to be just like co-routines where the last couple of years,
everyone wants to do a talk on it, try and explain it in their own way. You have to watch more than one just before it clicks. So it's probably going to be a bit like that. So if you didn't follow
that episode, don't worry, we've got more and we're going to cover some more topics.
So we do like to hear your thoughts about the show. You can always reach out to us on
xMaced, LinkedIn,
or you can email us at feedback at cppcast.com.
Now joining us today is Sandor Dargo.
Sandor is a passionate software craftsman
focusing on reducing the maintenance costs
by developing, applying, and enforcing clean code standards.
His other core activity is knowledge sharing,
both oral and written, within and outside his employer. When not reading or writing, he spends most of his
free time with his two children and his wife baking at home and traveling to new places.
Sandor, welcome to the show. Hi Phil, hi Anastasia, hi everyone. Thanks a lot,
thanks a lot for having me here. Yeah, great to have you here, Sander. So I was just wondering, I saw recently that you were presenting at C++ on C, like another
conference with a fantastic name, I would say. And I saw that your talk was actually scheduled
in the East Const room. So my first very important question, is it like East Const or Const West for
you? And the second question is how much you enjoyed the conference. Well, let me start with the east coast
and west const then.
You know, I'm all for consistency
and this could mean that
I would go with the east const, right?
Because they say that's the most
consistent solution,
but that's not really what I meant.
I try to follow the existing guidelines
and conventions of
a project, so I will just go
with whatever that
is already used there.
Well, I don't think that
consistency, the so-called
foolish consistency, should
stop us from making things better.
But
the question of cost is not the hill
that I will die on.
And yeah, but
CPP&C, I absolutely
love the conference. I told
the organizers
at the end that
it
kind of feels like going
home after a few
years. That was the first conference where I presented during COVID.
So that time it was virtual.
And I think I've been there actually in Folkestone three times by now.
And, you know, people are kind.
They want to learn and they want to share.
And they also help out.
So, for example, during my presentation, I was asked a question.
I don't remember what it was, but I couldn't give a proper answer.
And, you know, someone just helped out from the audience and explained that part.
And they managed to do that in a way that I didn't feel bad about it.
So, yeah, people are really kind.
I also think that it's just about the right size, C++ on C.
It's enough to have people with all different backgrounds,
industries, seniority levels,
but it's still human. You can
easily make
human connections.
So if there's only one
conference in a year where I apply to,
it's definitely C++ on C.
Thank you.
I don't think it's the mark of a good conference
and hopefully you do think C++ on C is a good conference and hopefully you do think the c++
on c is a good conference that so you you go for the technical talks but you come back for the
community and i think that's what definitely what i hear a lot from people so exactly yeah that's
worked out for you too all right well we'll get more into what you have been working on and talking
about at c++ on c and other places in a few minutes.
But before we do that, I've got a couple of news articles to talk about.
So feel free to comment on any of these.
So the first one is CPPfront or CPP2, the sort of new language from Herb Satter.
He was on a year or so ago talking about that.
So if you haven't
heard that it's um he's calling it syntax 2 for c++ so it's really c++ under the hood in fact
cpp front is a play on uh c front the original c++ compiler that was actually a transpiler down to c
so cpp front the transpiler down to c++ And he's using it as a bit of a,
partly as a bit of a playground for some of his ideas for proposals for C++.
Some of them have actually already gone in.
Some will probably never go in.
Some are already still in the running.
But I think, you know,
you can take it seriously as a language in its own right as well.
We'll see where that goes.
It's good to see that he is continually developing it despite a busy first half of the year he's got
actually some quite big features in um just a couple of highlights from here so there was a
what he calls the tercest function syntax so you may have heard of the proposed terce lambda syntax
for c++ which doesn't seem to have gone anywhere yet.
But this one is even terser,
so the terse is function syntax.
It's just colon, then arguments in parentheses,
and then the actual function body
with no curly brackets.
So it seems to be a little bit too terse for some people
if you read the Reddit comments,
but interesting that he's trying to experiment on that front.
There's quite a bit of C++ 23 and c++ 26 catch up because he does want to keep the language current the languages that he's generating but also the the features in cpp2 itself
one of the big ones is the ufcs unified function call syntax which is had in from the start i believe so you can call
functions as their methods or methods as if they're functions interchangeably but because
it's so seamless there's not really any way to sort of tell what the difference is you know if
there's any performance difference or or otherwise between that and a more direct syntax so he's
actually introduced this sort of opt-out syntax with two dots.
So instead of just doing dot and then a member name,
you do two dots and a member name, and it always has to be a member,
just so you can compare.
So I thought that was interesting.
What makes it particularly interesting is he's also introduced
another piece of syntax, which is three dots.
And that's for, well, what is originally for the half open range
so similar to actually many other languages have a syntax like this we
eventually you know start and an end and use something in between to indicate whether
in particular whether the the last element is part of the range or not so the half open range
starts at the first element and finishes one before the end.
So just like iterators, I mean, normal C++. And then the closed range is a dot dot equals,
which means it includes the last element of the range. So if you're counting from one to 10,
for example, it's natural to include the number 10 in that. It's useful to have both.
What's interesting there as well well apart from just the potential
confusion with the um unified function core syntax is that there's a whole reddit thread
about this and they actually convinced him to to change this to rather than having dot dot dot be
the half open range there's like a default range make it more explicit with um dot dot less
than for the for the half open range so you have um the idea is that you finish before you know
less than the last element that's the idea of the syntax so you've got dot dot equals dot dot less
than and no dot dot dot and then that clears up all of the the ambiguity and everybody's happy so
it was quite interesting to see that actually unrolled during the reddit thread yeah reddit
thread was actually much more impressive than the original post i have to say because i'm not
following the cpp thread updates uh like on a regular basis mostly like on conferences when i
listen to herb but uh here like the red Reddit thread has these fantastic opinions from many people.
And moreover, they have this table which they built from different languages and how in different
languages the half-open and the closed range are implemented. And you can compare Swift and
say other languages, Kotlin. It was interesting to see because sometimes i'm like
oh yeah they're just all the same they're using very similar syntax but then these very very small
tiny details differ and you look at them and you're like oh wow really and it was quite impressive
that i actually convinced for a change so i was yeah really impressed with how the reddit actually
worked and just you know i'm just talking yeah i think it was quite persuasive seeing it all laid out there you can see as you say there's a lot
of similarity but in some cases different languages chose the same operator to mean different things
so if you move between those languages that's that's really hard but if you've got really
unambiguous operator names then um i think somebody said that now CPP2 has the least ambiguous of all of them.
That's the whole idea of CPP2, to be a playground for these sort of things.
So I think that's a big plus.
Okay, so the next news item is autoconfig.
C++ code analysis redefined.
So you may remember that last year we had abbas sabra on uh he was
my colleague at the time at sonar before i left sonar and he was talking about automatic analysis
which is a type of auto config or zero config way to run the the sonar analyzers but only on the sonar cloud at the time which means
hosted uh on sonar servers only really works with you know cloud-based build systems in general
there are a few limitations not everybody could use it now that's come to sonar cube which you
can run on premises on your own hardware uh so it's actually quite a quite a big step up to to
make that run in that way i don't know
it sounds like a little thing oh now it's come to this this new way of doing it but behind the
scenes there's a lot of work that's gone on to to make that happen so i'll put the link into the
article there if that's something that interests you and the third article is so we talked before about an article from Benjamin Somerton, where he measured the impact of the final keyword, because it often be mentioned that it improved performance, but nobody seemed to be showing any benchmarks.
And I found that the truth is actually a little bit more subtle than that.
I would have done another one, but now for the no except keyword.
So the full title, no except for the noexcept keyword.
So the full title, noexcept can sometimes help or hurt performance.
So you can probably guess that, again, the reality is it's a little bit different to what we might have expected.
In fact, at some point, he asked a question.
So did it have an impact?
The short answer is yes, but also no.
It's complicated and silly.
So if that gives you a taster of what the article might be about it's a packed full of charts and numbers and other little stories around how
you had to run things an extra time to actually get useful results out and that sort of thing
quite a long article but if that's something that interests you very well worth having in our canon
so when we do talk about using no except for performance reasons,
although he does go into some of the reasoning behind why we say that,
here's some actual hard numbers that show that perhaps more often than not,
it actually decreases performance.
But I'll let you decide.
Yeah, there were quite plenty of numbers, I would say,
that I read through them.
And my biggest question was indeed, like, why this is happening?
Like, why some specific combination actually wins on top of others?
And there are not that many, like, explanations or discussions about that.
But maybe someone else could write a follow-up in the blog post actually explaining why this happens. There are some discrepancies in terms of
when the Microsoft compiler is used
and some specific combination with GCC.
It's just interesting to understand
what's happening under the hood.
Yeah.
It's a long article,
but I really loved his approach.
Once again, he went for the benchmark.
Whenever we talk about optimizations, we should always measure.
And, you know, it turned me about because when I try to see how no except influences the binary size, normally everyone will say, yeah, it's good for the binary size.
It will decrease.
And in most cases, it's good for the binary size it will decrease and uh in most cases it's
true but uh i found some cases where actually the binary size went up and it turned out that it's uh
it's a compiler bug ah yeah that's that's always a possibility, which may even go some way to explaining some of Ben's results.
Yeah, I think, yeah, to me, his main message is that don't forget to measure if you hear something that this will help you.
And that's the most important.
Yeah. And in particular, measure in the environment that you are actually developing for. So don't necessarily go to all the work that Ben has to try to compare it across many different environments for different toolchains.
But of course, that will also change over time as well.
So you got in a mention of the binary sizes there, which is a great segue into our interview, because you did a couple of talks actually we sort of did them
back to back so we made it like a mini workshop but uh two 90 minute sessions or was it 90 and
60 i forget but two back-to-back sessions at c++ on c just on the subject of reducing binary sizes
which is not something we we often talk about we usually do do just talk about performance and things like that.
But why is binary sizes so important that you had to do two talks on it?
Yeah, it's actually, yeah, it was 150 minutes.
It was long and tiring.
But the good news for me was that the vast majority of people came back for the second part. So that was a good message that people actually found the topic important and useful.
Well, I have to be honest, probably it's not important for everyone.
In my previous job, binary size meant something completely similar than what it means now.
There, our biggest struggle, let's say, was, say, we had about two dozen different backend services on the server.
And these services, the executables, shared the same shared library.
Well, actually quite a few shared libraries.
And sometimes we run into storage issues.
There, the solution was simple enough.
We just had to limit the number of versions
that we use, these shared libraries.
But now, well,
when we resize means something completely different,
we actually want to limit
the size of our C++ core library
that we share among the different environments,
the different languages,
like Android, iOS.
Right. So actually
now,
binary size is important
for us for three different
reasons.
Right. For one,
you know, there are markets where
the bandwidth
or even the data plans
are limited.
No. It's a matter of device.
It's a matter of your mobile contract.
And third, a smaller binary size,
even reduce the CO2 emissions.
That's what our data scientists found,
but I have no details on that. At least
I heard some data, but I don't remember. But I was really surprised. But from time to time,
we mentioned certain studies about different programming languages where you find that C++ and C and Rust, they are
much more efficient when it comes to energy consumption than other languages like Python,
for example.
So I think it's completely possible. But there were many others at these talks
who it means something different.
The binary size for them is actually really a storage limitation.
Because on your mobile, most probably it's not really important
in most cases, whether it's 100 or 105 megabytes.
For some, yes, but in most cases, no.
But in the embedded world,
well, you will have some real limitations.
And there, we are not reaching the size of 100 megabytes.
Sometimes not even a megabyte, right?
Yeah, I was developing for iphone back in the early
days of the the iphone store and i don't remember exactly when it changed but uh but back then at
least if your binary size was greater than 50 megabytes it wouldn't download over over mobile
yeah you have to wait until you've got on a wi-fi connection so that was was definitely, I mean, that's quite a small size
for binary these days, particularly something like a game
with images and audio files and that sort of thing.
So I think there is possibly still a limit,
but it's just much bigger now, so we don't really think about it.
But certainly that used to be the case.
Yeah, I remember that, you know, I had a data plan of 2 gigabytes maybe,
and I paid more than I pay now for 80 gigabytes.
So here it's less of an issue, but there are certainly markets where it's a problem.
Yeah, I guess embedded market is trending for these limitations.
They are the people who do care.
They don't have these capabilities of iPhone, like we know.
One more, you even have a book on Linpub on that, right?
So I guess the topic is actually quite popular that you've written a whole book on that.
Usually what I do is that if I write a lot of blog posts
about a certain topic, then I try to collect them,
I go deeper, and I publish those books on LeanPub.
That probably means that you have quite many solutions
to the problem, right?
Quite many techniques.
Yes, yes, actually.
Well, there are endless techniques to limit the binary size.
Oh, wow.
They're quite different.
You know, there are many different approaches. And I would say what you want to do with your code,
it really depends on how desperate you are in limiting your binary size.
Because there are certain techniques that are as simple as just activating a new compiler or linker flag. And funny enough, when I started my quest to limit the binary size,
I completely neglected those.
But we'll probably talk more about it later.
And then there are certain techniques that are best practices anyway,
and they will make your code cleaner.
That's another topic I
care about.
So, for example,
you shouldn't
have virtual destructors
if
you're not going to use them,
if you're not going to inherit from that
class, and it will also help your
binary size.
You should follow the rule of five, which will,
I don't say it will help your binary size,
but it might help your binary size.
And it's a best practice anyway.
And you know, to make const x for as many things as you can,
that's also quite in the trend, and we
consider it useful. It will
help your binary size as well,
in most cases.
And using
minimal templates, well, it will
you can see
this code extraction, which
will make your code
in most cases, if used wisely,
it will make it better, plus it will decrease
your binary size. So there
are these techniques that are best practices
anyway. But
when it comes to any kind
of optimization,
then you will run into techniques
which are
clearly compromises.
Like, there's
an interesting
one. Using, defaulting
your special member functions in the implementation file,
so in the CPP file.
And I remember this question came up in my head
a few years ago.
I didn't care about binary size at that time.
I was like, okay, in most cases,
we want to provide the implementation
for a certain function in the CPP file.
So when we default special member functions,
where should we do that?
Does it make sense to do it in the header file?
Or just like for any other function,
should we do it in the implementation file?
So this question came up to me a few years ago.
And I posted a question on X, and I mentioned a few C++ trainers.
And I remember that quite a long discussion came out of it.
And the conclusion was like, why on earth would you do anything like that?
Because if you want to default your special member function,
well, basically what you are saying
is that
there's nothing special going on here.
For some reason, I have to provide
these
special member functions
by myself.
But there's nothing special.
But if you do it in the implementation
file, well, you lie.
Because if it's not in the header, it implies that, well, actually there is something special ongoing.
But then you just go to the implementation file. Well, actually the compiler will go to the implementation file and just realize that, okay, there's nothing special going on there. And with that, you
lose the, well,
the compiler loses the ability to
perform certain kind of optimizations.
But at the same time,
it
limits inlining.
And
if it's a widely
used class, it
might help you on the front of the binary size.
So it's clearly a compromise to do that, not only because of possible runtime performance drawbacks, but because of readability.
The first
moment, you see that you
think, what on earth is going
on here? Why
someone did that?
So if
you decide to do that, I highly
recommend that
you document it at a
central place. Otherwise, someone will come and start removing them
until someone else reminds the person not to do that.
So I was there.
I did that.
Well, I didn't lose a lot of time, but still.
Now it's well documented.
Yeah, that is a good use of comments to explain
why you did it this way instead of some different way
so that somebody doesn't go and change it
thinking they know better.
Yeah.
So that's definitely a compromise to me.
And there are other techniques like using external templates
or to think about the initial values of
class members, which
might be good
for your binary size, and
you might want to do that.
You might want to use these techniques, but
sometimes they will
decrease the maintainability
of your code or the expressiveness.
So if, okay, talking about the initial values of class members,
if you use basically zero default values everywhere for members, then the compiler can just zero fill all the bytes
necessary. It doesn't have to fill the bytes, well, it doesn't have to fill the memory
with certain specific values, and it might help you. But it might also limit the expressiveness of your code
if you can't use whatever initial value that you want.
So probably I would say that if you have certain classes
that are instantiated, I don't want to say any numbers
you instantiate
them a lot of times then
think about these but
if that's not the case probably
that's not so
important for you
and you know there are
also some techniques that are almost
baked into a project
from the very beginning
do you rely on runtime type information or also some techniques that are almost baked into a project from the very beginning.
Do you rely on runtime type information or
exceptions?
You can turn these off and you can
gain
quite some space.
Not necessarily,
but often.
But if you don't
think about it from the very beginning, it might
become very difficult. So for example, if you overuse dynamic casts, then removing
all those later, well, it can be kind of a challenge. I would say that in most
cases it's probably a good thing to do but i know this could be a hot topic
not everyone would uh would agree on that but i do think that without dynamic us you will end up with
uh with cleaner code i can agree with that and uh and also a smaller binary. So I think that's useful.
And with the compiler flag dash f no rtti,
you can just disable using certain keywords like dynamic cast.
So I think that's a good thing.
Well, the exception is that that's another topic.
That's a big topic.
Yes, that's a big topic.
Maybe it's like the back to the non-except discussion.
Yeah.
Looping.
But, you know, we looked into turning exceptions off
maybe a year ago,
and it's just not possible for us.
Right.
Well, it would cost too much.
But probably we could gain some megabytes there,
but it's way too much work.
I imagine a lot of embedded projects have exceptions to say what anyway,
so maybe they're already getting that benefit.
Yeah, maybe that's also why they turned exceptions off.
It can be a factor, but I think usually it's other reasons,
particularly how they're going to handle exceptions.
It's quite a discouraged behavior, I guess,
embedded exceptions by everyone.
It's true.
Some of the other techniques you mentioned there,
particularly about sort of inlining defaulted constructors
or special member functions or initializing member variables,
sound like the sort of things that, at least when I first think of it,
I think Charlotte is not going to make a huge difference, is it?
But have you actually seen big reductions in binary size
by using those techniques specifically?
Well, they do add up.
But that's a very good question.
And that's a discussion I had maybe a year ago with some of my colleagues,
and I clearly didn't communicate clearly enough in the beginning.
Well, if you think about all these techniques, they are fun.
We are developers, and we try to code different ways and see the effects.
And we said, wow, OK.
I removed all the unused virtual destructors.
And we gained, I don't know, 8 kilobytes.
How cool is that?
And those kind of things and usually when you do these small things
you are on the scale
of kilobytes
depending of course on the size of the
project
but when you
try and you compile
your linker settings
you play on the scale of
megabytes
so that's
a bit different.
So yeah, what should I say?
I think both are important.
So writing code that is good, that is not bad for your binary
is a good thing.
It's good if you know how to do that.
I don't like if your code only satisfies certain criteria
because the compiler can do something for you. But you cannot ignore the compiler and linker settings.
Those are the first things you should look at.
Because, well, if you are not so desperate,
as I mentioned earlier, then probably you should just
start with these settings and think about using OS,
where OS, so O is for optimization,
S is for size.
And you should just try it first.
And you don't give up runtime performance
because if I'm not mistaken,
OS is based on O2 so it's already quite optimized
but it doesn't include certain techniques that would make your boundaries bigger such as loop
unrolling it won't replace your for loop with I don't know how many identical instructions because it's
not good for your binary.
So that's obviously the first thing
you should try.
So you mentioned
earlier about some techniques to reduce
inlining explicitly,
but maybe if it's
optimizing for size, it will do less inlining
to start with and you may not need to do that.
I think so, yeah.
That's really the thing.
And, well, I found compiler setting dedicated for some inlining threshold.
I found it a bit exotic.
I don't remember exactly its name.
It works in LLVM.
It was LLVM inline threshold,
something like that.
And you can set numbers
somewhere between zero and a thousand.
And it's a big range.
And I didn't find it very well documented.
Sometimes if you decrease it too much or increase it too much,
you get a result completely different from what you would expect.
But it's something you can experiment with
because with certain settings,
you might gain quite a lot.
But you have to measure.
There, you really have to measure. Yeah, I wonder if it may inline it past the point
that the inlined code would actually be smaller
than generating the function call.
I think that, yeah, that could actually happen.
And a technique that I, I think that could actually happen.
A technique that I... Well, it's not a technique for reducing the binary size or inlining,
but a useful technique if you want to benefit from this,
measure the binary size in your CI pipeline
and post it back to the main page of your pull request.
Yeah.
I'm talking about the CI pipeline actually,
wondering so how the process looks like for you.
I mean, you know quite a lot of techniques,
you shared some of them they all
quite interesting and probably when you look at the code you're not just like applying random
techniques but you see some criteria and you're like okay so this technique probably will work
better here so is it some kind of i don't know abstract tooling or something and you apply
techniques and measure and see what fits the best uh so i can't stop
thinking about that as a possibility for tooling here because every time i hear this kind of
frameworks and techniques and like things you can apply step by step that like brings this you know
lighting in my head that that could be a nice tooling or at least a nice approach nice process
so i'm just wondering how you specifically do that.
So do you have some process?
Well, I would say, yeah, no, not really. We don't really have a process for trying these things,
but we do measure the effects of a pull request on your binaries.
I say, again, at the end of the build, you see exactly how many bytes you shaved off
or actually you gained by applying a change.
And if you're really curious, it goes down even
to a section level.
But usually, that's just way too much information.
Okay.
And, you know, we try to do our best to reduce the size when you think about reducing the
size, but applications also, you know, gain new and new features.
If you don't do that, you die. Some die. So that will obviously increase the binary size.
So sometimes we have these discussions
that, okay, for a certain feature,
we have three different ways to implement it.
And in a small scale, we tried all three,
and we saw how much it should have gained with each. And we chose the one that did the least amount of binary size.
Yeah, sounds like a nice requirement for the code.
Like you have to implement this functionality
and also you have no more than this megabytes of binary size.
That's good.
Yeah, well, we don't have that
exactly. We don't have it
as a non-functional
requirement. Actually, we could,
but
we developers
are
kind of in this mindset
and even without
the named requirement,
we think about it.
Yeah, I guess in some systems, the requirement is just natural.
But it just simply doesn't fit.
So there is a natural requirement for that.
Especially if, you know, at each pull request, you see these numbers.
And you see a smiley next to the number that grows a lot.
Are you often rejecting people's requests because of that?
I'd say no.
No, but I do comment on it sometimes.
Okay.
But I also trust people that they already tried different things to limit the effects.
And in some cases, there are just no other options.
Okay.
So you mentioned the linker a few times.
Now, I know you've been talking about reducing the binary size, so presumably the final executable.
But are you concerned with like objects and library sizes that the link has to work with as well?
Or is that a completely different topic?
And for context, the reason I'm asking this, because I know some of the things you mentioned,
like, um, you know, heavy use of templates can often lead to very big object file sizes.
And then the link has a lot of work to do to de-duplicate them.
And it won't necessarily impact the final executable size so much,
but it can have its own problems with these large object files.
Okay.
Now I understand your question.
Thanks for detailing it.
No, not really.
We mostly care about the final size.
Okay.
Interesting.
Because you did mention templates there,
and maybe some of the same techniques you're talking about can help that as well. Now you mentioned extern template,
that was something that we were trying to use a lot on a project I work on to reduce
object file size, give less work for the linker. We never quite got it to work as well as we
had hoped, but...
You know, all these different techniques, I haven't tried everything yet at work.
I haven't tried everything at work.
So some of these things I tried at work or we experimented with those things at work, but not with everything.
And actually, yeah, Xtend templates is something that I would like to try one day.
So, you know, we have this notion of the idea of tech weeks
when, you know, you just work on anything
that you think that might be useful for your team,
for the company, for the users at the end and maybe uh next time i will try uh to use
extreme templates uh right uh at work not on some personal projects and research right so it's an
evolving situation we need to follow your blog to to keep to date. Yeah. And, you know, there's also this thing,
I don't know exactly like what am I supposed to share.
So usually I say, okay, this might work in a big code base
and you might shave off quite some kilobytes,
even megabytes of binary size,
but usually I don't share anything exact.
Yeah.
Well, like any of these sort of optimizations,
sometimes a thing you think may have a big effect doesn't,
and then the thing you think may have no effect at all
has the biggest effect.
It can be very variable.
Yeah.
And at the end of the day,
I think when you
read the blog you're looking for ideas and the fact that uh you know we managed to get rid of
one megabyte it doesn't really matter to you you're just looking for ideas what could I try in my project in different circumstances?
And, you know, you give it a go and it might help you.
So if you're looking for ideas,
we would recommend to check my blog,
search for binary sizes and try the different techniques.
And if they work, leave a comment. If they didn't
work, also leave a comment. It's always interesting. Yeah. Just leave a comment.
Just leave a comment what happened in your code base. It's interesting.
Like, for example, we apply and we accept quite a lot
because we know that it will be good for the binary size
as well.
Plus, it's a good communication technique as well
towards other developers.
And still, there might be some situations
where it doesn't work
and where it would actually increase
the binary size. So yeah, it's not
a silver bullet either.
Yeah, I think we need to get
Ben to do some more benchworks for
binary sizes instead.
So we'll put a link
to your blog in the show notes as well
as to your two C++ on notes as well as to your thank you to c++ on
c talks when they come out so as we speak i don't think they've been released just yet but i know
they're going up like all the videos for c++ on c are starting to go up now so that shouldn't be
too long so i'll try to remember to come back and and add them in so you can see that but you
mentioned that you've done um talks at c++ on C for the last three or four years.
And I remember last year you were doing a talk on clean code, which is something we don't hear a lot about these days.
So interestingly, you brought that up.
So do you want to tell us what, first of all, what clean code is and why we should care about it?
Yeah. Yeah, so first of all, I find it important to make sure what we talk about, because if it's written capitalized, about at the conference and it's not what i generally
think about when i mention clean code so it's hard to define right what clean code is i remember
your talk about uh about software quality oh yeah and uh what was the book zen and motorcycle maintenance the art of
motorcycle maintenance and then the talk was then in the art of code life cycle maintenance
i love your names phil
you know just to define quality well the guy wrote at least two books yeah uh robert pierce
yeah he wrote two books at at least two books, just trying
to define what quality is.
So I didn't spend so much time trying
to find the best
definition for clean code.
But the one I like is
code that is easy to
understand
and easy to change.
And in other
words, to me,
it means that clean code is an optimization
for maintainability.
Right, yeah.
And I think that in most cases,
that's the most important aspect.
I know we enjoy optimizing for binary size or for runtime performance.
You go to a conference, you go to the lightning talks,
and usually there is a guy there who will speak five minutes
about how they decrease the time it takes to print something by 100 times.
It's magical. In most cases cases you don't need it you don't care yeah so as i mentioned earlier i was working at sonar until earlier this
year and they have their own definition of clean code as well which is a little bit more expensive
and useful in its own right but as you you say, everyone has their own definition.
And it's useful to define what you actually mean when you say that.
And it usually says something about your own values as a developer, I think.
To see what they mean by kingdom.
What was your definition, Sonar?
I think we need another whole episode for that.
We'll get through it.
Yeah, it's such a long definition.
That's why I like this,
this short one.
No,
we can extend on it,
but.
Let's start a new podcast on CleanCard.
But believe me,
we tried.
Okay.
So yeah,
in any case,
I think that,
you know,
until other aspects are measured,
like you really have to increase runtime performance for some reason,
that's not so important as maintainability.
Because if it takes so much time to add a new feature,
if it takes so much time to understand, track down a bug,
and ship a new version,
then you will be out of competition pretty soon.
And writing cleaner code, more maintainable code,
I think is a good way to make sure that you're,
well, it doesn't make sure that your business will be in
competition in a in a long time but at least as an engineer you do your best to to ensure that
and i also remembered that uh well make sure it was you phil who mentioned the alignment trap
oh yes which i stole from somebody else so i can't claim credit for
ellen kelly i think ellen kelly yeah and uh he tried to understand what's more important for
business doing the things right or doing the right things and it turned out that in a mid or long-term perspective, it's more important to do the things right than doing the right things.
And the reason behind it is that it's very difficult to change your behavior as a human being,
not just as an engineer, but as a human being. So doing the things right is very important
because that's the thing that is difficult to change.
But to change what you do, what exactly you work on,
that's much easier to change, much easier to change a project,
but much more difficult to change how you do things.
So in that sense as well, clean code is very important for programmers to follow, I think.
I'm wondering how the clean code actually works with the modern C++,
because you probably know there is this opinion that the modern C++ is too complex
and maintaining the code base,
it's really a challenge.
And there are quite many developers,
there are different opinions in C++ on that, for sure.
But I mean, does the more complex language
actually make it harder to maintain a clean code?
Or on the contrary so what
do you think in terms of the c++ current evolution does it help this is a very interesting question
i think that uh i think that it helps well i could also say it depends
it depends but i think in general it helps because there are many modern C++ features
that can make your code more understandable,
can make your code more expressive
and more bug-free, I would say.
Just, okay, let's think about modern C++
since C++11.
If you just think about it, you know, the override keyword.
Yeah, you have to maintain your code.
You have to add it,
but it really increases maintainability of your code.
And yeah, when you ask the question,
okay, does clean code help?
Again, we go back to the question, what is clean code?
What is clean code in C++?
And that probably changes with almost every version.
So in a way, it makes things more difficult
because you have to maintain your code base.
But in other ways, it makes things
easier, better, because you have
the option to make your code
better.
So, for example,
recently I could
eliminate all this to enable
if from
our code base and replace them with
concepts.
Yeah, in a certain way it added them with concepts. And yeah, in a certain way, it added complexity, having concepts.
Well, it's another language feature, but in another way, it really helps reducing the
complexity and make the code more understandable.
So it really depends from which point of view you look look at this question i'd say yeah i mean i
agree whatever clears the old macros from my code would definitely help it for sure yeah but like uh
i was when i was asking i was thinking mostly about the way that more than c++ allows sometimes
allows us doing many things in many different ways. And so you can do different styles now in C++.
And that definitely doesn't help in terms of maintainability
until you maintain the common style for the whole team,
because if different people just prefer different styles,
it's absolutely non-maintainable.
I mean, it's really hard.
It's a hard task.
So that's what I was probably meaning when I was asking about,
like the modern language helps helps because there are too many
opportunities sometimes.
Yeah, but I think even from
the very beginning,
C++
offers solutions
in different paradigms. You can
write your code following
different paradigms from the very beginning.
So you already had this problem
probably on a different scale, but from the very beginning. So you already had this problem, probably on a different scale,
but you already had it.
Yeah.
Agreed.
Yeah.
But I would say that I think that C++
as a multi-paradigm language
is different to C++
where you can see sort of layers of archaeology
through the language
and you've got code bases that have code from
all those different layers. That's
just adding
complexity and noise.
So yeah, I completely agree that
if all the code is
modern C++, whatever that means at any
particular time, it's generally going to be
cleaner, I think, but
the language itself is
more complex.
You also cannot
just go in and change everything.
No.
After every compiler
upgrade.
Unless you're Herb Sutter
and you're developing CPP2.
You're doing your own compiler.
Tying it back to another user.
How about that?
For you, I know that I just said
I removed all the enable ifs in the whole code base. it back to another user item. How about that? I know that I just said I
removed all the enable ifs
in the whole code base.
But there were not a lot.
So then
you can do that. Otherwise
I think you should follow
more like
more granular approach. You
watch a certain piece of code and you just clean up there and,
and,
but you don't change everything at the same time.
Unless again,
you have a handful of cases or a few dozen cases and then it's easy to
change all at once.
Yeah.
I had to go for a similar thing with,
um,
I started working on catch 23 last year, which I haven't had a chance to get back to but what i did do was do something where we
would use lots of different tricks mostly spin a but some other things as well all got replaced
with concepts and it was a fraction of the size so much more understandable so much more reliable so concept
is definitely a really big win when it comes to simplifying code but it's not a simple feature
in its own it's just simpler than what we were doing before so i think we are running out of
time so we probably do need to wrap up there come to a clean end
for the episode i think so before we do let you go sandor is there anything else you want to
tell us about or or let us know where people can reach you if they want to follow up yeah so if
you're interested in binary sizes for example, or in new C++ features,
then please check out my blog.
It's sandor.go.com.
And you're more than welcome to leave any comments.
So one thing that I want to emphasize about
my blog, that
it's not
necessarily about the
newest, coolest
things. When I started
to write, I think
about seven years ago,
I said that, okay,
that's a tool
to document my learning process.
So if I learn about something new, by new I mean new to me,
I will write about it because it will help me to understand it better
if I have to write about it.
And if others read it, it will also enforce me to be more punctual
about what I learn and what I write.
So it started out as a learning tool for me.
And I've been posting on a weekly basis for the last seven years or so yeah i think they're often the
most valuable types of um of articles because you're you're at that point where you're you
still don't quite have the curse of knowledge which makes very difficult to explain things to
other people but if you've just been through that transition yourself and you're still fresh then
you've got that right perspective to be able to say, this is what made it click for me.
Maybe it'll make it click for you.
You know, that's an excellent point, Phil, because, well, I usually don't care about
these pay-few numbers, but I do check them from time to time, like, you know, once in
two, three months. And interestingly, maybe like 50% of all views
comes from one single article.
It's about how you use parameterized tests
with Google Tests.
And I wrote that article because we tried to use it one day
in a coding dojo at one of my previous teams.
And we spent at least an hour trying to understand it from documentation
and make it work.
So did I write it down?
No.
A few months later, we did the same thing again.
And we didn't remember.
And that time, I wrote it down.
And that's the most viewed article on my blog, interestingly.
And it was really helping myself and my team.
Excellent.
So I'll put a link to that in the show notes,
just to give you a few more page views so that you can read them.
Interestingly, since then,
the documentation,
the official documentation,
also got much better.
Right, right.
Well, they probably read your article
and incorporated it.
Maybe.
Anyway,
that's a nice wrap up for the show.
So thank you, Sandoral for coming on and telling us
all about how to reduce binary sizes how to make your code clean and maybe even how to
do parameterized tests in google test and thank you anastasia for coming on and being an excellent
co-host again yeah thank you for watching me. Thank you for talking to me. It was great.
Thanks a lot.
Thanks a lot for having me here.
It's been a great experience.
Thanks so much for listening in
as we chat about C++.
We'd love to hear
what you think of the podcast.
Please let us know
if we're discussing
the stuff you're interested in
or if you have a suggestion
for a guest or a topic,
we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.
You can also follow me and Phil individually on Twitter or Mastodon.
All those links, as well as the show notes, can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.