CppCast - GCC Compiler Development
Episode Date: August 3, 2017Rob and Jason are joined by Krister Walfridsson to talk about some of his contributions to the GCC Compiler. Krister got introduced to low-level programming by the C64/Amiga demo scene in the ...80s. This led to an interest in operating systems and compilers, and he has been involved in the NetBSD and GCC projects for more than 20 years. His career has been split between OS-level development on embedded platforms and compiler development, and he most enjoys working with "strange" custom-made architectures. News libq Metaclasses: Thoughts on generative C++ 6 Reasons Why We Distribute C++ Libraries as Source Code Undefined Behavior in 2017 Krister Walfridsson @kwalfridsson Krister Walfridsson's Blog Links Why volatile is hard to specify and implement Branch prediction Designing a CPU in VHDL, Part 1: Rationale, tools, methods Sponsors Backtrace Hosts @robwirving @lefticus
Transcript
Discussion (0)
This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building.
Get to the root cause quickly with detailed information at your fingertips.
Start your free trial at backtrace.io slash cppcast.
CppCast is also sponsored by CppCon, the annual week-long face-to-face gathering for the entire C++ community.
Get your ticket today. Episode 112 of CppCast with guest Krister Walfordson recorded July 31st, 2017. In this episode, we talked about Herb Sutter's Metaclass proposal.
And we talked to Krister Woffordson.
Krister talks to us about the contributions he's made to GCC. Welcome to episode 112 of CppCast, the only podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm doing pretty well. How about you?
I'm doing okay. Got my laptop back, so that crisis has been averted.
Oh, right.
New USB screen and everything.
Right. You said you could only see the top quarter of it or something
yeah no the top inch or two and uh yeah they sent me a new one with a new lcd screen and
they replaced the heat sink in it as well not sure what was wrong with the heat sink but
put in the cpu overheated and melted the connection to the screen let's go with that
that's a great explanation. Any news with you?
No, nothing at the moment.
Just thinking about all the conferences I have coming up.
Yeah, it's going to be a busy fall for you with Pacific++ and Meeting C++ and CppCon.
Yes, two days of training and four talks between three conferences.
No, five talks between three conferences. I. Yeah. Five talks between three conferences.
I should probably remember to do all of them, I guess.
Well, top of our episode of like three to piece of feedback.
This week we got a tweet,
I guess in reference to some of our recent episodes focusing on concurrency.
This listener wrote in,
Hey guys, I was wondering whether you've heard
of libq.io yet
and may want to make an episode on that.
So we definitely haven't done
an episode of libq.io. I had not
heard of it before this tweet.
Jason, have you heard of this before?
I don't think so.
Yeah, so I'll put the link in the show notes,
but it's
a C++ library that implements continuations called promises.
So it definitely fits in with some of our recent topics on coroutines
and the future of futures or possible lack thereof.
And that's a bold claim right on the front of the website here.
You're programming a multithread C++?
You're probably doing it wrong.
Yeah, I'm sure lots of people are.
Definitely a library worth checking out,
and maybe we'll see if we can get in touch
with one of the authors and set up an interview.
Okay.
So we'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook, Twitter,
or email us at feedback at cpcast.com,
and don't forget to leave us a review on iTunes.
Joining us today is Krister Walfordson.
Krister got introduced to low-level programming by the C64 Amiga demo scene in the 80s.
This led to an interest in operating systems and compilers,
and he has been involved in the NetBSD and GCC projects for more than 20 years.
His career has been split between OS-level development on embedded platforms and compiler development,
and he most enjoys working with strange custom-made architectures.
Christo, welcome to the show.
Thank you.
You know, it had to be an exciting time with DemoScene in the 80s.
In a way, I feel like I was born like a decade too late because I missed out on some of that stuff.
Yeah, I was probably a little bit early.
So my peak were in 1989, and that was when everything started being big.
But then I started university and didn't have time for the demo scene.
That's unfortunate.
But you were in the right part of the world for it, if I get
that right. Like,
Scandinavia and Europe was
big on the demo scene.
Yeah, definitely.
Okay, well, we got a couple
news articles to talk about, Christer. Feel free to
comment on any of these, and then we'll start talking
more about your work with GCC and other
projects, okay? Yep.
Okay, so this first article is from Herb Sutter's blog,
and it's an article titled
Thoughts on Generative C++ with Metaclasses.
I think we probably mentioned his proposal a while back
when we first heard about it,
but he did do an ACCU talk at the ACCU 2017 conference, and that video is now available on YouTube, which I really need to watch because metaclasses sound super interesting, but I haven't gotten around to it yet.
Jason, what were your thoughts on this?
Well, I also have not yet gotten around to watching the video.
And just to make sure our listeners are following along here. When he gave that keynote at ACCU,
he asked the conference organizers to not release the video immediately.
So that was the one video from ACCU 2017 that had not yet been released.
But after the last standards committee meeting,
the video got released,
he published his paper and it made everything public.
And it seems like he got some great feedback from the committee that they really liked this proposal.
And the whole idea is being able to more effectively generate new types
at compile time in C++.
And his proposal does depend on reflection in C++, right?
So kind of going to come after any reflection support
that we hopefully will be getting in C++, right? So kind of going to come after any reflection support that we hopefully
will be getting in C++ 20. Yeah. And I honestly am a bit confused on that particular detail. I
don't know if this would completely supplant the other reflection proposals because it seems to do
a lot of overlap. I could be wrong. My understanding was the idea of metaclasses was
this is something we could do
once we have reflection.
But maybe I'm wrong on that.
Does Krister have any input?
I also think that
he assumes that the reflection stuff will go in.
So this more is the next level on top on that right okay yeah well
we'll we'll have the the link to the article and the link to the accu talk in the show notes uh
and definitely seems like it'd be worth watching i'll get around to that myself this week probably
right uh next article is uh six reasons why we distribute C++ libraries as source code. And this comes from the Buckaroo team,
which is a new C++ package manager
that I think we mentioned a while back.
And yeah, why distribute a source code?
Because it's cross-platform,
and it's not always easy to distribute built libraries
for all the different platforms that are out there.
Kind of makes sense.
Yeah, and their comment that compiler flags need to be properly supported,
like mismatched builds between compiler flags with a library that you're linking to
can cause huge headaches with C++.
So I totally agree with that comment.
Right.
Krister, what were your thoughts on this as a GCC developer?
I think this is a reasonable way to do it.
If you look at in SPSD, we had a package source.
And there we build everything from source.
So I think that's a really good thing in open source software
when you have all the source available.
So why not take advantage of it?
Especially as in FBSD that we have all these old architectures and so on.
So they're not always available build servers and so on
to ensure everything is available for VACs and so on.
But if you have a VAC, you can build it yourself.
You just install the package
and the package manager will compile it
on your system automatically.
Yeah.
And Jason, I think I'm going to let you introduce
this last article on undefined behavior.
Yeah, so John Rieger,
I think that's how you pronounce his name,
does a lot of research that is on undefined behavior
and code correctness.
And he wrote a very extensive article
on status of undefined behavior in 2017.
If you want to know anything about the address sanitizer,
undefined behavior sanitizer,
what they're capable of,
how to defeat them,
read this article.
I'm not sure if I've ever seen
a list of the types of undefined behavior before.
I'm just not sure if I've ever seen it
so well quantified as it has been in this article.
This is an annex in the C standard.
So all of them are listed there.
Yeah, and he points out that there is no comparable list for C++
of all the types of undefined behavior,
but right, the C standard does list it out.
I know we talked with Patrice last week
about the undefined behavior subgroup at c++ um i'm surprised
i mean maybe they are working on a similar list for c++ it seems like it'd be worth having yeah
yeah that's such a long article yeah really good article and like we said it identifies different
types some of the different types of undefined behavior and how you could debug some of those types
of undefined behavior using some of the tools
like sanitizers and things like Valgrind, right?
Yeah.
And there's, so Chandler has done a talk
on undefined behavior in previous years of CppCon
and I believe that we may have one or more coming
in CppCon 2017 also.
Okay, so Christer, do you want to tell us a little bit more about how you got involved
in the development of GCC?
Sure.
So as I said, I did some low-level development before I started university.
But I more or less stopped programming at university because I'm a major in mathematics.
So I spent time there.
And one thing that interested me was, well, proving things about programs.
And Haskell came around that time
and it promised that you could actually
do mathematics and run it.
The reality was not really that great.
But anyway, I got excited by Haskell,
started looking at that.
And my compiler at that time compiled down from Haskell to C and then used GCC to compile
the C code.
And GCC miscompiled my beautiful Haskell programs.
So I needed to start looking at what is happening there.
So that is how I come into contact with GCC.
And well, I have not come back to Haskell after that.
I'm still at the GCC level.
So if you don't mind my asking, approximately when was that at the rise of Haskell and when you started with GCC?
Mid90s.
Okay.
I'm not completely sure which year, but that's 96, I would guess.
That's about the same year that I was first using GCC also.
95.
So you mentioned in an email that you sent us that you got involved with the EGCS fork initially.
Is that correct?
Yeah.
Now, I'm guessing most of our listeners aren't familiar with that part of the history of GCC.
It was a while ago, and I only vaguely remember it myself. Would you mind giving us some background?
Yeah. So, the GCC development were not that extensive at that time. There were
one maintainer doing most of the work
and first time I looked at GCC
the frequently asked questions
was how do I get involved in GCC?
And the answer was we don't
need help more or less.
Not that strong but
the general idea
seems to be like that.
But at that time there were lots of companies starting to be interested in GCC.
You had Cygnus Support doing contracting work for GCC and that kind of thing.
And they had lots of patches that they couldn't get into the mainline.
And after a while, they got annoyed by that. So they forked out a separate project that EGCS or EGGs that is called.
That's why you have the logo with the egg and the GNU on it on the GCSE website now.
Oh, okay.
I remember it being called EGGs, but I don't remember that.
Okay. So the idea was to just fork GCC, do exactly as we hoped that the GCC team should have done.
And after a while, it was clear that the eggs compiler was much more advanced than the GCC.
And it got merged back into GCC and those projects merged.
You could say that the eggs team took over.
Wow.
It's kind of like the promise, I guess you will,
of open source,
that anyone can take a project, fork it,
and make the changes,
and if the resulting code is better,
then that'll win. But for well-established large projects,
it's not something that you really see happen very often at all no and i think that time was in mid-90s that was essentially before
internet took over and so it's it was still not that open development anywhere in any project because many of these projects shipped their open source on cds and that kind of thing because
internet was not capable enough to to do too much so i think that in some sense it was
easier to do this fork at that time because you more or less had to do it.
What type of contributions did you make on the
X fork before it was merged back?
My
work was mostly
just running the test suite and see when things
broke
and trying to figure out why
and fix that.
I also did some minor things for
NetBSD configuration stuff.
So your initial involvement, you said,
was because you were getting
broken compiles from your Haskell output.
Do you recall what was broken then?
Yeah.
This was actually a NetBSD problem
because I was working on Spark
and there were some ABI mismatch between what the GCC thought NetBSD ABI were and what it was in reality.
So there were, I don't know exactly what it was, but I think it was a calling convention that one extra register should be sent as a register instead of stack or something like that.
And did you get the issue solved then?
Yeah.
Cool.
I'm kind of curious, like, you know, LLVM kind of is structured in this way to have
this clean distinction between front end and back end.
And I'm curious what the organization for GCC is.
Like, how hard is it to add a new back end to GCC?
Back end is quite easy.
So I think if you look at the commercial support for backends
that some companies provide,
I think they usually say one month of work
to create a stable port of a backend.
So get something working takes not more than one or two days or something.
And then you can spend how much time you want to make it efficient and so on.
But just make something working is quite easy.
And that's something you have done, right?
Yeah, I had done it for a proprietary CPU a while back.
Is that anything that you can talk about?
Not really.
Okay.
So I can...
But in general, it's not that special
because these days it's quite easy to create a CPU.
So if you look at more or less all silicon
have some kind of special blocks inside there
that have a proprietary CPU doing something.
So most of them are simple RISC machines,
but with some secret souls for special instructions.
But in general, you have a simple instruction for normal things,
for load store, arithmetic and so on.
It's funny that you said anyone can create a CPU today, because I think the average person thinks of a CPU as just being this insurmountably whatever complex thing.
We've got three layers of cache and instruction reordering and branch prediction and all this craziness that's in what we think of as a modern CPU.
Yeah.
So these CPUs are more like CPUs looked in the late 80s, early 90s.
So there were a great blog series two years back or something.
I think Colin Re Riley wrote it. Can I add a link to
that blog series on this? Okay.
But he designed it from scratch and he wrote the blogs as he developed the CPUs. It's a little bit...
You can see his mistakes
and so on also, but it's really instructive
and it's rather simple.
Do people tend to implement
these as FPGAs or
from scratch with 7400
series logic chips or something?
When you
develop, you do it on FPGAs.
But
in the products they they've been included in silicon so
let's say for example the nvidia gpus they have publicly stated that they have their own
risk cpu and i think they have like six or seven different of those in each GPU they manufacture.
Okay.
I also think that in the Intel CPU, that there are some... Isn't there an Arc CPU also doing some management stuff?
I've seen, yes, the management subsystem at Intel,
because there's been discussion about it having its own vulnerabilities
that there's no way for the host CPU to detect,
which sounds kind of scary.
Yeah.
But you end up with these kind of things
all over the place.
If you have a new mobile phone
that are controlling the radio chipsets
and everything like that.
So there are an enormous amount
of these kind of small,
simple chips on the CPUs.
Taking the conversation maybe full circle, if I can,
if I wanted to go and design my own CPU from scratch
and then implement a GCC backend for it,
this is something you would think the average person could approach?
Average person, maybe a little bit of stretch,
but a very determined person can definitely do it.
And so is it possible for you to give us an overview
of what's involved in adding this backend to GCC,
like at a high level, perhaps?
Yeah. So what GCC
does, it's compiled down
to its internal
representation. So its low level
internal representation looks
similar to the
LLM one,
even though it's usually written in
a Lisp style
with lots of parentheses.
But you still
have an XOR, an add,
and so on.
One thing is that you need to write
a rule for each of them
to map the
XOR
internal representation node to your
XOR instruction.
Okay.
The other thing is that you need to tell
GCC about
how
your architecture works, so the number
of registers and that kind of thing.
So there are
a few macros you fill in
with the cost model
for different
options. the most interesting there is addressing mode.
Because you have one macro telling which addressing modes are valid.
And then you have one telling the relative cost of them.
Because if you look at the special risk instructions, one is that, let's say you may have, you can index with a base, an index and a constant.
You may have different strange requirements on the size of the constant.
You may also be very interested in the size of your
instructions. So you may want to have the constant version much more expensive and that kind of
thing. So there are a few macros like that. There are also a few technical details that are more annoying to handle
because the compilation process goes in several steps.
So the compiler lowers the internal representation
to get closer and closer to your architecture.
And depending on constraints you have on your architecture
you may need to do some magic there.
So say that on top level you say that for example
all addressing modes for the constant are okay
and then late in the process you say that
oh, they must be divisible by two or something.
So that may be annoying depending on your architecture. But if you look at the simple
RISC architecture, like RISC-V or something like that, it's not that hard to figure out what's
happening. Okay. So that sounds like a good suggestion, starting from an existing target and kind of seeing what they've done there.
Yeah.
So choose one that is as close as possible to your architecture.
But if you start from scratch doing a simple architecture, I think RISC-V is probably roughly what you will end up with.
Okay. will end up with okay so thinking about you mentioned the commerce 64 in your bio and the
6502 uh is the cpu that it used and i've you know that's one of the things that i like to play with
sometimes and i looked at implementing a back-end for llvm and it seems to to be really uh angry
with you if you want to have a 16-bit addressing on an 8-bit CPU that has no 16-bit registers.
And I've also noticed that GCC has no 6502 backend, even though both LLVM and GCC have other 8-bit CPUs. And I'm curious if you're at all familiar with the issues surrounding this
and why we haven't seen a backend for that particular 8-bit processor.
The major problem is the register file
that you only have one accumulator.
Okay.
So if you would have
a few general
registers that can
be used both doing
arithmetic and doing
indexing and so on, then it's
possible.
It's still a bit annoying because if you look at high-level optimizations,
they usually assume you have a number of reads.
Otherwise, loop unrolling and everything like that adds,
raise the pressure.
So you may need to go through all the other optimizations
and kind of throttle them in different ways.
Okay.
But in general, as long as you have a few general registers, then it's possible to do something.
But without that, it's not worth the effort of trying.
So I guess that's the lesson then for our listeners
who are interested in designing their own CPU
and then implementing a backend for it.
Make sure you have more than one general purpose register.
Yeah, and you should probably have like eight or something at least.
Because otherwise you need to spill for every instruction.
And in general, when you do a backend,
you assume that everything is sane.
And then you have handle spilling and that kind of thing as special cases.
Right, not spilling as the normal case.
Right.
Okay.
So then you would need to have different structure on your register allocator and scheduler and everything like that.
And then it's much harder.
Right.
I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors.
Backtrace is a debugging
platform that improves software quality,
reliability, and support by bringing deep
introspection and automation throughout the
software error lifecycle.
Spend less time debugging and reduce your mean time
to resolution by using the first
and only platform to combine symbolic debugging, error aggregation, and state analysis. At the time of error,
Bactrace jumps into action, capturing detailed dumps of application and environmental state.
Bactrace then performs automated analysis on process memory and executable code to classify
errors and highlight important signals such as heap corruption, malware, and much more.
This data is aggregated and archived in a centralized object store,
providing your team a single system to investigate errors across your environments.
Join industry leaders like Fastly, Message Systems, and AppNexus
that use Backtrace to modernize their debugging infrastructure.
It's free to try, minutes to set up, fully featured with no commitment necessary.
Check them out at backtrace.io slash cppcast.
I was wondering if we could maybe talk about
GCC and Clang for a little bit.
A lot of times we'll see like
micro-bunchmarks and stuff show
Clang generating better code,
but GCC still seems to produce faster
binaries overall. I was wondering what your thoughts
are on that.
Yeah. I have not looked your thoughts are on that. Yeah.
I have not looked in detail
exactly what's happening.
But
it doesn't sound
strange to me if you see that
kind of result.
Because
when you look at
complex code,
then it's lots of trade-offs between different optimizations.
You might have things that makes it faster, but takes more registers, which adds billing if you do it too aggressively.
And you have that kind into the hardware constraints.
So actually, my experience from doing compilers is mostly that I spend as much time limiting optimizations as I do implementing new optimizations.
Oh, interesting.
Could you give us an example?
For example, if you look at a high-end CPU,
so you have the loop optimizations.
It's one of the important things a compiler does.
But you need to do different choices
if you're
going to vectorize it or not.
Because if you want to vectorize
it, you want as straight line code as possible.
If you're not going to vectorize
it, you may want to move things
around to get those little
computations done in each
iteration of the loop.
And this kind of
optimizations that move things around
is done before the loop optimizations
because, well, it needs to be simplified
as much as possible before
it starts to actually
do something with the loops.
So it's easy to destroy
for either of those cases
when you don't know
which choice the loop optimizer will take.
So one way to do it is obviously make two versions of the loop
and compile one as vectorization, one without,
and then choose the right one after.
But then it adds extra completion time and memory usage
and everything like that.
So are there
cases where you do that, where you actually generate
both results and then to compare
which one is the better option?
I'm not
sure how
much GCC is doing that with loops
right now, but
GCC are adding more and more of this
kind of optimization.
But
it's not that uncommon,
I would guess.
Other
issues in compilation is also that
many optimizations
destroy information.
The obvious case is that if you have unrolled a loop,
it's very hard to re-roll it if you think it's a good idea later.
Even though that's not a useful example.
But let's say, for example, that you are doing lots of calculations
on 8-bit values.
Most CPUs are faster if you do it on the native size.
And the compiler can see that it can promote
all these 8-bit values to 32-bit values
and do the calculation.
Okay.
But if a later pass then looks at this,
then it cannot see that the sizes are constrained.
Because the earlier passes could see easily
that, okay, it's an 8-bit value,
so it has a value between 0 and 55.
But if you look at the code at later passes,
you see that, oh, it's an integer.
It can be any value.
So when you have analysis passes,
then you need to be
sure that you do not do this kind of optimization before your analysis passes.
Which also means that you may need to throttle optimizations in the top half of the compiler
until all invariants have been calculated and then used information.
You can, for example, if you look at the GCC debug dumps,
it keeps unreachable code for quite a lot of time just because it wants to have all the if statements available
so it can see, oh, it's unreachable if x is larger than zero.
Then it can use that information later.
That's interesting.
So how do you ask GCC for that debug information so we can see?
There are lots of fdump flags.
So fdump3all is the most relevant.
I would have to try that.
There are lots of interesting information in that
that actually can be useful when you're optimizing your code.
For example, how important GCC thinks
the different parts of your code is.
So if it is hot code that they used much,
then GCC is more aggressive with inlining and loop unrolling and everything like that.
And if GCC thinks it's more will be executed on that kind of thing
i think you you do blogged about this at some point not that long ago about how a function
is called from main that takes on special meaning to the optimizer? Yes, I did.
So
I also, so that is one of
the reasons because it knows that main will
only be called once. So if
you are doing stuff within main
then
then it will not
inline and so on because why bother when it's
called only once?
That's kind of funny, because if your entire
program then exists only in main,
then that would be a case
for inlining 100%
of it.
But also, if you have a loop
and so on in main, then of course it sees
that the loop, it races lots of times.
So the body of the loop will be fully
optimized.
So I wrote another blog post just a few weeks back, the body of the loop will be fully optimized. Okay.
So I wrote another blog post just a few weeks back, I guess,
about more details about how GCC is reasoning
about how important code is.
I think I missed that one.
I'll have to check it out.
It's a title with a branch prediction or something,
which is maybe not that obvious from the title
of what this is about.
So those of us who don't know a lot
about how the optimizer works,
we have this fantasy that we should just be able
to tell the compiler, you know what?
Just keep trying every optimization you possibly can for the next hour
and then give me the result but it doesn't seem like anyone is implementing this feature do you
have any thoughts on that comments on that that is feature i also would like but but experience
seems to be that nobody really used that if they have it anyway.
Everybody's complaining about compilation speed.
But it is not as
simple to do this as
you may think.
I saw some experiments in
LLVM a while back when someone run
the
optimizations until
they couldn't optimize any longer.
And in benchmarking and so on, they didn't see
a real difference in performance.
But the programs become much larger.
That's disappointing.
But I'm not seeing any analysis of the data
so I'm not sure what's happening there.
But it's...
It may not be that surprising
because... Again, you have these trade-offs all the time
between size and optimizations, and you need at least to retune your compiler after running the
same pass multiple times, because as you do it multiple times, the profile will change.
So it maybe makes sense to be more aggressive with certain things and less aggressive with
other things when you get it on already optimized code.
So you just mentioned branch prediction a minute ago, and that made me think of profile
guided optimizations, particularly with what we were just talking about.
Is that anything that you can speak about, what that gains?
It, of course, depends very much on your application.
But I usually see about 10% improvements in average
when I run it on the program I have worked with,
which may or may not be representative.
So yeah, profile-guided optimization is one of the things
that I think are used much less than ideally,
because it helps a lot, but it's really annoying to use.
So I understand why it's not used in reality.
So in the systems that you work on,
do you tend to make this as part of your automated build environment or anything?
I usually do it to get a feeling for what is possible to improve
and then modify the source code until I get roughly the same.
Interesting.
The project I've used it for has been with a very small hot part.
So you have a relatively small loop that's taking the majority of the time.
But I see that it can get 10% faster by profile guided optimization.
Then I see, try to figure out what the compiler has done
and then either change the code or maybe write it in assembly.
So is there one of those F dump flags that would tell us what the PGO did?
I used a combination of looking at that and actually on the
source code, on the object code
because there are
essentially two things the compiler do
one is
different
inlining and unrolling
and the other one is
reordering
to get better flow through
the function.
That depends
also on what
CPU you're using, how important that is.
I have done it mostly on small
embedded platforms where
branches are expensive.
If it changed
a comparison,
if it decides that default condition is en förmåga. Så om det bestämmer
att default-konditionen är
den mest
viktiga, så kan det byta till
så att det blir default.
Okej.
Så det här
ser jag ofta på
disassembling-koden.
Men tittar jag på inligning
så är det också ganska lätt att titta på symboler som är kallade. Så det är sample code, but looking at inlining and so on is also quite easy to look at the symbols
that are called.
But the dumps do contain all this information and percentage numbers what the compiler figured
out.
So it's possible to look at those two. With Clang
in recent years coming out,
have you seen much changes
amongst GCC maintainers?
Are they becoming more competitive or
making any other changes in the way they
develop GCC in response to
Clang?
In some sense, I find it
hard to tell because
if you look at from day to day,
everything is the same.
There are some small optimizations,
some small bug fixes and so on.
If you look at the big picture,
I think there is a rather big difference
in feeling that it feels like more is happening.
So when you have...
Before, we didn't have any competitions, really.
In the late 90s,
this was a Sun compiler we compiled against.
But after that,
we have been in a vacuum kind of way.
So it's hard to know if you can do something better.
I've done everything I find reasonable in this optimization,
and now I do something else.
But when it comes to another implementation that does things different,
then you may see that, hmm, maybe I missed something here.
So I think the general feeling is different,
but I have a hard time quantifying it.
So I guess something we haven't explicitly asked you yet is,
what is your day-to-day normal involvement with GCC today
what aspects do you work on
so I'm the NetBeastie maintainer
so in that I'm not doing that much
because well
the support has been throttled over the years
because I've not really had the time to update it.
So what I've done lately is go through all configurations
and update it to the modern way of doing things.
Because GCC aims to be backward compatible in the source code.
So these configurations have not been changed in 15 years.
Wow. source code. So these configurations have not been changed in 15 years. So it works, but
they have invented
better ways of
doing the configurations and more options and so on.
So I'm going through that then
and using the modern ways
instead of the
deprecated mechanisms.
But otherwise
I'm mostly
compiling random programs, benchmarking them,
looking at the assembly and see what's happening,
and open bug reports, and in some few cases, fixing them myself.
But mostly, I'm just opening bug reports these days.
Well, that's important, I would imagine.
I think so.
Most people are not that interested in looking at big blobs of assembly and trying to figure
out what's happening.
I find this kind of interesting.
It's interesting, I think, too, that you've mentioned NetBSD a couple times now.
And I think maybe our listeners don't appreciate the huge variety of CPUs and operating systems that GCC still supports today.
It has to dwarf any other compiler.
I believe so.
Even though, again, a lot of those architectures have been throttled over the years because configuration has changed and cost models have changed. So I think many of those
generate much worse code these days
than if you use a 10-year-old compiler.
Oh, interesting.
Just because it hasn't been maintained
as well. Yeah.
Because when you do,
again, let's say better
inlining, then you need to have a cost
model, how much to inline.
And this cost model has been updated for Intel and ARM and so on,
but nobody cares updating the VAX backend.
Right.
Because it needs to take into account
the different sizes of different instructions and so on.
So it is a non-trivial work to do it.
So instead, the default is that all instructions
cost the same.
So if someone wanted to start getting involved in compiler development,
do you have a recommendation for maybe, I don't know, going back and trying to update one of these old targets? Would that be a good way for someone to learn, or do you have a recommendation for like you know maybe i don't know going back and trying
to update one of these old targets would that be a good way for someone to learn or do you have a
different recommendation that depends much on what that person is interested to doing because
there are very big difference doing in front end optimizations or back-end things.
But if you are interested in doing the detailed back-end work,
then I think there are good options there.
Okay.
What do you think C++ developers should know about writing efficient code that you've learned from working on the GCC?
I would say that the most important thing is actually to use the tools correctly.
Because I have seen lots of discussions where people are obsessing over things like,
should I do a pass-by value, pass-by reference, and so on.
And then they compile the code with dash O.
That doesn't do that much optimization.
So one thing is for GCC,
when you are compiling your code,
you should use O2, O3, or OS,
depending on what platform and so on you are working on.
So I'm doing fast math and that kind of thing.
If that's relevant for your codebase.
I would guess the same is true for Clang,
and also that Microsoft probably have lots of interesting options for the computer optimizations.
One other thing is LinkedIn optimization, Because that is easy to add. You just pass the FLTO, and then it optimizes with knowledge of the whole program.
For certain ways of writing C++, that is really important.
Because de-virtualization and that kind of thing can then see how your classes are used in different files,
and change the virtual calls to concrete calls
that then can be inlined and everything like that.
So it's just that simple?
Just pass FLTO when we're linking the executable?
Yeah.
Also when compiling.
When compiling, okay.
There are also a nice option to make this parallel.
So if you do FLTO with equal number of cores you have on your machine,
it can parallelize the link time optimization step.
Okay.
It makes this much faster and take more memory.
So it's a trade-off there.
And again, this is true for both GCC and LLVM.
Both of those projects have spent lots of work making this efficient.
So what kind of performance increases have you seen using LTO?
I have not seen that much in my project because I usually do not write
a code in the way that
or help by
LTO. Okay. Because
this means that
you need to inline between
different files and that kind of thing.
I usually structure my projects
so the compiler already
have all the knowledge.
Okay. But the projects I work on are mostly small embedded systems.
But it's like a two-file project.
Right.
So that doesn't matter.
Or it's big compilers.
Therefore, for various reasons, we are not using LTO.
It probably should help there so if you have like a one file
one cpp file with a gigantic header only library you would not expect to see a bunch of improvement
but if you have a several statically linked libraries then maybe um yeah so well everything Maybe. Yeah. So, well, everything must be compiled with LTO.
So depending on what you mean with statically linked.
Okay.
But let's say you are compiling Firefox and that kind of thing.
There you will see a big difference, if I understand it correctly, from the benchmarks I've seen.
Okay.
You recently published an article also on volatile,
and that's something that seems to be completely misunderstood.
Do you want to speak about that at all?
I think that could be a whole one-hour issue about that.
So I'm not sure how...
The problem with volatile is that
the problem is trying to solve
is accessing hardware
and that is not really how
most people
want to use it
and the standard is
is written in a way
that is very hand-wavy,
because all hardware works different and so on.
So there are lots of problems there, how to interpret the standard.
And compiler writers like to see it like,
well, we know if you are touching hardware or not,
because if we are storing on the stack, we know that it's not hardware, so
why bother all this mess
in handling all the time?
And
normal developers want to see
that I make a store,
so I want a store.
But most compilers do
what the users expect.
Although it's very hard to test
because you do not really see a difference
unless you check exactly what loads and store
are being done during runtime.
So it's easy for the compilers to introduce bugs
that it sometimes optimizes away
volatile load and store, especially on the stack. Okay. introduce bugs that it sometimes optimizes away wall attacks
especially on the stack
okay
I don't
think I have any more questions Jason do you have anything
no I don't
think so okay so
Krister where can people find you online
maybe read some more of your blog posts
yeah my
blog is on
kristerw.blogspot.com
and otherwise
I'm on Twitter
as cavealfredson
okay it's been great having you on the show today
definitely sounds like you have a lot of interesting content
on your blog
I'd encourage listeners to go check it out
yeah it's been great being here
thanks for joining us.
Thank you.
Bye.
Okay.
Thanks so much for listening in as we chat about C++.
I'd love to hear what you think of the podcast.
Please let me know if we're discussing the stuff you're interested in.
Or, if you have a suggestion for a topic, I'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter.
You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course,
you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.