CppCast - GCC Compiler Development

Episode Date: August 3, 2017

Rob and Jason are joined by Krister Walfridsson to talk about some of his contributions to the GCC Compiler. Krister got introduced to low-level programming by the C64/Amiga demo scene in the ...80s. This led to an interest in operating systems and compilers, and he has been involved in the NetBSD and GCC projects for more than 20 years. His career has been split between OS-level development on embedded platforms and compiler development, and he most enjoys working with "strange" custom-made architectures. News libq Metaclasses: Thoughts on generative C++ 6 Reasons Why We Distribute C++ Libraries as Source Code Undefined Behavior in 2017 Krister Walfridsson @kwalfridsson Krister Walfridsson's Blog Links Why volatile is hard to specify and implement Branch prediction Designing a CPU in VHDL, Part 1: Rationale, tools, methods Sponsors Backtrace Hosts @robwirving @lefticus

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building. Get to the root cause quickly with detailed information at your fingertips. Start your free trial at backtrace.io slash cppcast. CppCast is also sponsored by CppCon, the annual week-long face-to-face gathering for the entire C++ community. Get your ticket today. Episode 112 of CppCast with guest Krister Walfordson recorded July 31st, 2017. In this episode, we talked about Herb Sutter's Metaclass proposal. And we talked to Krister Woffordson. Krister talks to us about the contributions he's made to GCC. Welcome to episode 112 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Starting point is 00:01:31 Jason, how are you doing today? I'm doing pretty well. How about you? I'm doing okay. Got my laptop back, so that crisis has been averted. Oh, right. New USB screen and everything. Right. You said you could only see the top quarter of it or something yeah no the top inch or two and uh yeah they sent me a new one with a new lcd screen and they replaced the heat sink in it as well not sure what was wrong with the heat sink but
Starting point is 00:01:55 put in the cpu overheated and melted the connection to the screen let's go with that that's a great explanation. Any news with you? No, nothing at the moment. Just thinking about all the conferences I have coming up. Yeah, it's going to be a busy fall for you with Pacific++ and Meeting C++ and CppCon. Yes, two days of training and four talks between three conferences. No, five talks between three conferences. I. Yeah. Five talks between three conferences. I should probably remember to do all of them, I guess.
Starting point is 00:02:34 Well, top of our episode of like three to piece of feedback. This week we got a tweet, I guess in reference to some of our recent episodes focusing on concurrency. This listener wrote in, Hey guys, I was wondering whether you've heard of libq.io yet and may want to make an episode on that. So we definitely haven't done
Starting point is 00:02:52 an episode of libq.io. I had not heard of it before this tweet. Jason, have you heard of this before? I don't think so. Yeah, so I'll put the link in the show notes, but it's a C++ library that implements continuations called promises. So it definitely fits in with some of our recent topics on coroutines
Starting point is 00:03:14 and the future of futures or possible lack thereof. And that's a bold claim right on the front of the website here. You're programming a multithread C++? You're probably doing it wrong. Yeah, I'm sure lots of people are. Definitely a library worth checking out, and maybe we'll see if we can get in touch with one of the authors and set up an interview.
Starting point is 00:03:34 Okay. So we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com, and don't forget to leave us a review on iTunes. Joining us today is Krister Walfordson. Krister got introduced to low-level programming by the C64 Amiga demo scene in the 80s. This led to an interest in operating systems and compilers,
Starting point is 00:03:56 and he has been involved in the NetBSD and GCC projects for more than 20 years. His career has been split between OS-level development on embedded platforms and compiler development, and he most enjoys working with strange custom-made architectures. Christo, welcome to the show. Thank you. You know, it had to be an exciting time with DemoScene in the 80s. In a way, I feel like I was born like a decade too late because I missed out on some of that stuff. Yeah, I was probably a little bit early.
Starting point is 00:04:27 So my peak were in 1989, and that was when everything started being big. But then I started university and didn't have time for the demo scene. That's unfortunate. But you were in the right part of the world for it, if I get that right. Like, Scandinavia and Europe was big on the demo scene. Yeah, definitely.
Starting point is 00:04:54 Okay, well, we got a couple news articles to talk about, Christer. Feel free to comment on any of these, and then we'll start talking more about your work with GCC and other projects, okay? Yep. Okay, so this first article is from Herb Sutter's blog, and it's an article titled Thoughts on Generative C++ with Metaclasses.
Starting point is 00:05:13 I think we probably mentioned his proposal a while back when we first heard about it, but he did do an ACCU talk at the ACCU 2017 conference, and that video is now available on YouTube, which I really need to watch because metaclasses sound super interesting, but I haven't gotten around to it yet. Jason, what were your thoughts on this? Well, I also have not yet gotten around to watching the video. And just to make sure our listeners are following along here. When he gave that keynote at ACCU, he asked the conference organizers to not release the video immediately. So that was the one video from ACCU 2017 that had not yet been released.
Starting point is 00:05:54 But after the last standards committee meeting, the video got released, he published his paper and it made everything public. And it seems like he got some great feedback from the committee that they really liked this proposal. And the whole idea is being able to more effectively generate new types at compile time in C++. And his proposal does depend on reflection in C++, right? So kind of going to come after any reflection support
Starting point is 00:06:24 that we hopefully will be getting in C++, right? So kind of going to come after any reflection support that we hopefully will be getting in C++ 20. Yeah. And I honestly am a bit confused on that particular detail. I don't know if this would completely supplant the other reflection proposals because it seems to do a lot of overlap. I could be wrong. My understanding was the idea of metaclasses was this is something we could do once we have reflection. But maybe I'm wrong on that. Does Krister have any input?
Starting point is 00:06:55 I also think that he assumes that the reflection stuff will go in. So this more is the next level on top on that right okay yeah well we'll we'll have the the link to the article and the link to the accu talk in the show notes uh and definitely seems like it'd be worth watching i'll get around to that myself this week probably right uh next article is uh six reasons why we distribute C++ libraries as source code. And this comes from the Buckaroo team, which is a new C++ package manager that I think we mentioned a while back.
Starting point is 00:07:34 And yeah, why distribute a source code? Because it's cross-platform, and it's not always easy to distribute built libraries for all the different platforms that are out there. Kind of makes sense. Yeah, and their comment that compiler flags need to be properly supported, like mismatched builds between compiler flags with a library that you're linking to can cause huge headaches with C++.
Starting point is 00:07:59 So I totally agree with that comment. Right. Krister, what were your thoughts on this as a GCC developer? I think this is a reasonable way to do it. If you look at in SPSD, we had a package source. And there we build everything from source. So I think that's a really good thing in open source software when you have all the source available.
Starting point is 00:08:28 So why not take advantage of it? Especially as in FBSD that we have all these old architectures and so on. So they're not always available build servers and so on to ensure everything is available for VACs and so on. But if you have a VAC, you can build it yourself. You just install the package and the package manager will compile it on your system automatically.
Starting point is 00:08:55 Yeah. And Jason, I think I'm going to let you introduce this last article on undefined behavior. Yeah, so John Rieger, I think that's how you pronounce his name, does a lot of research that is on undefined behavior and code correctness. And he wrote a very extensive article
Starting point is 00:09:18 on status of undefined behavior in 2017. If you want to know anything about the address sanitizer, undefined behavior sanitizer, what they're capable of, how to defeat them, read this article. I'm not sure if I've ever seen a list of the types of undefined behavior before.
Starting point is 00:09:37 I'm just not sure if I've ever seen it so well quantified as it has been in this article. This is an annex in the C standard. So all of them are listed there. Yeah, and he points out that there is no comparable list for C++ of all the types of undefined behavior, but right, the C standard does list it out. I know we talked with Patrice last week
Starting point is 00:10:02 about the undefined behavior subgroup at c++ um i'm surprised i mean maybe they are working on a similar list for c++ it seems like it'd be worth having yeah yeah that's such a long article yeah really good article and like we said it identifies different types some of the different types of undefined behavior and how you could debug some of those types of undefined behavior using some of the tools like sanitizers and things like Valgrind, right? Yeah. And there's, so Chandler has done a talk
Starting point is 00:10:39 on undefined behavior in previous years of CppCon and I believe that we may have one or more coming in CppCon 2017 also. Okay, so Christer, do you want to tell us a little bit more about how you got involved in the development of GCC? Sure. So as I said, I did some low-level development before I started university. But I more or less stopped programming at university because I'm a major in mathematics.
Starting point is 00:11:12 So I spent time there. And one thing that interested me was, well, proving things about programs. And Haskell came around that time and it promised that you could actually do mathematics and run it. The reality was not really that great. But anyway, I got excited by Haskell, started looking at that.
Starting point is 00:11:42 And my compiler at that time compiled down from Haskell to C and then used GCC to compile the C code. And GCC miscompiled my beautiful Haskell programs. So I needed to start looking at what is happening there. So that is how I come into contact with GCC. And well, I have not come back to Haskell after that. I'm still at the GCC level. So if you don't mind my asking, approximately when was that at the rise of Haskell and when you started with GCC?
Starting point is 00:12:20 Mid90s. Okay. I'm not completely sure which year, but that's 96, I would guess. That's about the same year that I was first using GCC also. 95. So you mentioned in an email that you sent us that you got involved with the EGCS fork initially. Is that correct? Yeah.
Starting point is 00:12:46 Now, I'm guessing most of our listeners aren't familiar with that part of the history of GCC. It was a while ago, and I only vaguely remember it myself. Would you mind giving us some background? Yeah. So, the GCC development were not that extensive at that time. There were one maintainer doing most of the work and first time I looked at GCC the frequently asked questions was how do I get involved in GCC? And the answer was we don't
Starting point is 00:13:15 need help more or less. Not that strong but the general idea seems to be like that. But at that time there were lots of companies starting to be interested in GCC. You had Cygnus Support doing contracting work for GCC and that kind of thing. And they had lots of patches that they couldn't get into the mainline. And after a while, they got annoyed by that. So they forked out a separate project that EGCS or EGGs that is called.
Starting point is 00:13:50 That's why you have the logo with the egg and the GNU on it on the GCSE website now. Oh, okay. I remember it being called EGGs, but I don't remember that. Okay. So the idea was to just fork GCC, do exactly as we hoped that the GCC team should have done. And after a while, it was clear that the eggs compiler was much more advanced than the GCC. And it got merged back into GCC and those projects merged. You could say that the eggs team took over. Wow.
Starting point is 00:14:34 It's kind of like the promise, I guess you will, of open source, that anyone can take a project, fork it, and make the changes, and if the resulting code is better, then that'll win. But for well-established large projects, it's not something that you really see happen very often at all no and i think that time was in mid-90s that was essentially before internet took over and so it's it was still not that open development anywhere in any project because many of these projects shipped their open source on cds and that kind of thing because
Starting point is 00:15:10 internet was not capable enough to to do too much so i think that in some sense it was easier to do this fork at that time because you more or less had to do it. What type of contributions did you make on the X fork before it was merged back? My work was mostly just running the test suite and see when things broke
Starting point is 00:15:37 and trying to figure out why and fix that. I also did some minor things for NetBSD configuration stuff. So your initial involvement, you said, was because you were getting broken compiles from your Haskell output. Do you recall what was broken then?
Starting point is 00:15:58 Yeah. This was actually a NetBSD problem because I was working on Spark and there were some ABI mismatch between what the GCC thought NetBSD ABI were and what it was in reality. So there were, I don't know exactly what it was, but I think it was a calling convention that one extra register should be sent as a register instead of stack or something like that. And did you get the issue solved then? Yeah. Cool.
Starting point is 00:16:30 I'm kind of curious, like, you know, LLVM kind of is structured in this way to have this clean distinction between front end and back end. And I'm curious what the organization for GCC is. Like, how hard is it to add a new back end to GCC? Back end is quite easy. So I think if you look at the commercial support for backends that some companies provide, I think they usually say one month of work
Starting point is 00:16:55 to create a stable port of a backend. So get something working takes not more than one or two days or something. And then you can spend how much time you want to make it efficient and so on. But just make something working is quite easy. And that's something you have done, right? Yeah, I had done it for a proprietary CPU a while back. Is that anything that you can talk about? Not really.
Starting point is 00:17:26 Okay. So I can... But in general, it's not that special because these days it's quite easy to create a CPU. So if you look at more or less all silicon have some kind of special blocks inside there that have a proprietary CPU doing something. So most of them are simple RISC machines,
Starting point is 00:17:54 but with some secret souls for special instructions. But in general, you have a simple instruction for normal things, for load store, arithmetic and so on. It's funny that you said anyone can create a CPU today, because I think the average person thinks of a CPU as just being this insurmountably whatever complex thing. We've got three layers of cache and instruction reordering and branch prediction and all this craziness that's in what we think of as a modern CPU. Yeah. So these CPUs are more like CPUs looked in the late 80s, early 90s. So there were a great blog series two years back or something.
Starting point is 00:18:42 I think Colin Re Riley wrote it. Can I add a link to that blog series on this? Okay. But he designed it from scratch and he wrote the blogs as he developed the CPUs. It's a little bit... You can see his mistakes and so on also, but it's really instructive and it's rather simple. Do people tend to implement these as FPGAs or
Starting point is 00:19:16 from scratch with 7400 series logic chips or something? When you develop, you do it on FPGAs. But in the products they they've been included in silicon so let's say for example the nvidia gpus they have publicly stated that they have their own risk cpu and i think they have like six or seven different of those in each GPU they manufacture.
Starting point is 00:19:46 Okay. I also think that in the Intel CPU, that there are some... Isn't there an Arc CPU also doing some management stuff? I've seen, yes, the management subsystem at Intel, because there's been discussion about it having its own vulnerabilities that there's no way for the host CPU to detect, which sounds kind of scary. Yeah. But you end up with these kind of things
Starting point is 00:20:13 all over the place. If you have a new mobile phone that are controlling the radio chipsets and everything like that. So there are an enormous amount of these kind of small, simple chips on the CPUs. Taking the conversation maybe full circle, if I can,
Starting point is 00:20:34 if I wanted to go and design my own CPU from scratch and then implement a GCC backend for it, this is something you would think the average person could approach? Average person, maybe a little bit of stretch, but a very determined person can definitely do it. And so is it possible for you to give us an overview of what's involved in adding this backend to GCC, like at a high level, perhaps?
Starting point is 00:21:03 Yeah. So what GCC does, it's compiled down to its internal representation. So its low level internal representation looks similar to the LLM one, even though it's usually written in
Starting point is 00:21:19 a Lisp style with lots of parentheses. But you still have an XOR, an add, and so on. One thing is that you need to write a rule for each of them to map the
Starting point is 00:21:35 XOR internal representation node to your XOR instruction. Okay. The other thing is that you need to tell GCC about how your architecture works, so the number
Starting point is 00:21:52 of registers and that kind of thing. So there are a few macros you fill in with the cost model for different options. the most interesting there is addressing mode. Because you have one macro telling which addressing modes are valid. And then you have one telling the relative cost of them.
Starting point is 00:22:20 Because if you look at the special risk instructions, one is that, let's say you may have, you can index with a base, an index and a constant. You may have different strange requirements on the size of the constant. You may also be very interested in the size of your instructions. So you may want to have the constant version much more expensive and that kind of thing. So there are a few macros like that. There are also a few technical details that are more annoying to handle because the compilation process goes in several steps. So the compiler lowers the internal representation to get closer and closer to your architecture.
Starting point is 00:23:21 And depending on constraints you have on your architecture you may need to do some magic there. So say that on top level you say that for example all addressing modes for the constant are okay and then late in the process you say that oh, they must be divisible by two or something. So that may be annoying depending on your architecture. But if you look at the simple RISC architecture, like RISC-V or something like that, it's not that hard to figure out what's
Starting point is 00:23:57 happening. Okay. So that sounds like a good suggestion, starting from an existing target and kind of seeing what they've done there. Yeah. So choose one that is as close as possible to your architecture. But if you start from scratch doing a simple architecture, I think RISC-V is probably roughly what you will end up with. Okay. will end up with okay so thinking about you mentioned the commerce 64 in your bio and the 6502 uh is the cpu that it used and i've you know that's one of the things that i like to play with sometimes and i looked at implementing a back-end for llvm and it seems to to be really uh angry with you if you want to have a 16-bit addressing on an 8-bit CPU that has no 16-bit registers.
Starting point is 00:24:51 And I've also noticed that GCC has no 6502 backend, even though both LLVM and GCC have other 8-bit CPUs. And I'm curious if you're at all familiar with the issues surrounding this and why we haven't seen a backend for that particular 8-bit processor. The major problem is the register file that you only have one accumulator. Okay. So if you would have a few general registers that can
Starting point is 00:25:15 be used both doing arithmetic and doing indexing and so on, then it's possible. It's still a bit annoying because if you look at high-level optimizations, they usually assume you have a number of reads. Otherwise, loop unrolling and everything like that adds, raise the pressure.
Starting point is 00:25:39 So you may need to go through all the other optimizations and kind of throttle them in different ways. Okay. But in general, as long as you have a few general registers, then it's possible to do something. But without that, it's not worth the effort of trying. So I guess that's the lesson then for our listeners who are interested in designing their own CPU and then implementing a backend for it.
Starting point is 00:26:09 Make sure you have more than one general purpose register. Yeah, and you should probably have like eight or something at least. Because otherwise you need to spill for every instruction. And in general, when you do a backend, you assume that everything is sane. And then you have handle spilling and that kind of thing as special cases. Right, not spilling as the normal case. Right.
Starting point is 00:26:32 Okay. So then you would need to have different structure on your register allocator and scheduler and everything like that. And then it's much harder. Right. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. Backtrace is a debugging platform that improves software quality, reliability, and support by bringing deep
Starting point is 00:26:56 introspection and automation throughout the software error lifecycle. Spend less time debugging and reduce your mean time to resolution by using the first and only platform to combine symbolic debugging, error aggregation, and state analysis. At the time of error, Bactrace jumps into action, capturing detailed dumps of application and environmental state. Bactrace then performs automated analysis on process memory and executable code to classify errors and highlight important signals such as heap corruption, malware, and much more.
Starting point is 00:27:23 This data is aggregated and archived in a centralized object store, providing your team a single system to investigate errors across your environments. Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to modernize their debugging infrastructure. It's free to try, minutes to set up, fully featured with no commitment necessary. Check them out at backtrace.io slash cppcast. I was wondering if we could maybe talk about GCC and Clang for a little bit.
Starting point is 00:27:51 A lot of times we'll see like micro-bunchmarks and stuff show Clang generating better code, but GCC still seems to produce faster binaries overall. I was wondering what your thoughts are on that. Yeah. I have not looked your thoughts are on that. Yeah. I have not looked in detail
Starting point is 00:28:08 exactly what's happening. But it doesn't sound strange to me if you see that kind of result. Because when you look at complex code,
Starting point is 00:28:24 then it's lots of trade-offs between different optimizations. You might have things that makes it faster, but takes more registers, which adds billing if you do it too aggressively. And you have that kind into the hardware constraints. So actually, my experience from doing compilers is mostly that I spend as much time limiting optimizations as I do implementing new optimizations. Oh, interesting. Could you give us an example? For example, if you look at a high-end CPU, so you have the loop optimizations.
Starting point is 00:29:17 It's one of the important things a compiler does. But you need to do different choices if you're going to vectorize it or not. Because if you want to vectorize it, you want as straight line code as possible. If you're not going to vectorize it, you may want to move things
Starting point is 00:29:36 around to get those little computations done in each iteration of the loop. And this kind of optimizations that move things around is done before the loop optimizations because, well, it needs to be simplified as much as possible before
Starting point is 00:29:52 it starts to actually do something with the loops. So it's easy to destroy for either of those cases when you don't know which choice the loop optimizer will take. So one way to do it is obviously make two versions of the loop and compile one as vectorization, one without,
Starting point is 00:30:20 and then choose the right one after. But then it adds extra completion time and memory usage and everything like that. So are there cases where you do that, where you actually generate both results and then to compare which one is the better option? I'm not
Starting point is 00:30:38 sure how much GCC is doing that with loops right now, but GCC are adding more and more of this kind of optimization. But it's not that uncommon, I would guess.
Starting point is 00:30:56 Other issues in compilation is also that many optimizations destroy information. The obvious case is that if you have unrolled a loop, it's very hard to re-roll it if you think it's a good idea later. Even though that's not a useful example. But let's say, for example, that you are doing lots of calculations
Starting point is 00:31:19 on 8-bit values. Most CPUs are faster if you do it on the native size. And the compiler can see that it can promote all these 8-bit values to 32-bit values and do the calculation. Okay. But if a later pass then looks at this, then it cannot see that the sizes are constrained.
Starting point is 00:31:42 Because the earlier passes could see easily that, okay, it's an 8-bit value, so it has a value between 0 and 55. But if you look at the code at later passes, you see that, oh, it's an integer. It can be any value. So when you have analysis passes, then you need to be
Starting point is 00:32:06 sure that you do not do this kind of optimization before your analysis passes. Which also means that you may need to throttle optimizations in the top half of the compiler until all invariants have been calculated and then used information. You can, for example, if you look at the GCC debug dumps, it keeps unreachable code for quite a lot of time just because it wants to have all the if statements available so it can see, oh, it's unreachable if x is larger than zero. Then it can use that information later. That's interesting.
Starting point is 00:32:48 So how do you ask GCC for that debug information so we can see? There are lots of fdump flags. So fdump3all is the most relevant. I would have to try that. There are lots of interesting information in that that actually can be useful when you're optimizing your code. For example, how important GCC thinks the different parts of your code is.
Starting point is 00:33:18 So if it is hot code that they used much, then GCC is more aggressive with inlining and loop unrolling and everything like that. And if GCC thinks it's more will be executed on that kind of thing i think you you do blogged about this at some point not that long ago about how a function is called from main that takes on special meaning to the optimizer? Yes, I did. So I also, so that is one of the reasons because it knows that main will
Starting point is 00:34:11 only be called once. So if you are doing stuff within main then then it will not inline and so on because why bother when it's called only once? That's kind of funny, because if your entire program then exists only in main,
Starting point is 00:34:28 then that would be a case for inlining 100% of it. But also, if you have a loop and so on in main, then of course it sees that the loop, it races lots of times. So the body of the loop will be fully optimized.
Starting point is 00:34:43 So I wrote another blog post just a few weeks back, the body of the loop will be fully optimized. Okay. So I wrote another blog post just a few weeks back, I guess, about more details about how GCC is reasoning about how important code is. I think I missed that one. I'll have to check it out. It's a title with a branch prediction or something, which is maybe not that obvious from the title
Starting point is 00:35:10 of what this is about. So those of us who don't know a lot about how the optimizer works, we have this fantasy that we should just be able to tell the compiler, you know what? Just keep trying every optimization you possibly can for the next hour and then give me the result but it doesn't seem like anyone is implementing this feature do you have any thoughts on that comments on that that is feature i also would like but but experience
Starting point is 00:35:40 seems to be that nobody really used that if they have it anyway. Everybody's complaining about compilation speed. But it is not as simple to do this as you may think. I saw some experiments in LLVM a while back when someone run the
Starting point is 00:36:02 optimizations until they couldn't optimize any longer. And in benchmarking and so on, they didn't see a real difference in performance. But the programs become much larger. That's disappointing. But I'm not seeing any analysis of the data so I'm not sure what's happening there.
Starting point is 00:36:20 But it's... It may not be that surprising because... Again, you have these trade-offs all the time between size and optimizations, and you need at least to retune your compiler after running the same pass multiple times, because as you do it multiple times, the profile will change. So it maybe makes sense to be more aggressive with certain things and less aggressive with other things when you get it on already optimized code. So you just mentioned branch prediction a minute ago, and that made me think of profile
Starting point is 00:37:01 guided optimizations, particularly with what we were just talking about. Is that anything that you can speak about, what that gains? It, of course, depends very much on your application. But I usually see about 10% improvements in average when I run it on the program I have worked with, which may or may not be representative. So yeah, profile-guided optimization is one of the things that I think are used much less than ideally,
Starting point is 00:37:38 because it helps a lot, but it's really annoying to use. So I understand why it's not used in reality. So in the systems that you work on, do you tend to make this as part of your automated build environment or anything? I usually do it to get a feeling for what is possible to improve and then modify the source code until I get roughly the same. Interesting. The project I've used it for has been with a very small hot part.
Starting point is 00:38:13 So you have a relatively small loop that's taking the majority of the time. But I see that it can get 10% faster by profile guided optimization. Then I see, try to figure out what the compiler has done and then either change the code or maybe write it in assembly. So is there one of those F dump flags that would tell us what the PGO did? I used a combination of looking at that and actually on the source code, on the object code because there are
Starting point is 00:38:50 essentially two things the compiler do one is different inlining and unrolling and the other one is reordering to get better flow through the function.
Starting point is 00:39:07 That depends also on what CPU you're using, how important that is. I have done it mostly on small embedded platforms where branches are expensive. If it changed a comparison,
Starting point is 00:39:22 if it decides that default condition is en förmåga. Så om det bestämmer att default-konditionen är den mest viktiga, så kan det byta till så att det blir default. Okej. Så det här ser jag ofta på
Starting point is 00:39:39 disassembling-koden. Men tittar jag på inligning så är det också ganska lätt att titta på symboler som är kallade. Så det är sample code, but looking at inlining and so on is also quite easy to look at the symbols that are called. But the dumps do contain all this information and percentage numbers what the compiler figured out. So it's possible to look at those two. With Clang in recent years coming out,
Starting point is 00:40:08 have you seen much changes amongst GCC maintainers? Are they becoming more competitive or making any other changes in the way they develop GCC in response to Clang? In some sense, I find it hard to tell because
Starting point is 00:40:23 if you look at from day to day, everything is the same. There are some small optimizations, some small bug fixes and so on. If you look at the big picture, I think there is a rather big difference in feeling that it feels like more is happening. So when you have...
Starting point is 00:40:44 Before, we didn't have any competitions, really. In the late 90s, this was a Sun compiler we compiled against. But after that, we have been in a vacuum kind of way. So it's hard to know if you can do something better. I've done everything I find reasonable in this optimization, and now I do something else.
Starting point is 00:41:06 But when it comes to another implementation that does things different, then you may see that, hmm, maybe I missed something here. So I think the general feeling is different, but I have a hard time quantifying it. So I guess something we haven't explicitly asked you yet is, what is your day-to-day normal involvement with GCC today what aspects do you work on so I'm the NetBeastie maintainer
Starting point is 00:41:32 so in that I'm not doing that much because well the support has been throttled over the years because I've not really had the time to update it. So what I've done lately is go through all configurations and update it to the modern way of doing things. Because GCC aims to be backward compatible in the source code. So these configurations have not been changed in 15 years.
Starting point is 00:42:05 Wow. source code. So these configurations have not been changed in 15 years. So it works, but they have invented better ways of doing the configurations and more options and so on. So I'm going through that then and using the modern ways instead of the deprecated mechanisms.
Starting point is 00:42:23 But otherwise I'm mostly compiling random programs, benchmarking them, looking at the assembly and see what's happening, and open bug reports, and in some few cases, fixing them myself. But mostly, I'm just opening bug reports these days. Well, that's important, I would imagine. I think so.
Starting point is 00:42:47 Most people are not that interested in looking at big blobs of assembly and trying to figure out what's happening. I find this kind of interesting. It's interesting, I think, too, that you've mentioned NetBSD a couple times now. And I think maybe our listeners don't appreciate the huge variety of CPUs and operating systems that GCC still supports today. It has to dwarf any other compiler. I believe so. Even though, again, a lot of those architectures have been throttled over the years because configuration has changed and cost models have changed. So I think many of those
Starting point is 00:43:27 generate much worse code these days than if you use a 10-year-old compiler. Oh, interesting. Just because it hasn't been maintained as well. Yeah. Because when you do, again, let's say better inlining, then you need to have a cost
Starting point is 00:43:44 model, how much to inline. And this cost model has been updated for Intel and ARM and so on, but nobody cares updating the VAX backend. Right. Because it needs to take into account the different sizes of different instructions and so on. So it is a non-trivial work to do it. So instead, the default is that all instructions
Starting point is 00:44:13 cost the same. So if someone wanted to start getting involved in compiler development, do you have a recommendation for maybe, I don't know, going back and trying to update one of these old targets? Would that be a good way for someone to learn, or do you have a recommendation for like you know maybe i don't know going back and trying to update one of these old targets would that be a good way for someone to learn or do you have a different recommendation that depends much on what that person is interested to doing because there are very big difference doing in front end optimizations or back-end things. But if you are interested in doing the detailed back-end work, then I think there are good options there.
Starting point is 00:44:54 Okay. What do you think C++ developers should know about writing efficient code that you've learned from working on the GCC? I would say that the most important thing is actually to use the tools correctly. Because I have seen lots of discussions where people are obsessing over things like, should I do a pass-by value, pass-by reference, and so on. And then they compile the code with dash O. That doesn't do that much optimization. So one thing is for GCC,
Starting point is 00:45:32 when you are compiling your code, you should use O2, O3, or OS, depending on what platform and so on you are working on. So I'm doing fast math and that kind of thing. If that's relevant for your codebase. I would guess the same is true for Clang, and also that Microsoft probably have lots of interesting options for the computer optimizations. One other thing is LinkedIn optimization, Because that is easy to add. You just pass the FLTO, and then it optimizes with knowledge of the whole program.
Starting point is 00:46:11 For certain ways of writing C++, that is really important. Because de-virtualization and that kind of thing can then see how your classes are used in different files, and change the virtual calls to concrete calls that then can be inlined and everything like that. So it's just that simple? Just pass FLTO when we're linking the executable? Yeah. Also when compiling.
Starting point is 00:46:40 When compiling, okay. There are also a nice option to make this parallel. So if you do FLTO with equal number of cores you have on your machine, it can parallelize the link time optimization step. Okay. It makes this much faster and take more memory. So it's a trade-off there. And again, this is true for both GCC and LLVM.
Starting point is 00:47:08 Both of those projects have spent lots of work making this efficient. So what kind of performance increases have you seen using LTO? I have not seen that much in my project because I usually do not write a code in the way that or help by LTO. Okay. Because this means that you need to inline between
Starting point is 00:47:35 different files and that kind of thing. I usually structure my projects so the compiler already have all the knowledge. Okay. But the projects I work on are mostly small embedded systems. But it's like a two-file project. Right. So that doesn't matter.
Starting point is 00:47:56 Or it's big compilers. Therefore, for various reasons, we are not using LTO. It probably should help there so if you have like a one file one cpp file with a gigantic header only library you would not expect to see a bunch of improvement but if you have a several statically linked libraries then maybe um yeah so well everything Maybe. Yeah. So, well, everything must be compiled with LTO. So depending on what you mean with statically linked. Okay. But let's say you are compiling Firefox and that kind of thing.
Starting point is 00:48:35 There you will see a big difference, if I understand it correctly, from the benchmarks I've seen. Okay. You recently published an article also on volatile, and that's something that seems to be completely misunderstood. Do you want to speak about that at all? I think that could be a whole one-hour issue about that. So I'm not sure how... The problem with volatile is that
Starting point is 00:49:06 the problem is trying to solve is accessing hardware and that is not really how most people want to use it and the standard is is written in a way that is very hand-wavy,
Starting point is 00:49:25 because all hardware works different and so on. So there are lots of problems there, how to interpret the standard. And compiler writers like to see it like, well, we know if you are touching hardware or not, because if we are storing on the stack, we know that it's not hardware, so why bother all this mess in handling all the time? And
Starting point is 00:49:50 normal developers want to see that I make a store, so I want a store. But most compilers do what the users expect. Although it's very hard to test because you do not really see a difference unless you check exactly what loads and store
Starting point is 00:50:15 are being done during runtime. So it's easy for the compilers to introduce bugs that it sometimes optimizes away volatile load and store, especially on the stack. Okay. introduce bugs that it sometimes optimizes away wall attacks especially on the stack okay I don't think I have any more questions Jason do you have anything
Starting point is 00:50:33 no I don't think so okay so Krister where can people find you online maybe read some more of your blog posts yeah my blog is on kristerw.blogspot.com and otherwise
Starting point is 00:50:49 I'm on Twitter as cavealfredson okay it's been great having you on the show today definitely sounds like you have a lot of interesting content on your blog I'd encourage listeners to go check it out yeah it's been great being here thanks for joining us.
Starting point is 00:51:06 Thank you. Bye. Okay. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in. Or, if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter.
Starting point is 00:51:31 You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.