CppCast - CppCon 2016

Episode Date: September 25, 2016

Rob and Jason are joined by Chandler Carruth from Google, in this live interview from CppCon 2016 Chandler discusses the topics of his two CppCon talks and using Modules at Google. Chandler Ca...rruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening. CppCon Lightning Talks Atila Neves Mock C functions using the preprocessor Jens Weller Ken Sykes Jon Kalb Gabor Horvath CodeCompass Chandler Carruth @chandlerc1024 Chandler Carruth's GitHub Links CppCon 2016 Playlist CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures" CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" Sponsor Backtrace

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building. Get to the root cause quickly with detailed information at your fingertips. Start your free trial at backtrace.io slash cppcast. Episode 71 of CppCast with guest Chandler Carruth recorded September 23rd, 2016. In this episode, we interview Lightning Talk presenters live from CppCon 2016. Then we sit down with Chandler Kruth from Google. Chandler tells us about high performance data structures and using modules at Google. Welcome to a very special episode 71 of CppCast. We are live from CppCon.
Starting point is 00:01:19 I am joined, as always, by my co-host, Jason Turner. Hey, Rob. How are you doing? So, it is the last day of CPPCon. We actually just wrapped up an interview with Chandler Carruth. Yes. And earlier this week, we talked to a bunch of Lightning Talk presenters. Four or five, something like that.
Starting point is 00:01:36 Something like that. So that's going to be the episode for today. It's a bit different from the normal format. We'll do Lightning interviews, and then we'll have Chandler. And that's going to be it. So, Rob, this is your first C++ conference. What do you think? I thought it was great. I really love, one of the things that's special about this conference that I haven't seen in other conferences is all the speakers, including the quote unquote celebrities like Bjarne or Herb, are just milling around the conference,
Starting point is 00:02:05 going to talks themselves. Yeah. So you can go up and have a talk with Herb Sutter or Bjarne Struestrup in between talks, just like any other conference is tempting. Yeah. And everyone tends to be approachable. Very much so. Yeah.
Starting point is 00:02:19 Yeah. So I've really enjoyed my time at the conference. I really enjoyed your keynote. Thank you. Are you prepared to become a celebrity when that hits YouTube? I don't know what to say to that. Yeah. Well, it was really a very fun keynote slash plenary talk.
Starting point is 00:02:37 Thanks. I'm looking forward to everyone seeing that on YouTube. I hope the video comes out well. Yeah, I think it will. And that's about all we have. So please enjoy the rest of the show and we'll be back to our normal programming next week. We've got a bunch
Starting point is 00:02:50 of interviews lined up from people we met at the conference, right? Yeah. Like six or something. Yeah, about six new interviews. We're going to be reaching out to all of those speakers soon and should have a lot of exciting content over the next few months.
Starting point is 00:03:06 Yeah, it should be fun. Yeah. So we are doing a special episode of CPPcast. We just finished up all the CPPCon Wednesday night lightning sessions. Right. And we're going to be talking to a couple of the speakers who just gave their five-minute lightning talks. So welcome to the show. And what's your name?
Starting point is 00:03:25 Atilla. And what was your lightning talk on? I was using C++14 to mock C functions. So, I've personally never really used mocking. Tell us, like, what's the point? What does mocking gain us? In this case, I mean, I was doing it for a legacy C code base, so the whole point was that the code was already written, not by us, and we want to make sure that it works
Starting point is 00:03:45 because we have to change it because there are bugs and feature requests and whatnot. So how can we make sure that the thing still does what it's supposed to do? And in this case, it was networking code, so side effects, not so great of an idea when you're unit testing, and that's why. It's like, I want to prevent this code from calling out
Starting point is 00:04:01 the functions I'd rather it didn't. The other reason is, in our case, the build system is really, really bizarre, and we don't want to use the official build system because it's slow, weird, and you can only use special VMs for it. So the other reason is just don't call this thing. Let's fake all the things so that we can test our core logic. And I wouldn't have written the code that way. It's better to write the code for testability in mind
Starting point is 00:04:25 but when the code's already there, what can you do? So it's better to just make these troublesome functions basically go away and make sure that your functions are calling them in the right way. So that's what mocking gives you then, is a way to make a fake function. Right, because in OOP what happens is you pass an interface
Starting point is 00:04:41 or something like that to a function and this is local, right? It's in the parameter list. But if you have C code, it's global states, basically. The name of the function is a globally known thing. Right. So you're basically trying to get rid of that so that you can reroute it to any implementation you want, and then you don't have packets being sent or right into the database
Starting point is 00:05:01 or any of the other things that would be problematic in a unit testing center. Okay. That's very cool. So what is the library and can people find it online? Yeah. I tell them my GitHub is called Premark because it uses the preprocessor to get rid of these functions and reroute them to different implementation.
Starting point is 00:05:18 So the other macros I provide make it so that you have to write the least amount of boilerplate possible to get this to work. So it writes the code for you. Excellent. Okay. How are you enjoying the conference so far? It's been fantastic. Even better than last year. Not saying a lot. Okay. Thank you very much. Thank you. Jens, we've already had you on the show before,
Starting point is 00:05:38 but you just gave a lightning talk. Why don't you tell us a little bit about what the talk was? Yeah. I wanted to share a little bit of information about how to present code. Of course, I think that this is what, you know, a lot of things happen at the conference, and most people are talking about code.
Starting point is 00:05:53 And I think that if we find better methods or if we get better at this, we all as a community are having a huge improvement. It's probably very thing to do right. Yeah, and as you mentioned, your lighting talk is somewhat based on the talk Scott Myers did before. Yes, Scott Myers gave a keynote in 2014 at MediC++, which also was, I think, his last public talk. And part of that keynote he dedicated to how to present and how to prepare materials for the modern age.
Starting point is 00:06:33 And I was always very interested in that. It got me thinking and I thought it's probably time to have like... I think it's probably time to start to prepare this as like a set of materials that people are able to look it up. And I want to have my speakers for my conference, but also for the other conferences, to improve their presentations. And also that people who start presenting have materials that they can look up what are the best practices. And so make it easier for people to get started with giving talks. So if you wanted to pick one key point as a teaser for our listeners, what would it be?
Starting point is 00:07:15 What's your most important point? Oh, that's a good question. I think the main point of my lightning talk was that it should be clear what and how you want to present your code and that you should highlight what is really important to the viewer of the presentation. Okay. And you said there's probably different techniques you go about doing that. Maybe what are some of those techniques for presenting the code, highlighting the most important parts?
Starting point is 00:07:52 Yes. There is a lot of different ways to prepare a presentation and some of those ways are having a better option to integrate code in your slide deck and some other programs like PowerPoint or OpenOffice are not as well prepared for that.
Starting point is 00:08:12 So you have to go for screenshots or it's a bit more work than if you really want to make this in a good way. And on the other hand, how do you want to solve this problem or how do you tackle this problem there are a lot of different ways to do that and i think this is just a process which is starting i'm interested in having a discussion on this and to get the speaking community inside our community and maybe also from other communities just think start thinking about this and that we find better solutions how to present in the future. Brilliant.
Starting point is 00:08:50 Okay, thank you very much. Tell me a little bit about your thoughts on the conference so far. I think it's, again, a very good conference. It's big. One thing which I think is, like, again, very difficult to choose where to go, which talks to see, as it's a lot of conference and power.
Starting point is 00:09:12 And I, again, see a lot of things like, you know, what do I want to do at my conference and how do they do things here. And so I like the keynote so far. The keynotes are better than they were last year, in my opinion. And on the other hand, the off-track, like in the breaks, meeting people, talking to people, that's also very good. I met a couple of new people which are really interesting. And meeting again a lot of other friends.
Starting point is 00:09:44 Okay, thanks for your time today, Jens. Thanks. Okay. Ken Sykes, you're here from Microsoft, right? Yes, I'm from Microsoft. And you just gave a lightning talk on? On improvements to the Windows debuggers, NatVis, data model, that kind of thing.
Starting point is 00:10:02 Okay. So, NatVis has been around for a little while. I'm familiar with that. But you've made improvements to it? that kind of thing. Okay, so NetVis has been around for a little while. I'm familiar with that. But you've made improvements to it? Well, the big improvement is Visual Studio has its own debugger package. They've had NetVis for a while. We brought it to the Windows debugger,
Starting point is 00:10:17 WinDBG, CDB, and TSD. So now those debuggers work with it as well. I never actually used WinDP or the others myself. So I've opened WinDBG, and I've maybe attempted to debug with WinDBG, but I don't think I've ever been successful.
Starting point is 00:10:35 Well, it's a bit of an acquired taste. Yeah, definitely. So this should make it a lot more palatable to someone who's maybe more familiar with a normal ID debugger. Right, yes. So, like I said, common types, like a std string, it should just show the string. And now it does.
Starting point is 00:10:53 Maybe a question just for those who aren't familiar with WinDBG. What are some of the use cases for why you would open up WinDBG? Right. Well, I work on the Windows OS, so we receive crash dumps from all over the world through Watson. So every time you hit send report, it makes our lives a little bit more difficult. But yeah, so we open those things up. At least internally, we have a bunch of additional extensions and things with our internal symbols to figure out what's wrong with our product. But you can imagine other companies like Adobe, right?
Starting point is 00:11:36 They also receive dump report, dump files that they need to process as well. And so me, I work inside Windows. I use it all the time. I know Visual Studio can open dump files, but I've basically never done that. So just the opposite side. So basically any time you're debugging an application that's actually out, released in the wild,
Starting point is 00:11:59 you might be able to get a crash report that WinDBG could be useful for. Yeah, that's a common case. And a lot of times we have to debug issues where we don't have, it's not our program, it's our program running on, it's another person's program running on Windows, right? And so that's another use case where, I don't know,
Starting point is 00:12:22 maybe it's just something I'm used to, but. So I'm curious about something in your intro and bio. I believe they said you've been with Microsoft since Windows 3.0? Yes, that's right. That is quite the history. Yes. What's that been like? Well, it's been a lot of fun.
Starting point is 00:12:40 This is my second tour of duty with Microsoft. I worked there from 89 to 2000. I came back in 2004 working from D.C. So I work remotely for them. And I've done that for about 12 years now. Worked on lots of different things. I've worked on Paintbrush, PostScript Driver, GDI, back in the Win9X days,
Starting point is 00:13:09 the shell, Windows Runtime, and now the debuggers. Wow, very cool. So, yeah. Just wondering, does Microsoft feel different from your perspective over the last few years with their embracing the open source movement and working with Linux, things like that? Yeah, there's definitely... I see changes there.
Starting point is 00:13:29 I mean, the interop is cool. At least internally, they're more open to us using open source as well as supporting open source development by other external people. And so there's a little less of just inventing everything yourself. So it's nice. It's like I actually get to look at what Boost does
Starting point is 00:13:52 now. Wow. It's this new cool thing. So it's fun. Okay. Well, thank you for joining us today. Alright, thank you. So John, conference organizer, you just finished a lightning talk. I did.
Starting point is 00:14:08 What was the talk called? It was called Unsigned, A Guide to Better Code. And do you want to tell us about the origin of this talk? Well, James McInnes, he was, I know he rarely tweets, but he tweeted something about how awful signed numbers were. And I challenged him on that. I said, no, no, no. You shouldn't use unsigned.
Starting point is 00:14:33 That's the thing. And so he took this contrary position. And I said, well, we're not going to battle this out on Twitter because, for one thing, he's got a huge advantage. And 140 characters isn't really a good way to discuss things. So I wanted to put him on my turf. So I said, let's battle this out with Light Detox at CppCon. And he agreed. I kid you not. I'm sure it's stored in the Twitter history there.
Starting point is 00:14:55 He said yes. And I think he then promptly forgot it. But, of course, that night I had made my slides. Anyway, I made the slides. I presented them at a local user group just to kind of go through it and then i forgot about them until uh last night i was sitting next to i just happened to be sitting next to james at dinner and i don't know what what it was that caused it i suddenly realized hey wait we're supposed to have that duel and uh i hope we can still get lightning
Starting point is 00:15:23 talks in it changes i don't in. And James, I don't have slides. And I said, James, don't worry. I'll write your slides for you. And he didn't buy into that at all. But because I had my laptop right there with me, I pulled it out. And I showed it to him. And he said, and he even tweeted this out. He said,
Starting point is 00:15:39 oh, you're really making me look bad. And I said, well, that was the point. But he said, well, yeah, I can't really disagree with what you've said. And I said, well, that was the point. But, uh, but he said, well, yeah, I can't really disagree with what you've said. And I said, oh, then I really want to write your slides for you. We'll just have you say how much you agree with me. And that was the end of the duel. Right. But, uh, he was a very good sport. He, he said in the audience and took my ribbing. Um, so I had some technical points to make and, I'm sure if James had presented his side of it, there's probably less disagreement than a duel makes it sound like. But it was a lot of fun, and I think it gave some people some things to think about.
Starting point is 00:16:16 Yeah, definitely. How's the conference going so far? I'm having a lot of fun. I'm getting a lot of really positive things from people. The thing about being a conference organizer is that everything that goes wrong, you hear about. A lot of times you have to deal with it. But a lot of times it's dealt with by somebody else. But they just let you know, by the way, this happened and here's what I did.
Starting point is 00:16:37 So I know everything that's gone wrong. I suppose not everything. It's probably other things, right? But I'm hoping that most of those things got taken care of before most, or maybe even all, attendees saw it. So I think the illusion we create for the attendees is that everything's going perfectly smoothly,
Starting point is 00:16:54 and I think we've got most of them fooled for sure, which is the goal, right? I think as long as the snacks show up on time, we're all pretty happy then. Yeah. Yeah. I don't know. I haven't heard the complaint I kind of expected to hear, which is that
Starting point is 00:17:09 the snacks this year, there's a lot less sugar. We're downplaying the chocolate. There's no bagels in the morning. It's fruit in the morning and yogurt. And it's healthy. I think part of it is that the health is part of it,
Starting point is 00:17:26 but also I think that eating sugary things causes you to eat more sugary things. There were some issues last year with people taking way more than they should and then people complaining because they didn't get any and stuff like that. And I just thought, nobody ever takes too many apples, and nobody ever complains when they don't get the apples. And I haven't had those complaints this year. Well, it worked out well from my perspective. I have enjoyed the yogurt
Starting point is 00:17:49 in the morning one day. So you're looking forward to the end of the conference when you can actually watch some of the content? I am. So the goal for the plenaries is to get it up within 24 hours, but that's kind of a stretch goal. We didn't make it this year. I don't think we have any plenaries is to get it up within 24 hours, but that's kind of a stretch goal.
Starting point is 00:18:05 We didn't make it this year. I don't think we have any plenaries up. It's Wednesday night, but I think we're going to have one tomorrow. And they'll get up real quick. The main goal is to have all of the sessions up in one month. And that includes the lightning talks that we've been discussing today. The lightning talks, yes. People should be able to watch the lightning talks that you're talking about.
Starting point is 00:18:27 Okay. In fact, you might want to, I don't know what your plan is, you might delay this for a month and then put it up when people can see it. That's not a bad idea, actually. Okay. Well, thanks for your time today, John. Thank you, guys. Thank you for the great job you're doing.
Starting point is 00:18:41 Thank you for doing a session talking about your experiences behind the microphone. Theoretically, that'll be up. Theoretically, yeah. Okay, thank you. So Gabor, you just gave a lightning talk at CppCon. Can you tell us what the talk was? Sure. So the talk was about a tool which is called Code Compass. There are several tools already which helps us develop software mely a Code Compass-t. Már két képe van, ami segíthetőségünkre készül,
Starting point is 00:19:06 hogy jobban legyünk működők, amikor képeztetjük a kódot. Ez a képe egy kicsit más, mert a fő célja, hogy segíthetjük a kódot érteni. Szóval ez egy képe a kód megfelelődése. És ez a Clang kompilára szükséges. Ez a kódot készít, és sok fejlesztőt készít, mint szolgálat,
Starting point is 00:19:34 és kétszerzés, és ismét különböző grafokat készít. Úgyhogy vizualizációk válik, úgyhogy based on the code. So it uses visualization techniques like generating UML diagrams from the code or code diagrams, component diagrams,
Starting point is 00:19:51 and also it can help us show very relevant information that are from different files in a very concise way. So we do not need to remember which function is at which file, and we do not need to switch between those files all the time. So you actually were using a web browser to browse the code, right? Is that how that worked? Yes, yes. So basically there is a web server running,
Starting point is 00:20:28 and you can connect to it with a browser. And that tool is only for viewing the code, so you cannot use it as an editor. But usually the changes that we do while we are developing is very well contained in a small subset of the codebase and we can still use it to navigate the rest of the code and it proved to be very useful for us. And it was open source recently, and it can be found on GitHub. So I think you should definitely try it. Okay, so just to clarify, this is going to be running on some other machine,
Starting point is 00:21:21 you're going to have your repositories checked out, and you're able to go into your browser and navigate that way. Well, it depends on if you would like to, you can run it locally. You could run it locally. Okay. So it takes some time to parse the code. Usually it is slower than the compilation. So you might want to do the parsing on nightly builds on a server, and then you can connect to that and use that through the browser.
Starting point is 00:21:55 That's interesting. I could see a large organization or something having this run nightly and then just having it available to the whole organization. Yeah, definitely. That's exactly the use case that we are using it for. Cool. Okay. And how are you enjoying the conference so far?
Starting point is 00:22:10 It is a great experience. I just met somebody who was changing emails very frequently on the mailing list, and I just realized that he was that guy. So that kind of experience is great. I think it's first to go to conferences because of that. I've noticed a lot of people on Twitter, for instance, don't even use a real picture of themselves, so trying to recognize them here,
Starting point is 00:22:44 and then even if they do have a picture, it might be 10 years old, and then it's that much harder still to recognize people. Yeah. You also have a case where you feel like you've already known someone, even though you've only interacted with them on Twitter. Oh, right.
Starting point is 00:22:57 You're like, we've met, haven't we? Right, exactly. Okay. Well, thank you for your time today. Thank you. Thanks. I wanted to interrupt this discussion for your time today. Thank you. Thanks. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. Backtrace is the debugging platform that improves software quality, reliability, and support
Starting point is 00:23:14 by bringing deep introspection and automation throughout the software error lifecycle. Spend less time debugging and reduce your mean time to resolution by using the first and only platform to combine symbolic debugging, error aggregation, and state analysis. At the time of error, Backtrace jumps into action, capturing detailed dumps of application and environmental state. Backtrace then performs automated analysis on process memory and executable code to classify errors and highlight important signals such as heap corruption, malware, and much more. This data is aggregated and archived in a centralized object store, providing your team
Starting point is 00:23:48 a single system to investigate errors across your environments. Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to modernize their debugging infrastructure. It's free to try, minutes to set up, fully featured, with no commitment necessary. Check them out at backtrace.io slash cppcast. Okay, so we are joined today by Chandler Carruth. It's the last day of CppCon. Chandler, welcome to the show.
Starting point is 00:24:14 Thanks for having me. So you did two talks this week, right? Yes, I did two talks. Do you want to tell us about the first one? Sure. So the very first talk I gave was trying to kind of continue something I started two years ago, which is giving kind of background information for people about how to write really high performance C++ code. I think that there's a lot of interest in writing high performance code, but there's actually not a lot of material about how
Starting point is 00:24:40 to do it effectively and how to do it without making your code really bad. And so a lot of what I'm trying to do is give people kind of patterns they can follow that are going to make their code really fast and that are going to actually be sustainable long term. And this year in particular, I talked a lot about concrete use cases or concrete techniques you can use based on LLVM's code itself. LLVM has a collection of data structures that it uses to make a lot of its algorithms, a lot of its code very fast, very efficient.
Starting point is 00:25:09 I presented an overview of those, how you can incorporate them, some of the unusual and surprising tricks that are used that make them especially effective. But really focus on data structures and how to make those fast. And trying to actually tie it back to real-world use cases.
Starting point is 00:25:26 So it's always difficult to get into technical details on the air, but is there any particular piece that you might want to pull out that you'd say, this is like the tidbit, this is why you should watch my talk? So the key idea is that you can have your data structures and you can cause them to be customized in their behavior as the program dynamics change. And the classic and best example of this are small-size optimizations. And there's just a tremendous amount that you can do with a small-size optimization
Starting point is 00:25:54 to allow a data structure to be very lightweight and fast when it's small and actually still scale very effectively as it grows large. And then there's a lot of tricks you can use to kind of amplify the effect of that kind of thing by packing data into smaller and smaller spaces, causing things to be very, very dense and very, very cache-friendly. And so those are kind of the two techniques that interplay in the talk I gave.
Starting point is 00:26:19 That's particularly interesting because I actually overheard a hallway conversation from people saying how they didn't really understand small object optimizations and they wish they had more information on it. I mean, the idea was to try and show people how that can actually be one of the most effective data structure techniques.
Starting point is 00:26:35 And that was your hybrid data structures course. You did another one on undefined behavior? Yeah, the second one was I think actually a higher level talk in a lot of ways. It wasn't just kind of walking through techniques. It was trying to give people a new set of ideas, a new set of language to use and terminology to use to talk about undefined behavior, to talk about problems in their code. There's been a tremendous amount of frustration and friction in the C++ community and in the
Starting point is 00:27:07 wider programming community around undefined behavior and bugs and security exploits that stem from undefined behavior. And I actually think that all of that really misses the key thing. These bugs and the security exploits, they don't stem from undefined behavior. They stem from the fact that we have incorrect programs because either the programmers didn't realize that there was a bug in their code or because we've designed the language, we've designed an API in a way that makes it brittle in the face of reasonable programs. It's hard to actually use the language of the APIs correctly.
Starting point is 00:27:46 And once we start focusing on whether it's easy or hard to use the language of the API correctly and how you can use it incorrectly and what it means for a language feature to be used incorrectly in the same way that an API can be used incorrectly, we can start realizing where the trade-offs
Starting point is 00:28:02 really lie. We're actually making a conscious trade-off sometimes to provide language features that have very narrow scope. They end up with narrow contracts. They end up with very narrow use cases. And if you go outside of those use cases, you end up with undefined behavior, but that's because you end up with an invalid program,
Starting point is 00:28:21 not because we're just trying to break people's code. And I tried to give some ideas about when this is a reasonable thing to do. For example, when there is no hardware that is truly portable and universal that can implement the behavior in a consistent and reasonable way. When you would have to make performance trade-offs, where you'd have to actually pessimize the performance on one platform in order to define the behavior on another platform. Those kinds of trade-offs really don't fit with the spirit
Starting point is 00:28:49 of C++. And so what we need to do instead is have a reasonable way for people to write software and not run into these issues. So I suggested principles around what we do to kind of make a principle choice to narrow the contract of a language feature
Starting point is 00:29:07 without leaving landmines, right? Without leaving traps for people to fall into. And those center around being able to check for mistakes, at least probabilistically. Also, being able to explain a rational model for how you're supposed to use the language feature rather than there just being strange one-off rules that you have to memorize and recall. And if you get it wrong, you have an incorrect program.
Starting point is 00:29:34 And trying to also respect the existing code. So one thing that I think is always really risky is if we have very widespread programming patterns and we introduce a language feature or, heaven forbid, we change a language feature in a way that works directly against the grain of those widespread programs, we shouldn't be surprised that people dislike that and that they run into problems there. I think we actually need to look very carefully at the existing code when we're doing this. And we even need to look at existing code when we're kind of reconsidering past choices. You know, I think it's seriously a possibility that the C++ language has some mistakes in it.
Starting point is 00:30:14 We might actually need to change things. And we should look at the existing code to inform those decisions. So I might be going a little off topic here, but Claim does have an undefined behavior checker, right? Yes. How does that, does that work well with the things you're talking about? Absolutely. I mean, the reason why I now, I advocate so firmly for, you know, have narrow contracts for language features, which could result in undefined behavior if they are misused, is because
Starting point is 00:30:46 we can very consistently check code to make sure it isn't misusing them. And that comes from the undefined behavior sanitizer, and it's where Client can go in and insert checks that make sure before your code
Starting point is 00:31:02 hits undefined behavior to see if you're actually going to satisfy the contract or the language feature you're about to use. And it's not just client. GCC actually has a version of it as well. So it's this idea of checking for bad behavior dynamically when necessary, statically when you can, and using that to kind of test and ensure your code is correct,
Starting point is 00:31:26 that takes a lot of the guesswork out. That makes it much more reliable, and that removes a lot of the risk and uncertainty for me around the existence of these things. That's not to say that we don't still need to have a clear rational basis for having a contract that says you can't pass a null pointer here or that you can't do this operation. We still need to have a good reason for doing it.
Starting point is 00:31:49 But when we have a good reason, we also need to have tools to help programmers out. Okay. One of the things I wanted to ask you about was modules. I know one of your colleagues from Google did a talk on modules yesterday. Gabby did one this morning. Do you want to give us an update on your thoughts and Google's perspective on modules?
Starting point is 00:32:05 So my thoughts around modules, I think, are really well captured by the two talks that we gave, that Google gave at CppCon this year. We really wanted to give kind of a story of modules, right? And it comes in two parts. The first part is that we've actually deployed modules in our code base and it's a very, very large-scale deployment. So about 10% of all of our code is built into modules now.
Starting point is 00:32:34 We talked to Titus Winters recently and he told us you have something like 10 million lines of code. So that's about a million or so lines. No, we have hundreds of millions of lines of C++ code. So we have several tens of millions of lines of C++ code. Okay. So we have several tens of millions of lines of C++ code built into modules. Technically, what Titus said was it's more than 10 million lines and I'm not
Starting point is 00:32:52 allowed to tell you how much. So I can be a little bit more specific. We have hundreds of millions of lines of code. I can't tell you how many. It's on the order of 100 million lines of code and substantially more than 100 million lines of code. And so we've got tens of millions million lines of code and substantially more than 100 million lines of code. Okay.
Starting point is 00:33:06 And so we've got tens of millions of lines of code being built into C++ modules, and we've got those modules in turn being used, being imported into all of our code. So all of our primary C++ code base is actually using modules. We're building into modules kind of a very, it's a very narrow subset of our code base. It's one we control a lot. It happens to be very big. And that's all of the generated code for protocol offers, which are kind of an interface description language
Starting point is 00:33:39 that lets us build up messages, serializable and deserializable messages, as well as communication protocol APIs. It's really, really nice. We use it very heavily. It's open source. You can find out about protocol buffers. But internally, we have so many of these protocol buffers, they generate just a massive amount of source code.
Starting point is 00:33:59 Probably a fairly large fraction of the source code that the compiler actually is parsing, were, before we were using modules, header files from protocol buffers. And so what we did was we focused on this because we could control all of the generated code for protocol buffers in one place. It's generated code, so it's actually centrally controlled, despite being generated throughout Google's code base. And we generate modules now for all of the protocol buffer code and for all of the code that protocol buffers relies on, all of its dependencies. And it ends up totaling about 10% of our code. And this means that we're building these modules,
Starting point is 00:34:37 but we're also importing them everywhere any code uses a protocol buffer library. And so they're getting imported essentially everywhere. And that gave us a lot of experience. It was very hard to do. We did have to make changes to our source code, but what we've been using is a very special form of modules that we've built into Clang.
Starting point is 00:34:58 And that's what Richard Smith talked a lot about. We've actually built a C++ 98, essentially, form of modules into Clang. It's not using substantial language changes. It's really leveraging the existing language and a very particular compilation model. So we look at what header files are actually modular and could be built into a module. And when we see a header file that's one of these, we build it into a module,
Starting point is 00:35:27 and rather than textually including it, we import that module semantically automatically. This means you don't have to write import in your source code. You don't have to use the module syntax in the header file. We can get a lot of the benefits and kind of experiment with what it means to change the compilation model in this way without kind of committing our source code to
Starting point is 00:35:47 one syntax or another, which is really nice. It gives us a lot of experience, but we're not pated into any kind of corner, right? Regardless of what the standards committee ends up choosing to standardize for syntax, we're going to be able to adapt because we haven't actually written anything into our source code
Starting point is 00:36:03 about modules. We did have to make some changes to our source code. They just weren't specific to modules. They were kind of triggered by the modules built. And the changes all centered around making our code either more modular or better factored or removing bugs from the code that modules allowed us to detect. Interesting. And so some examples of this are we want our headers to be standalone. or removing bugs from the code that modules allowed us to detect. Interesting.
Starting point is 00:36:25 Right? And so some examples of this are, you know, we want our headers to be standalone, right? We want them to parse plainly from a clean slate. And we found lots of header files for which that wasn't true. But the changes there are strict improvements, right? We wanted this to be true long before we had modules. We just hadn't managed to actually check that you could parse the header file from scratch. It happened that it always got included in the right order.
Starting point is 00:36:49 Things just happened to work. And we started detecting those issues. We also found ODR violations. And these would be ODR violations that just happened to not occur textually, but when you start doing the semantic import, we were able to detect them. And so we would flag those as issues,
Starting point is 00:37:06 and then we would go and fix them in our source code. But the nice thing about this is that all of these are pretty clearly bugs in the source code. They were never intended. And so even for library code, which is really intended to be C++ 98 or 11 forever, and to use textual inclusion forever, we were actually able to make changes
Starting point is 00:37:23 and adapt them in a way that allows us to kind of import them semantically, which is really nice. And then we wanted to kind of take all of that experience and the implementation technique and see how that would best fit with kind of a standardized feature that actually adds in syntax
Starting point is 00:37:43 and most importantly adds in kind of controls so that now we actually have export controls. We can actually control which APIs leave a particular file and are available to consumers instead of it being everything because that was the model of includes. And in doing that, we actually noticed very particular changes we felt like we needed in the module's proposal, but surprisingly narrow changes. For example, all of the fundamental pieces of it work great. The only really interesting change we pushed for is having some way to
Starting point is 00:38:21 actually bridge between these C++98 or C++11 libraries that are being developed in a textual world. They're not likely to change overnight. They may have users whose compilers are going to be much slower to update. We want to still have those. We also have, and those are leaf users, we also have libraries which are near the bottom of our dependency graph, like very core and fundamental libraries that either also have users that aren't going to update or that just aren't going to be changed anymore because they're legacy,
Starting point is 00:38:56 they're stable, we don't want to touch them. Both of these cases, we have these kind of legacy holdovers, and we want to be able to modularize libraries in the middle because that removes kind of an ordering constraint for modularization. It lets us rapidly provide the modular benefits to users that want them without them being gated on some other team or some other project, having time in their schedule to make those changes. And from our experience doing this rollout, we needed a few changes to make this really work well
Starting point is 00:39:26 because we found that there's a really large prevalence of various pieces of kind of C++11 and textual inclusion tied API design throughout these APIs, and we need some kind of legacy mode that enables those to work well. And Richard's talk, I mean, this is a complicated topic, so
Starting point is 00:39:52 Richard's talk goes into all the details and kind of walks you through examples of like, here is the precise code pattern that it turns out doesn't work well unless you have a very particular kind of legacy mode that allows you to transform a header file into a module in a kind of safe and predictable way. Interesting.
Starting point is 00:40:12 So obviously listeners should go and watch Richard's talk once that's available online, but just one more question on that. What types of improvements did you see in compilation speed? So this is actually an interesting thing, and Manuel gave some of the things we've seen. We actually saw pretty good improvements in compilation speed, but one of our constraints is we need to deploy this not to local builds, but to a very large distributed build system.
Starting point is 00:40:36 And one of the challenges with modules there is that we have to send all of the modules that are inputs to a compilation to the actual distributed build worker that's going to do the compilation. And modules are large, right? There's a space trade-off here. The most compressed and smallest representation of a C++ API is probably the header file's text. It's really an efficient representation, even when it's really large. It's a really efficient
Starting point is 00:41:03 representation. So the module files end up. It's a really efficient representation. So the module files end up being a larger encoding of that information. And so we actually saw that our compilation times would in many cases drop by a factor of two. But distributing the actual build, getting everything set up on the remote worker and doing the compile, would end up eating a lot of those games. And so we actually saw fairly small average case compile time improvements. This is just an initial attempt, right? We're only at 10% of our code base is modularized. We still have plenty of textual inclusion going on as well.
Starting point is 00:41:35 And it's early days in terms of implementation experience. A lot of the build system overhead we're hoping we can address and kind of get closer to that 2x. And we're hoping that we can make the compiler better and have more of our code modularized to get even better than 2x but that's definitely something in the future right now we're actually not seeing a lot of improvements on the average case interesting thing is that that wasn't the primary target our primary concern about compile time is not the average case, but the long tail. The 90th percentile or the 99th percentile of the slowest compile files.
Starting point is 00:42:16 Because if you do a large build of software, you're compiling hundreds, thousands, maybe tens of thousands of source files. The 99th percentile compile time probably shows up in most of your builds. Even though it's an edge case, it's actually almost always the edge case, the tall pole in your build. And there we saw really dramatic improvements. 2x, 5x more. We've seen really dramatic improvements in some of the long tail compile times.
Starting point is 00:42:39 And that's, I think, the really exciting part of C++ modules. Because you're really in a risky game of long tail latency. There's actually a great paper by Jeff Dean who talks about, it's called the tail at scale. And the idea is how long tail latency has a disproportionate effect on large scale distributed systems
Starting point is 00:43:01 because of the aggregation factor. As you fan out across a large scale system, right, you end up having this multiplicative factor on the probability of encountering a long tail step. And so even though you have these like very, very unlikely long tail latencies, when you scale up the system, you end up making them much more likely again. And so there's this really surprising and disproportionate effect. And we're definitely seeing that in compiles inside of large distributed
Starting point is 00:43:33 build systems. And so for us, one of the biggest things is making that tail come in. And then we notice the other thing is that user latencies also tend to look more like a latency game, less like a distributed build, because the user's not building everything, right? They're making one edit, and then they're rebuilding. And for that, we also see really impressive speedups, because it's very localized, it's very specific. Cool.
Starting point is 00:43:57 Do you want to tell us a little bit about your thoughts on the conference in general? Are you enjoying your time here? I always love this conference. I mean, I've loved this conference since the first time we did it. I am very glad that when John Kalb started talking about this conference, I pestered him for about
Starting point is 00:44:14 five hours into the very small hours of the morning in Aspen until he actually agreed to consider doing it for real. I'm so thankful that the Sequel's Bus Foundation was in a good position to kind of step forward and work with John to cause this conference
Starting point is 00:44:31 to come into existence. I think having a community is one thing. Having a place for that community to kind of come together and to exchange ideas and to really cement its relationship is important. And I don't think the community would be as strong without it. I think you can see this in the dramatic increase in the quality of information about C++, in the teaching of C++.
Starting point is 00:44:59 People are now prioritizing C++ language features as really important things to have in their workplace, in their projects, to a degree that I don't think was happening before. And I think it's going to help contribute to the kind of resurgence of C++ that we've been enjoying for the last few years. Is this the third year for this conference? It's the third year for the conference. Okay. And it's growing each year, which is just a great sign.
Starting point is 00:45:22 I mean, my big thing is I want to see the conference grow because I feel like there remains this really large body of C++ developers that we're still not reaching and that we can do even more for. I mean, we post all of the videos on YouTube, right? We want to broadcast the information as widely as we can. This conference is not trying to, you know, like claim to any of the information here.
Starting point is 00:45:44 But I think there's actually a lot of value for having people here physically, for actually getting everyone together. And so I want to see it grow to be as large as it can be. So officially we're right around 900 people or something like that? I think so. I think it's over 900 attendees and speakers. And there's several million C++ users around the world, so it definitely has growth. Right.
Starting point is 00:46:09 And I think it's just a challenge that we have to continually kind of push more value and make the conference both more valuable and also more accessible to C++ developers around the world. Right. Yeah. Any favorite talks of your own on the mission? Favorite talks? I mean, I had an absolute blast
Starting point is 00:46:30 at all the modules talks this time. I think it's one of the most exciting things going on at the conference. I also had a lot of fun at kind of some of the smaller talks, actually, I think, around the periphery of the conference. They often get neglected,
Starting point is 00:46:43 but there's some really really really great talks here. Gorg gave a great talk about coroutines and how they're implemented and like going into really deep details about exactly how the nuts and bolts fit together at the bottom and that's I think I think it's great to surface those things so that people have an understanding of that. But I couldn't possibly pick a favorite talk though. It's just not possible. Do you have anything else? I don't think so.
Starting point is 00:47:09 Okay. Thank you so much for your time, Shashana. Absolutely. Thank you so much for having me. Thanks for joining us. Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, I'd love to hear that. Also, you can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you can follow CPP cast on Twitter and like CPP cast on Facebook.
Starting point is 00:47:34 And of course you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.