CppCast - CppCon 2016
Episode Date: September 25, 2016Rob and Jason are joined by Chandler Carruth from Google, in this live interview from CppCon 2016 Chandler discusses the topics of his two CppCon talks and using Modules at Google. Chandler Ca...rruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening. CppCon Lightning Talks Atila Neves Mock C functions using the preprocessor Jens Weller Ken Sykes Jon Kalb Gabor Horvath CodeCompass Chandler Carruth @chandlerc1024 Chandler Carruth's GitHub Links CppCon 2016 Playlist CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures" CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" Sponsor Backtrace
Transcript
Discussion (0)
This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building.
Get to the root cause quickly with detailed information at your fingertips.
Start your free trial at backtrace.io slash cppcast.
Episode 71 of CppCast with guest Chandler Carruth recorded September 23rd, 2016.
In this episode, we interview Lightning Talk presenters live from CppCon 2016.
Then we sit down with Chandler Kruth from Google.
Chandler tells us about high performance data structures and using modules at Google. Welcome to a very special episode 71 of CppCast.
We are live from CppCon.
I am joined, as always, by my co-host, Jason Turner.
Hey, Rob.
How are you doing?
So, it is the last day of CPPCon.
We actually just wrapped up an interview with Chandler Carruth.
Yes.
And earlier this week, we talked to a bunch of Lightning Talk presenters.
Four or five, something like that.
Something like that.
So that's going to be the episode for today.
It's a bit different from the normal format.
We'll do Lightning interviews, and then we'll have Chandler.
And that's going to be it. So, Rob, this is your first C++ conference. What do you think?
I thought it was great. I really love, one of the things that's special about this conference that
I haven't seen in other conferences is all the speakers, including the quote unquote celebrities
like Bjarne or Herb, are just milling around the conference,
going to talks themselves.
Yeah.
So you can go up and have a talk with Herb Sutter or Bjarne Struestrup in between talks,
just like any other conference is tempting.
Yeah.
And everyone tends to be approachable.
Very much so.
Yeah.
Yeah.
So I've really enjoyed my time at the conference.
I really enjoyed your keynote.
Thank you.
Are you prepared to become a celebrity when that hits YouTube?
I don't know what to say to that.
Yeah.
Well, it was really a very fun keynote slash plenary talk.
Thanks.
I'm looking forward to everyone seeing that on YouTube.
I hope the video comes out well.
Yeah, I think it will.
And that's about all we have.
So please enjoy the rest of the show
and we'll be back to our
normal programming next week. We've got a bunch
of interviews
lined up from people we met
at the conference, right? Yeah.
Like six or something. Yeah, about
six new interviews.
We're going to be reaching out to all of those
speakers soon and
should have a lot of exciting content over the next few months.
Yeah, it should be fun.
Yeah.
So we are doing a special episode of CPPcast.
We just finished up all the CPPCon Wednesday night lightning sessions.
Right.
And we're going to be talking to a couple of the speakers who just gave their five-minute lightning talks.
So welcome to the show.
And what's your name?
Atilla.
And what was your lightning talk on?
I was using C++14 to mock C functions.
So, I've personally never really used mocking.
Tell us, like, what's the point?
What does mocking gain us?
In this case, I mean, I was doing it for a legacy C code base, so the whole point was
that the code was already written, not by us, and we want to make sure that it works
because we have to change it because there are bugs
and feature requests and whatnot.
So how can we make sure that the thing still
does what it's supposed to do?
And in this case, it was networking code,
so side effects, not so great of an idea
when you're unit testing, and that's why.
It's like, I want to prevent this code from calling out
the functions I'd rather it didn't.
The other reason is, in our case, the build system is really, really bizarre,
and we don't want to use the official build system because it's slow, weird,
and you can only use special VMs for it.
So the other reason is just don't call this thing.
Let's fake all the things so that we can test our core logic.
And I wouldn't have written the code that way.
It's better to write the code for testability in mind
but when the code's already there, what can
you do? So it's better to just make these
troublesome functions basically go away
and make sure that your functions
are calling them in the right way. So that's what
mocking gives you then, is a way to make a fake
function. Right, because in OOP
what happens is you pass an interface
or something like that to a function and this is
local, right? It's in the parameter list.
But if you have C code, it's global states, basically.
The name of the function is a globally known thing.
Right.
So you're basically trying to get rid of that
so that you can reroute it to any implementation you want,
and then you don't have packets being sent or right into the database
or any of the other things that would be problematic
in a unit testing center.
Okay.
That's very cool.
So what is the library and can people find it online?
Yeah.
I tell them my GitHub is called Premark because it uses the preprocessor to get rid of these
functions and reroute them to different implementation.
So the other macros I provide make it so that you have to write the least amount of boilerplate
possible to get this to work.
So it writes the code for you. Excellent.
Okay. How are you enjoying the conference
so far? It's been fantastic. Even better
than last year. Not saying a lot.
Okay. Thank you very much. Thank you.
Jens, we've already had you on the show before,
but you just gave a lightning talk. Why don't you tell us a little bit
about what the talk was?
Yeah. I
wanted to share a little bit of information
about how to present code.
Of course, I think that this is what, you know,
a lot of things happen at the conference,
and most people are talking about code.
And I think that if we find better methods
or if we get better at this,
we all as a community are having a huge improvement.
It's probably very thing to do right.
Yeah, and as you mentioned, your lighting talk is somewhat based on the talk Scott Myers did before.
Yes, Scott Myers gave a keynote in 2014 at MediC++,
which also was, I think, his last public talk. And part of that keynote he dedicated to how to present
and how to prepare materials for the modern age.
And I was always very interested in that.
It got me thinking and I thought it's probably time to have like...
I think it's probably time to start to prepare this as like a set of materials that people are able to look it up.
And I want to have my speakers for my conference, but also for the other conferences, to improve their presentations.
And also that people who start presenting have materials that they can look up what are the best practices.
And so make it easier for people to get started with giving talks.
So if you wanted to pick one key point as a teaser for our listeners,
what would it be?
What's your most important point?
Oh, that's a good question.
I think the main point of my lightning talk was that it should be clear what and how you want to present your code
and that you should highlight what is really important to the viewer of the presentation.
Okay. And you said there's probably different techniques you go about doing that.
Maybe what are some of those
techniques for presenting the code,
highlighting the most important parts?
Yes.
There is a lot of
different ways
to prepare a presentation
and some of those ways are
having a better option to
integrate code in your slide deck
and some other programs like PowerPoint or OpenOffice are not as well prepared for that.
So you have to go for screenshots or it's a bit more work than if you really want to
make this in a good way.
And on the other hand, how do you want to solve this problem or how do you tackle this problem
there are a lot of different ways to do that and i think this is just a process which is starting
i'm interested in having a discussion on this and to get the speaking community inside our community
and maybe also from other communities just think start thinking about this and that we find better solutions
how to present in the future.
Brilliant.
Okay, thank you very much.
Tell me a little bit about your thoughts
on the conference so far.
I think it's, again, a very good conference.
It's big.
One thing which I think is, like,
again, very difficult to choose where to go,
which talks to see, as it's a lot of conference and power.
And I, again, see a lot of things like, you know,
what do I want to do at my conference and how do they do things here.
And so I like the keynote so far.
The keynotes are better than they were last year, in my opinion.
And on the other hand, the off-track, like in the breaks,
meeting people, talking to people, that's also very good.
I met a couple of new people which are really interesting.
And meeting again a lot of other friends.
Okay, thanks for your time today, Jens.
Thanks.
Okay.
Ken Sykes, you're here from Microsoft, right?
Yes, I'm from Microsoft.
And you just gave a lightning talk on?
On improvements to the Windows debuggers,
NatVis, data model, that kind of thing.
Okay.
So, NatVis has been around for a little while.
I'm familiar with that. But you've made improvements to it? that kind of thing. Okay, so NetVis has been around for a little while.
I'm familiar with that.
But you've made improvements to it?
Well, the big improvement is Visual Studio has its own debugger package.
They've had NetVis for a while.
We brought it to the Windows debugger,
WinDBG, CDB, and TSD.
So now those debuggers work with it as well.
I never actually used WinDP
or the others myself.
So I've opened
WinDBG, and I've maybe
attempted to debug with WinDBG,
but I don't think I've ever been successful.
Well, it's a bit of an acquired taste.
Yeah, definitely.
So this should make it a lot more
palatable to someone who's maybe more
familiar with a normal ID debugger.
Right, yes.
So, like I said, common types, like a std string, it should just show the string.
And now it does.
Maybe a question just for those who aren't familiar with WinDBG.
What are some of the use cases for why you would open up WinDBG?
Right.
Well, I work on the Windows OS, so we receive crash dumps from all over the world through Watson.
So every time you hit send report, it makes our lives a little bit more difficult.
But yeah, so we open those things up.
At least internally, we have a bunch of additional extensions and things with our internal symbols to figure out what's wrong with our product.
But you can imagine other companies like Adobe, right?
They also receive dump report, dump files that they need to process as well.
And so me, I work inside Windows.
I use it all the time.
I know Visual Studio can open dump files,
but I've basically never done that.
So just the opposite side.
So basically any time you're debugging an application
that's actually out, released in the wild,
you might be able to get a crash report
that WinDBG could be useful for.
Yeah, that's a common case.
And a lot of times we have to debug issues
where we don't have, it's not our program,
it's our program running on,
it's another person's program running on Windows, right?
And so that's another use case where, I don't know,
maybe it's just something I'm used to, but.
So I'm curious about something in your intro and bio.
I believe they said you've been with Microsoft since Windows 3.0?
Yes, that's right.
That is quite the history.
Yes.
What's that been like?
Well, it's been a lot of fun.
This is my second tour of duty with Microsoft.
I worked there from 89 to 2000.
I came back in 2004 working from D.C.
So I work remotely for them.
And I've done that for about 12 years now.
Worked on lots of different things.
I've worked on Paintbrush, PostScript Driver, GDI,
back in the Win9X days,
the shell, Windows Runtime,
and now the debuggers.
Wow, very cool.
So, yeah.
Just wondering, does Microsoft feel different from your perspective over the last few years with their embracing the open source movement
and working with Linux, things like that?
Yeah, there's definitely...
I see changes there.
I mean, the interop is cool.
At least internally, they're more open to us using open source
as well as supporting open source development
by other external people.
And so there's a little
less of just inventing everything yourself.
So it's nice.
It's like I actually get to look at what Boost does
now. Wow.
It's this new cool thing.
So it's fun.
Okay. Well, thank you
for joining us today. Alright, thank you.
So John,
conference organizer, you just finished a lightning talk.
I did.
What was the talk called?
It was called Unsigned, A Guide to Better Code.
And do you want to tell us about the origin of this talk?
Well, James McInnes, he was, I know he rarely tweets, but he tweeted something about how
awful signed numbers were.
And I challenged him on that.
I said, no, no, no.
You shouldn't use unsigned.
That's the thing.
And so he took this contrary position.
And I said, well, we're not going to battle this out on Twitter because, for one thing, he's got a huge advantage.
And 140 characters isn't really a good way to discuss things. So I wanted to put him on my turf.
So I said, let's battle this out with Light Detox at CppCon.
And he agreed.
I kid you not.
I'm sure it's stored in the Twitter history there.
He said yes.
And I think he then promptly forgot it.
But, of course, that night I had made my slides.
Anyway, I made the slides.
I presented them at a local user group just to kind
of go through it and then i forgot about them until uh last night i was sitting next to i just
happened to be sitting next to james at dinner and i don't know what what it was that caused it i
suddenly realized hey wait we're supposed to have that duel and uh i hope we can still get lightning
talks in it changes i don't in. And James, I don't
have slides. And I said, James,
don't worry. I'll write your slides for you.
And he didn't buy into that
at all. But because I had my
laptop right there with me, I pulled it out. And I
showed it to him. And he said,
and he even tweeted this out. He said,
oh, you're really making me look bad.
And I said, well, that was the point.
But he said, well, yeah, I can't really disagree with what you've said. And I said, well, that was the point. But, uh, but he said,
well, yeah, I can't really disagree with what you've said. And I said, oh, then I really want
to write your slides for you. We'll just have you say how much you agree with me. And that was the
end of the duel. Right. But, uh, he was a very good sport. He, he said in the audience and took
my ribbing. Um, so I had some technical points to make and, I'm sure if James had presented his side of it, there's probably less disagreement than a duel makes it sound like.
But it was a lot of fun, and I think it gave some people some things to think about.
Yeah, definitely.
How's the conference going so far?
I'm having a lot of fun.
I'm getting a lot of really positive things from people.
The thing about being a conference organizer is that everything that goes wrong, you hear about.
A lot of times you have to deal with it.
But a lot of times it's dealt with by somebody else.
But they just let you know, by the way, this happened and here's what I did.
So I know everything that's gone wrong.
I suppose not everything.
It's probably other things, right?
But I'm hoping that most of those things got taken care of before
most, or maybe even all, attendees
saw it. So I think
the illusion we create for the attendees is
that everything's going perfectly smoothly,
and I think we've got most of them fooled
for sure, which is the goal,
right? I think as long as the snacks
show up on time,
we're all pretty happy then.
Yeah.
Yeah. I don't know. I haven't heard the complaint
I kind of expected to hear, which is that
the snacks this year, there's a lot less sugar.
We're downplaying
the chocolate. There's no
bagels in the morning. It's fruit in the morning
and yogurt.
And it's healthy.
I think part of it is that
the health is part of it,
but also I think that eating sugary things causes you to eat more sugary things.
There were some issues last year with people taking way more than they should
and then people complaining because they didn't get any and stuff like that.
And I just thought, nobody ever takes too many apples,
and nobody ever complains when they don't get the apples.
And I haven't had those complaints this year.
Well, it worked out well from my
perspective. I have enjoyed the yogurt
in the morning one day.
So you're looking forward to the end of the conference
when you can actually watch some of the content?
I am.
So
the goal for the
plenaries is to get it up within 24 hours, but that's
kind of a stretch goal. We didn't make it this year. I don't think we have any plenaries is to get it up within 24 hours, but that's kind of a stretch goal.
We didn't make it this year.
I don't think we have any plenaries up.
It's Wednesday night, but I think we're going to have one tomorrow.
And they'll get up real quick.
The main goal is to have all of the sessions up in one month.
And that includes the lightning talks that we've been discussing today.
The lightning talks, yes.
People should be able to watch the lightning talks that you're talking about.
Okay.
In fact, you might want to, I don't know what your plan is,
you might delay this for a month and then put it up when people can see it.
That's not a bad idea, actually.
Okay.
Well, thanks for your time today, John.
Thank you, guys.
Thank you for the great job you're doing.
Thank you for doing a session talking about your experiences behind the microphone.
Theoretically, that'll be up.
Theoretically, yeah.
Okay, thank you.
So Gabor, you just gave a lightning talk at CppCon.
Can you tell us what the talk was?
Sure. So the talk was about a tool which is called Code Compass.
There are several tools already which helps us develop software mely a Code Compass-t. Már két képe van, ami segÃthetÅ‘ségünkre készül,
hogy jobban legyünk működők, amikor képeztetjük a kódot.
Ez a képe egy kicsit más, mert a fÅ‘ célja, hogy segÃthetjük a kódot érteni.
Szóval ez egy képe a kód megfelelődése.
És ez a Clang
kompilára szükséges.
Ez a kódot készÃt, és
sok fejlesztÅ‘t készÃt, mint
szolgálat,
és kétszerzés, és
ismét
különböző
grafokat készÃt.
Úgyhogy
vizualizációk válik, úgyhogy based on the code. So it uses visualization techniques
like generating UML diagrams from the code
or code diagrams, component diagrams,
and also it can help us show very relevant information
that are from different files in a very concise way.
So we do not need to remember which function is at which file,
and we do not need to switch between those files all the time.
So you actually were using a web browser to browse the code, right?
Is that how that worked?
Yes, yes.
So basically there is a web server running,
and you can connect to it with a browser.
And that tool is only for viewing the code,
so you cannot use it as an editor.
But usually the changes that we do while we are developing is very well contained in a small subset of the codebase
and we can still use it to navigate the rest of the code and it proved to be very useful for us.
And it was open source recently, and it can be found on GitHub.
So I think you should definitely try it.
Okay, so just to clarify, this is going to be running on some other machine,
you're going to have your repositories checked out,
and you're able to go into your browser and navigate that way.
Well, it depends on if you would like to, you can run it locally.
You could run it locally.
Okay.
So it takes some time to parse the code.
Usually it is slower than the compilation. So you might want to do the parsing on nightly builds on a server,
and then you can connect to that and use that through the browser.
That's interesting.
I could see a large organization or something having this run nightly
and then just having it available to the whole organization.
Yeah, definitely.
That's exactly the use case that we are using it for.
Cool.
Okay.
And how are you enjoying the conference so far?
It is a great experience.
I just met somebody who was changing emails very frequently on the mailing list,
and I just realized that he was that guy.
So that kind of experience is great.
I think it's first to go to conferences because of that.
I've noticed a lot of people on Twitter, for instance,
don't even use a real picture of themselves,
so trying to recognize them here,
and then even if they do have a picture,
it might be 10 years old,
and then it's that much harder still to recognize people.
Yeah.
You also have a case where you feel like
you've already known someone,
even though you've only interacted with them on Twitter.
Oh, right.
You're like, we've met, haven't we?
Right, exactly.
Okay.
Well, thank you for your time today.
Thank you.
Thanks. I wanted to interrupt this discussion for your time today. Thank you. Thanks.
I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors.
Backtrace is the debugging platform that improves software quality, reliability, and support
by bringing deep introspection and automation throughout the software error lifecycle.
Spend less time debugging and reduce your mean time to resolution
by using the first and only platform to combine symbolic debugging, error aggregation, and state analysis.
At the time of error, Backtrace jumps into action,
capturing detailed dumps of application and environmental state.
Backtrace then performs automated analysis on process memory and executable code
to classify errors and highlight important signals such as heap corruption, malware, and much more.
This data is aggregated and archived in a centralized object store, providing your team
a single system to investigate errors across your environments.
Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to
modernize their debugging infrastructure.
It's free to try, minutes to set up, fully featured, with no commitment necessary.
Check them out at backtrace.io slash cppcast.
Okay, so we are joined today by Chandler Carruth.
It's the last day of CppCon.
Chandler, welcome to the show.
Thanks for having me.
So you did two talks this week, right?
Yes, I did two talks.
Do you want to tell us about the first one?
Sure.
So the very first talk I gave was trying to kind of continue something I started two years ago, which is giving kind of background information for
people about how to write really high performance C++ code. I think that there's a lot of
interest in writing high performance code, but there's actually not a lot of material about how
to do it effectively and how to do it without making your code really bad. And so a lot of what I'm trying to do is give people kind of patterns they can follow that
are going to make their code really fast and that are going to actually be sustainable
long term.
And this year in particular, I talked a lot about concrete use cases or concrete techniques
you can use based on LLVM's code itself.
LLVM has a collection of data structures that it uses to
make a lot of its algorithms, a lot of its
code very fast, very efficient.
I presented an overview of those,
how you can incorporate them, some of the
unusual and surprising tricks that are used
that make them especially effective.
But really focus on data structures
and how to make those fast.
And
trying to actually tie it back to real-world use cases.
So it's always difficult to get into technical details on the air,
but is there any particular piece that you might want to pull out
that you'd say, this is like the tidbit, this is why you should watch my talk?
So the key idea is that you can have your data structures
and you can cause them to be customized in their behavior
as the program dynamics change.
And the classic and best example of this are small-size optimizations.
And there's just a tremendous amount that you can do with a small-size optimization
to allow a data structure to be very lightweight and fast when it's small
and actually still scale very effectively as it grows large.
And then there's a lot of tricks you can use to kind of amplify the effect of that kind of thing
by packing data into smaller and smaller spaces,
causing things to be very, very dense
and very, very cache-friendly.
And so those are kind of the two techniques
that interplay in the talk I gave.
That's particularly interesting
because I actually overheard a hallway conversation
from people saying how they didn't really understand small object
optimizations and they wish they had more information
on it. I mean, the idea
was to try and show people how
that can actually be one of the most effective
data structure techniques.
And that was your hybrid data structures course.
You did another one on undefined
behavior? Yeah, the second one was
I think actually a higher level
talk in a lot of ways.
It wasn't just kind of walking through techniques.
It was trying to give people a new set of ideas, a new set of language to use and terminology to use to talk about undefined behavior, to talk about problems in their code.
There's been a tremendous amount of frustration and friction in the C++ community and in the
wider programming community around undefined behavior and bugs and security exploits that
stem from undefined behavior.
And I actually think that all of that really misses the key thing.
These bugs and the security exploits, they don't stem from undefined behavior. They stem from the fact that we have incorrect programs
because either the programmers didn't realize that there was a bug in their code
or because we've designed the language, we've designed an API in a way
that makes it brittle in the face of reasonable programs.
It's hard to actually use the language of the APIs correctly.
And once we start focusing on
whether it's easy or hard to use
the language of the API correctly
and how you can use it incorrectly
and what it means for a language feature
to be used incorrectly in the same way
that an API can be used incorrectly,
we can start realizing where the trade-offs
really lie.
We're actually making a conscious trade-off sometimes
to provide language features that have very narrow scope.
They end up with narrow contracts.
They end up with very narrow use cases.
And if you go outside of those use cases,
you end up with undefined behavior,
but that's because you end up with an invalid program,
not because we're just trying to break people's code.
And I tried to give some ideas about when this is a reasonable thing to do.
For example, when there is no hardware that is truly portable and universal that can implement
the behavior in a consistent and reasonable way.
When you would have to make performance trade-offs, where you'd have to actually pessimize the
performance on one platform in order to define
the behavior on another platform.
Those kinds of trade-offs really don't fit with the spirit
of C++. And so what
we need to do instead is have a reasonable way
for people to write software and
not run into these issues.
So I suggested principles around
what we do to kind of
make a principle choice
to narrow the contract of a language feature
without leaving landmines, right?
Without leaving traps for people to fall into.
And those center around being able to check for mistakes,
at least probabilistically.
Also, being able to explain a rational model
for how you're supposed to use the language feature
rather than there just being strange one-off rules that you have to memorize and recall.
And if you get it wrong, you have an incorrect program.
And trying to also respect the existing code.
So one thing that I think is always really risky is if we have very widespread programming patterns
and we introduce a language feature or, heaven forbid, we change a language feature
in a way that works directly against the grain of those widespread programs,
we shouldn't be surprised that people dislike that and that they run into problems there.
I think we actually need to look very carefully at the existing code when we're doing this.
And we even need to look at existing code when we're kind of reconsidering past choices.
You know, I think it's seriously a possibility that the C++ language has some mistakes in it.
We might actually need to change things.
And we should look at the existing code to inform those decisions.
So I might be going a little off topic here, but Claim does have an undefined behavior checker, right?
Yes.
How does that, does that work well with the things you're talking about?
Absolutely.
I mean, the reason why I now, I advocate so firmly for, you know, have narrow contracts for language features,
which could result in undefined behavior if they are misused, is because
we can very consistently
check code
to make sure it isn't misusing them.
And that comes from
the undefined behavior sanitizer,
and it's where
Client can go in and insert checks
that make sure before your code
hits undefined behavior to see if you're actually
going to satisfy the contract
or the language feature you're about to use.
And it's not just client.
GCC actually has a version of it as well.
So it's this idea of checking for bad behavior dynamically when necessary,
statically when you can,
and using that to kind of test and ensure your code is correct,
that takes a lot of the guesswork out.
That makes it much more reliable,
and that removes a lot of the risk and uncertainty for me
around the existence of these things.
That's not to say that we don't still need to have a clear rational basis
for having a contract that says you can't pass a null pointer here
or that you can't do this operation.
We still need to have a good reason for doing it.
But when we have a good reason,
we also need to have tools to help programmers out.
Okay.
One of the things I wanted to ask you about was modules.
I know one of your colleagues from Google did a talk on modules yesterday.
Gabby did one this morning.
Do you want to give us an update on your thoughts
and Google's perspective on modules?
So my thoughts around modules, I think,
are really well captured by the two talks that we gave,
that Google gave at CppCon this year.
We really wanted to give kind of a story of modules, right?
And it comes in two parts.
The first part is that we've actually deployed modules in our code base
and it's a very, very large-scale deployment.
So about 10% of all of our code is built into modules now.
We talked to Titus Winters recently
and he told us you have something like 10 million lines of code.
So that's about a million or so lines.
No, we have hundreds of millions of lines of C++ code.
So we have several tens of millions of lines of C++ code. Okay. So we have several tens
of millions of lines of C++ code built
into modules. Technically, what Titus said
was it's more than 10 million lines and I'm not
allowed to tell you how much.
So I can be a little bit more specific. We have
hundreds of millions of lines of code.
I can't tell you how many.
It's on the order of 100
million lines of code and
substantially more than 100 million lines of code. And so we've got tens of millions million lines of code and substantially more than 100 million lines of code.
Okay.
And so we've got tens of millions of lines of code
being built into C++ modules,
and we've got those modules in turn being used,
being imported into all of our code.
So all of our primary C++ code base is actually using modules.
We're building into modules kind of a very, it's a very narrow
subset of our code base. It's one we control a lot. It happens to be very big. And that's all
of the generated code for protocol offers, which are kind of an interface description language
that lets us build up messages, serializable and deserializable messages, as well as communication protocol
APIs.
It's really, really nice.
We use it very heavily.
It's open source.
You can find out about protocol buffers.
But internally, we have so many of these protocol buffers, they generate just a massive amount
of source code.
Probably a fairly large fraction of the source code that the compiler actually is parsing,
were, before we were using modules, header files from protocol buffers.
And so what we did was we focused on this because we could control all of the generated code for protocol buffers in one place.
It's generated code, so it's actually centrally controlled, despite being generated throughout Google's code base.
And we generate modules now for all of the protocol buffer code and for all of the code that protocol buffers
relies on, all of its dependencies.
And it ends up totaling about 10% of our code.
And this means that we're building these modules,
but we're also importing them everywhere
any code uses a protocol buffer library.
And so they're getting imported essentially everywhere.
And that gave us a lot of experience.
It was very hard to do.
We did have to make changes to our source code,
but what we've been using is a very special form of modules
that we've built into Clang.
And that's what Richard Smith talked a lot about.
We've actually built a C++ 98, essentially,
form of modules into Clang.
It's not using substantial language changes.
It's really leveraging the existing language and a very particular compilation model.
So we look at what header files are actually modular and could be built into a module.
And when we see a header file that's one of these,
we build it into a module,
and rather than textually including it,
we import that module semantically automatically.
This means you don't have to write import in your source code.
You don't have to use the module syntax in the header file.
We can get a lot of the benefits
and kind of experiment with what it means
to change the compilation model in this way
without kind of committing our source code to
one syntax or another, which
is really nice. It gives us a lot of experience,
but we're not pated into any
kind of corner, right? Regardless of
what the standards committee ends up choosing to
standardize for syntax, we're going to be able to
adapt because we haven't
actually written anything into our source code
about modules.
We did have to make some changes to our source code.
They just weren't specific to modules.
They were kind of triggered by the modules built.
And the changes all centered around making our code either more modular
or better factored or removing bugs from the code that modules allowed us to detect.
Interesting.
And so some examples of this are we want our headers to be standalone. or removing bugs from the code that modules allowed us to detect. Interesting.
Right?
And so some examples of this are, you know, we want our headers to be standalone, right?
We want them to parse plainly from a clean slate.
And we found lots of header files for which that wasn't true.
But the changes there are strict improvements, right? We wanted this to be true long before we had modules.
We just hadn't managed to actually check that you could parse the header file from scratch.
It happened that it always
got included in the right order.
Things just happened to work. And we started
detecting those issues.
We also found ODR
violations. And these would be ODR violations
that just happened to not occur
textually, but when you start
doing the semantic import, we were able to detect
them. And so we would flag those as issues,
and then we would go and fix them in our source code.
But the nice thing about this is that all of these
are pretty clearly bugs in the source code.
They were never intended.
And so even for library code,
which is really intended to be C++ 98 or 11 forever,
and to use textual inclusion forever,
we were actually able to make changes
and adapt them in a way that allows us
to kind of import them semantically,
which is really nice.
And then we wanted to kind of take all of that experience
and the implementation technique
and see how that would best fit
with kind of a standardized feature
that actually adds in syntax
and most importantly adds in kind of controls
so that now we actually have export controls.
We can actually control which APIs leave a particular file
and are available to consumers instead of it being everything
because that was the model of includes.
And in doing that, we actually noticed very particular changes we felt like we needed
in the module's proposal, but surprisingly narrow changes. For example, all of the fundamental
pieces of it work great. The only really interesting change we pushed for is having some way to
actually bridge between these C++98 or C++11 libraries that are being developed
in a textual world.
They're not likely to change overnight.
They may have users whose compilers are going to be much slower to update.
We want to still have those.
We also have, and those are leaf users, we also have libraries which are near the bottom of
our dependency graph, like very core and fundamental libraries that either also have users that aren't
going to update or that just aren't going to be changed anymore because they're legacy,
they're stable, we don't want to touch them. Both of these cases, we have these kind of
legacy holdovers, and we want to be able to modularize libraries in the middle
because that removes kind of an ordering constraint for modularization.
It lets us rapidly provide the modular benefits to users that want them
without them being gated on some other team or some other project,
having time in their schedule to make those changes.
And from our experience doing this rollout,
we needed a few changes to make this really work well
because we found that there's a really large prevalence
of various pieces of kind of C++11
and textual inclusion tied API design
throughout these APIs,
and we need some kind of legacy mode that enables
those to work well.
And Richard's talk, I mean,
this is a complicated topic, so
Richard's talk goes into all the details
and kind of walks you through examples of like,
here is the precise code pattern
that it turns out doesn't
work well unless you have
a very particular kind of legacy
mode that allows you
to transform a header file into a module in a kind of safe and predictable way. Interesting.
So obviously listeners should go and watch Richard's talk once that's available online,
but just one more question on that. What types of improvements did you see in compilation speed?
So this is actually an interesting thing, and Manuel gave some of the things we've seen. We actually
saw pretty good improvements
in compilation speed, but one of our
constraints is we need to deploy this
not to local builds, but to a
very large distributed build system.
And one of the challenges with modules
there is that we have to send all
of the modules that are inputs to a compilation
to the actual
distributed build worker that's going
to do the compilation. And modules are large, right? There's a space trade-off here. The most
compressed and smallest representation of a C++ API is probably the header file's text.
It's really an efficient representation, even when it's really large. It's a really efficient
representation. So the module files end up. It's a really efficient representation.
So the module files end up being a larger encoding of that information. And so we actually saw that our compilation times would in many cases drop by a factor of two.
But distributing the actual build, getting everything set up on the remote worker and doing the compile,
would end up eating a lot of those games.
And so we actually saw fairly small average case compile time improvements.
This is just an initial attempt, right?
We're only at 10% of our code base is modularized.
We still have plenty of textual inclusion going on as well.
And it's early days in terms of implementation experience.
A lot of the build system overhead we're hoping we can address and kind of get closer to that
2x.
And we're hoping that we can make the compiler better and have more of our code modularized to
get even better than 2x but that's definitely something in the future right now we're actually
not seeing a lot of improvements on the average case interesting thing is that that wasn't the
primary target our primary concern about compile time is not the average case, but the long tail.
The 90th percentile or the 99th percentile of the slowest compile files.
Because if you do a large build of software, you're compiling hundreds, thousands, maybe tens of thousands of source files.
The 99th percentile compile time probably shows up in most of your builds.
Even though it's an edge case, it's actually almost always the edge case,
the tall pole in your build.
And there we saw really dramatic improvements.
2x, 5x more.
We've seen really dramatic improvements
in some of the long tail compile times.
And that's, I think, the really exciting part
of C++ modules.
Because you're really in a risky game of long tail latency.
There's actually a great paper by Jeff Dean
who talks about, it's called the tail at scale.
And the idea is how long tail latency
has a disproportionate effect
on large scale distributed systems
because of the aggregation factor.
As you fan out across a large scale system, right,
you end up having this multiplicative factor on the probability of
encountering a long tail step.
And so even though you have these like very,
very unlikely long tail latencies, when you scale up the system,
you end up making them much more likely again. And so there's this really surprising and disproportionate effect.
And we're definitely seeing that in compiles inside of large distributed
build systems. And so for us, one of the biggest things is making that tail come
in. And then we notice the other thing is that user latencies also tend to look
more like a latency game, less like a distributed build,
because the user's not building everything, right?
They're making one edit, and then they're rebuilding.
And for that, we also see really impressive speedups,
because it's very localized, it's very specific.
Cool.
Do you want to tell us a little bit about your thoughts on the conference in general?
Are you enjoying your time here?
I always love this conference.
I mean, I've loved this conference since the first time we did it.
I am very glad
that when John Kalb started
talking about this conference, I
pestered him for about
five hours into the very small hours
of the morning in Aspen
until he actually agreed to consider
doing it for real.
I'm so thankful that the
Sequel's Bus Foundation
was in a good position to kind of step forward
and work with John to cause this conference
to come into existence.
I think having a community is one thing.
Having a place for that community
to kind of come together and to exchange ideas
and to really cement its relationship is important.
And I don't think the community would be as strong without it.
I think you can see this in the dramatic increase in the quality of information about C++,
in the teaching of C++.
People are now prioritizing C++ language features as really important things to have in their workplace,
in their projects, to a degree that I don't think was happening before.
And I think it's going to help contribute to the kind of resurgence of C++
that we've been enjoying for the last few years.
Is this the third year for this conference?
It's the third year for the conference.
Okay.
And it's growing each year, which is just a great sign.
I mean, my big thing is I want to see the conference grow
because I feel like there remains this really large body
of C++ developers that we're still not reaching
and that we can do even more for.
I mean, we post all of the videos on YouTube, right?
We want to broadcast the information as widely as we can.
This conference is not trying to, you know,
like claim to any of the information here.
But I think there's actually a lot of
value for having people here physically, for actually getting everyone together. And so
I want to see it grow to be as large as it can be.
So officially we're right around 900 people or something like that?
I think so. I think it's over 900 attendees and speakers.
And there's several million C++ users around the world,
so it definitely has growth.
Right.
And I think it's just a challenge that we have to continually
kind of push more value and make the conference both more valuable
and also more accessible to C++ developers around the world.
Right.
Yeah.
Any favorite talks of your own on the mission?
Favorite talks?
I mean, I had an absolute blast
at all the modules talks this time.
I think it's one of the most exciting things
going on at the conference.
I also had a lot of fun
at kind of some of the smaller talks,
actually, I think,
around the periphery of the conference.
They often get neglected,
but there's some really really really great talks here.
Gorg gave a great talk about coroutines and how they're implemented and like
going into really deep details about exactly how the nuts and bolts fit
together at the bottom and that's I think I think it's great to surface
those things so that people have an understanding of that. But I couldn't
possibly pick a favorite talk though. It's just not possible.
Do you have anything else?
I don't think so.
Okay. Thank you so much for your time, Shashana.
Absolutely. Thank you so much for having me.
Thanks for joining us.
Thanks so much for listening as we chat about C++. I'd love to hear what you think of the
podcast. Please let me know if we're discussing the stuff you're interested in, or if you
have a suggestion for a topic, I'd love to hear that. Also, you can email all your thoughts to feedback at cppcast.com.
I'd also appreciate if you can follow CPP cast on Twitter and like CPP cast on
Facebook.
And of course you can find all that info and the show notes on the podcast
website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.