CppCast - Standard Library Hardening
Episode Date: April 11, 2025Louis Dionne joins Phil and Timur. Louis talks to us about his role as code owner of libc++ (clang's standard library implementation) and the standard library hardening proposal that was just accepted... into C++26, why this is important, and what you can do even today. News GDC 2025: How Build Insights Reduced Call of Duty: Modern Warfare II’s Build Times by 50% C++ Core Guidelines issue to remove .h recommendation for headers Reddit discussion “Note to the C++ standards committee members” - Bjarne Stroustrup Links P3471R4 - "Standard Library Hardening" "Retrofitting spatial safety to hundreds of millions of lines of C++" - Google Blog
Transcript
Discussion (0)
Episode 396 of CPP Cast, recorded 28th of March 2025.
["CPP Cast Theme Song"]
In this episode, we talk about C++ built insights,
which file extension we should use for C++ header files,
and a leaked message by bjarnestroestrup.
Then we are joined by Louis Dion. Louis talks to us about standard library hardening.
Welcome to episode 396 of CBPCast, the first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler, joined by my co-host, Phil Nash.
Phil, how are you doing today?
I'm all right, Timo.
How are you doing?
I'm all right.
It's a bit hectic right now.
I just came back from a visit to my hometown and in a couple of days I'm all right, Timo. How are you doing? I'm all right. It's a bit hectic right now. I just
came back from a visit to my hometown and in a couple days I'm going to the UK to where you are
now to attend ACCU and I just have a couple days in between. There's lots of family stuff going on.
So it's a bit hectic and rushed at the moment, but otherwise I'm doing great. Yeah. How are you,
Phil? That sounds like a quiet time by your standards. But yeah, it's pretty hectic for me as well.
Obviously, preparing for ACCU. I haven't even started preparing my talk yet.
Oh, yeah. I haven't started preparing my talk either. And I think mine is on the first day,
isn't it?
Last day, I think. But we'll talk about that next week.
We'll talk about that next week. So, Phil, just to remind our readers, you're actually the
event manager of that conference, right?
That is correct.
Yes.
All right.
So yeah, I can imagine I have organized conferences before, so I can imagine
how much work that is.
So, yeah, thank you very much for doing all of that and, and, um, yeah,
throwing together an amazing conference and juggling it with all the other stuff.
So, yeah, well, I hope to see a few of our listeners there as well.
It's a great conference. If you've never been to ACCU, I can highly recommend it.
And we will all be there.
All right. So at the top of every episode, we'd like to read a piece of feedback.
And this time we actually got quite a lot of feedback.
We got quite a lot of emails from people telling us that they're
really happy that the show is back because we were on hiatus for
something like four months and then we came back two weeks ago.
And now we're doing we're back to our usual every two weeks schedule.
And we got a lot of feedback that people are happy with that.
So that's really heartwarming.
Thank you so much for this feedback.
It's a good feeling to know that people are listening
to the show and they care about the show.
So I just want to read one of those emails
because obviously we can't read all of them.
There were quite a few more, but this one is by Yerg.
And Yerg says, hi, this is just my part of reducing
the spam ratio in your emails, smiley face.
Thank you, Yerg, I really appreciate that.
I have nothing in particular to say, except that it's great to hear you back and well.
I just realized how out of touch I am or was with the latest developments because your podcast is apparently my primary source of info on C++.
Well, we're very honored to be your primary source of information on C++.
Thank you so much, Jörg.
Yeah, it's a bit scary as well.
It is. I hope you're doing a half decent job, but yeah, this is part of the fun.
So thank you very much, Jörg and everybody else for listening.
And please keep sending us emails.
We'd like to hear your thoughts about the show.
And while you can still reach us on X, Mastodon and LinkedIn, the best way to give us feedback is by email at feedback at cppcast.com.
Joining us today is Louis Dion. Hi, Louis.
Hey, thanks for having me.
So Louis has been doing C++ for a while. In the past, he's done a lot of meta
programming with libraries like BoostHana and TypeEraser with libraries like Dino.
And he now works at Apple, where he is the co-owner of libc++.
And if people don't know what that is, that is the standard library implementation of LLVM,
the one that comes with the Clang compiler.
So that is a pretty big deal.
And Louis is an active C++ standard committee member,
and recently has been focusing on making existing C++ code safer
by leveraging libc++'s position in the stack
to deliver incremental improvements to large amounts of code.
Well, Louis, that sounds amazing. So welcome again to the show. And I'm really looking forward to
talking to you about all of this. Yeah, I'm looking forward to talking about that as well.
But I want to pull out one thing from the bio that we're not going to be talking about. So now's my
only chance. Because you did mention that you worked on Dino back in the day. I know it's been
a few years, but I've got a bit of an interest in type erasure at the moment myself. So one
of the things I've been thinking about is how reflection, if and when we actually get
it in C++, may impact type erasure libraries because it can be a bit awkward to use.
It is so awkward, right? So I think it is actually one of the biggest game changers for Type Erasure. I remember like even
back in the days when I got involved in reflection a little
bit, I think it might have been some isekwami thing. I gave a
presentation where I basically used the reflection facilities
back then, you know, which were completely different from the
ones that we're standardizing right now,
to show how we could re-implement Dino in just a few lines. It fit on a couple of slides, basically. So it's going to be a game changer. It's going to be a lot simpler to implement. It's going to be
better, easier to use. It is going to be a game changer for this specific area of C++. I think
that's really good news. Actually,
I'm looking forward to that. Nice. Foreshadowing what are you looking forward to in C++? We'll
ask a bit later. All right. Yeah, we'll get more into that later. So Louis, hold on. Because before
we get to you, we have a couple of news articles to talk about, but feel free to comment on any
of these, of course. So today I got three of them. The first one actually has to do with a conference
that just ended. And it's not one of those C++ conferences that we usually
talk about here on the show, but it is one that a lot of people attend. And
that is interesting, which is the Game Developers Conference, GDC 2025. It's
just wrapped up. And in connection with that conference, there is actually a
new blog post on the Microsoft Game Dev conference, there is actually a new blog
post on the Microsoft game dev blog, which is again a blog that, you know, I don't like look
into every day, but like every once in a while, there's stuff in there that I'm like, Oh, yeah,
this could be really interesting to quite a lot of T plus plus people. So this one is called how
built insights reduce Call of Duty modern warfare modern warfare to build times by 50%.
So that's a pretty big number.
So this blog post is mostly about C++ build insights, which is a tool that ships with
Visual Studio, if I'm not mistaken. And it's about optimizing build times, as the title says,
which is a huge deal for video game people, but also for many other CW developers.
So the blog reads a little bit like a marketing sales pitch for that product, but it's still
really interesting and I recommend you read it. The approach roughly is that they measure
the performance of the build using VC Perf and MS Build and a bunch of Microsoft tools.
They analyze the resulting profile,
they use Windows Performance Analyzer,
they find inefficiencies in the build process,
and then they implement optimizations
based on what they found and they re-measure
and then keep doing that.
And the article goes into quite a lot of detail
about what they found and what they've done
to cut the compile times on this
really big game by quite a lot. And there were three of them that I thought, that's really
interesting stuff. That's kind of cool. One of them was that they found force inlining,
which I think a lot of game people and probably a lot of other people, they just slapped force
inline all over their code. That was actually significantly increasing compilation time. And
that was primarily due to nested calls on forced inline functions.
The one function in particular had 13,700 forced inline calls, just one function.
Because there was like a combinatorial explosion in call counts.
And that resulted in over a minute of extra compile time.
Just that one function being forced inline like thousands of times.
Then the other one was, there was a single function with particularly large
dynamic initializers that were somehow triggered by it, and that was adding
millions of unnecessary operations.
And that was slowing down a whole program analysis, which is something
that you do after you compiled your individual like translation units.
And that's quite important to get more optimizations and make the game faster.
And that single function was slowing the whole program
analysis down by ninefold, like, nine times.
So that added another three minutes to the build.
They got rid of that.
And the last one, I thought, was my favorite one.
They actually found issues with Visual Studio itself.
They found that
link time code generation, which is like one of the build steps that takes up a lot of time,
was using a hash table implementation that wasn't very well suited for the tasks and created lots of
cache misses. And so those cache misses were again slowing down the build by quite a lot.
And so then they actually have to go and again slowing down the build by, by quite a lot. And so then they actually, you know, they actually have to go and like fix the,
the build tools as well, not just the optimized the code.
So I thought that was a really interesting kind of thing where you, you, you start
optimizing your code, but also optimizing your tools for, for build times.
Yeah.
I actually, um, so it's a fun fact about this year.
Several years ago, libc++ was using the always-inline attribute on almost all of its functions.
And we got rid of that now.
The reason was basically to control some ODR aspects of the library.
It was to avoid ODR violations.
It's kind of a complicated story.
We'll talk about that later, I'm sure.
I think we might. Yeah, so we got rid of that now. We saw improved...
Compounders are able to optimize better, better code size, much better debugging experience as
well. So yeah, it's something that we did on all of our functions. It was kind of
surprising to find out about that first. That's fun. So always in line in Clang, is that the same
as force in line in Microsoft? I guess it is. I guess it is. I actually don't know for sure.
Yeah. Well, cool. So the second news item I have is a GitHub issue
that has been opened against the C++ core guidelines,
which is a very, very important set of coding guidelines
that a lot of the community uses for writing C++.
And it's essentially like a pull request
against those guidelines.
To remove the recommendation that we should use.h
as the file extension for
C++ header files. And there is a recommendation that we should use something else instead,
such as.hpp. There's a bunch of motivation given..h clashes with C, so.h is supposed
to be for C headers and not for C++ headers. And C++ headers are typically not usable as
C headers. So there's, you know, according to the person behind this value in saying this, this is
a CSS header.
This is not a C header in a way that, you know, people and tools can immediately see.
And also use of dot H is popular, but it's not actually universal.
There are code bases that do use dot HPP or dot HXX or something else, which I can confirm.
Like I've worked on such code bases.
I'm sure so have you. And finally, it also says that actually it is in our community interest to
actually differentiate and distinguish C++ from C. Because especially, for example, with this whole
recent safety and security discussion that has been going on for the last couple of years,
security people really like throwing C and C++ to one bucket. And it's always like, they always talk about C, C slash C++. And, you know,
like, making a clear difference, just by how we choose our file extensions, you know, my may help
actually get the message across that those are actually different programming languages. That one is marked as a more minor consideration compared to the others, but yeah, it's interesting.
That suggestion generated actually quite a lot of discussion on Reddit.
I think in the past there was also a big thread about this in Stack Overflow and in other
places.
It is quite contentious.
I don't know what the resolution will be,
but I'm curious what you think about it.
I tend to agree with one of the Reddit comments
that it's a nice idea, but this ship has sailed.
And I think it's too late to bring it back in.
Even if we make the convention now,
why are we going to use.hpp or whatever it is we decide on?
Well, there's so many lines of code out there
right now using dot H that we can't change. That is just going to be an inconsistency.
And unlike other times when we bring up the inconsistency argument where you can say,
well, that's just going to hold us back from improving. That's the thing. I don't think
it's actually going to improve anything. So probably not a fight worth having. Otherwise, I tend to agree with the underlying sentiments.
Right. So Louis, in your standard library, I guess your headers don't have any extension,
do they?
Well, actually, the public ones don't have any extension, which by the way is really
weird. I don't understand.
Yeah, it's weird. I never understood that. Yeah, that's kind of weird.
What's the deal, Rich?
I thought the idea was that they're not necessarily actually implemented in header files at all. It's
more like you want to include this sort of pseudo module.
But in practice, they kind of are, right? So you go to the source code for vector and then you're
in a file that's just called vector.
It doesn't have to be implemented that way.
Yeah, yeah, yeah.
I mean, if that's the rationale, it seems like one of these places where the standard
tries to be overly general and give us a lot of, you know, allow for many different ways
to be conforming implementation when in reality we don't really need that much anyway.
But yeah, so we actually have private headers as well, right? To like help
implement the public ones. And we use data H. And frankly, I, I
just don't really have an opinion. I'm very neutral on
this. I mean, I think I agree, I think with the underlying
statement that like, yes, it's nice to have something that, you
know, differentiates C++ from C. But at the same time, it just seems like such a very
small thing. I don't know. I don't think it's worth spending a lot of time and energy
sort of discussing that when we have other things that are much more important. But that's
really just like my personal position. So like So I would go with whatever makes people happy essentially is my position.
So yeah, I don't care. It's a pretty good position. I think I can get behind that.
So what do you think then, Timo? Well, it's kind of similar. I do like,
you know, kind of in my way, my brain is white, I kind of like things being categorized and labeled
and cleanly differentiated from each other. I mean, you don't
see this, because you're just audio only. But if you do see,
like Phil and Louis see my room and like this, lots of little,
little drawers, then paints taking the labels and with the
gods to what's what's in them. So, so I would actually, you know, say
yeah, dot h for C and dot HPP for CPP makes a lot of sense. But much like Louis, I don't think this
is something we like should spend time on because kind of the kind of what we win by spending a lot
of time on it and what it costs to spend a lot of time on it that we don't spend on other things,
it's kind of like that ratio doesn't seem to be very favorable. So I wouldn't spend too much
time on discussing this probably. And that's exactly why people like to discuss it. Well,
I guess that's just how the internet works, isn't it? Exactly. All right. So I do have one last
news item, which is a paper by Bjarne Struestrup, which was in the latest
mailing. So in the latest round of papers that were published a couple of weeks ago,
with the paper number P3651, and it's called Note to the C++ Standards Committee Members.
And it is actually a publication of a message that Bjarne sent earlier on the committee
internal mailing lists.
And then that got leaked to the public.
And there were other articles that were kind of talking about that.
And then Bjarne said, okay, well, now that it's been leaked, I might as well just release
it officially.
So he did.
And it is an interesting read.
So it is a call to urgent action partly in response to
unprecedented serious attacks on C++. So this is all about the safety and
security debate that has been going on for I think, three years now and I guess
more intensity than before or something like that.
At least two.
At least two. There was one sentence that really caught my attention.
Bjarne says, I feel strongly about this. Please don't be fooled by my relatively calm language.
And this was kind of a big flag, like, oh, okay.
Okay.
He is, he is being serious here.
This is something that.
No, we should pay attention to.
Um, so Bjarne is disappointed that profiles did not make it under COS 26.
Of course that's not necessarily because it's a bad idea, but because we just
didn't have a kind of proper spec in time.
But he says, you know, at least the committee didn't do, didn't do nothing.
So C++ 26 will have a hardened standard library.
This is something he explicitly calls out.
So we have Louis here to talk about that in a minute.
That's going to be very interesting.
Great segue.
But Bjarne says that profiles are essential for the future of C++.
He says that they will not break your code.
They will not prevent us from standardizing other stuff later.
And we should move forward with that.
So that's kind of an oversimplified summary of that article.
So yeah, I thought that was significant because that's kind of what's on Bjarne's heart right now. And he says, you know, this is really important.
Please listen to what I'm saying.
So, yeah.
Any thoughts?
But it's definitely worth a read if you haven't seen it already.
And it sort of reminds me of the position you were in back in January with regard to
contracts, a lot of different misinformation or misunderstandings going around.
And like, so one chance to correct
it and put the right information out there. And this is Bjarne's moment in the same thing
with profiles, I think.
I think from my perspective, profiles were kind of pretty vague for a long time, you
know, and I think what's happening and what happened in Austria is partly that, you know, and I think what's happening and what's happened in Austria is partly that,
that, you know, we've just started being like really truly understanding what
profiles are going to, you know, make possible concretely. And so it's, there's also like for
any feature, especially a feature of that size, you need to give people the time and the chance
to truly like consider the different aspects, how it you need to give people the time and the chance to
truly consider the different aspects, how it interacts with other parts of the language
and so on.
And I feel like maybe this is a little bit of a...
What's happened is a little bit of a timing, the result of maybe...
I don't want to say bad timing, but maybe being a little bit late with the thing.
So yeah, I feel that we should not, like we
sometimes use the fact that like C++ is being threatened, you know, by various, you know,
in various ways as a motivation to, you know, for change. But I think it shouldn't, we should
not forget basically that, you know, we need to do things right. And sometimes that takes,
you know, time basically to do things right. So sometimes that takes time basically to do things right.
So I think that's more important at the end of the day
than doing a knee-jerk reaction for something
that we perceive as an imminent trap.
Yeah, so actually this whole topic of profiles
is as much part of the safety and security discussion as library
hardening is, which is the topic that we invited you here for, Louis. Of course, library hardening
is a little bit more advanced than profiles. In fact, it has been accepted for C++ 26,
the Austrian meeting. It's very exciting. So we want to talk to you about that. You have actually
been on the show before. I looked into the the archive that was all the way back in 2016. When you
were chatting to Robin Jason about boost HANA, which I think at the time, the
Metaprogramming library you wrote that was like very, very new. Everybody was
very excited about it was at the time a very, very new way of programming with
with constexpr. But a lot of stuff has happened since then, hasn't it?
So much.
I mean, yeah, it feels like a lifetime.
Yeah.
All right.
So, so library hardening.
So, so what is it?
It's, it's really simple, honestly.
Like, um, you know how the standard has a bunch of these preconditions.
Um, like the, the, in the standard for the library clauses, you have a bunch of
these preconditions that are like written explicitly, or
sometimes they're a little bit implicit, but they're very
like objects. Well, we're just basically turning those from
being undefined behavior, if you don't satisfy the
precondition, into being like a guaranteed, I'm going to say
guaranteed crash, but there's obviously like more to it. It's
actually like a contract violation. But basically, we're giving, you know, we're providing a guarantee that
something not bad is going to happen if you don't satisfy the pre-condition. So it's very
basic.
Could you give us an example of one of these preconditions, just so we've got something
concrete to think about?
Yeah, for sure. So standard vector operator bracket, for example,
has a pre-condition that the index you pass is less than the size of the vector, right? And that's to
make sure that you don't index your vector out of bounds. And so under a hardened implementation of
the standard, you would get a contract violation if you fail to provide, like if you provide an index,
that's too large. Whereas today in a normal non-pargent implementation, you get undefined
behavior, which means that in most cases, what's actually going to happen is that you're just going
to be reading or writing outside the bounds of your vector, which is bad, obviously.
outside the bounds of your vector, which is bad, obviously.
Right. So how does that work concretely?
So let's say I have a code base that uses the vector, which I think most of us do.
And then I have a call to operate the square bracket and I mess it up.
And then I have a weird bug today.
And then I upgrade a brand compiler.
Is it like, I guess, I guess, actually, let me rewind.
Is this some, so now we've accepted this in the C++ 26,
which of course isn't really even standardized yet
and much less available as a compiler flag,
but it is something I can already use today
if I'm on C++ 17 or C++ 20, or isn't that right?
Yes, I mean, it depends on the,
like if you're using libc++,
well, actually all implementations provide something
that is kind of similar, you know, it has, you know,
differences, but yeah, so the standard library
of hardening actually comes from standardizing something
that we did, you know, we started several years ago,
but we did in libc++ as a vendor
extension, basically, so libc++ party.
And you can already enable the same checks, essentially the same sort of out of bounds
check in libc++ today, whatever your standard mode is, and you just do that by using a macro
that we provide.
And it's kind of, you know of less nice than the standard thing, but it's effectively the same because you get the same chance.
So yeah, it is available today already.
And now we have standardized it, which means that when you flip the switch to C++ 26, then it will just be on by default?
Or what does that mean? No, right. So basically, the thing that we're standardizing is that we... So you know how the
standard has two notions, basically, a notion of two implementations today. It has a normal
hosted implementation and it has a notion of a freestanding implementation. And when you implement
for bare metal, So you basically like the
ISO standard is really like describing two standards in a certain way, right? It's like a template for
you know, producing standards and then you have like two standards. And we're adding basically
like another access to this. So just like an implementation can decide to be hosted or freestanding, you can also
decide to be hardened or, well, not hardened.
And so the idea is that completely, let's say you have Clang.
Clang provides today a flag called dash freestanding.
And when you pass dash freestanding, it says under that flag, I am
a, um, a conforming ISO C++ freestanding implementation. I, I, we're actually not really conforming,
but that's the, you know, in spirit.
And then you don't have like dynamic memory. You don't have threads and stuff like that,
right?
Well, it disables a bunch of stuff, right? And, uh, like, you know, there's no special
mangling for main. There's like all kinds of things that are different.
Oh, okay, okay, okay, yeah, interesting.
And the idea basically is that your compiler,
when you pass maybe dash F hardened,
would now give you a conforming
hardend implementation of the standard.
And what that means is, as I said before,
is that when you are a conforming
hardend implementation of the ISO standard, then you need to like not have to be And what that means is, as I said before, is that when you are a conforming hardened implementation
of the ISO standard,
then you need to not have to be
when you index a vector out of bounds.
So that's kind of the way this goes.
So people who are interested in getting this
would enable the flag that turns their implementation
into a hardened implementation
if they provide such a thing,
which we expect implementations are actually going to provide.
And then they get all the guarantees that comes with that.
All right.
So basically you took the flag that you already have essentially, and you said,
okay, now that's a flavor of the standard that's basically blessed by
the C++ standard itself.
Yeah, exactly.
Yes.
That's the way, yeah, it's basically a way to introduce a
language mode without acknowledging that it's really a language mode because that is not a very
popular idea.
All right. So that's, that's thank you. Thanks. That's that makes a lot of sense. I'm gonna ask a
question that I already know the answer to because I was involved in some of the stuff, but I'll
answer it. Ask it anyway. So what actually happens now if I put
DASHFHardened on and then I access my vector out of bounds? Okay, so the way that we specified it
in the proposal is that it is a contract violation to fail to respect some of these, one of these preconditions.
And so you get the whole sort of contracts machinery, you get to like override your handler,
you get the contract semantic selection, and so on.
And so that's actually extremely useful because, and that's something that we noticed in Tokyo, right?
Timor and we talked about it
like a whole lot actually over the past year.
There was a lot of back and forth to get the wedding right,
to make sure it gels with the main contracts proposal.
And yeah.
Right, and even changes to contracts themselves.
Cause in Tokyo, I remember,
when I first sort of, I don't know,
I woke up from, I don't know what,
and I realized like, hey, contracts, like, is like super important. And it's like,
it's like extremely relevant to what we're doing. And I was like, Hey, but it actually
does not support, like the main use case that I have. And then we know we talked about it.
And we like, you know, improved contracts a little bit here and there. And like, turns
out it's like a perfect fit. It's like a match made in heaven for something like hardening because it gives
you the exact set of like requirements that you need to have in order to really
deploy hardening on a large scale or even on a small scale, at a small scale.
So, for example, the ability to have the observed semantics of contracts, which is to say that when a prependition is not respected,
the handler is going to get called so that you can say,
for example, the vector was accessed out of bounds,
but it does not terminate the program.
The program just keeps going,
and whatever happens, happens.
That's not a very clean thing to do, but it's actually extremely important in practice
because if you have a huge code base, you're going to have benign instances of UB, like instances of
undefined behavior that are actually not causing any harm. But if you decide to actually crash
and abort a program when you encounter that UBbie, now you got a real big stability problem.
So that's an actual issue for being able to deploy this
in production.
So what you might want to do is do something like enable the
observed semantic for one or two releases while you're
driving down the amount of bugs.
And as you're fixing those,
and at some point you become comfortable that you're not
going to bring down your whole server or have your desktop app actually crash all the time. So you fix
those bugs. Then you actually enable hardening with a proper trapping semantic, which is
also all provided by contracts. And boom, I mean, now you're in a good place.
So this set of features provided by contracts,
especially the contract violation handling part of it,
is just like a perfect match for exactly what we're
trying to do in the standard library.
So I'm really, really happy that both got in together.
Yeah, I'm really happy as well.
And a lot of the motivation for, for example,
having observed that you say that you
have for the standard library is the same motivation that you have for the motivation for, for example, having observed that you say that, you know, have for the standard library is, there's the same motivation that you have for like
the base feature, right?
Where somebody adds their own pre or post and they want exactly that.
They want to be able to deploy that without crashing their code if they got it wrong.
So, it's just really interesting to see how like from both sides, it all kind of comes
together and makes sense.
That makes me very happy.
Yeah.
Yeah.
Well, on the one hand, it does seem that the timing is right to do it now in a number of ways,
in the contracts being there, the focus on safety and security.
On the other hand, this is something we could have done many years ago, maybe right at the start.
But why didn't we?
And for a lot of people, that's because there's got to be a performance cost to doing these checks. So is that really something that we can afford? I think it is. Well, so actually,
we, you know, lib-synol-hardening has been around for a while now, and we went through like many
sort of iterations of it. And we can talk about that, you know, later maybe. But part of that,
you know, as part of that, we have examples of like basically standard library
hardening being deployed in production. I mean, we've deployed it inside the kernel, in WebKit,
we've deployed it in a bunch of internal code bases, and Google has also deployed it in their
whole code base, which if I understand correctly,
that includes the service like, you know, Chrome, a bunch of mainstream apps. And they actually
wrote a blog post with their experience and they showed that like they got it down to a point
3% performance slowdown. So, so is it something that you can afford? Well, I guess that depends.
But I would say, you know, given like they found over like a thousand bugs and they estimate that it's going to reduce their number of bugs by like one or two thousands yearly.
And they reduce their like baseline set for number by 30%. I mean, it's just like, you know, those numbers are amazing. And if the only thing that you have to pay is like point 3% performance.
I don't know. But to me, it's like it's worth it. So it's not a zero cost because obviously you're inserting more instructions, but there's a lot of aspects of the hardening proposal.
And of course, you know, since it comes from the libc++ part name, the design of that specific
feature in libc++, there's a lot of aspects of that that are meant to enable that, to make that
possible. So for example, the way that we implement traps,
which is actually taken from the Swift Stainter library,
is that we enable like a single,
sorry, we insert a single instruction to trap.
So it's very, very low cost.
It does not increase your code size a lot.
It's something that the compiler can reason about,
maybe can hoist it out of loops.
And it's just like, it's a very, very low cost approach
to doing this. So yes, I do think it is something that people can,
most people should be able to afford that sort of performance hit because it's very, very small.
And then the other thing is we can also, well, the LeapSteple something for our nation of
hardening, at least allows you to select whether for a nation of hardening, at least, allows you to select
whether you want to enable hardening on a per translation unit basis.
So if you actually have like a very hot loop that may not be security critical, you know,
then you might not want to actually enable hardening in that translation unit, but then
you harden the rest of your application.
So you can actually cherry pick the things that you want to harden.
And yeah, this fine grained control is like really important if you want to actually do it in the real world.
So I'm happy that you're mentioning a per-translation unit, because that gets us
into a whole interesting set of questions that we had to answer for the base contracts proposal
as well. And I'm curious how you answered it for the standard library. Right. So for things like, you know, what happens if you have
the same function and one translation unit with another one without hardening?
What if that function is inlined? What if like, does that affect ABI? Does it
affect like, how do you, how does the linkage choose between the hardened and
the non hardened one if it's if it's inlined? How does the linkage choose between the hardened and the non-hardened one if it's inline? How does the user know which one gets picked?
There's a whole set of questions here.
How does this actually work?
It's actually very difficult.
It's not a fully solved problem, just to be very clear.
I think it's useful maybe for listeners.
I think it's useful to step back and to basically the idea here is that
you when you have a translation unit, right, you select a set of, I'm going to say, configuration
options that affects how like what code is generated. So these configuration options
that includes like dash F, like do you have exceptions enabled? Do you have hardening enabled?
Even the standard mode actually affects that
because we have if-def based on the standard mode
inside the standard library,
and you may have that in your code as well.
So there's a lot of different, you know,
compiler flags that affect the actual binary output
or, you know, code gen that you get
inside a translation unit.
Now, the idea is that with templates,
we have the one definition rule where basically,
if you have a function with the same name,
that gets instantiated in different translation units,
the linker needs to be able to keep just one of them,
when it links everything together to produce an
actual executable. So the idea is that one definition rule says that if they have the
same name, if all the functions have the same name, then they need to have the same code
gen. They need to, it needs to be the same code because otherwise the linker is basically
picking one at random. And so if you have functions with the same name, but that have
different code gen, now
basically you don't know which function you're going to be ending up with in your final executable.
And that is the root of this whole question.
And hardening contracts, evaluation semantics, dash F no exceptions, even though that is
not acknowledged by the standards, but that still exists. Those are all things that affect the cogen. And we need a way to make sure that you
get the function, that you actually use the function that has a cogen that you wanted.
Because an example of something that can happen otherwise. So let's say you call a function that may grow an
exception, right? And you're compiling your code with exceptions enabled. So you call the function,
you have a try-catch, you're doing everything correctly to handle your errors. But then you
actually link against a distant translation unit over there that has been compiled with exceptions disabled.
And under exceptions, the label, the function instead like aborts the program. Right. And,
and, but it's cool because you're not actually calling, like you're not linking against, I mean,
you enabled exceptions in your translation unit. Right. And then when you actually link everything
together, the problem is if the linker decides to pick the version of the function that was compiled with exceptions that is labeled, that thing actually
aborts. It does not throw an exception. So now it's got a try-catch against the function, right,
that actually aborts because that's the version that the linker picks.
You don't see it in your code, right? You're like, why the hell is this aborting? I've seen bugs
like this. They're very, very, very hard to even figure out
what the hell is going on.
Like as soon as you have an ODR bug,
like you're gonna spend days on this.
It's terrible.
And you're like, you're just staring at this.
You're just debugging this and staring at the code
and being like, why is this aborting?
I am catching the exception.
And it's just because, well,
you're just not calling the function that you think you are, right?
You're just calling the, it's the other version of the if-def.
So with all of that, you know, understood now, basically,
how do we solve that problem?
So the problem comes at the end from the fact that we have a
function where two, well, I'm going to say two functions really with the same name,
but they have different code gen, right?
And so we need to encode in the name of the function,
the configuration, like the properties
that it has in the code gen.
So what we do in libc++,
and that's not actually like a full solution,
but it helps a lot, right?
So in libc++, we actually compute
what we call an ODR signature
based on various properties of the current build.
So we actually sniff some of the compiler flags,
and then we say, all right, hardening mode is fast,
exceptions are enabled.
Here is the numerical version of the library,
and there's other stuff like the standard mode we could add
and then a bunch of stuff like that.
But for now, we just like sniff these three properties. We then at the pre-processor level, we form a string,
right? That looks like an E and then a number, whatever. It's just like some sort of hash,
right? So you basically mangle the compiler flags somehow.
Exactly. So we take a hash of the compiler flags. It's a very, very, very primitive version of that.
I did not know that. That's interesting. That's pretty cool.
Yeah, we actually do that. If you grab for ODR signature in libc++, there's a very long comment
that explains how this works, why it's done, etc. And it's pretty interesting. It came from,
we converged towards this solution after years of banging our heads against these sort of issues.
And basically, what we do with that spring now is that we, um, and
that's the nasty part, I guess, but we basically include that, that, that, that,
that special hash into the mangling of every function that we produce ever.
Right.
Uh, well, every ABI, like every, everything that is not part of the, the
dilibs API essentially, right. So everything that we don't want to keep ABI, like everything that is not part of the Dynalibs API essentially, right?
So everything that we don't want to keep ABI stable.
And that's this libcpp hide from ABI macro.
That's the thing, that macro applies this ABI tag,
this special mangling property to the functions.
So that's how we fix that.
And the result is basically that now, if you are
calling vector operator brackets from a hardened translation unit, you're really not calling
vector operator brackets. You're calling vector operator bracket underscore hardened fast.
And if someone else is calling vector operator brackets from a non-hardened translation unit, they're calling vector operator brackets underscore not hardened. And so when you throw the
linker at these translation units, there's no confusion as to which function were you calling.
Right? So there's no chance that you're actually going to think that you're calling a hardened
implementation when in reality, whoops, you were calling a non- like an unsafe one.
This is all going to work as an intended decimal.
Right. So that's fascinating. Two questions immediately after this explanation. First,
does that have an impact on binary size? And how bad is that? And the second question is what
happens when the function is inlined and you never have a symbol anywhere?
when the function is inlined and you never have a symbol anywhere.
Right. So it does have an impact on binary size, but basically the impact it has is that now you have more copy of function, but you have one copy per copy that is required. In other words, you don't have duplicate code.
You just have the correct code.
You have as much code as you need
to actually do the thing you want to do.
Correctly.
Whereas previously some of that was missing.
Yeah, okay, okay.
Yeah, so previously it's as if you had
some really nice outlining optimizations in your compiler,
but these outlining optimizations were incorrect.
And so that's what you get by default.
So we're just restoring that.
So yeah, if you do that sort of trick, you will notice maybe a little bit of a cosize change,
but at the same time, this is a gust for correctness.
Now, the other thing is it's actually not a full solution because the moment you pass through
a function that does not have an ADI tag like
that.
Now, so for example, if you get inlined into like another function, like a user function
that does not have an ADI tag, then you know, well, I guess if you get inlined, then it's
okay because the code that gets inlined is the one that you selected with your compiler flight,
so that's fine. But if the function that you were inlined into is a user function,
which also gets ODR duplicated by the laker, and it didn't have an ODR signature, then it's the same
problem all over again, and you don't get that guarantee. So that's why Lipsy Pulsus itself is
protected from these issues. But
because, you know, the only way libc++ is useful is by being
used from user code. And because user code is not protected
from this issue. You know, there's only like, it's not a
complete solution, basically.
Interesting.
Yeah, it doesn't happen very often. But technically, you
know, it's it is still a problem. But we don't see it very often, but technically, you know, it's, it is still a problem.
Uh, but we don't see it very often, at least not since we've introduced these ODR signatures.
You mentioned that you, um, you sniffed the compiler flags, which makes me think that
you're, you're looking for ODR odors.
Oh yeah.
Classic film.
I like that one again. Right.
So, I mean, obviously we just standardized this, but it's been around for quite a lot longer, right?
In the form of vendor-specific compiler flags.
Can you talk a little bit about the history and also how much deployment experience you already had with this stuff?
Is this out in the wild? Are people already using this?
Can you tell us a little bit about what you found out? Yeah, for sure. experience you already had with this stuff? Is this out in the wild? Are people already using this?
Can you tell us a little bit about what you found out? Yeah, for sure. So actually, there's been interest for... It started a long time ago. It was totally not libc++-hardening, but we've been
interested in hardening C++ code as a general thing for a very long time. And often, we wanted to do that in like, contacts that were not
posted. So contacts, you know, maybe like, you know, firmware or the kernel or places where
libc++ was not typically available. Because actually, libc++ was not really usable outside of
a posted environment a couple of years ago. So, so we tried a bunch of things and eventually they increased support for like freestanding in libc++,
opened new doors and we realized that hey, these environments where security is super important,
that didn't have libc++ before, now we can actually have libc++ in it.
And libc++ provides awesome tools like span, std array,
std vector, and things like that that are more strongly typed than your typical sort of just
C raw pointers all over the place kind of thing. And so then we realized, well, maybe actually
using LipsTeeplesSUS in these places and making sure that libc++ provides increased down safety
would be a good way to make progress. And so that's what led essentially to this safe mode
in libc++, which we released in LLVM 15 in 2022. And there was some adoption in WebKit, Chrome was also an early adopter of that.
And then that was a basic sort of on-off mode where you could get essentially a bunch of
assertions that we had in the library. They were not really audited, they were not really
surveyed or anything, but you got a bunch of assertions that were not too
expensive, you know, it wasn't very formalized at that point.
But that's you know, what that was like the first incarnation
of it. And you did get a function that allows you to
sort of customize how like what would happen when an assertion
went wrong, right. And then we improved it in 2024 in LDM 18 into the
full hardening mode that we know today. The main difference here basically is that
the hardening mode that we have today provides four different modes. It has no card name, obviously.
Then it has fast card name, extensive and debug.
And the idea is that like not all assertions
are equally important, right?
For example, a vector operative bracket
is pretty important because it's an out of bounds issue.
But a null pointer dereference
is usually not that big of a deal
because it reliably leads to a crash.
I say reliably, it depends on a bunch of stuff. And I know that dereferencing a mode pointer can
lead to a heisenberg where the compiler is going to time travel because it sees that you're doing
something bad down there. And so it can actually cause issues. But in practice, usually if you dereference a mode pointer, you will get a set fault. And so those are not equally dangerous. And so for that reason,
we actually classify assertions and we give people high level modes so that they can decide
what's right for them and enable basically that mode in their code.
So that's one of the main differences. And then there's a bunch of other design principles
that we can get into,
like the fact that we can pick a default mode,
like the vendors and pick the default mode
that the library comes from.
So if you're on a platform where safety is like
super important and performance is maybe also
important but less than safety you can ship like a lib-sequel stuff that is
hardened by default. Then we already talked about the the ODR stuff
basically that was also like a requirement for us to to make it like
ODR safe and to allow enabling different modes for TU, because that allows
incremental adoption by people. They can enable hardening in the part of their codebase, but
not all of it. And things should not blow up from an ODR perspective, like we talked
about earlier. And yeah, so there's a few design principles like that, but basically
that's where it came to be
and where it is now.
So now you've got hardening into C++ 26.
Is that work all done?
Is it all tied up, or is there more to do?
Right, so there is still some more to do.
I think there is, so first of all,
the first hardening paper we have does
not cover the full standard library. So we did like a survey of the most important ones.
And one of our guiding principles there was that we only wanted to standardize assertions
that we already had deployment experience with in libc++. And that was just like to
short circuit any, we were kind of late in the cycle
too. So we wanted to short circuit any sort of, you know, discussion that would potentially
question, you know, has this actually been implemented? Is this actually, you know,
worth it and blah, blah, blah. So we just, we basically just, we just decided to standardize things that we had already deployed.
So there's still more to do.
We want to do a good survey of light into ranges, for example,
to add some missing preconditions.
I think there is some work to do in MD span as well.
So yeah, there's still some work to do from the hiring perspective.
But we have a good start.
This is a fair coverage at this point.
Right, so nearly there, but a bit more to do.
But I mean, it's hardening all the routes to safety
in C++ or other areas you're interested in.
Yeah, actually there's another proposal
that we've been working on.
Uh, it's a paper that's in the pipeline.
It's in core now.
Um, it's adding a typed operator new and it's already typed operator new.
Yeah, it's a type operator new.
So it's basically a way to define operator new as a template.
And the first parameter of that new, of OperaNu is a type identity.
So you actually get the type that is being allocated as a parameter inside OperaNu.
And the benefit here, this actually originates from a project that aimed at improving the security of existing code as well.
And the goal here was to mitigate type confusion attacks.
So type confusion is basically when an attacker,
let's say you have like a dangling pointer to something and you're interpreting,
like you have a pointer to a foo, but in reality, it's a dangling pointer.
And what's really in that memory location is not a scoop,
it's a bar.
And that bar is like a super interesting control data
structure from your kernel, for example, right?
And let's say you have like a legitimate API
to actually set the foo that is dangling.
Now you can, and you know, if foo is a string, for example,
and you have like a legitimate API that you can use to set that string to whatever byte values you can, and you know, if, if who is a string, for example, and you have like a legitimate API
that you can use to set that string to whatever byte values you want. Now you, you know, you can
control exactly the contents of the bar with your legitimate API, right? And you can get the program
to often do like basically whatever you want. So that's the general idea. So now this originates from two things. First, yes,
there is a dangling pointer somewhere. So that's bad.
But also there is basically an aliasing issue where you
think it's, you think it's a foo, but really it's a bar, right?
And so the idea is that if you had a magic system
allocator that makes sure that you never allocate memory for a foo
in the same place where you have allocated memory for a bar, so if you make sure that a foo and a bar,
dynamically allocated foo and bar, can never ever ever alias, you will still have your dangling pointer bug,
but you're not going to be able to exploit it in the same way. That's the idea. So that's a mitigation technique. It does not fix the bug, but it makes it so much
harder for a bad person to exploit. So if we like unroll this, basically, the idea here is that
in order to implement that sort of, we call it type isolation, right? The system allocator needs a knowledge of what type is being allocated.
But that is not knowledge that we have available in Oprah new today, because the signature
of Oprah new is like, I get a size T, me a bunch of other stuff, but I don't know what
type I'm allocating. So, so this is when we actually implemented like a Clang extension that passes a 64-bit integer alongside OPRDNU.
We have a funky OPRDNU, like a new variant of OPRDNU in libc++ that describes the type
being allocated.
And then we pass that down to our system allocator, which does some amount of type isolation and
really, really helps mitigate
these sort of issues.
And so what we asked ourselves is, is there a way that we can make this technique more
easy to do without having to do like a complicated vendor extension?
And so that's why basically we proposed adding a type of Oprah new where Oprah new now knows
the type being allocated. So if it wants to,
it can actually pass that down to the system allocator or to any custom allocator you might
implement for yourself. So that's kind of the idea. And we think that there is also a lot of other
benefits to like, there's other non-safety related things that you can do with this feature. So we're
pretty excited about that. Nice. I look forward to all the discussion about new new versus old new.
All right. So we talked a lot about library hardening and contracts and,
and the new operator new and other safety and security related things, but we are actually
coming up on time here. So I think we should slowly wrap up. But before we do that, we have our traditional last question, which is, apart from all of the stuff that we talked about, is there anything else in the world of CSL right now that you find particularly interesting or exciting?
I just want to say it again. But really, I'm really happy to see that reflection making progress.
I think it's just going to open the door to just so many awesome things, right?
Like that Eurasia, also interrupt with other languages.
It's going to be like a, you know, I think it's a great feature, a very powerful feature
especially.
And as an implementer, I'm very scared though, of what users are going to be able to do.
We've been discussing with Clang people about ways
to protect our implementation against abuse.
Otherwise, we think that middle compiler upgrades
are going to be very, very difficult,
because people are just going to do weird stuff
with our implementation details.
And yeah, basically, it's going to be very powerful. But
also, you know, if you can use implementation details too much, then it gets hard to do any
kind of upgrade. And so I think we'll need to figure out ways to protect ourselves against that,
maybe some special attributes, special compiler diagnostics that prevent people from shooting
themselves in the foot. But apart from that, you know, that's, you that, I think that's fine. I think there's a technical
solution to that. Another thing that I'm pretty excited with is Hannah's work on expanding
constexpr, really. I'm really glad that she's working on taking constexpr to its full potential.
I think she's been doing amazing work. So I'm actually really looking forward to that. It does
create quite a bit of work for me, but I think it's worth it.
I think it's important.
Yeah.
I'm glad she's doing that.
So yeah, I think those are the two things on my radar.
So what's the latest there?
We can now throw exceptions at compile time?
Yeah.
She's got like an implementation of that.
I mean, it's super, super interesting.
I think coroutines is the one remaining thing, isn't it? I'm actually not sure what is the state of the coroutines one.
So that was, I think, controversial.
Some people were saying it's not really implementable.
There was a lot of discussion at the last meeting that I have completely missed because
I was busy with getting contracts into 26.
So I have actually no idea where that is right now.
Probably suspended.
Okay.
That was a good one.
That was a good one. contracts into 26. So I have actually no idea where that is right now. Probably suspended.
Okay. That was a good one.
That was a good one.
All right. Well, we started with, uh, with, with boost Hannah.
We finished with Hannah Dysikova.
So that was a, that was a nice complete circle, but, um, I think we,
we need to wrap up there.
So thank you for coming on the show.
Louie, is there anything else you want to tell us before we go? I think we need to wrap up there. So thank you for coming on the show, Louis.
Is there anything else you want to tell us before we go?
Like where we can find you if we want to continue the conversation?
Sure.
I mean, you can reach out to me by email mostly.
Otherwise, I'm on DL, we have Discord.
Or you can reach out to me by just creating an issue against some people.
So I guess that's pretty much all I do these days.
So yeah, yeah.
And thanks for having me.
It was a real pleasure.
Thanks for coming on.
Thanks so much for listening in.
As we chat about C++,
we'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff
that you're interested in,
or if you have a suggestion for a topic,
we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow cppcast at cppcast on X
or at mastadon at cppcast.com on mastadon and leave us a review on iTunes.
You can find all of that info and the show notes on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.