CppCast - MSVC's STL and vcpkg
Episode Date: May 7, 2020Rob and Jason are joined by Billy O'Neal from Microsoft. They first discuss some news from various conferences and user groups that are going online. Then they talk to Billy O'Neal from Microsoft's Vi...sual C++ team. He tells them how he joined the team and some of the projects he's worked on, including some recent work on vcpkg. News Modules the beginner's guide - Daniela Engert - Meeting C++ 2019 C++ London goes online We're welcoming you to CoreCpp C++ On Sea On Line Useful tools for checking and fixing C/C++ code Links Envoy Proxy Microsoft STL Changelog Vcpkg 2020.04 Update and Product Roadmap binskim Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Read the article "Checking the GCC 10 Compiler with PVS-Studio" covering 10 heroically found errors despite the great number of macros in the GCC code. Use code JetBrainsForCppCast during checkout at JetBrains.com for a 25% discount
Transcript
Discussion (0)
Episode 246 of CppCast with guest Billy O'Neill, PyCharm, and ReSharper. To help you become a C++ guru, they've got C-Line, an intelligent IDE, and ReSharper C++, a smart extension for Visual Studio.
Exclusively for CppCast, JetBrains is offering a 25% discount on yearly individual licenses
on both of these C++ tools, which applies to new purchases and renewals alike.
Use the coupon code JETBRAINS for CppCast during checkout at JetBrains.com to take advantage of this deal.
In this episode, we discuss conferences and user groups going online.
Then we talk to Billy O'Neill from Microsoft.
Billy talks to us about his work on Microsoft Standard Library and VC package updates. Welcome to episode 246 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Berfing, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I am all right. Rob, how are you doing?
Doing just fine. You know, getting by.
You have any news or announcements you want to share?
Well, I don't know. There's conference changes happening all over the place,
but I don't have anything to share specifically at the moment, maybe next week regarding that stuff.
But it does look like I am going to be doing a live stream with Matt on Friday,
probably, although that has not been officially announced yet.
And it'll probably be about the same time this is going live.
Okay.
Matt Godbolt, that is.
If you're tuning in on the day this podcast comes out,
maybe check Matt and your Twitter feeds and see if you're doing a live stream.
Yeah, we might be doing a YouTube thing there.
Okay, cool.
We teased you Twitter about it last week.
What would this live stream be about?
What would the topic be?
Well, we haven't decided yet, so I can't tell you
okay
well at the top of our episode I'd like to
read a piece of feedback
we got this tweet from the
Rust and C++ Cardiff user
group saying we had a chance to listen
to the story behind Daniela
Engert and C++ modules
in the CppCast episode not so long ago
need to catch up with the video.
Thanks, Meeting C++, for the reminder.
So, yeah, her talk, which we kind of mentioned
and then talked a little bit about when we had Daniela Engert on,
is now live.
It is live. Okay, great.
It is live.
So if you want to watch modules, the beginner's guide,
that is now on YouTube.
You can go watch it.
Sounds like something we all need to watch.
Yeah, we definitely need to learn more about modules.
They're coming.
Okay, well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter, or email us at feedback at cbcast.com.
And don't forget to leave us a review on iTunes or subscribe on YouTube.
Joining us today is Billy O'Neill.
Billy is a developer and standard library maintainer at Microsoft.
He is also often on loan to other teams, such as the Visual C++ Infrastructure team,
for example, working on the distributed compiler test harness,
and most recently, the VC Package team.
Before joining the C++ team, Billy worked on security compliance tooling
in the former Trustworthy Computing team.
Billy, welcome to the show.
Hi.
How's it going?
It's all right here. Yeah. How does that
work? How does how do you get loaned out to other teams? Do they like have a library card that they
have to stamp? What? So basically, I've just, you know, made it known to my bosses that I'm not
super picky about what exactly I'm working on. If you talk to Stefan
and ask him to do
things that are not standard library
related, Stefan is going to be very
angry at you.
Relatively speaking.
Whereas I've
I'm never
going to be the template metaprogramming
wizard that
Stefan and Casey are.
But what I am willing to do is whatever the heck needs to be done right now.
So the VC package team needs somebody.
Wow, that was loud.
The VC package team needs somebody.
Okay, go send Bill to do that.
Kind of happens.
It's not unilateral.
Somebody on the VC package team can come over and say,
hey, Bill, I need help with this.
It's more like they go to my boss and say,
can we have Bill for a couple months?
And then at some point your boss says,
okay, we're ready for him to come back.
Pretty much.
Interesting.
You get a lot of interesting exposure to a lot of
different types of you know projects and different types yes even more thing even more than just
normal standard library shenanigans do yes i can see that i mean some of the most interesting
experience i got at work was when i when a new project would come up and i would volunteer for
it and nine times out of ten it was stuff I didn't know anything about,
and somehow I convinced them to trust me to try.
Yes.
Okay, well, Billy, we've got a couple news articles to discuss.
Feel free to comment on any of these,
and then we'll start talking about the work you've been doing at Microsoft, okay?
All right.
All right, so first one, we kind of have three of these.
Yeah.
Conference-related
announcements where, because of what's going on in the world right now,
the conferences and meeting user groups are moving
online. So the first one we have is C++
London. They're going to be doing a live YouTube stream
for their future C++ London user groups.
They're going to have a Slido site set up,
which I'm not familiar with,
where you can ask questions and do polls.
And they're also going to try to do
like a post-presentation video conference
where people can just hang out.
They even said drinks are optional,
so I thought that was pretty neat.
Okay, yeah.
And the other one is Core C++,
which we've talked about several times as the conference
but it's also just the jerusalem area user right jason well between jerusalem and tel aviv yeah
they move back and forth yeah right so they're also going to be uh you know doing online version
of their meetup and i think they're doing theirs on zoom and then what was the last one jason uh
is c++ on c right is the actual conference is moving online so this is interesting because uh
phil says on here that there will be a greatly reduced ticket price but does not yet say
what that means exactly um but he's gonna put up to put up, Oh no, here it is. We're reducing the
ticket price to 150 pounds for the full conference. So two full days of conference. Um, I think
that's right. That's interesting. I wonder how well something like that is going to do with,
you know, others offering content for free. I am actually very interested in this as well, both as a business model and
how successful it will be or not. He does specifically state that he's using the
experience that they've already gained from the C++ Linden meetup for how to handle a project
like this. So if you can imagine multiple tracks going at once with the ability to do polls and
feedback from your audience and everything, it seems the most promising of what I've seen so far.
Yeah. Well, hopefully it does well.
Yeah. And speaking of online conferences, Billy, Microsoft actually just put one on last week,
right? Were you involved in that at all? I wasn't directly involved.
I got asked to do some reviews of things that were put up,
and I got asked to provide some demos and things like this.
But that was mostly Mahmood, my boss, doing that on his own.
And that was the pure virtual SQL source.
I think it watched the first hours of that, maybe, and it was pretty good.
I checked in once or twice.
I didn't stay on for it.
Well, they did turn each of the talks
into separate YouTube videos.
So if anybody is interested in modules
or DC package or standard library stuff
and didn't want to stay on an eight hour live stream right and i like the way they
handled it with uh you know sai getting everyone to pre-record the content and then show it live
with a q a session afterwards i thought that was a very good way of doing it yeah i think he drove
most of that whole conference effort. He was the...
I don't know.
He has this thing where he's like,
submit three talks to CppCon just in case.
All three get accepted.
All three get accepted.
That's been his MO, I guess.
Okay.
Go ahead.
Before I move on,
I'd just also mention one reason
I wanted to discuss that these meetups are online.
I called out specifically the C++ London and the core C++ one.
It's just the fact that everyone is doing this.
So if you keep your eyes open, look at your local meetup, they're probably doing something online.
I know my meetup has moved online partially just because I don't think we could stomach the idea of missing one because
we've gotten like a 36 month record now something like that so our meetup is going to be tomorrow
night it'll be too late for for this podcast but you can find the c++ denver meetup on meetup.com
and join as well whoever's listening wants to it'll be interesting to see how much you know
online membership
differs from your average normal meetup
because anyone can attend if they want to.
Yes.
I have no idea how this is going to play out really ultimately.
And then the last thing we have is this blog post,
useful tools for checking and fixing C and C++ code.
Another language is two.
And it's really just a list of different static analyzers and other tools that I thought was a pretty good resource.
CppCheck, Coverity.
We've talked about most of these before.
PVS Studio.
I don't think I've seen SonarCloud before.
Are you familiar with that one, Jason?
I've had students mention it in classes,
but I've not used it myself.
I think that's like a big,
like do analytics on what tools report
on your code base overall.
At least that's what it was.
I don't know, a few years back
when I was in the security team,
they were talking about this being the new thing
that would replace a whole bunch of internal stuff that we had built.
Like an aggregate kind of thing?
Yeah, like, show me all of the
potentially uninitialized variable used hits in Windows.
Right.
Kind of tool.
I would hate to see that report, I'm sorry.
I mean, because it's just such a large code base with such a history,
I'm guessing staying on top of that kind of thing is quite difficult, actually.
Yeah, I was, for example, the C++17 guaranteed order of evaluation rules
actually broke stuff in NT because they were dependent on the particular unspecified
order of evaluation that the compiler
optimizer had chosen.
Yes.
So, you know, if you change anything
Windows will notice.
That's kind of awesome, actually.
So it gives the compiler team
a lot to play with when they need to, huh?
Or standard library.
Yeah, that's a blessing and a curse.
Because if you want to see, like,
will the compiler change you just made build Windows,
well, even on, like, a fleet of machines,
that's a full-day thing.
Right.
So, like, you kick that off,
and you come back the next day and it might have worked.
Can you request, I don't know, some sub-part of Windows to at least do a quick taste test to see if your change is going to break something obvious?
I don't know.
I actually haven't used the Windows.
I haven't made changes to the compiler itself very much.
And Windows doesn't ingest libraries every change that we do.
So I haven't interacted with that system very much,
other than to hear compiler folks complain.
But for the most part, we don't need that, because we have a million plus of our own tests
that we do every commit, basically.
We have, I don't know, it only takes
20 minutes to run, but that's spread over
100 machines.
The compiler has been building test collaterals since the mid-70s.
Wow.
You find tests with Dave Cutler's
initials on them. Dave Cutler is the guy who designed
NT, designed the kernel.
There's a book during that process of the development of NT
that's really well written and at one time was hugely popular.
I can't remember the name of it now.
Look that up?
I'm going to attempt to look it up,
but we don't have to stop the conversation.
Maybe I'll have to read something like that then.
It's like, I know who the guy is just because you know there's
there's only one uh senior technical fellow at the company
is he still there then yes yeah he he works in azure stuff now i think that's cool uh the book
was showstopper the breakneck race to create Create Windows NT and the Next Generation at Microsoft, 1994.
If you want a copy of it today, it's $85 because it's out of print.
Okay.
Well, Billy, why don't you tell us a little bit more about how you got started with C++?
Obviously, we went over a little bit of your bio um well uh it was really like middle high school era and i did volunteer work on these uh internet
forums like um bleeping computer was one of them and there were a whole bunch of others where
at the time windows didn't come with antivirus software.
And so there were lots of users who would install their 50 free screensavers from whatever website they had visited.
And they would get their 50 free screensavers and malicious software.
And we had tools that would list all of the auto-start places in Windows and files that were modified recently and stuff like this.
And then we'd dump that into a report.
And then the user would paste that report into a forum post.
And then we would read that and go, oh, that is 8randomcharacters.dll in your auto-start entry.
That's probably bad.
And then we would write a little script that would delete the delete that thing it's all just for fun of it volunteer stuff
yeah something like that um and i wanted to write tools that this community would use
and because of that environment you need single binary like here's the XE and that's it deployment because
machines that
are broken because they are full of viruses
tend to not like running
installers and that sort of thing.
So if you
wanted that at the time,
the only game in town was C++.
So that's
how I started looking at that
and I saw like Stefan's channel 9 talks
this would have been
what 2006
ish or 2009
ish that about 10 years ago
I guess and then right
after that I did a whole bunch of stack overflow
question answering and met
James McNellis from that
community and well Stefan and James,
back in 2015 when I joined
the C++ team, were on the
C++ team. So James was the
C runtime maintainer and
Stefan was the STL maintainer.
And so when the
security team that I used to be part of got
let's say
downsized when they
removed the trustworthy computing org um i asked to interview
with them and they thought i was useful in some capacity i guess um that's cool so you're actual
so you did this uh worked you know debugging people's computers effectively, de-virusing them.
I would say it's largely manual.
It's like an automated manual process.
Well, at the time it was. But for the most part,
I haven't actually worked on an antivirus engine.
But I'm pretty sure that nowadays
what they were doing, something like like defender would just stop that right and like the the bad like right around the time when i left
that community they the really bad things had started to do things like um hijacking your
master boot record to hide themselves uh That stuff becomes more, well,
you need a tool written by some antivirus vendor
and tell the user to run it and hope it works.
It's a lot less, oh, well, just find the starting random
DLL names in your auto start list.
But that kind of hobby work directly led to your job then, since you were at the security team?
Yes. For example, when they interviewed me to be an intern to work on that security team,
they asked me if I knew how the Stack Protector worked. And I just said, Canaries! And they were
like, okay, that's better than every other intern we've
interviewed. I'm like, I don't know
the specific assembly, but I know that
make sure the thing
before the return address hasn't been overwritten.
That sort of thing.
And I think as an intern
I did a static analysis tool that they
threw away.
What happens?
Well, because that team what that team does is, I mean, because that team,
what that team does is, or did,
I should say, they took stuff that
came out of Microsoft research, like
let's say Blaster
comes out, or one of the big,
you know, this is a
vulnerability that's in the news.
And they would write a tool, and the
question would be, okay, well, we're Microsoft
and we have, you know, umpteen millions of lines of code.
Can we find all of the places that are just like this exploit we just fixed?
And so research would write a thing that would be better than nothing.
Right.
Like, it would have a ridiculous false positive rate, but it's still faster than audit millions of lines of code for this pattern.
And what my previous team did is they took that thing that barely worked
that came out of research and tried to make it good enough
that we could make it a release gate.
So you do not ship Windows while this tool says you have,
for example, potentially uninitialized pointer.
Like there's a list of compiler warnings, for example.
And then there's a tool that makes sure
that you didn't turn off those compiler warnings
in the compiler.
I feel like a lot of projects I work on
need a similar tool.
Yeah, in fact, the current incarnation of that
is called bin skim, and it's on GitHub.
It cracks open your PDBs
and makes sure there are no compiler command lines with
warning disable specific warning.
I'm sorry, this is Binskim?
Yes.
Okay. We'll put a link to
that because I have to know
that that exists. I have to know about all the tools.
Yes.
Yeah, and it doesn't...
It's not like you can't suppress this warning
ever. It's we don't want you to suppress the warning on a project-wide level sort of thing.
Neat.
Yes.
So how long did you do that?
Well, I was, let's see, I graduated 2012, so I was an intern the previous summer.
So if you count the internship, that would have been from 2011 to 2015 ish so what four years um wow
i've actually that means i've been on the c plus team for as long as i was on that team
that's interesting to think about um anyway yes so something like that i'm always fascinated by
these stories where someone's personal interest or hobby or whatever kind of directly led to what they got paid to do, at least for some amount of time.
Yes.
Thank you.
Yeah, the library's team are workaholics, and probably not in a good way, but that's why all of us are crazy people.
It's fine.
Okay. crazy people it's fine okay so you joined the msvc library team i guess at a pretty exciting
time because they were you know working a lot harder on conformance what were some of the first
things you worked on when you joined oh um so the first thing i worked on well the first thing i
remember working on let's go with that um sure there was a bug in our RangX engine where it was
if you had
all spaces star
all things that are not spaces star
that whole
quantity starred. So it's just
a really long way of saying
anything.
Right. Okay.
Our RangX engine really didn't like when you did that.
It still really doesn't like when you do that,
but at least I made it not emit wrong answers.
And so I spent weeks debugging this thing,
or, I don't know, maybe a week debugging this thing,
and I wrote a thing that would dump out the DFA
that the RegEx parser spat out
so that I had some idea of what it was doing.
And it turned out
and the reason I remember this is
the exact line that had the failure
and the exact failure was caught by the compiler
but
the regex engine that we
had licensed had
a thing where they put the function local
into a local static
and then just immediately read it out of the local static
and there was a comment next to it,
to quiet diagnostics.
You remove that, and the compiler
was like, here's the bug!
It was like a
char out of range thing.
Yeah, so needless to say, right after that, I went
and found all of the to quiet diagnostics
comments in the codebase base and removed those.
Wow.
That's, I mean, just a really important lesson.
I don't know how many times that I've interacted with people that are like, oh, well, I tried turning on that warning, but it was just far too noisy, so I turned it back off again.
I mean, well, there's a reason.
Static analysis warnings can be very noisy. That's why
we tell people not to use
wall in our compiler.
Because wall for us is equivalent
to GCC or Clang
W everything, and includes warnings
like, this behavior changed since
Visual C++ 6.
Yeah, I mean,
people want to be able to audit their codebase
for that, but that doesn't mean it makes sense to turn on all the time.
But yeah, compiler diagnostic.
The ones that make it into Wextra or Rw4, they go there because they find substantial numbers of bugs.
And I don't know how the other standard library maintainers live with their treating the standard library as a system header thing.
Okay, let's go ahead and talk about that, because that is one of my personal pet peeves.
So go ahead and explain what you just said, for the sake of our listeners.
Sure. So the other standard libraries, Libstdc++ and Libc++, which are the standard libraries for GCC and Clang, respectively,
operate in this
world where they
use this GCC feature called system
headers that Clang presumably implemented
for GCC compatibility,
where if the header doesn't come
from, the idea is, if the header doesn't
come from your project, you don't get
warnings for it, because, well,
presumably you can't do anything about it.
It's a header that came with your
system.
If unist.h
or windows.h has a
warning in it, what are you supposed to do
about that?
We don't do that.
Our standard library doesn't do that
because the standard library is full of
templates. So you can convince the standard library to do do that because the standard library is full of templates.
So you can convince the standard library to do things on your behalf,
and you should get warnings for those because they're your fault.
So, for example, if you assign a 64-bit unsigned number into an int,
the compiler will emit a warning saying, you know, this is a possibly narrowing conversion.
You probably didn't want to do that.
Or you probably at least wanted to check what you were doing.
If you get std transform to do that same assignment by giving it a transform that returns a big unsigned int and assigning it into an int range, that's still you writing the same code.
And so our view is that you should get that warning.
And that means that
our headers, we assume, have
all of the warnings turned on, or most
of the warnings turned on. If you look at our
sources that we just shipped on GitHub
in September,
and you look at our tests,
we run it with W4
and a list of the warnings
from WALL that people have
argued for us to turn on.
And we do have some
library-wide suppressions.
Unfortunately, the unreachable
code warning is a very good warning
to find vulnerabilities,
but in templates
it's difficult to use correctly.
So we have that one suppressed in the headers.
Okay.
But we do that on a case-by-case basis.
We don't just say, this isn't a standard library header, therefore no warning.
Because then those cases like Transformer for each, where the user wrote the thing that's being warned about effectively, they deserve to get that notification.
So you said this was your pet peeve.
What's, uh...
My pet peeve is the standard
libraries that exclude themselves.
Oh, I see. Because of the number of
warnings that are missed. So I'm
fully on the Microsoft team
when it comes to this. I see.
I wasn't trying to give you a hard time. I wanted you
to explain why it's important.
Well, we do occasionally miss things.
And I do understand users who are angry.
Like, I can't turn on this warning in my code base
because you have a mistake over here.
Standard library maintainers, what are you doing?
Like, I don't know.
For a long time, random, for example,
was full of, if you make the random distribution for float instead of for double,
well, the promotions really, the core language rules for promotions really want everything to be double.
Right.
And we didn't have all of the casts. user types that can be smaller than int or double for integers or floating point respectively,
you need to cast everything back to the original type.
Because char plus char is int.
Yeah.
That one has caught me several times in code as well,
and I'm like, okay, I'm tired of the static casts
that I have to write now.
But you have to, and I've learned in my code that without those static casts there,
I actually did have some actual bugs in the code
where I was doing promotion that could lead to an actual incorrect answer in some cases.
Yes.
I saw people complaining like,
GCC's warning for this complains about, like, char plus equals int.
It's like, well, that's because plus equals is form the int result and then assign it over the char again, which doesn't do what people expect a lot of the time.
And the only way to do that without a warning is char equals static cast char plus int, or whatever.
Yes, you have to calculate the result and then explicitly...
It doesn't give you the correct result that you expected,
but it at least makes it explicit what it's doing, if that makes sense.
Yeah, because if there's overflow, there's overflow, and there's...
Well, it's because everything promotes first.
If I recall correctly, you can't even calculate char plus char, because both of the parameters get promoted to int first.
Perhaps. That I'm not positive about.
But yeah, the usual arithmetic conversions are... I'm sure they were a good idea in the 70s.
I want to interrupt the discussion for just a moment to bring you a word from our sponsor,
PVS Studio.
The company behind the PVS Studio Static Code Analyzer, which has proven itself in the search
for errors, typos, and potential vulnerabilities.
The tool supports the analysis of C, C++, C Sharp, and Java code.
The PVS Studio Analyzer is not only about diagnostic rules, but also about integration with such systems as SonarCube, Platform.io, Azure DevOps, Travis CI, Circle CI,
GitLab CI, CD, Jenkins, Visual Studio, and more. However, the issue still remains,
what can the analyzer do as compared to compilers? Therefore, the PVS Studio team
occasionally checks compilers and writes notes about errors found in them.
Recently, another article of this type was posted about checking the GCC 10 compiler.
You can check out the link in the description of the podcast.
Also, follow the link to the PVS Studio download page.
When requesting a license, write the hashtag CppCast and receive a trial license not for one week, but for a full month. So over the past five years,
what are some of the more interesting or challenging parts you've worked on
in the standard library?
A lot of it is that it's not like
standard library code authoring stuff.
It's taking proposals that come out of the committee
where, you know, not to fault the committee
or anybody on the committee,
but big chunks of it come from that come out of the committee where not to fault the committee or anybody on the committee, but
big chunks of it
come from the Linux
HPC world.
The committee has a lot of people who
work at the US Department of
Energy labs, for example.
These are the folks who program
Summit, the biggest supercomputer in the world,
to do things.
And so their model
is we own
the whole machine. We have a zillion cores
on this box and
we own the box
while we're running on it and
we don't need to worry.
We assume we can take over the system
and as long as we get our
workload done faster
that's a win.
Whereas we don't target Linux or a POSIX environment, and we are more concerned with health of the system.
So burn all of the CPU cores to get this done 5% faster is a win for the HPC guys and is a loss for us.
And also there are things like Windows Dynamic Linking Model is different than POSIX.
So if you notice, if you look at
StudeAtomic for not lock-free things, you can make StudeAtomic
of, I don't know, 200 byte struct.
For example. No CPU in the world has a 200-byte atomic instruction.
So you're going to need...
You're going to touch the non-lock-free atomics.
We put the lock for the non-lock-free atomics
in the atomic itself, and the other standard libraries
have this external lock table thing.
Okay. Why?
Well, because
you will only need as many
locks as you have threads
on average.
They have a hash table of locks
that use the address
of the atomic as
the key into the table.
And they're only
holding the lock for as long to do the memcopy
or memcomp
of the type in the atomic
so it's more
space efficient to have the external table
especially because a lot of these things often
have large alignment requirements
like if you put an
align as 200 thing in that atomic
the fact that we stick one extra
byte for the lock doubles the size
of the atomic on you
so it's there are good reasons to move it out stick one extra byte for the lock doubles the size of the atomic on you.
There are good reasons to move it out, but the problem is
that we,
in Windows' linking model,
and it's going to sound like
I'm ripping on Windows' linking model.
There are ABI reasons and stuff that I like
Windows' model for, but one of the consequences
is a typical Windows
program has four standard
libraries in it.
Windows has theirs, and they
don't want their use of Sturtoke
to stomp on the user's use of Sturtoke,
for example.
So in Windows model,
every symbol is namespaced
to the module it comes from.
So it's not Sturtoke, it's
UCRT-based exclamation point Stur comes from. So it's not strtok, it's ucrt-based exclamation point
strtok.
So
it's difficult for us to use that external
locktable model because
we want you to be able
to pass pointers to atomics around
between modules
and that just work.
Whereas if we use the external locktable,
if the pointer to the atomic was passed around between
two different DLLs that statically link the standard library, they would be using
different lock tables, which means they wouldn't be atomic
anymore. Whereas in
the POSIX model, they have a process-wide symbol table,
and so you can have as many copies of the standard library as you want
because the operating system loader in that environment will just pick one.
So you don't have this multiple people want to provide the same facility problem.
Of course, the advantage of the Windows model is
you can have different modules that use different standard libraries,
and that works.
You don't have this ABI,
like if people disagree on what the layout of string is,
it causes a problem.
It doesn't really cause a problem on Windows
as long as they don't put that in their interface,
in their exported interface, if that makes any sense.
So your original question was...
I'm sorry, I went on in this little rant. your original question was i'm sorry i went on this little rant your
original question was what is the um or what is the the challenging thing it's explaining to people
who aren't from this world like why their thing that seems so obvious to them can't work for you
right and things like this external lock table is one of them. Like, recently there was a proposal for RCU that went through the concurrency and parallelism group.
RCU is a, like, pseudo-garbage collection thing that solves lock-free programming problems.
Okay.
And they wanted the standard library to, like, create a thread whose job it is to reclaim the memory
for the rcu and i'm like well i can't do that like like i i like if somebody loads the standard
library into a shell extension and then unloads the shell extension i can't have any threads that
are still running because then you unload the you unload the code the thread is running and
that tends to not be good for the program. Right.
Huh.
That's interesting.
How often does it come up when there's something like that, where you're like,
well, actually, that's not possible
on whatever?
It doesn't come up super often,
but it's...
I mean, I think I see it more
than some of my coworkers just because the sort of, most of the things that I've touched in the standard library have been the concurrency thing.
And most of the things the rest of the committee proposes don't have this, like, deep platform integration impact.
Like, vector doesn't care about this problem.
Right.
Like, format doesn't care about this problem right like format doesn't care about this problem like only really the
concurrency stuff does and that's because the concurrency stuff has this non-composition
problem like if you have uh two libraries that um want to maintain thread pools and you try to use
them together the result is not usually good okay um like you can create deadlocks where one of them
is waiting like the thread pools are waiting on each Like, you can create deadlocks where one of them is waiting, like, the thread pools are
waiting on each other and nobody can create new threads,
for example.
And so, a lot
of the things SG1 wants to do
really are,
like, they have to be programmed global
for them to work, and
in this, well, we don't agree
on what standard library we're using world,
that's difficult.
I think, I mean, I'm a huge fan of cross-platform code. Like everything that I work on, I try to
make it cross-platform, multi-platform. And I feel like these differences that you're describing,
some people might be listening to it and go, well, that's stupid. Windows should just use
the model that everyone else uses. I'm sitting here going, this is hugely important that we don't have the entire computing world doing the same thing.
And instead, we understand a larger picture and different issues.
Um, I'm, yes, I'm not sure. I think the systems that we have ended up with are sort of products
of the world they're intended to target.
So, like,
POSIX world, for the most part,
it's assumed that you distribute things
as source code. Maybe I'm overgeneralizing
here, but, like, it's expected
that you can just rebuild your program
to some extent.
Whereas
Windows has always been
a very binary deployment environment.
We talk to customers and they're like,
we're so happy that Visual C++ is finally conforming.
Now the bottleneck is GCC, some ridiculous old version,
because that's what's bundled on some distro we have to support.
So the consequence is, on the on the posix implementations
they get they get better code sharing they get it's easier to share like templated code between
people but like that's why we can run our new shiny standard library on vista that shipped in
2006 and trying to run modern code on a 2006 Linux distro
would be interesting.
That's a good point.
But there are consequences that are surprising.
People don't expect putting a local static in a header
to give you one local static per DLL
that included the header.
Right.
And I've heard people tell me that that's broken.
And, well,
maybe, but that's
like, firewalling things within
the individual who's deploying that
thing is the only way to ensure
that you can cleanly
swap out modules without
worrying about ABI
explosions. Like, that's one of the reasons why
the committee recently has had this
ABI problem, because in POSIX worldix world only one person only one entity gets to say what vector
colon colon vector does um since you mentioned it do you have any opinions you wanted to share about
the uh abi discussions that have been ongoing in the committee so it's it's... I mean, we are...
We still have lots of customers.
We've been ABI-stabilized since VS
2015. That was kind of actually
sprung on us. We got asked
to do that, and we said, okay, we need to do all
these things to make that better.
And then
we got told, well, no,
people need this now.
So, it was very...
So anyway, we have more ABI lock-in problems
than the other implementations do
because we allow people to deploy the libraries themselves.
So we are obligated to break ABI occasionally
because we have to do it even to add stuff.
So we can't add new exports from our own library
because
somebody could have
already deployed an older copy of the library
next to their XE.
The operating system loader picks that one.
Right.
If you deployed something
that needs to be one of these guests in somebody's
else's process scenarios, like a shell extension
or a print driver or whatever,
the user will go to print
and the process will crash because
these new exports that you're looking
for are not present in the version of the library
that's loaded. And we've
been able to work around this by adding
more DLLs.
That's why when you deploy our library,
we have a DLL that
only contains the default memory resource
variable, and that's it.
Okay.
So for some things, we're able to add
more libraries to work around the problem, but
eventually there will be things
where they need to interact with other things that are
already in the separately compiled
library, and then we're stuck.
We have to rename the
DLL to something else
and that's an abi rig so in in a sense we have we are we need to do it more often than the other
vendors might need to do because we let users deploy that um so we care about abi and that we
don't want to do it gratuitously um i don't know. I think we propose something like every five or ten years
we should just do that.
But it's easier for us as well
because of that whole DLL scenario.
You can have one program with two components
compiled with different standard libraries in them,
and it's fine.
Whereas on the POSIX implementations,
I don't know how they would solve that problem
because both standard libraries would want the same symbol names.
You pick whichever one is linked first.
Right, but the operating system loader does that.
Yes. Yeah, you don't really necessarily have a way to control it.
Right. So if you have, in the POSIX world,
if you have a print driver that wanted standard library version 1.2.3.4,
and its string layout is this.
Yeah, too bad.
Yeah, like you're toast.
There's no saving that.
And that's why on lots of Linux distros, your only options are GCC and Libsyn C++ and,
you know, outside of toy examples.
Makes sense.
So I'm actually, to go back, if you don't mind, a little bit to your standard library than outside of toy examples. Makes sense.
So I'm actually, to go back, if you don't mind,
a little bit to your standard library work.
You were involved in, do I remember correctly,
that you were involved in the implementation of the parallel STL algorithms?
Yes, that was an interesting,
like when that paper went in,
there were like eight supposed implementations of this thing.
And we were surprised to find that supposedly one of them was ours.
Anyway.
Yeah, so I think Intel's TBB implementation was the first one of those that shipped, and then we went right after them. And that's another example of this, we care about things that maybe the HPC community doesn't care about.
With us, you can call Parallel Sort, and as soon as Parallel Sort returns, unload the standard library.
Okay.
Which means that we don't have thread pulls inside the standard library itself. Literally, when you call parallel sort,
we spin up a whole work
stealing scheduler just to do
sort, and then
tear it all down before we return.
And surprisingly, that
performed very well in testing, so
hooray.
Whereas Thrust, for
example, is NVIDvidia's implementation of the same
uh like well they're targeting a gpu it's like we we wanted to limit the damage of calling
parallel anything with bad sizes so um for example if you wanted to parallel sort six elements, or for each six elements, if you do this on thrust, well, they're going to do this copy the memory to the GPU operation, and then sort the six elements on the GPU, and then copy the result back.
And so the consequence of doing that when n is too small and parallelism doesn't win can be relatively expensive.
Right. Whereas we assume that most of the time N is small.
And so we take steps to make the,
like,
we gave up some percentage of maximum theoretical performance with N being
huge in order to limit the damage when N is small.
Okay.
If that makes any sense yeah we had on
here i don't remember who it was someone a long time ago and we were first talking about the
parallel stl if i recall correctly he suggested uh that in the future when we all have the parallel
algorithms that we should just specify is it is it sequential or? I forget what the three are.
It's seek, par, and par on seek.
You should specify that as appropriate for your
data every single time you call an algorithm
and let the standard library decide
if it should be doing anything different with it.
Right, and that's why
we were so paranoid about making sure
that
the result when n was small was
not too bad. Right.
So would you suggest that, or if I as the
programmer know that this thing is never
more than 10 elements, should I pass those
flags, or should I just say, no, just use
this normal sort?
I would say
profile. Okay. Because
I mean,
how do I put this? There are some cases where you might be
able to avoid profiling so like when i a general rule of thumb is if the algorithm has to do more
than order of n work it's probably a good idea to parallelize so like the the crossover point
for sort was i think like sort ints was like a,000 elements when I tested it.
Lots of programs sort more than 1,000 elements.
Sure.
Whereas for something like foreach or transform reduce,
it has to be quite large before the parallel version wins.
Because the instant you involve another thread,
you have to cross caches onto another core.
So if all the data you're working with
was on the local cache on your core,
then parallelism will lose.
But that depends on what your data looks like,
and so it's hard to give you a hard and fast...
Sure, right.
...this rule.
And I think, I don't know,
there seems to be a thought that the standard library can do more
than it really can on some of these things like i've had people say that they expect parallel copy
to somehow discover um which memory which numa nodes the memory that are input to copy come from
and like that sounds unlikely and like paralyze appropriately to that like you know do the right
numa node copying and yeah i i that was my response like how do you expect stud copy to
figure that out um so i i don't know um like given the fact that this thing like there are a lot of
promises of the parallel algorithms library but i I haven't seen anybody actually implement this
idea of
the compiler can do something
smart about parallelizing this.
I'm not saying it's impossible,
but it's
certainly a useful tool.
But
I would
expect people to assume
it's library tech.
We're going to spawn some number of threads
and call your callbacks on them.
Not reach into your callback
and see that you're calling another parallel algorithm inside
and that sort of thing.
So I'm not sure if I answered your question.
Yeah, sure.
I guess maybe I don't know how to answer your question
no that's fine you gave a lot of
information
I'm still kind of waiting to use
the parallel algorithms because I haven't tried them
on Visual Studio yet and let's be fair
that's the only implementation that we actually have
in our standard
doesn't GCC 10 have it?
yes but you have to link to a separate library
which I think is the
TBB from Intel.
I see. Okay.
So, yes, now
I could also, as of what,
very recently... No, actually, GCC 10
is not even out yet, technically.
So, theoretically, very soon,
I would be able to. I would be interested to hear
if they ended up fixing the forward
progress problem with
TBB.
Like, one of the rules in the spec for these is that the parallel algorithms library
is not supposed to be able to introduce deadlocks into your program.
That sounds reasonable, yeah.
Well, you say that, but a lot of parallelism frameworks
assume when you give them work
that their thread pool will have a thread available.
Okay. Which is a reasonable assumption have a thread available. Okay.
Which is a reasonable assumption for a lot of the time.
So what, it would have to try to create a thread, and if it can't, then go back to doing the work in the current thread?
Right.
So what we do is we use the current thread to help.
We go to the system thread pool, which is create thread pool work, submit thread pool work, wait for thread pool work callbacks.
Those are Win32 APIs.
And so we go to the thread pool and we say, here's some work.
Can you please help us?
And then we just start doing the work.
And if the thread pool doesn't get to it before we're done, we yank the work off of the thread pool and say, nope, sorry, we didn't want that.
That's fine.
And that's super important for handling that small n case I was talking about,
because quite often, if n is like 100,
one context switch is more than it takes to do all of that processing.
So you never get a thread pool thread,
even if there are thread pool threads available,
because you're already done.
And I don't believe, I don't know if they ever actually implemented
the forward progress guarantee stuff.
I know they had issues once upon a time.
But now it sounds like I'm bad-mouthing another implementation
that I haven't seen, so maybe I should.
I don't intend to say that.
I don't think so.
That's, yeah.
Like, maybe they do have this implemented somehow.
I don't know.
So we've talked a lot about
um you know different standard library work you've done but you did mention in your bio how
you are sometimes on loan i think you're currently doing work for the vc package team do you want to
tell us a little bit about what you've done there yes so um yeah we just had a blog post
a couple days ago about um so for those who aren't familiar, VC Package is
a tool that
downloads source code and builds
it for a whole bunch of open source
libraries, that sort of thing.
And there
are, for example, there are things that have been put into
the standard library, like say RegEx
that we think
have been kind of mistakes.
Like, all of the RegEx implementations in the three major standard libraries are bad.
Lose by hundreds of times to RegEx extensions like Google's RE2.
Right. Or PCRE.
Yeah. And that's because the people who wrote those RegEx extensions
are people who eat text regex engines are people
who eat text processing for breakfast
and are geniuses. And
the people who wrote the standard regex engines
are standard library maintainers.
I don't want to denigrate
standard library maintainers, of course, but
we
touch regex today and
touch atomic and parallel algorithms tomorrow
and file system the next day
like we're not specialized in anything um and so that's you know like i if one of the implementations
was bad then i could go okay well maybe those guys were didn't know what they were doing but
like all three of them are bad suggests uh a problem um so anyway uh we want to be able to tell people
just go vcpackage install
re2. Things that
don't need this ABI stabilized...
There's no
argument that programs need
a regex engine, for example.
But the argument is, well, does that really
belong in std if
we have a tool available, or if we have
tooling available on all the major platforms
where you can just go grab a thing that does that thing.
It only needs to be in std if everybody needs to agree
exactly on what its interface and layout are.
So anyway, we're trying to...
Today, when you use vcpackage,
you actually build the tool itself.
You call bootstrap vc package
and it builds itself um using whatever compiler you have locally and we've had people who don't
like to do that like they want to have a binary so that they're not building it every time
so that's kind of what i've been doing for the vc package team recently it's like
get that team ready to do code signing, go through all the security policies and legal stuff.
Because when you're in top 10 most valuable companies, people like to go after you for things that they wouldn't go after, let's say, other vendors for.
Right. so and also like it would be bad like setting up that infrastructure for them because like for
example today we build when you submit a package to vc package for inclusion in the catalog we
build it well um if your package is i'm going to try to steal the microsoft signing key package
it turns out there are lots of people that would love to have microsoft signing keys
fascinating yes like some of the malware vendors that we started this conversation
about for example um so you know we need to do things like never build the user code on the
machines that can sign okay that sort of thing um and yeah, that's what I've been doing
for the last couple of months for them.
And also onboarding them to this
preview Azure Pipelines scale sets feature.
So previously, because building the whole catalog
takes like nine hours,
but building most submissions only takes like five minutes.
So they had like 12 machines always on in Azure,
which tends to be kind of expensive.
Whereas the Azure pipelines team just came out with this thing
where it will automatically scale like,
oh, I'm seeing more pull requests on GitHub come in,
go fire up some more machines.
Okay.
So I'm kind of curious about your opinion on this now.
Well, I am curious, so I'm going to ask. To say I'm kind of curious about your opinion on this now. Well, I am curious, so I'm going to ask.
To say I'm kind of curious is weird. Anyhow, every project now, I don't want to discuss
Conan versus CPVC package or anything like that. But every project that I'm involved in right now
has Conan. It has a package manager, more to the point. So based on what you just said a moment ago, do you see, hypothetically, a decreasing role for the standard library as more people are using package managers?
I think that's what we would like to see, yes.
Okay.
Right, because...
And by we, do you mean like the royal we, or do you mean the standard committee or? So I don't know if that's a universally held opinion of the C++ team,
but many,
many people on the C++ team have that opinion because like a lot of the stuff
that I just talked about,
like the unload problem or the be a guest in somebody else's process problem,
like those aren't problems for everyone,
right?
Like the,
the, the advantage of putting it
in stood is it's available everywhere the corollary to that is everything we put in stood has to be
able to go everywhere right and that means there's useful stuff that we're leaving on the floor
and that we must leave on the floor for stood to achieve its mission of being universally available um so something like
thrusts which is a let's go target gpus with stuff library undeniably does something useful
but i don't think we could ever put it into std because we can't assume that you have an nvidia
gpu right okay yeah um you know that sort of thing um And there are lots of software ecosystems in the world that have small standard libraries, relatively speaking, and package managers to fill in the gaps.
Like npm for node.js is a classic example.
Rust has Crate.
And so, sure, they don't have a web server in their standard library, but who cares?
They have libraries readily available that do.
Right. Interesting getting your
opinion as both STL
maintainer and someone who's worked on a package
manager that you're definitely a very pro
package manager. Yes.
Like I said,
I don't know if that opinion is universal
on the team. I haven't done that
polling. That's fine.
Yeah.
Well, Billy, it's been great having you on the show. I haven't done that polling. That's fine, yeah. Okay.
Well, Billy, it's been great having you on the show today.
Thanks.
It's been a great experience.
Thanks so much for listening in as we
chat about C++. We'd love to
hear what you think of the podcast. Please let
us know if we're discussing the stuff you're interested in
or if you have a suggestion for a topic.
We'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can
like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving
and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the
show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info
and the show notes on the podcast website
at cppcast.com.
Theme music for this episode
was provided by podcastthemes.com.