Algorithms + Data Structures = Programs - Episode 11: What belongs in the standard library?
Episode Date: February 5, 2021In this episode, Bryce and Conor talk about standard libaries, open source libraries and more.Date Recorded: 2021-01-24Date Released: 2021-02-05C++ Standard LibraryPython Standard LibaryPython Built-i...n FunctionsPython argparsePython itertoolsPython more-itertoolsAlex Stepanov PapersC++TO April 2019: Jon Kalb “C++ Today” (Include History of STL)C++ Boost LibrariesC++ Boost FilesystemCUDA Thrust Parallel AlgorithmsC++ Microsoft STL on GithubC++ Ranges-v3P0443 C++ Executors ProposalIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Transcript
Discussion (0)
I think I look fantastic. I look like Legolas. Here we go.
That's not how you pronounce that.
Legolas? Legolas? Whatever.
Oh no. Oh no. Oh no. That is not good. What were you thinking?
And it's like shaved on the side and oh no. Oh no, Connor.
Do they really let you have hair like that and be an
actuary welcome to adsp the podcast episode 11 recorded on jan 24th 2021. My name is Connor and
today with my co-host Bryce we talk about what belongs in a standard library and more.
So uh I got an idea. You got an idea? Okay. Yeah I think we should talk about what belongs
in the standard library. Okay. Is this C++ specific or is this uh i think it is okay because well i i guess it's kind of a
comparison of language standard libraries is really a part of this because um i think that the
the the the there are a lot of people trying to put things in the C++ standard library.
And I think a lot of the things that people wish to see in the C++ standard library that aren't suitable for the C++ standard library,
folks want to see those things because they expect them to be there in other languages' standard libraries. And they perhaps don't understand the difference between
the C++ standard library and the standard library of other languages. So just because two languages
both say that they have a standard library or foundational libraries doesn't mean that
they have the same, that doesn't mean that they're the same thing,
or that they have the same requirements,
or that they have the same challenges.
And I think first we have to start by defining
what a standard library is for C++ or for any language.
And I think that standard libraries have one key property,
and it's the one key property that everybody's interested in.
And that property is the standard libraries are the set of libraries
that are available when you install the language
without adding any extra packages.
So when you just install the base Python packages,
which libraries do you get without adding other Python libraries?
When you install a C++ compiler, what libraries do you have without adding other libraries?
When you install Java, what class libraries do you have without going and adding other
ones? And the reason why this property is key is because things that are in that set of standard libraries that are automatically installed with the language or compiler, they get implicit distribution. awesome package and you're like, great, I want to use it, you're more likely to use
that package if you don't have to install it, if you just happen to have it installed
on your system.
And that is the case for the standard library.
Whereas if the package is something that you have to install, even if it's in a language
where it's very easy to install a third-party package, something like Python or Rust,
that is still an extra hurdle.
And you might be in a space where you're like, oh, I don't want to add extra dependencies.
There's still this extra step that you have to do,
and it makes it less likely that people are going to use that
library. And so that's what I think really makes a standard library is that it is a component that
comes with the language by default. Does that make sense? Yes, that made sense. And to summarize,
so the key property of a standard library is just that it comes with the language that's the key property yeah um exactly and and so you know that explains why there is a desire to have rich standard
libraries because um you know if you think that there's something that's really important for
your programmers to have access to um or that's going to make it, you know, it's going
to make your language better. And like, why wouldn't you want it to just be there by default?
Like, it's the easiest option if all you care about is getting your library into the hands
of programmers. The easiest option by far is just to have everybody implicitly install it. It's the easiest route to get
adoption. But, you know, there is a cost. So, you know, the first cost is, of course, that
the larger that core set of libraries is, the larger your actual, you know, core package is.
One of the key principles of C++ is you don't pay for what you don't use.
And so everything that you add to that standard library makes the size of that core package, larger and larger.
But I think more importantly, perhaps, there is a maintenance cost. I think most languages try to have a set of standard libraries that have a coherent design to them,
that it's not a collection of libraries, but it is one coherent library where all of the APIs sort of fall the same style.
I will note what I think is an important exception to this, which is the Python standard libraries.
If you look at the set of like core Python libraries, there is actually a fairly notable divergence in the
API style and even the naming conventions among some of those library components.
And I believe that's because the way that it works in Python is that a number of those libraries started off as third-party libraries and then they were adopted into sort of the core set.
Do you know if that's accurate?
I cannot confirm or deny that.
And I actually don't even know.
Off the top of my head, I know that there's a built-in functions page.
But what are the Python – are they called core or standard?
I don't know what they're called but
i'm thinking like the um uh the python standard library uh yeah yeah like something the thing the
one that comes to mind to me is um like the python libraries for parsing arguments um because i think
i've in in my history of writing bad Python code, I think there've,
I think I, I think there are two different argument parsing libraries that have been
around historically.
I know one of them is like discouraged from being used these days, and that's probably
the one that I'm used to using.
And then I think the newer one's called arg parse.
Um, but yeah, like that, that, that's an example, like, like you've never had to install the I'm used to using. And then I think the newer one's called argparse.
But yeah, that's an example.
You've never had to install the package for argparse.
That just comes with your Python installation.
But I believe that that was something that started off as its own library, and then it got added into that standard set.
Somebody wrote some third-party library, and maybe they put it on GitHub, and then it eventually, the Python community decided to adopt it as part of the standard library.
Yeah, my guess is just on, like, I'm on the Python docs website, and there is a massive number of libraries that are supported here.
So I think, you know, if we have Python listeners, they can tweet at us and correct
us. But I'm pretty sure some of these is it's exactly what you said, it was a standalone library
written by potentially, you know, core core developers, but potentially not. And then it
just became so useful, that over time, they were folded into the standard. Yeah, specifically,
like I'm looking at iter tools, which is probably like one of the most common libraries
that I use.
There's another version, more iter tools,
which is basically just like an extension of that library.
But that one's not in the standard.
You have to go to PyPI or whatever package manager
and pip install it.
Yeah.
And so one of the nice things
about the Python standard library is it's very rich.
You know, there's a lot of different capabilities in there that you just get by default. But I think one of the nice things about the Python standard library is it's very rich. You know, there's a lot of different capabilities in there that you just get by default.
But I think one of the places where it,
one of the weaknesses is that it does not have that same interface consistency
as other languages,
standard libraries,
because the,
in Python,
really that it's a collection of libraries.
It's not one coherent library.
It's a collection of different libraries.
And some of those different libraries may have slightly different ways of doing things
or slightly different styles to them.
And, you know, if you look at, you know, a language like C++,
which is very different from Python in a number of ways,
but arguably is most different in the design of the library
and in terms of the package management story.
Even in C++, pretty much everything that's in the standard library
started off as a third-party library.
The original basis for the C++ standard library that was adopted in 98 was Alex Stepanov's STL library,
which was this third-party library. Now, the standards committee, they knew they needed a
standard library before Alex Stepanov came around. And there are components of the standard library that weren't taken from
the STL. I believe that locales and IO streams were not part of the STL. That was a separate
component that did not necessarily come from any pre-existing third-party library. But things like
the containers and the algorithms, etc., those all
came from the STL. So the committee, back when they were putting together 98, they said they
recognized we need a library. And they started designing some of the components to it. But then
at some point, I don't remember who, but somebody became aware of Alex's STL library.
And he was like, you know what?
This is what the standard library should be.
And he brought it to the committee.
And he said, hey, look, here's this.
It's novel.
It's groundbreaking.
And I think that we should ship it.
And the committee really took a big risk on shipping the STL.
And if you ever want to learn a little bit more about the history of this,
John Kalb has given some really great talks about the history of C++ in general,
but in particular, talking about this moment in C++'s history.
I'm sure we'll add some links to them in the show notes.
But this is really a monumentous decision for C++ to take this risk on this sort of unproven design.
And then if you look at the next big revision of the C++ standard library, C++11,
back during that time, people joked that that was essentially just
standardizing Boost. Boost was this, you know, this popular collection of third-party
C++ libraries, and C++11 adopted many of those libraries, like the Boost Atomics library,
the Boost concurrent, the Boost thread library. There's probably a number of other ones that I'm not
thinking of, but being primarily somebody who does concurrent programming, those are the two
that come to mind. And then in C++ 17, our next big library release, we standardized parallel
algorithms based on a library I work on, Thrust, and a file system library based on Boost file system.
And ranges that came in C++20 started off as a very popular library on GitHub.
So C++ has followed the same model of sort of adopting designs from existing third-party libraries. One of the things that
the mantra I like to tell people about what belongs in the standard library is we standardize
existing practice when there is a clear need. And by standardized existing practice, what I mean is we don't try to invent things ourselves in the committee when possible.
Whenever we have the opportunity to look at what's in the wild and to adopt that, we try to do that.
And one of the ways that we know that something is maybe ripe for standardization is if we see three or four different versions of basically the same
thing in different frameworks, then maybe that means that it's time to standardize it.
So one of the best ways to get your thing standardized in C++ to get your thing added
to the standard library is if you can make a list in your paper where you can say, here's links to five open source libraries that have a construct called, you know, whatever, that's the same thing as this construct in this paper.
And if I can go and look at each one of those things and I can look through and say, hey, yeah, all of these things are the same thing. then that that indicates that yeah this is a design that that is used in the wild by a lot
of people and it's even better if you can do this in a way that that demonstrates that there's a
need for interoperability like if of all five of those libraries expose this type, but they're not all interoperable and a user might want to pass one of these kinds of things into each one of those
libraries, well, then that makes a case for, hey, yeah, maybe this should be a part of the standard
library, part of the vocabulary. And yeah, so that is often sort of key to how we standardize in C++, this idea of we want to look at existing
practice and we want to adopt existing practice, not necessarily invent on our own.
But this gets us to the big difference between the C++ standard library and a lot of other
languages, which is the C++ standard library is a specification.
It's not an implementation.
And so even if we're adopting something that is based upon some existing third-party library, we're not just taking that library and shipping it with C++ compilers.
There are three major standard library implementations. And when we standardize
something, all three of those implementations have to go and implement it. And we have in recent
years started to have some shared implementation between different libraries. And this has been
accelerated by the fact that the Microsoft standard library, which was the one of the three big ones that was not open source,
was the Microsoft one. It is now open source. So now all of the libraries are open source.
The GCC ones under a GPL license. The other two are under very similar sort of license,
are under a more liberal type of license.
The LLVM ones under the LLVM license, of course.
And so this makes it easier to share code between them.
And so it looks like there's going to be some shared implementation of, for example,
the parallel algorithms between some of these standard libraries. And my team, which has done some implementation work on
some concurrency primitives, has made contributions to multiple standard libraries. So we're starting
to see there be some shared implementation between them, but there's still different code bases.
And so just off the bat, if you want to add something to the standard library in C++,
it is 3x the implementation work and 3x the maintenance work of shipping it as a third-party library.
Because you need to land it in each one of these major standard library code bases.
And another one of the challenges is, you know, the standard library, it really, it's
deployed at a scale which is uncommon for other software.
It runs on essentially every device on the planet. There's about 5 million
C++ developer users, which is a very large developer user base. And it has to run on pretty
much all types of hardware and for all various different types of applications. So it has to worry about corner cases that other libraries may not need to.
Because if I'm providing a third-party library to do something, I can choose to say, I'm going
to focus on these sets of users. And I recognize the validity of this other use case that you might have, but it's not a priority for me right now.
Like for example, ranges v3 selected a very specific set of compilers that it was going to
focus on in a very specific set of platforms. Whereas C++ ranges in the standard library,
they don't have that luxury because they have to be deployed everywhere that C++ compilers are deployed.
And we don't have the choice of just saying this library or this feature is not going to be available for all C++ users.
So it's about the blessing and a curse to have this implicit availability everywhere.
That, yes, it means that everybody can just install your, like, everybody who installs the language just has the library, but it also means that the library has
to work for everybody that's going to install the language. That's a really high bar to clear.
And the other real problem is that the people who are standard library maintainers aren't
domain experts. They're not necessarily
the right person to implement, you know, some particular type of library. They're not the right,
maybe they don't have the right expertise to implement, you know, an XML parser. And the
people who are domain experts, who are experts in implementing XML parsers, they don't have the experience with
standard library development. They don't know what it's like to have to ship in a C++ standard
library at that scale. So it's difficult to do the actual work, which again has to be done in three different places.
And so I'd argue that putting things in the standard library is actually very inefficient.
You end up having to support a lot of users who may not actually need your thing
and a lot of environments and platforms where your library may not actually really need to be supported
because you might not have users there,
but you still have to support them if you're in the standard library.
And it's very inefficient to implement
because you've got to implement it
in all of these different standard library implementations
in the way that a standard library expects.
And of course, that's just leaving out the whole question of,
you know, how does it actually get into the standard. You have to write an actual normative specification very good at stability, but it's not so good at
fixing its mistakes. It's very hard for us to make ABI breaking changes and API breaking changes in
the standard library. And that's not the right fit for all third-party libraries. Some third-party libraries might want to evolve faster maybe
because the domain that they're in is a domain where there's active research and where every
few years a new algorithm comes out or there's some innovation that requires interface changes.
So that's my rant on the nature of standard libraries.
So API stability, ABI stability, three times the implementation work, three times the maintenance
work, platform support. Are you trying to build an argument that nothing should go, or not nothing,
but a very select number of things should be going in the C++ standard library for all of these
reasons? Like there, you started, I guess, with sort of the one key property is that, you know, But a very select number of things should be going in the C++ standard library for all of these reasons.
Like you started, I guess, with sort of the one key property is that, you know, it's what comes with the language and people have easy access to it and it'll get utilized more.
But then there's obviously like a huge price tag and all these things.
So I guess is there like what are your thoughts on how much stuff should, like, should we be trying to,
should we be trying to decelerate, like, the, the rate that we put things in, like,
because, like, while you were going through 98 and 11 and 17 and 20, you only mentioned, like,
a fraction of the things that you, like, you know, variant optional, the list goes on and on and on
of, like, what's being added to the standard library, and, you know, I don't want to pick
out any of the
papers that are currently in the proposal process. But there are a number of, you know, non trivial
libraries that are quite large that are being proposed. And, you know, I'm sure not all of
them will get in, but I'm sure some of them will. Yeah, what are your thoughts on, you know,
should we be trying to find a small set of things and propose those? Or should we be trying to put all our energy into
like a package manager? So, I mean, I definitely am an advocate for a small C++ standard library.
And I do believe that a lot of the desire foriciency of C++ package management. I've said for a few years
that I think the single most important thing that we can do for C++ over the next 10 years is make it
orders of magnitude easier to use third-party libraries in C++ projects. And that means package management, that means build
systems, and that doesn't necessarily mean doing either of those things within the context of
the standard. The standard committee really only owns the text of C++ programs. And dealing with things like package management
or build systems is a little bit outside of our mandate
and would also be tricky for us to do.
But I think that if C++ had as rich
a package management system as, say, Rust or Python,
I don't think there would be as much demand
to put things in the standard library.
The standard library though is not a substitute for package management. I think that's the key
thing is that it is not a sufficient justification to put things in the C++ standard library simply
so that they're widely available to people because of the high costs to implement, because it's very inefficient
to put things in the standard library,
because once we put things into the standard library,
it's very hard to fix them later.
All of those things, I think,
push us towards wanting a smaller standard library
because that's lower risk.
Putting things into the standard library is higher risk.
We have to be very certain that we've got them right. As for what I think belongs in the standard library,
I think things that go into the standard library need to meet one of these three criterion.
This is what I refer to as the clear need in my mantra of we standardize existing practice when there is a clear need.
Yeah, I was going to ask about that because, yeah, clear need, depending on who you're talking to,
if it's a scientist versus a hardware engineer versus something else,
like their clear needs are going to be completely different.
Right, right. And when I say clear need, I mean one of these three things. A, wide use in interfaces. So this is something like std tuple or std vector, where the reason to have it in the standard library is that if you don't have it in the standard library, then a bunch of different third-party libraries will
provide their own version. And these things tend to appear in interfaces. So if standard library,
you know, let's use strings as an example, because that is one of the ones that's often a pain point
in the field. Let's say that
library A has its own string type and library B has its own string type and library C has its
own string type. And none of those string types are convertible or interoperable between them.
And you, the programmer, wants to use all three of those libraries. So now you've got to traffic in three different string types in your program
and you may have to write code to convert between them.
And all those libraries, they've probably got interfaces that take strings.
So it's not just an implementation detail of the library.
It's something that's surfaced in the interface that's exposed to you, the user.
So in that case, for something like that, there is a strong motivation for us to put it in the standard library so that it's part of the standard vocabulary.
We often call these things vocabulary types. So then we can have one string type, std string,
and all three of those libraries can accept std strings
as part of their interface. So that's bullet number one, which is when it's something that's
widely used in interfaces and we want to have one standard version of it.
The second thing, the second criterion, is things that encapsulate non-portability.
So things like std file system, the concurrency library, std atomics, IO streams,
things that are going to be implemented in a different way for each different platform,
in each different operating system, each different embedded environment, etc.,
where you, the programmer, probably don't want to be writing those yourself.
You probably want the person who's providing your platform,
who's the vendor of your platform, who's the expert on it,
to provide a suitable version for their platform.
Those sorts of things belong in the standard library. And then the last thing is facilities
that are not really truly purely library facilities, but that have a language component or that require language support. For example, things like type traits
or all of the coroutines library bits. Those belong in the standard library because, well,
they're part of the core language or they have ties to the core language and they have to be
implemented through some internal contract
between the compiler and the standard library. It's not something that you could implement on
your own. And that's it. Those are my three criterion. It's things that are widely used
in interfaces, things that encapsulate non-portability, or things that require
language support. I believe that only things that check off at least one of those bullet points belong in the standard library. So I guess we can we can wrap
this episode up or wind it down by maybe or maybe you don't want to pick pick on but are there
proposals in the pipeline that you think you know that are slotted for C++ 23, 26, 29, that clearly are examples that map to one of those three?
That are clearly examples that map to one of those three.
Yeah. Okay. So I'll give you a few. So first let's talk about the encapsulating non-portability.
And we should say, just because we're mentioning these proposals doesn't mean they're guaranteed to get in, but continue. Right. So one of the big focuses for C++23 has been networking support. Networking,
networking library is something that would be very similar to something like file system in that it's an encapsulation of non-portability. Windows and Unix platforms have very different network APIs. And so it would
be very useful to users for us to create an abstraction that they could use on both platforms
portably to write the same code and have it be, you know, have the implementation be provided by the experts at Microsoft who
know how the Microsoft networking APIs work and the experts who work on GCC who know how
Unix sockets work.
Another example of encapsulating, this is one that's really an example of both encapsulating non-portability and wide use in interfaces would be executors.
So executors are intended to be an abstraction around execution resources, things like thread pools, etc.
So they sort of fit into the category of encapsulating non-portability. One of the things
that's in the executor proposal is this static thread pool type, which is a concrete executor
type. On different platforms, that static thread pool is going to be implemented in different ways.
And it might be even more customized than you think. Like you might think, well, there's just going to be one implementation
on the different Unix platforms
and then one on the different Windows platforms,
but that might not be the case.
Think about the difference between an embedded Linux platform,
what you might want a thread pool to look like there,
versus a high-performance server that's running in an HPC cluster.
But executors are also vocabulary types because we think that they'll sort of be like the
iterators of concurrent algorithms where there'll be something that's going to be passed in
to a bunch of different interfaces.
So it's very useful for us to define the core notion of what this
abstraction is in the standard library so that people can go and write generic algorithms that
use that abstraction in their own code. And then finally, let's use the reflection library as an
example of something that will require language support. So there's a lot of work going on to add static
reflection to the C++ language. And obviously that has a very large compiler component.
But on top of the compiler mechanisms that actually expose the data, you need a way to actually access it
in your program. And that's going to be through a library style API. And also, you don't just need
the library API for getting that reflection metadata. You also need the compile time programming facilities to work with and manipulate that reflection
information and use it. So I think that's a really good example of something that
you can't do on your own. You can't make reflection happen solely on your own as a
third-party library. It requires compiler support. Yeah, those are great examples.
Yeah.
I thought you were going to ask me for examples of things
that don't fall into those categories.
And I was going to say, I don't think I can do that.
Because, of course, there are differing beliefs on the committee.
And at the end of the day,
we had a meeting this week of the C++ Library Evolution Committee where we were talking about design guidelines.
And one of the first things that we said is, you know, we're not going to have design rules.
It's sort of that Pirates of the Caribbean meme of they're not rules, they're more like guidelines.
The code is more what you call guidelines than actual rules.
There are really no hard and fast rules in the C++ standard library design. Just because I,
the current chair, and a bunch of other people have a certain belief about what belongs in the
standard library, A, doesn't mean that everybody agrees with us, and B, it doesn't mean that there aren't exceptions to that, to those rules, you know.
If there is something that the committee agrees is important that doesn't really fall into my
criterion, I'm not going to stand on the principle.
At the end of the day, we have to be flexible and we don't want to let a set of criterion
or a set of rules overly constrain us.
But I do fundamentally believe that a smaller standard library
is better for C++ than a bigger one
and that what we really need is to make it easier to use third-party libraries
in C++ and that that will alleviate a lot of the desire to have a wide, large standard library in
C++. Yeah, I think for our listeners, regardless of whether they agree or disagree with your philosophy and the three things that define a clear need, I think it's useful just to keep in mind if you are trying to propose something and get it standardized in C++, keeping in mind that list of three things and potentially molding your proposal so that it more clearly satisfies one or more of those three things is not
going to hurt. Right. So yeah. Yep. I certainly agree with that. And with that, yeah, we should
probably call it. Yep. Sounds good. Thanks for listening and have a great day.