Algorithms + Data Structures = Programs - Episode 11: What belongs in the standard library?

Episode Date: February 5, 2021

In this episode, Bryce and Conor talk about standard libaries, open source libraries and more.Date Recorded: 2021-01-24Date Released: 2021-02-05C++ Standard LibraryPython Standard LibaryPython Built-i...n FunctionsPython argparsePython itertoolsPython more-itertoolsAlex Stepanov PapersC++TO April 2019: Jon Kalb “C++ Today” (Include History of STL)C++ Boost LibrariesC++ Boost FilesystemCUDA Thrust Parallel AlgorithmsC++ Microsoft STL on GithubC++ Ranges-v3P0443 C++ Executors ProposalIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 I think I look fantastic. I look like Legolas. Here we go. That's not how you pronounce that. Legolas? Legolas? Whatever. Oh no. Oh no. Oh no. That is not good. What were you thinking? And it's like shaved on the side and oh no. Oh no, Connor. Do they really let you have hair like that and be an actuary welcome to adsp the podcast episode 11 recorded on jan 24th 2021. My name is Connor and today with my co-host Bryce we talk about what belongs in a standard library and more.
Starting point is 00:00:57 So uh I got an idea. You got an idea? Okay. Yeah I think we should talk about what belongs in the standard library. Okay. Is this C++ specific or is this uh i think it is okay because well i i guess it's kind of a comparison of language standard libraries is really a part of this because um i think that the the the the there are a lot of people trying to put things in the C++ standard library. And I think a lot of the things that people wish to see in the C++ standard library that aren't suitable for the C++ standard library, folks want to see those things because they expect them to be there in other languages' standard libraries. And they perhaps don't understand the difference between the C++ standard library and the standard library of other languages. So just because two languages both say that they have a standard library or foundational libraries doesn't mean that
Starting point is 00:02:00 they have the same, that doesn't mean that they're the same thing, or that they have the same requirements, or that they have the same challenges. And I think first we have to start by defining what a standard library is for C++ or for any language. And I think that standard libraries have one key property, and it's the one key property that everybody's interested in. And that property is the standard libraries are the set of libraries
Starting point is 00:02:29 that are available when you install the language without adding any extra packages. So when you just install the base Python packages, which libraries do you get without adding other Python libraries? When you install a C++ compiler, what libraries do you have without adding other libraries? When you install Java, what class libraries do you have without going and adding other ones? And the reason why this property is key is because things that are in that set of standard libraries that are automatically installed with the language or compiler, they get implicit distribution. awesome package and you're like, great, I want to use it, you're more likely to use that package if you don't have to install it, if you just happen to have it installed
Starting point is 00:03:30 on your system. And that is the case for the standard library. Whereas if the package is something that you have to install, even if it's in a language where it's very easy to install a third-party package, something like Python or Rust, that is still an extra hurdle. And you might be in a space where you're like, oh, I don't want to add extra dependencies. There's still this extra step that you have to do, and it makes it less likely that people are going to use that
Starting point is 00:04:05 library. And so that's what I think really makes a standard library is that it is a component that comes with the language by default. Does that make sense? Yes, that made sense. And to summarize, so the key property of a standard library is just that it comes with the language that's the key property yeah um exactly and and so you know that explains why there is a desire to have rich standard libraries because um you know if you think that there's something that's really important for your programmers to have access to um or that's going to make it, you know, it's going to make your language better. And like, why wouldn't you want it to just be there by default? Like, it's the easiest option if all you care about is getting your library into the hands of programmers. The easiest option by far is just to have everybody implicitly install it. It's the easiest route to get
Starting point is 00:05:06 adoption. But, you know, there is a cost. So, you know, the first cost is, of course, that the larger that core set of libraries is, the larger your actual, you know, core package is. One of the key principles of C++ is you don't pay for what you don't use. And so everything that you add to that standard library makes the size of that core package, larger and larger. But I think more importantly, perhaps, there is a maintenance cost. I think most languages try to have a set of standard libraries that have a coherent design to them, that it's not a collection of libraries, but it is one coherent library where all of the APIs sort of fall the same style. I will note what I think is an important exception to this, which is the Python standard libraries. If you look at the set of like core Python libraries, there is actually a fairly notable divergence in the
Starting point is 00:06:29 API style and even the naming conventions among some of those library components. And I believe that's because the way that it works in Python is that a number of those libraries started off as third-party libraries and then they were adopted into sort of the core set. Do you know if that's accurate? I cannot confirm or deny that. And I actually don't even know. Off the top of my head, I know that there's a built-in functions page. But what are the Python – are they called core or standard? I don't know what they're called but
Starting point is 00:07:06 i'm thinking like the um uh the python standard library uh yeah yeah like something the thing the one that comes to mind to me is um like the python libraries for parsing arguments um because i think i've in in my history of writing bad Python code, I think there've, I think I, I think there are two different argument parsing libraries that have been around historically. I know one of them is like discouraged from being used these days, and that's probably the one that I'm used to using. And then I think the newer one's called arg parse.
Starting point is 00:07:44 Um, but yeah, like that, that, that's an example, like, like you've never had to install the I'm used to using. And then I think the newer one's called argparse. But yeah, that's an example. You've never had to install the package for argparse. That just comes with your Python installation. But I believe that that was something that started off as its own library, and then it got added into that standard set. Somebody wrote some third-party library, and maybe they put it on GitHub, and then it eventually, the Python community decided to adopt it as part of the standard library. Yeah, my guess is just on, like, I'm on the Python docs website, and there is a massive number of libraries that are supported here. So I think, you know, if we have Python listeners, they can tweet at us and correct
Starting point is 00:08:25 us. But I'm pretty sure some of these is it's exactly what you said, it was a standalone library written by potentially, you know, core core developers, but potentially not. And then it just became so useful, that over time, they were folded into the standard. Yeah, specifically, like I'm looking at iter tools, which is probably like one of the most common libraries that I use. There's another version, more iter tools, which is basically just like an extension of that library. But that one's not in the standard.
Starting point is 00:08:54 You have to go to PyPI or whatever package manager and pip install it. Yeah. And so one of the nice things about the Python standard library is it's very rich. You know, there's a lot of different capabilities in there that you just get by default. But I think one of the nice things about the Python standard library is it's very rich. You know, there's a lot of different capabilities in there that you just get by default. But I think one of the places where it, one of the weaknesses is that it does not have that same interface consistency
Starting point is 00:09:13 as other languages, standard libraries, because the, in Python, really that it's a collection of libraries. It's not one coherent library. It's a collection of different libraries. And some of those different libraries may have slightly different ways of doing things
Starting point is 00:09:30 or slightly different styles to them. And, you know, if you look at, you know, a language like C++, which is very different from Python in a number of ways, but arguably is most different in the design of the library and in terms of the package management story. Even in C++, pretty much everything that's in the standard library started off as a third-party library. The original basis for the C++ standard library that was adopted in 98 was Alex Stepanov's STL library,
Starting point is 00:10:10 which was this third-party library. Now, the standards committee, they knew they needed a standard library before Alex Stepanov came around. And there are components of the standard library that weren't taken from the STL. I believe that locales and IO streams were not part of the STL. That was a separate component that did not necessarily come from any pre-existing third-party library. But things like the containers and the algorithms, etc., those all came from the STL. So the committee, back when they were putting together 98, they said they recognized we need a library. And they started designing some of the components to it. But then at some point, I don't remember who, but somebody became aware of Alex's STL library.
Starting point is 00:11:06 And he was like, you know what? This is what the standard library should be. And he brought it to the committee. And he said, hey, look, here's this. It's novel. It's groundbreaking. And I think that we should ship it. And the committee really took a big risk on shipping the STL.
Starting point is 00:11:25 And if you ever want to learn a little bit more about the history of this, John Kalb has given some really great talks about the history of C++ in general, but in particular, talking about this moment in C++'s history. I'm sure we'll add some links to them in the show notes. But this is really a monumentous decision for C++ to take this risk on this sort of unproven design. And then if you look at the next big revision of the C++ standard library, C++11, back during that time, people joked that that was essentially just standardizing Boost. Boost was this, you know, this popular collection of third-party
Starting point is 00:12:12 C++ libraries, and C++11 adopted many of those libraries, like the Boost Atomics library, the Boost concurrent, the Boost thread library. There's probably a number of other ones that I'm not thinking of, but being primarily somebody who does concurrent programming, those are the two that come to mind. And then in C++ 17, our next big library release, we standardized parallel algorithms based on a library I work on, Thrust, and a file system library based on Boost file system. And ranges that came in C++20 started off as a very popular library on GitHub. So C++ has followed the same model of sort of adopting designs from existing third-party libraries. One of the things that the mantra I like to tell people about what belongs in the standard library is we standardize
Starting point is 00:13:16 existing practice when there is a clear need. And by standardized existing practice, what I mean is we don't try to invent things ourselves in the committee when possible. Whenever we have the opportunity to look at what's in the wild and to adopt that, we try to do that. And one of the ways that we know that something is maybe ripe for standardization is if we see three or four different versions of basically the same thing in different frameworks, then maybe that means that it's time to standardize it. So one of the best ways to get your thing standardized in C++ to get your thing added to the standard library is if you can make a list in your paper where you can say, here's links to five open source libraries that have a construct called, you know, whatever, that's the same thing as this construct in this paper. And if I can go and look at each one of those things and I can look through and say, hey, yeah, all of these things are the same thing. then that that indicates that yeah this is a design that that is used in the wild by a lot of people and it's even better if you can do this in a way that that demonstrates that there's a
Starting point is 00:14:39 need for interoperability like if of all five of those libraries expose this type, but they're not all interoperable and a user might want to pass one of these kinds of things into each one of those libraries, well, then that makes a case for, hey, yeah, maybe this should be a part of the standard library, part of the vocabulary. And yeah, so that is often sort of key to how we standardize in C++, this idea of we want to look at existing practice and we want to adopt existing practice, not necessarily invent on our own. But this gets us to the big difference between the C++ standard library and a lot of other languages, which is the C++ standard library is a specification. It's not an implementation. And so even if we're adopting something that is based upon some existing third-party library, we're not just taking that library and shipping it with C++ compilers.
Starting point is 00:15:41 There are three major standard library implementations. And when we standardize something, all three of those implementations have to go and implement it. And we have in recent years started to have some shared implementation between different libraries. And this has been accelerated by the fact that the Microsoft standard library, which was the one of the three big ones that was not open source, was the Microsoft one. It is now open source. So now all of the libraries are open source. The GCC ones under a GPL license. The other two are under very similar sort of license, are under a more liberal type of license. The LLVM ones under the LLVM license, of course.
Starting point is 00:16:30 And so this makes it easier to share code between them. And so it looks like there's going to be some shared implementation of, for example, the parallel algorithms between some of these standard libraries. And my team, which has done some implementation work on some concurrency primitives, has made contributions to multiple standard libraries. So we're starting to see there be some shared implementation between them, but there's still different code bases. And so just off the bat, if you want to add something to the standard library in C++, it is 3x the implementation work and 3x the maintenance work of shipping it as a third-party library. Because you need to land it in each one of these major standard library code bases.
Starting point is 00:17:27 And another one of the challenges is, you know, the standard library, it really, it's deployed at a scale which is uncommon for other software. It runs on essentially every device on the planet. There's about 5 million C++ developer users, which is a very large developer user base. And it has to run on pretty much all types of hardware and for all various different types of applications. So it has to worry about corner cases that other libraries may not need to. Because if I'm providing a third-party library to do something, I can choose to say, I'm going to focus on these sets of users. And I recognize the validity of this other use case that you might have, but it's not a priority for me right now. Like for example, ranges v3 selected a very specific set of compilers that it was going to
Starting point is 00:18:33 focus on in a very specific set of platforms. Whereas C++ ranges in the standard library, they don't have that luxury because they have to be deployed everywhere that C++ compilers are deployed. And we don't have the choice of just saying this library or this feature is not going to be available for all C++ users. So it's about the blessing and a curse to have this implicit availability everywhere. That, yes, it means that everybody can just install your, like, everybody who installs the language just has the library, but it also means that the library has to work for everybody that's going to install the language. That's a really high bar to clear. And the other real problem is that the people who are standard library maintainers aren't domain experts. They're not necessarily
Starting point is 00:19:25 the right person to implement, you know, some particular type of library. They're not the right, maybe they don't have the right expertise to implement, you know, an XML parser. And the people who are domain experts, who are experts in implementing XML parsers, they don't have the experience with standard library development. They don't know what it's like to have to ship in a C++ standard library at that scale. So it's difficult to do the actual work, which again has to be done in three different places. And so I'd argue that putting things in the standard library is actually very inefficient. You end up having to support a lot of users who may not actually need your thing and a lot of environments and platforms where your library may not actually really need to be supported
Starting point is 00:20:24 because you might not have users there, but you still have to support them if you're in the standard library. And it's very inefficient to implement because you've got to implement it in all of these different standard library implementations in the way that a standard library expects. And of course, that's just leaving out the whole question of, you know, how does it actually get into the standard. You have to write an actual normative specification very good at stability, but it's not so good at
Starting point is 00:21:06 fixing its mistakes. It's very hard for us to make ABI breaking changes and API breaking changes in the standard library. And that's not the right fit for all third-party libraries. Some third-party libraries might want to evolve faster maybe because the domain that they're in is a domain where there's active research and where every few years a new algorithm comes out or there's some innovation that requires interface changes. So that's my rant on the nature of standard libraries. So API stability, ABI stability, three times the implementation work, three times the maintenance work, platform support. Are you trying to build an argument that nothing should go, or not nothing, but a very select number of things should be going in the C++ standard library for all of these
Starting point is 00:22:04 reasons? Like there, you started, I guess, with sort of the one key property is that, you know, But a very select number of things should be going in the C++ standard library for all of these reasons. Like you started, I guess, with sort of the one key property is that, you know, it's what comes with the language and people have easy access to it and it'll get utilized more. But then there's obviously like a huge price tag and all these things. So I guess is there like what are your thoughts on how much stuff should, like, should we be trying to, should we be trying to decelerate, like, the, the rate that we put things in, like, because, like, while you were going through 98 and 11 and 17 and 20, you only mentioned, like, a fraction of the things that you, like, you know, variant optional, the list goes on and on and on of, like, what's being added to the standard library, and, you know, I don't want to pick
Starting point is 00:22:44 out any of the papers that are currently in the proposal process. But there are a number of, you know, non trivial libraries that are quite large that are being proposed. And, you know, I'm sure not all of them will get in, but I'm sure some of them will. Yeah, what are your thoughts on, you know, should we be trying to find a small set of things and propose those? Or should we be trying to put all our energy into like a package manager? So, I mean, I definitely am an advocate for a small C++ standard library. And I do believe that a lot of the desire foriciency of C++ package management. I've said for a few years that I think the single most important thing that we can do for C++ over the next 10 years is make it
Starting point is 00:23:37 orders of magnitude easier to use third-party libraries in C++ projects. And that means package management, that means build systems, and that doesn't necessarily mean doing either of those things within the context of the standard. The standard committee really only owns the text of C++ programs. And dealing with things like package management or build systems is a little bit outside of our mandate and would also be tricky for us to do. But I think that if C++ had as rich a package management system as, say, Rust or Python, I don't think there would be as much demand
Starting point is 00:24:23 to put things in the standard library. The standard library though is not a substitute for package management. I think that's the key thing is that it is not a sufficient justification to put things in the C++ standard library simply so that they're widely available to people because of the high costs to implement, because it's very inefficient to put things in the standard library, because once we put things into the standard library, it's very hard to fix them later. All of those things, I think,
Starting point is 00:24:54 push us towards wanting a smaller standard library because that's lower risk. Putting things into the standard library is higher risk. We have to be very certain that we've got them right. As for what I think belongs in the standard library, I think things that go into the standard library need to meet one of these three criterion. This is what I refer to as the clear need in my mantra of we standardize existing practice when there is a clear need. Yeah, I was going to ask about that because, yeah, clear need, depending on who you're talking to, if it's a scientist versus a hardware engineer versus something else,
Starting point is 00:25:38 like their clear needs are going to be completely different. Right, right. And when I say clear need, I mean one of these three things. A, wide use in interfaces. So this is something like std tuple or std vector, where the reason to have it in the standard library is that if you don't have it in the standard library, then a bunch of different third-party libraries will provide their own version. And these things tend to appear in interfaces. So if standard library, you know, let's use strings as an example, because that is one of the ones that's often a pain point in the field. Let's say that library A has its own string type and library B has its own string type and library C has its own string type. And none of those string types are convertible or interoperable between them. And you, the programmer, wants to use all three of those libraries. So now you've got to traffic in three different string types in your program
Starting point is 00:26:49 and you may have to write code to convert between them. And all those libraries, they've probably got interfaces that take strings. So it's not just an implementation detail of the library. It's something that's surfaced in the interface that's exposed to you, the user. So in that case, for something like that, there is a strong motivation for us to put it in the standard library so that it's part of the standard vocabulary. We often call these things vocabulary types. So then we can have one string type, std string, and all three of those libraries can accept std strings as part of their interface. So that's bullet number one, which is when it's something that's
Starting point is 00:27:35 widely used in interfaces and we want to have one standard version of it. The second thing, the second criterion, is things that encapsulate non-portability. So things like std file system, the concurrency library, std atomics, IO streams, things that are going to be implemented in a different way for each different platform, in each different operating system, each different embedded environment, etc., where you, the programmer, probably don't want to be writing those yourself. You probably want the person who's providing your platform, who's the vendor of your platform, who's the expert on it,
Starting point is 00:28:22 to provide a suitable version for their platform. Those sorts of things belong in the standard library. And then the last thing is facilities that are not really truly purely library facilities, but that have a language component or that require language support. For example, things like type traits or all of the coroutines library bits. Those belong in the standard library because, well, they're part of the core language or they have ties to the core language and they have to be implemented through some internal contract between the compiler and the standard library. It's not something that you could implement on your own. And that's it. Those are my three criterion. It's things that are widely used
Starting point is 00:29:16 in interfaces, things that encapsulate non-portability, or things that require language support. I believe that only things that check off at least one of those bullet points belong in the standard library. So I guess we can we can wrap this episode up or wind it down by maybe or maybe you don't want to pick pick on but are there proposals in the pipeline that you think you know that are slotted for C++ 23, 26, 29, that clearly are examples that map to one of those three? That are clearly examples that map to one of those three. Yeah. Okay. So I'll give you a few. So first let's talk about the encapsulating non-portability. And we should say, just because we're mentioning these proposals doesn't mean they're guaranteed to get in, but continue. Right. So one of the big focuses for C++23 has been networking support. Networking, networking library is something that would be very similar to something like file system in that it's an encapsulation of non-portability. Windows and Unix platforms have very different network APIs. And so it would
Starting point is 00:30:33 be very useful to users for us to create an abstraction that they could use on both platforms portably to write the same code and have it be, you know, have the implementation be provided by the experts at Microsoft who know how the Microsoft networking APIs work and the experts who work on GCC who know how Unix sockets work. Another example of encapsulating, this is one that's really an example of both encapsulating non-portability and wide use in interfaces would be executors. So executors are intended to be an abstraction around execution resources, things like thread pools, etc. So they sort of fit into the category of encapsulating non-portability. One of the things that's in the executor proposal is this static thread pool type, which is a concrete executor
Starting point is 00:31:33 type. On different platforms, that static thread pool is going to be implemented in different ways. And it might be even more customized than you think. Like you might think, well, there's just going to be one implementation on the different Unix platforms and then one on the different Windows platforms, but that might not be the case. Think about the difference between an embedded Linux platform, what you might want a thread pool to look like there, versus a high-performance server that's running in an HPC cluster.
Starting point is 00:32:06 But executors are also vocabulary types because we think that they'll sort of be like the iterators of concurrent algorithms where there'll be something that's going to be passed in to a bunch of different interfaces. So it's very useful for us to define the core notion of what this abstraction is in the standard library so that people can go and write generic algorithms that use that abstraction in their own code. And then finally, let's use the reflection library as an example of something that will require language support. So there's a lot of work going on to add static reflection to the C++ language. And obviously that has a very large compiler component.
Starting point is 00:32:56 But on top of the compiler mechanisms that actually expose the data, you need a way to actually access it in your program. And that's going to be through a library style API. And also, you don't just need the library API for getting that reflection metadata. You also need the compile time programming facilities to work with and manipulate that reflection information and use it. So I think that's a really good example of something that you can't do on your own. You can't make reflection happen solely on your own as a third-party library. It requires compiler support. Yeah, those are great examples. Yeah. I thought you were going to ask me for examples of things
Starting point is 00:33:48 that don't fall into those categories. And I was going to say, I don't think I can do that. Because, of course, there are differing beliefs on the committee. And at the end of the day, we had a meeting this week of the C++ Library Evolution Committee where we were talking about design guidelines. And one of the first things that we said is, you know, we're not going to have design rules. It's sort of that Pirates of the Caribbean meme of they're not rules, they're more like guidelines. The code is more what you call guidelines than actual rules.
Starting point is 00:34:26 There are really no hard and fast rules in the C++ standard library design. Just because I, the current chair, and a bunch of other people have a certain belief about what belongs in the standard library, A, doesn't mean that everybody agrees with us, and B, it doesn't mean that there aren't exceptions to that, to those rules, you know. If there is something that the committee agrees is important that doesn't really fall into my criterion, I'm not going to stand on the principle. At the end of the day, we have to be flexible and we don't want to let a set of criterion or a set of rules overly constrain us. But I do fundamentally believe that a smaller standard library
Starting point is 00:35:18 is better for C++ than a bigger one and that what we really need is to make it easier to use third-party libraries in C++ and that that will alleviate a lot of the desire to have a wide, large standard library in C++. Yeah, I think for our listeners, regardless of whether they agree or disagree with your philosophy and the three things that define a clear need, I think it's useful just to keep in mind if you are trying to propose something and get it standardized in C++, keeping in mind that list of three things and potentially molding your proposal so that it more clearly satisfies one or more of those three things is not going to hurt. Right. So yeah. Yep. I certainly agree with that. And with that, yeah, we should probably call it. Yep. Sounds good. Thanks for listening and have a great day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.