CppCast - Microsoft's STL

Episode Date: February 7, 2017

Rob and Jason are joined by Stephan T Lavavej to talk about Microsoft's STL and some of the changes to the Library coming in the VS 2017 release. Stephan T. Lavavej is a Principal Software Eng...ineer at Microsoft, maintaining Visual C++'s implementation of the C++ Standard Library since 2007. He also designed a couple of C++14 features: make_unique and the transparent operator functors. He likes his initials (which people can actually spell) and cats (although he doesn't own any). News CppChat "The Great Functor Debate" is Saturday Implementing State Machines with std::variant STL learning resource Stephan T. Lavavej @StephanTLavavej Links STL Fixes in VS 2017 RTM C++ 14/17 Features and STL Fixes in VS "15" Preview 5 C++ 14/17 Features and STL Fixes in VS “15” Preview 4 Sponsor Backtrace JetBrains

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of CppCast is sponsored by Backtrace, the turnkey debugging platform that helps you spend less time debugging and more time building. Get to the root cause quickly with detailed information at your fingertips. Start your free trial at backtrace.io.cppcast. And by JetBrains, maker of intelligent development tools to simplify your challenging tasks and automate the routine ones. JetBrains is offering a 25% discount for an individual license on the C++ tool of your choice, CLion, ReSharper, C++, or AppCode. Use the coupon code JetBrains for CppCast during checkout at JetBrains.com. Episode 88 of CppCast with guest Stephan T. Lava Wade recorded February 7th, 2017.
Starting point is 00:01:01 In this episode, we discuss the upcoming Great Funk Door debate on CPP Chat. Then we talk to STL about Microsoft's STL. Stefan talks to us about the changes coming to the STL in VS 2017 and more. Welcome to episode 88 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? Great, Rob. This is the episode I've been waiting for. We can finally go back in time. Okay.
Starting point is 00:01:50 88 miles per hour. There we go. Yeah, that's the one we're looking forward to, not episode 100. I don't know. I was excited about it. Well, at the top of every episode, I'd like to read a piece of feedback. This week, you got a nice tweet from Felix, who just listened to the Beast episode. He said, CppCast, so glad I found this podcast. I am a C++ student, and the interview with Vinny gave me an idea for trying my hand at a WebSocket app. So I thought this was great.
Starting point is 00:02:28 It's always nice to see C++ students getting into some little projects on their own. I'm sure Vinny's very happy to see some new users for his library. Yeah, I hope we can encourage students, and particularly given the number of younger guests that we've had on the show too, I think that should be encouraging also. Yeah, absolutely. And one thing I wanted to mention is we are going to start working with JetBrains again to give away some product licenses to the listeners who write in to us for feedback. So I'll be reaching out to Felix to give him a license to one of the JetBrains products,
Starting point is 00:02:58 either Sea Lion or ReSharper. So that's a nice thing they're doing for us there. Yeah. Yeah. So we'd love to hear your thoughts about the show as well you can always reach out to us on facebook twitter or email us at feedback at cpcast.com and don't forget to leave us reviews on itunes joining us today is stephan t lala wade he's a principal software engineer at microsoft maintaining visual c++ implementation of the c++
Starting point is 00:03:22 standard library since 2007 he's also designed a couple of C++14 features, MakeUnique, and the TransparentOperatorFunctors. He likes his initials, which people can actually spell, and cats, although he doesn't currently own any. STL, welcome to the show. Hi, thanks for having me. So did you actually change your middle
Starting point is 00:03:40 name so that you could get STL for your initials? That's what I'm thinking happened here. What I did is I actually went back in time and chose my father's name to start with a T. And then I got my mother to give me a variant of his name that also starts with T. So time travel was involved. Okay, that makes perfect sense. So there's time travel in the future then. That's promising.
Starting point is 00:04:03 I am actually honestly curious how you got your start in Zbus+. It was actually 15 years ago. Although I started with C, which was a mistake, but I didn't realize it at the time when I graduated high school. I was like, oh, I want to implement SHA1 hashing. Because that was the best hash algorithm at the time, you know, hilarious now. And at the time, I only knew like basic. So I wanted to, you know, step up to a more powerful language. So I started learning C and eventually, you know, hilarious now. And at the time, I only knew, like, basic, so I wanted to, you know, step up to a more powerful language. So I started learning C, and I eventually, you know, figured it out, self-taught, basically. But then when I tried to write a program using
Starting point is 00:04:32 dynamic memory allocation, the thing always crashed. I just did not understand how to work with MAUC. And it was about 2002 that I finally said, you know, I was avoiding C++ because I heard it was complicated, and I wanted C's simplicity, said, you know, I was avoiding C++ because I heard it was complicated and I wanted C's simplicity, but maybe there is something to it. Let me try this thing.
Starting point is 00:04:51 And when I saw basically the power of encapsulation with classes, and then eventually when I started using the STL, although originally I did try to write my own STL. I think everybody sort of has to do that before they see the light. I just really never looked back. I was like, okay, the language is admittedly more complicated, but it lets me write simpler code. And it did cost me like a year and a half struggling with C to learn that. And I would not recommend today that somebody start with C. But for me, it was a formative experience. And pretty shortly after that, I got hired by Microsoft. I started working actually in Outlook. I worked on Outlook search for a couple years. When stuff lights up yellow, that's me.
Starting point is 00:05:32 And in 2007, I switched over to Visual SQL Plus. And since then, I've been maintaining our STL implementation. Wow. So recently celebrated my 10th anniversary on the team. I like to say that I have outlasted everybody else on the libraries team and everyone in my management chain up to and including the CEO. Because Palmer's out and Satya Nadella is now CEO. Congratulations on that.
Starting point is 00:05:56 Although I'm far from the most senior member of the Visual CS Plus team because we've got compiler devs that have been there 20, 25 years. It's just in the libraries team. 10 is a lot. Is there anyone that's been on the compilers team that long that we would know, I guess? Jonathan Caves? Have you talked to him?
Starting point is 00:06:18 Okay, he's been there. Well, we haven't interviewed him, but I do know who he is, yes. I think he joined Visual CS Plus in 1996 or something. There's a compiler back in devs who have been there for over 25 years. It's really amazing how senior some of those devs
Starting point is 00:06:34 are. Wow. In our team, it's like, oh, you've been here five years? Congratulations. You're still the new dev. Wow. Okay. Well, we have a couple news articles to go through feel free to comment on any of these and then we'll start talking to you about all the work you're doing at microsoft on the stl okay cool okay uh this first one uh we talked a little bit about uh jackie k's uh functor debate article that she posted last week in our news article and And Jason and I kind of admitted that we weren't the best ones
Starting point is 00:07:08 to talk about it because we didn't fully understand functors. So CPP Chat is coming back with John Kolb and he's going to have Jackie Kay and also Ben Dean and Jonathan Mueller on the show this weekend and they're going to hash it out. They're going to talk about functors. So that should be worth listening to. on the show this weekend, and they're going to hash it out. They're going to talk about functures. So that should be worth listening to. Yeah, and Ben is big on functional programming and these kind of technical definitions, and so is Jackie. It should be interesting.
Starting point is 00:07:36 Is this like a one-sided argument, or is there someone defending the view that she was against? I have absolutely no idea. I'm curious about that. So I have a sort of. I'm curious about that. So I have a sort of contrary view because I did see this when it was posted on Reddit because I moderate the CBP subreddit
Starting point is 00:07:51 and I'm a co-moderator, junior moderator. And I saw this article and it was like, stop calling function objects functors. And I've done that in the past. So I was like, oh, you know, what is this article proposing?
Starting point is 00:08:03 And I have sort of an interesting history with functional programming, because when I went to college, one of the courses in introductory CS was based on scheme, which is a dialect of Lisp with lots of parentheses, very functional. And I went through that as a freshman. I actually TA'd it for three years as a sophomore, junior, senior. So I've seen a lot of scheme, and I haven't used it since then, but it sort of left an indelible impression on me. And sort of my philosophy is that functional programming, like actual functional languages, they are interesting from a theoretical point of view, and they do influence how I write code. But I ultimately rejected them. I view, especially Scheme, which is the one I'm familiar with, as overly complicated,
Starting point is 00:08:48 and it involves ideas that are very difficult for what I view as even skilled programmers to grasp, like environment diagrams and closures, things like that. It does simplify things by saying, oh, persistent mutable state is bad, but then it sneaks it in through the back door anyways with set bang. And C++ sort of offers a better blend because in C++, if you have a function object or a lambda with modifiable state, it's just a data member. Everybody knows in C++ how data members work. They're not mysterious or anything. They're not grabbing parts of the outer environment. It's you have a data member. It's going to capture something by value, or it's going to be a reference that you then bind to something else. And you know how references work, and you know
Starting point is 00:09:36 how data members work. So congratulations, you understand lambdas. You just need to figure out the syntax. Scheme, and I watched year after year, freshmen try to understand it, was so much harder if you wanted to essentially create a lambda with modifiable state, as you would in C++. So while it has influenced how I write and think about code, if I see modifiable state, I'll think, oh, you know, do we really need that? I view that as a cost, just like complicated control flow is a cost. But I don't view functional programming as some sort of ideal endpoint that C++ should aspire to be. I think it offers lessons for C++, but I don't automatically say, oh, functional programming is good.
Starting point is 00:10:19 So when I see a long post that talks about mathematical concepts like monads and category theory and, oh, functor means this thing. I'm like, that's nice, especially for mathematicians. I don't view it as having any influence on my day-to-day work as a programmer. And if people want to use functor as an abbreviation of function object, just as I would say comparator as an abbreviation for comparison function object, why not let them? C++ is just going to have terms like vector that are very different from the mathematical meaning of terms. If you'd like all these mathematical things to influence your view of metaprogramming like Boost HANA, that's very well,
Starting point is 00:10:57 but I don't think that it should try to influence the way that we talk about the language, especially when the existing practice is you're never going to get people to stop calling function objects functors. It's not going to happen. We have more of a chance getting them to stop calling containers collections, which drives me more nuts for completely unrelated reasons. Well, you know, though, as we are kind of adopting
Starting point is 00:11:19 more and more functional programming techniques into C++, I do think it raises a valid question of whether or not we should at least be aware of what the proper terms are so that we don't have too much overlap in our vocabulary or mismatch in our vocabulary. Yeah, it's hard, though, because sometimes there's just not enough words
Starting point is 00:11:35 in the English language, and you just have to start grabbing some. Right. Well, yeah, I mean, after our last episode, I was looking at my open-source project and realized that I actually have Functor embedded in the vocabulary of my project. Like, crap, do I need to change this or not? Now, perhaps there is some hope, because as re-implementing Functional taught me, which took me, I think, over a month,
Starting point is 00:12:00 in C++ 17, and going actually all the way back to TR one, but in CS plus 17, it's really a first classes. And there's the notion of a callable object, which is basically things that we wish were callable like functions, uh, that includes, uh, actual functions, function pointers, lambdas, which are a special case of function objects, um, things that have conversions to function pointers, which nobody remembers, but your implementer must remember. And pointers to member functions and pointers to member data, which are totally not callable with parenthesis syntax. But the standard library, going back to TR1, has looked at those things and said, you know, wouldn't it be nice if you could call these things? And all of these objects are treated uniformly by what I think of as the invoke protocol.
Starting point is 00:12:46 So in C++17, the non-member function invoke takes an arbitrary callable object, which is any one of those things, and arguments, and then feeds them in a well-defined manner according to the type of the callable object. Lots of things use the invoke protocol, and it may be used even more broadly in the future, potentially in a new ranges STL, it could say, oh, you know, instead of calling things with parentheses, like current STL algorithms do, I'm just going to feed them all to invoke. So if you want to pass a pointer to member function, and call a member function on every element in a range, you can do that now, and you don't need like a special lambda or anything. Could happen, especially if compilers get like optimized special cases for invoke, because right now invoke is relatively heavy on the metaprogramming. So there is a somewhat compiler throughput concern to shoving everything
Starting point is 00:13:36 through it. But in terms of convenience, it is kind of nice, which is why the STL relies on it directly or indirectly in many cases. So perhaps callable object is the term that we should be using, not function object. Like, if you're using function objects, in some sense your worldview is limited to the C++, you know, 03 world of, oh, I need parentheses. But if you use callable object and put everything through invoke,
Starting point is 00:13:58 now you respect pointers to members. I thought I just read that the callable type trait is being renamed to invocable. Did I miss? Yes, I think that is on the table because it actually does answer the question, can I use this thing in invoke? Then there was this, this is what the standardization committee spends ridiculous amounts of time on. Do you spend invocable with a C or with a K? Just unbelievable amounts of time.
Starting point is 00:14:25 And mailing lists have been spent on this. Wow. I think they're going, don't quote me on this, I think they're going with C because they looked it up in a dictionary or something. But it could go the other way. Who knows? So at the risk of really going off into the weeds here,
Starting point is 00:14:40 I have to ask a question. You said things that are convertible to function pointers. And the only thing that I'm aware of is non-capturing lambdas. Are there other things? Oh yeah, you can write just a struct with operator function pointer paren print. It's best if
Starting point is 00:14:55 you actually have a typedef for the function pointer so the syntax doesn't go out of hand. But yeah, you can just do it. Now, the lambda trick is that they can generate that thing automatically. But if you have a function out there, you can return a pointer to it from operator function pointer type. Never even crossed my mind to make a conversion function to function pointer. I know, it's totally evil, right? And yet it's something that the core language lets you do. And in fact, such a type will work with the STL
Starting point is 00:15:26 because anytime it expects a function object, it expects something callable with parentheses. It doesn't actually care and cannot sense how that happens. So you can feed that to like a C++03 STL and it will work if your compiler is sufficiently bug-free. Mostly I know about this because when I overhauled our functional header, I had to make sure that
Starting point is 00:15:47 such things worked with invoke and result of and is callable and so forth. Wow. Okay. I feel super bad for the compiler implementers who have to deal with these ridiculous conversions. Like, why should they have to
Starting point is 00:16:03 imagine, oh, you know, what function pointer are you convertible to? But they actually have to, and it works most of the time. That's why I'm a library dev instead of a compiler dev. I want an easy job. I don't want a hard job. I see the ridiculous stuff that I write, and I make them compile. I don't want to be on the other end of that. Okay, this next article, them compile. I don't want to be on the other end of that. Okay. This next article,
Starting point is 00:16:27 Implementing State Machines with Stood Variant. So this basically implemented some code that was briefly seen in a slide from Ben Dean's CPPCon talk Using Types Effectively, where he does some interesting things with Stood Optional and Stood Variant.
Starting point is 00:16:44 What did you guys think of this article? Go ahead. I read it. I don't know if this was actually linked on CPP, so on the CPP subreddit, so I think I read it for the first time when you sent me the link. And it was actually really cool. Now, admittedly, I have not played around with variant extensively
Starting point is 00:17:01 because one of our other STL maintainers, Casey Carter, implemented it for VS 2017. But I reviewed his code and played around a little bit with it in tests. And it was a really cool article. I view it as a sort of natural way to represent a state machine
Starting point is 00:17:18 that respects the type system. I liked how it used visit instead of just having, you know, like pointers and, you know, integer IDs saying, oh, we're in state X, Y, and Z and unions and, you know, the disgusting sort of cache you would need. I think it is actually a very high level application of variant. And I'm actually excited to try it out in my own code when I see a use for that. I'm still kind of trying to I'm thinking that variant's going to be used in a lot of interesting ways
Starting point is 00:17:47 that we haven't really seen yet. And this is just one more data point in that, I think. Yeah, absolutely. This next one might appeal to you, STL. It's the STL Learning Resource on the Fluent C++ blog.
Starting point is 00:18:04 And I think we talked about one of Jonathan Bakaccarat's articles last week, Jason? I think so, yes. Yeah, so he's putting together just a collection of articles to introduce newer C++ developers to the STL. And he's kind of halfway in the process of writing all these articles. He has a bunch of titles down, but has only about seven articles written so far. But it looks pretty promising. I read through some of the articles, and he's got some nice, clear examples
Starting point is 00:18:33 with both code samples and sometimes visual aids. So I thought this was a pretty good idea. It looks like an ambitious project. Sure. I thought it was a very nice-looking website. I wish I had those sort of web design skills. And that actually makes the content much more appealing. I felt that the content was actually pretty good.
Starting point is 00:19:00 Although, if I were writing it, I probably would have started more from the fundamentals on you haven't used the STL before so here's how you stick data in a vector you need to remember you need to push back you can't just say via 5 and get a magical 5th element oh now you've got all these elements how would you like
Starting point is 00:19:18 to process them well you could iterate with indices or pointers but here's a better way with iterators I felt that that sort of ground work really should be done before introducing somebody to, you know, here's set symmetric difference, the longest named algorithm in the STL. Yeah, one thing I thought that was interesting is he has algorithms catalog and then understanding the STL. It really seems like understanding the STL should be the first group of articles that you're presenting. Yeah, like in my intro STL video series,
Starting point is 00:19:48 I started with, you know, here's how you store stuff and also explain the difference between the STL and other libraries that people use. Because the STL, if you don't know anything about it, and you look at it for the first time, you're very tempted to say, why do you have these, you know, containers, iterators, algorithms? Why are the algorithms not member functions of the containers? Why do you need to say.begin.n? That's stupid. Why can't I just say v.sort? Why is sort this non-member function?
Starting point is 00:20:14 And it involves sort of a shift in worldview to think, oh, this solves the n times m problem, where if you have n containers and m algorithms, if they were member functions, you would need to multiply them through. And every container would need an implementation of every member function. That'd be terrible. And if you simply say, oh, containers, they're just going to return iterators to their elements, which abstract away the notion of traversal and access, so that you can blast through a vector, which is contiguous stuff in RAM, just as easily as you blast through a list, which is linked pointers or linked nodes. Then algorithms, which only care about getting from element A to element B and not how it's done,
Starting point is 00:20:57 can then be implemented in terms of those iterators in a generic fashion. Then you only need to learn n containers plus m algorithms, which is a whole lot simpler to learn and to implement. Of course, you know, it breaks down a little because like map has member functions, because it knows special things about its data structure that linear, you know, sequence-based STL algorithms don't know. So it's not, you know, a perfect separation, but it's a pretty good separation. And I feel that that sort of needs to be communicated before you even learn any STL algorithms. And then once you know that, it's like, oh, in some sense, every STL algorithm is like every other. They take some iterators, they do stuff in some sort of linear fashion, and then they're done. And they follow
Starting point is 00:21:42 certain protocols like the less than protocol, better not be less equal, or the notion of using end iterators as sentinel values to say, I didn't find your thing, or the insert before protocol that many containers or lower bound follow. Once you know all those things, they really do all seem similar to the point where I need to open up the standard and look at the prototype for an algorithm. Then I know basically what it does. I hardly ever need to read the actual semantics of it unless it's something relatively obscure. Then I need a reminder. That reminds me that figuring out the STL was one of my major hurdles to learning C++.
Starting point is 00:22:30 I'd forgotten that. It's actually like, I think by volume of the standard, it's pretty big. Because the standard is currently 30 clauses, a clause in the standard. And if you've never read the standard, it's actually quite useful to at least look at the working paper and get some idea of what's in there. Because admittedly, the core language part, which is the first 16 clauses or chapters in some sense of the standard, it's quite impenetrable. If you're not a compiler dev or library dev reading every day, you can get some things out of it. But answering questions like, why is this overload preferred over this other one?
Starting point is 00:23:09 That is hard. I would not recommend that for most people. But on the library side, which is clauses 17 and above, which is I think half or more than half the number of pages, it's actually simpler because the library standard is it does have global wording that says, whenever we mention stuff in the library, it means this, like the notion of what does effects equivalent to mean. That's global. But the library is less interconnected. I don't need to know anything about bit set in order to go look at the definition of vector. So if I want to know how a vector works, I can just open up clause 23 containers, hit plus in my PDF viewer until I find the section. And then I can see the class definition. I can see definitions of how all the member functions are supposed to work.
Starting point is 00:23:50 And it's a lot nicer than reading the implementation, which I do not usually recommend, and more authoritative than reading websites. But if you want to stick with CPP reference, you can. I just personally directly go to the standard. Okay. But yeah, The library is... People always complain, oh, you can't say STL.
Starting point is 00:24:09 That's an old historical library. And I always respond as an STL maintainer, I have the sovereign right to call what I work on what I want. And it is the STL. Because it's a valid use of metonymy, which is a thing in English. The STL, which I think of as the portion of the C++ standard library
Starting point is 00:24:27 that is derived in principle from Stepanov's original work, which, you know, containers, iterators, algorithms, and stuff like shared putter, maybe stuff like Iostream, not really, and definitely not the locale stuff. It's a significant portion of the standard library and the part that's actively growing. I consider variant optional any to be honorary citizens of the STL now, because they are like pair and tuple, which was added in TR1 and C++11. So it is a big part of learning C++.
Starting point is 00:24:59 Like C++ without the STL, I don't know why anybody would program in that language. I often say C++ without the STL is a terrible language know why anybody would program in that language. I often say C++ without the STL is a terrible language. Why would you do such a thing? But with the STL, it's wonderful. That's the reason. Yeah. I still see comments from time to time about trying to avoid the standard library, which I don't understand. I mean, if you have specific reasons about exceptions or dynamic allocation for specific cases, fine. But, you know, still use the algorithms at the very least. Yeah.
Starting point is 00:25:30 It does seem like every large company still writes their own. Well, Microsoft generally does not. We do have one internal thing that tries to really avoid exceptions, but even that is discouraged in new code. It exists really for legacy code. I would say most or almost all of Microsoft uses the real STL that we are shipping in Visual Studio. Not always the latest version. Sometimes it's older versions, but even Windows is trying to move everybody to the latest copy of the STL that they have. And we really don't have internal libraries that try to imitate the STL. It's like, you have the one, and you're going to like it. That's cool. Now, I did read all those STL
Starting point is 00:26:20 learning resource pages, and I did have quibbles with a couple of the assertions in there like i saw uh and if i were delivering this feedback i can just tell you guys um in the algorithms like back inserter appeared like a lot and it is very nice to be able to write to with on algorithms that take output iterators but there was no mention of ways it is suboptimal. And I always like to tell people, if you're going to put a bunch of elements into a vector, especially if you already have them and you don't need to process them with Transform or something, copy to back inserter is not what you want to say. Instead, you want to use range insertion. You should say
Starting point is 00:27:02 v.insert at v.end and then give the range. And that's more efficient because when the vector sees a range of iterators being inserted, they can ask, hey, are you forward or better iterators? Oh, you are. I can figure out the distance between you. Oh, you have 100 elements. Well, I better reallocate exactly once to have enough capacity and then slam in all the elements. So you pay at most one reallocation. Whereas if you do a copy-to-back inserter, that's essentially doing a repeated pushback or replaceback of all the elements. And so the vector is going to constantly need to ask,
Starting point is 00:27:34 do I have enough capacity? It's going to do n checks for that, and the allocations will be more gradual, but you still could pay more than one reallocation. It's less efficient and for no good reason. If you already have that range of elements, insert at end is self-explanatory. So, you know, algorithms, they're good, but they're not everything. Sometimes you actually do want to use container member functions. And I also had quibbles with its emphasis on std function. I've done entire talks at CppCon where I try to tell people, you know, std function, it's great,
Starting point is 00:28:05 but don't use it gratuitously. Only use it when you actually need its type erasure properties. And merely storing lambdas in a std function so you can mention the return type, that's not really a good use of std function. Even though it does have the small functor optimization and tries to avoid dynamic memory allocation, it does pay non-zero costs. It's one of the few things in the STL that does pay a cost whether you need it or not. So I sort of disagreed with the emphasis on std function. I felt that it was somewhat unwarranted. Well, I guess you are qualified to disagree on that.
Starting point is 00:28:41 Okay, let's start talking about, you just published, I think, two days ago, the list of STL fixes that are going to make it into the VS 2017 RTM. Do you want to start by discussing the vector overhaul? Yeah, so vector is, I think of it as the STL's most important container. It's my favorite. It should be everyone's favorite. It is way better than the other containers like list and forward list that nobody uses except for test code and even like map and set. They're just not as warm and fuzzy as vector is. So it's sort of the most important container. It's received a lot of work over the years with CS plus 11, adding move semantics to everything,
Starting point is 00:29:21 but we'd never really gone through and picked it through with a fine tooth comb. And it had accumulated a lot of debt over the years. Like back in 2010, when we added move semantics to the STL, we just unconditionally moved elements when reallocating. Turns out the standard doesn't actually let you do that. Vector needs to ask a very complicated question when it reallocates in order to preserve the strong exception guarantee that it had somewhat unwisely applied to pushback back in 98 and even more unwisely has extended to various other insertion scenarios in the standard. So users who wanted the strong exception guarantee would not get that in the presence of throwing move constructors. And other bugs had crept in, like we've had for many years, just nonstop problems with
Starting point is 00:30:10 aliasing. And this is when users, and when I speak about users, I mean everybody who uses the STL, as opposed to implementers like me. And as an implementer, I feel that users are constantly trying to, like, stab me in the back. You know, I just want to give them a vector. And then they say, oh, but I want to say v.pushback v bracket zero. I'm going to push back a vector's own element.
Starting point is 00:30:31 Ha ha ha ha ha. That's really problematic because as part of pushback, we sometimes need to, you know, reallocate. And that could invalidate the element that you just gave us. And that aliasing is problematic. We try to defend against it with a check to see, oh, the address of the element you're pushing back, is it within my memory block? That kind of works, but it actually fails in certain scenarios. And then when we got emplace back, it's even worse. Because when you emplace back, you're saying, please construct an element of
Starting point is 00:30:59 type T, the vector's element type, from these arguments A, B, and C. We have no way of knowing whether A, B, and C are actually within the vector's memory block or controlled by elements of the vector's memory block, because they could be totally different types. So in Visual Studio's STL, from the time that we implemented in place back up until and including 2015, if you tried to say v.inplaceback v bracket zero, we would emit essentially silent bad code gen. It would compile, and it would probably crash or misbehave on time. And that made me feel like super bad. And it was, you know, a bug on our reader.
Starting point is 00:31:33 So finally got a chance to really sit down with Vector and say, okay, I'm going to look at every member function, and I'm going to rewrite it until it is perfect. Fixing all of these issues with exception handling guarantees, aliasing, iterator invalidation checks, performance, because we had users filing a dozen bugs about that stuff. So it took me a month, but I was able to rewrite all of this stuff while preserving binary compatibility. Vector's actual representation did not change,
Starting point is 00:32:04 only the implementations of its member functions. So that's how we're able to ship it in 2017 RTM and continue to claim that this is binary compatible all the way back to 2015 RTM, which is the first time we're doing this for quite some time in a major release of the STL. So as part of this I looked at things like our insertion scenarios. In some cases, we would insert by pushing or replacing an element at the end of the vector and then rotating it into place with an STL algorithm. Rotate is cool. And I believe Sean Perrin has talks where he shows like the wonderful powers of rotate.
Starting point is 00:32:40 And it's so amazing. But if you can avoid doing that, it's even better. And if there's one place where programmers should write handwritten code to avoid calling an STL algorithm, it is within the implementation of the STL itself. And within a vector, if you are inserting, in almost all scenarios, you can ensure that there is space for you to just airdrop the elements in place without having to go rotate them. And so I rewrote our insertion member functions to do this, and now we only perform a rotate in the one case where it is actually necessary, which is fortunately quite obscure when you're working with input-only iterators.
Starting point is 00:33:18 So that instantly, or in one fell swoop, resolved all the bugs where users filed saying, oh, I did an insertion or an emplacement, and I saw you paying a whole bunch of copy assignments or move assignments and where are these all coming from? And the answer was, oh, we were calling rotate that was doing a whole bunch of extra work. Now we do with a completely independent implementation, equal numbers of element operations as libc++ as STL does from scratch. Except in one scenario where I actually pay a little more than libc++,
Starting point is 00:33:47 and that's because I'm right and they're wrong. There's a case where they were trying to take a shortcut by doing a RU-aliased-within-my-memory-buffer check, and that actually works almost all the time, except in the case where a T element owns another T element. So in 2017, VS will do the strictly correct thing at the cost of an extra operation sometimes. And I mentioned this to at least C++'s maintainers,
Starting point is 00:34:09 who I work fairly closely with these days, and they said, yeah, it's such a shame that what we're doing is so fast and yet not quite right. So that's a bug that they're aware of, and hopefully it'll get fixed someday. Or maybe not. Maybe they just don't care about that one scenario and they'd rather have the fast code.
Starting point is 00:34:24 That's up to them. I wanted to be strictly correct because I wanted to fix all the bugs in Vector. I'm curious when you say you spent a month rewriting Functor, you spent a month working on Vector, and it's all right if you can't really talk about this, but do you just say, hey, this is screwed up. I need to spend a month working on it. Or how are your priorities set for working on the STL? Yeah, so I have some priorities that come down from management. But I also get to sort of decide what order to sequence things on and what or what should we work on? Should we be working on bugs,
Starting point is 00:34:59 new features, other work like harnessing test suites. I'm not a manager at Microsoft, but I'm basically a tech lead for the STL. So when we had all these vector bug reports coming in, I basically looked at our schedule and I said, you know, nobody is actually screaming at me for CS plus 17 features, got to have them now, now, now. Nobody needs fixes for X, Y, and Z. And I've got all these vector bugs piled up. And we've got this release where I would like to be able to do it in a major release,
Starting point is 00:35:31 because it is a relatively large change. Even though the goal is for it to be binary compatible, I feel more comfortable releasing the vector overhaul in 2017 RTM rather than something like 2015 update three. So I basically said, you know, the timing looks right. And I don't have any other higher priorities. So I'm just going to overhaul vector now. And I just started and I figured out how, you know, I realized how long it would take. I sort of mentally budgeted about a month and it took about that long. And what we do the same elsewhere, like our STL maintainer, Billy O'Neill,
Starting point is 00:36:03 has overhauled string for performance, not quite to the level that I spent on vector, but fixed a bunch of longstanding issues in there. We've got overhauls coming in the future of Atomic. File system is up next. All the multi-threading headers have been rewritten. And we just try to fit those in while making progress on the features, because customers do have a right to expect that we are working on C++ 17 and so forth but in the libraries i feel i feel that we're in a pretty good state for conformance i had published a blog post around 2015 update 2 saying that hey in the libraries not the compiler but in the stl specifically we were actually feature complete with cs plus 17 as of that date we had implemented all the stuff they had voted into the working paper. That is no longer
Starting point is 00:36:46 true because the Dastardly Committee has continued voting in stuff to the working paper, so we're behind at the moment. But for a brief shining moment back in update 2, we were current. So while we still have work to do, we're pretty conformant right now, and I feel that that
Starting point is 00:37:02 gives us some time to go back and do things like increase the quality of vector, increase the performance of string, because those things are important too, even if they don't directly result in nice little green yes boxes on our feature tables. Yeah, and those are very, very highly used parts of the STL. Yeah, exactly. I would rather spend a month working on vector than than a month implementing some C++ 17 feature that almost nobody's going to use. We will get to all of them, but I felt that Vector was important enough to spend time on.
Starting point is 00:37:32 It's just a judgment call. We do listen to customers when they're saying, oh, I really want so-and-so feature. We're like, okay, that does influence our thinking. But we also have to make our own calls. Do you want us to talk about the warning level, which was also labeled as an overhaul
Starting point is 00:37:52 in this changelog document? Yeah, that was a long-standing, again, another one of these long-standing things, where we have a different philosophy than other STLs. For example, in Clang and GCC, they have the notion of a system header. If you're a header found in a system directory, they just suppress all warnings under the notion that, you know, if it comes from the system, those developers know what they're doing.
Starting point is 00:38:15 And while I agree, like, in theory, that implementers should know what they're doing, even if your implementers are perfect, and of course they are, it's not always the case that you want to suppress warnings in system headers. And the example I've given is when you instantiate a templated algorithm from the STL, you're asking that template to perform actions on your behalf, like please compare this element to this value, or please assign this thing to an output iterator. And those actions,
Starting point is 00:38:43 depending on the types that you give us, could be harmful, potentially harmful, and compiler warnings could identify bugs. For example, oh, you're assigning to an output iterator, you're truncating from signed 64-bit to unsigned 8-bit. That's probably bad, and compilers have signed conversion and truncation warnings for that. If you've simply suppressed warnings in system headers, and if that applies to an instantiation of a template,
Starting point is 00:39:15 then you'll never hear about that. So you basically, you lost the ability to get a warning just because you asked the STL to perform an action for you, even though having the STL do it didn't make it any more inherently safe. So Visual Studio does not have the concept of a system header that suppresses all warnings. Instead, we are exposed to user warnings in the SCL's headers, but we still want to avoid warning in our own code. Sometimes we do things that, while correct, are somewhat suspicious, and we need to silence warnings, and we can't always do that with extra parentheses or a static cast. So we used to do that in a very blunt manner by just pushing our warning level to three and then shutting up even more warnings with a push-disable-pop through pragmas. But that was overly aggressive in the sense that
Starting point is 00:39:55 there are many valuable warnings at level four, like non-standard extension warnings, some truncation warnings, or sign conversion warnings. So as some time freed up on our schedule and as I was harnessing Libsys Plus's open source test suite, I looked at our warning situation and especially the fact that like Billy, Casey, as they had been adding new headers like variant and string view to the STL,
Starting point is 00:40:18 they were following a cleaner approach by pushing the warning level to four and being very selective about what they were disabling. And I looked at our existing headers like, hmm, I wonder how hard it would be to make everything follow this highly clean procedure of push to level four and then only disable stuff in a targeted manner. So cleaned up the entire STL to do this, cleaned up all of our test suites, which also had warnings being emitted, went back and fixed up something, because if the user compiles at warning level 3, they don't
Starting point is 00:40:47 necessarily want the STL to be emitting at level 4 warnings. Even if I think it's good for them, they complain that it breaks the code at WX, and fine, whatever. So we got this thing from the compiler. This is the sort of eternal war between users and implementers, that they just want
Starting point is 00:41:04 their code to keep compiling. And I want to tell them that their code is bad and they don't want to hear it. So we sort of reached a compromise where if users compile at warning level three, then the STL will just push to level three as we did before. And they might see a few new warnings because we did remove some overly broad suppressions. But they shouldn't see like blizzards of warnings. But if the user compiles at warning level four, we can sense that through a new macro. And the STL will now respect that by pushing its warning level to four and allowing more warnings to fire in the STL headers. And I believe that when 2017 is released, and if users say, oh, you broke my code because I'm
Starting point is 00:41:41 compiling at warning level four, and now we're emitting all these warnings in the STL's headers because you weren't shutting them up. I'll be like, well, you deserve it because you asked for warning level four, and that's very good. And now we're emitting these warnings that you asked for. So you can't complain that we're now firing them. Go fix your code. And we feel that this is a good sort of compromise between the desire to avoid breaking changes and the desire to identify erroneous code constructs. Because we have all these great warnings, most of them are great, in the compiler, and the STL was just clobbering some of them. And now we're trying to avoid doing that. So did you find any interesting bugs when pushing your own warning level up to four?
Starting point is 00:42:13 We never, we found a couple minor things. We were generally pretty clean. One of the minor things was I did find an intentional truncation in the STL in the random number library. It turns out there's... And this is interesting because it was found by Libsys Plus' test suite. We're using a different STL's test suite against our STL. They had constructed an independent bits engine. I think, yeah, it was that one.
Starting point is 00:42:40 Wrapping an MT19937, which is a 32-bit engine, and yet the independent bits engine they had was asked to generate 64-bit integers. So totally fine thing to do, but if you look at the standardese, what it actually implies is that when you seed this wrapper with a 64-bit integer, it then gives that seed to the inner engine, but that thing only takes a 32-bit seed. So you actually get a standard mandated data loss truncation. The upper 32 bits of your 64-bit seed will be dropped on the floor by the speed. And we were just passing it along. So we emitted a truncation warning, and that was revealed by pushing the warning level to four. So to fix that, because it is mandated by the standard, I inserted a static cast. So now we
Starting point is 00:43:25 static cast down from whatever the outer seed is to the inner seed type, and the compiler will not warn when it sees that. So that was not quite a bug, but more an unexpected find. We found other bugs through warnings, but generally not. So recently on Reddit, I saw an article about how the QT headers were leaking their warning suppressions into their client code. Do you have any way of testing and guaranteeing your code, your headers, that it doesn't leak the suppressions out? I believe that the Visual Studio Plus compiler now has a warning for when it sees imbalanced push pops. I could be wrong about that. We basically avoid that by being extremely disciplined. We have a certain pattern of push, disable, pop,
Starting point is 00:44:14 where we defend against the pragma pack level, the pragma warning level, macroized new because people are evil, including MFC, and we repeat that in every header that we have. And whenever we make a new standard library header, the first thing I do is check, okay, have we copied the right dance of push, disable, pop in the header?
Starting point is 00:44:36 And we try really hard to avoid mixing things like pragm, push, and disables with other preprocessor directives. Because if you have multiple modes and you've interleaved push pops there, it's quite easy to get confused. We generally have few modes like that in the STL and fewer that are interleaved with suppressions. So we haven't really needed lots of validation to avoid leaking pushes and pops. And that's why I don't know whether the compiler for sure has the imbalanced push-pop. I think it does, but maybe I'm thinking of Clang. All the front ends sort of get
Starting point is 00:45:09 mixed together in my head sometimes. That sounds like it'd be handy. Whereas we do have a regression test for other errors in the STL that are easy to commit. For example, one of the first things I added when we were picking up TR1 back in 2008 is we
Starting point is 00:45:25 forgot inline on a header-only function. That's totally cool if you only include it in one CPP file, one translation unit. But the moment you link together two of them for getting inline on a non-templated function outside of a class definition is
Starting point is 00:45:42 a guaranteed linker error. We have a test in the STL that drags in every single STL header into two translation units and links them together to make sure we don't forget inline, because that's so easy. Whereas messing up the pushes and pops, that's something that is in some sense harder to do, as long as you're being disciplined about copy pasting. We actually copy paste code a lot in the STL. We're just very careful about it.
Starting point is 00:46:04 I actually, I've had to implement a similar regression test myself, but on a much smaller scale. Yeah. That one, that one takes a while to include every STL header. I envy projects that are smaller that only need to include like, you know,
Starting point is 00:46:16 a megabyte of source code. Right. Yes, definitely. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. Backtrace is a debugging platform that improves software quality, reliability, and support by bringing deep introspection and automation throughout the software error lifecycle. Spend less time debugging and reduce your mean time to resolution by using the first and only platform to combine symbolic debugging, error aggregation, and state analysis.
Starting point is 00:46:43 At the time of error, Backtrace jumps into action, capturing detailed dumps of application and environmental state. Bactrase then performs automated analysis on process memory and executable code to classify errors and highlight important signals such as heap corruption, malware, and much more. This data is aggregated and archived in a centralized object store, providing your team a single system to investigate errors across your environments. Join industry leaders like Fastly, Message Systems, and AppNexus that use Backtrace to modernize their debugging
Starting point is 00:47:12 infrastructure. It's free to try, minutes to set up, fully featured with no commitment necessary. Check them out at backtrace.io slash cppcast. Another thing you mentioned in your article was we're further improving STL's defenses against overloaded
Starting point is 00:47:28 operator comma prens and operator ampersand prens. What does that entail exactly? This is definitely where users hate implementers and want to cause them unbounded amounts of misery. The way the STL is specified is we basically
Starting point is 00:47:43 have to handle users doing arbitrary stuff unless they do certain bad things. Some things are out of bounds. You cannot give a destructor that actually emits an exception to the STL. We do not handle that. We're not required to, and we will generally just slam to no, accept, and terminate the process. We don't have to handle operator less thans that actually return less than equal. That's a precondition violation, totally bad. We don't have to handle you if you define macros that follow the form underscore capital letter. Those names are reserved. We do have to handle code that macrizes anything else. So if you want to macrize the identifier like meow, that's actually totally fine. And that means that we can't use local variables that have pretty names. So that's why if you open up the STL headers,
Starting point is 00:48:26 they're written in this really strange fashion with all these underscore capitals for Visual CS+, or double underscores for like LibCS+, and LibSTD CS+, as I understand it. Because we've got to defend ourselves against user macros. And this applies to all sorts of other things. And this is one of the ways that reading standardese is sometimes hard.
Starting point is 00:48:42 There's nothing in the STL that forbids users from overloading operator comma and operator ampersand very aggressively, at least as of C++11. There used to be some prohibitions back in 03, which are unfortunately gone now. So if you have a type that overloads the address of operator or the comma operator, or even worse, if you have non-member functions that can be found through argument-dependent lookup, then if the STL ever does anything, like say, ampersand elem to get the address of an element, or if it says plus plus iter1 comma plus plus iter2 in a for loop, both of those things can be hijacked by overloaded operators. And the STL is actually not supposed to be vulnerable to such hijacking. Now, this is true of every library, but somebody like Boost has the luxury of saying, oh, you overloaded operator ampersand. I don't need to handle
Starting point is 00:49:32 that nonsense. Bug won't fix. The STL, because we're controlled by an international standard, I don't actually get to point to anything and say, you're being bad. The standard allows users to be evil in that fashion. So to be technically correct, we need to defend ourselves against this. And when I say evil, there are definitely types like Ccomputter that overload operator ampersand. But I still think of them as relatively evil. And indeed, in our regression tests, the type that overloads operator ampersand and operator comma is called capital evil because it tries to be maximally hostile to the STL. So we have to defend against this through a couple mechanisms. One is the thing that was
Starting point is 00:50:08 added in C++11 and further enhanced called std adder self. This is a function that is provided by the header memory that takes an object by reference, it can also take a function, and returns a pointer to the thing and will always get a pointer to the thing, even if it overloads operator ampersand. Previously, this had to be done through a certain sequence of casts. Reinterpret casts like char ref and handle cv qualifiers and blah blah blah. Nowadays,
Starting point is 00:50:36 we actually need to use a compiler hook, because address of needs to be constexpr. And reinterpret cast is not constexpr okay. So we need a compiler hook to say, pretty please compiler, give me the true address of this thing, even if there's an overloaded operator. And by the way, make it constexpr. So c1xx, which is MSVC's front end, and edg both implement the compiler hook that clang also implements. And that's how RSTL can use it with
Starting point is 00:51:03 all of these different frontends. So whenever we work with a type that we do not control, like a user element type, or a type that has been infected by a type that we do not control, like a vector of user element, argument-dependent lookup
Starting point is 00:51:19 could find an overloaded operator ampersand, and that means that we need to use std adder sub. So if you look at our headers, you're going to see a lot of std adder sub, where almost everybody else would just write an ampersand. But hey, it makes people using, you know, these very bizarre types happy. For operator comma, we need to use void camps. So if we want to write a for loop where we increment two iterators, we need to say plus plus iter one comma, and then cast a void plus plus iter two. Because if you look at the way the core language works, if one of the operands to operator comma is void,
Starting point is 00:51:49 it's not going to consider any overloads because you can't have an overload taking like a void argument. That's like totally bogus. So the presence of the void cast, and it can actually be on either the left or the right, but I put it on the right of the comma, basically in the middle, because there, if you use like a C cast, which is okay for void, it's sort of syntactically unambiguous.
Starting point is 00:52:16 Whereas if you use a C cast for void on the left-hand side of a comma, it looks really weird, even though the cast does bind tighter than the comma. The only time we put the cast on the left-hand side there is when we need the result from the right, and that does occur in a couple places. The other nice thing about putting it in the middle, in some sense, is that it also works when we increment three iterators. If we say, plus plus iter1, comma, cast devoid, plus plus iter2, comma, plus plus iter3, the single cast there protects
Starting point is 00:52:38 both commas, which a cast on the left-hand side would not do. And we do have a few of those in the STL. Fortunately, we never need to mess with four iterators in a single expression, so that just doesn't come up. But this has been sort of an iterative thing. This has been going on since, I think, like 2013, certainly 2015.
Starting point is 00:52:58 We have been adding more and more defenses to the STL against this. I now believe that we're completely clean against such operators being picked up through argument-dependent lookup, but we are not clean against such operators being defined at global scope. Fortunately, nobody else is. So here's a secret. You can kill
Starting point is 00:53:15 any STL in the world by defining a non-member operator ampersand at global scope that picks up everything and then just delete the overload. If you do that before you include STL headers, you will kill every single one of them. I've checked on one box, like libc++, libstdc++, visualc++. We just say operand on too many things that they don't depend on user types. So if it weren't for such global overloads, we would be
Starting point is 00:53:40 okay. But the presence of that global overload will just prevent us from compiling. And technically, nothing in the standard makes that invalid. But I have no plans to handle that yet. Nobody's quite that evil. But if you pick it up through ADL, yeah, we'll try to defend against it. I think we're good there. There is a known bug where if you define overloads for other operators, we still get hijacked. Notably, if you overload things like operator bitwise or operator XOR and binary operator ampersand for like enums, there are parts of our regex implementation and iostreams implementation that will get hijacked when they mix like one of our enums in an integer. It's possible for a user overload to be more preferred, and we get hijacked. So we've got an active
Starting point is 00:54:27 bug, we've got to go audit all the uses of these bitwise operators and cast both operands or feed them to a function or something. It hasn't been a huge priority for us, but it does come up from time to time. The work is ongoing. How tempted are you to send an email back to the user and be like,
Starting point is 00:54:43 stop doing that? I am often so tempted, but I don't get to cite a section of the standard. And I feel that if it's permitted by the standard, I do have to uphold it. Whereas if a user violates the standard and they say, pretty please make my code work, there I get to say, no, by section 1234, meow.operator, your code is invalid. I will not fix this thing. I love saying that. But on the flip side, if it is permitted by the standard, then I probably need to hold up my end of the bargain.
Starting point is 00:55:12 So maybe switching topics just a little bit here. I've seen a lot of trend towards putting more and more stuff on the stack and avoiding dynamic allocations. And we see this small function optimization, small string optimizations. These things are pushed heavily.
Starting point is 00:55:27 But do you think there's like a limit or a balance here? Like how much stuff should we put on the stack? When should we move to the heap? Yeah, you definitely want to avoid putting way too much stuff on the stack, especially if you have like deeply recursive functions. Stack space is a finite resource. And if you run out, it's very bad.
Starting point is 00:55:43 This is one of the things I encountered when I was first learning C++. I wrote a recursive tree traversal algorithm that would just die if you iterated over or recursed through a million nodes. And I had a tree that was a million tall. And it was actually very mysterious to me. Why is my program just terminating when faced with this input that I think is totally correct? And I needed to realize and learn, oh, stack overflow, that's what that is.
Starting point is 00:56:07 This was, I think, actually before stack overflow was the question and answer site. Right. It was a long time ago. So you definitely don't want to be putting, you know, like multi-kilobyte buffers on the stack. But on the other hand, dynamic memory allocation, it is a cost, even if your allocator is very efficient. You would prefer to avoid it. And you get increased locality by putting stuff on the stack or within an object, and that's why things like the small string and small
Starting point is 00:56:31 functor optimization exist. So we have actually had to think about, you know, where do we tune it? Right now, we are looking at retuning the small string optimization in a future version, but back in 2015, when I overhauled our functional implementation, and that was a binary breaking release, so I could change the representation of objects, I had to look at it and say, okay, how big of a function object do I want to be able to store locally within a std function, assuming all the other criteria are met, without dynamically allocating memory. Previously, we had hoped to store function objects up to like three void pointers in
Starting point is 00:57:07 size, but various bloat and other inefficiencies in our representation meant that we could only store things like up to two void pointers in some cases. That was like really small. And users would eat very commonly of functors that stored like a couple long longs or a long long and a couple pointers or whatever, or a double, doubles or eight bytes. And they would exceed the small functor optimization and stuff would be quite slow. We even had one bug report where a user said, my code is 10 times slower on 64-bit than it is for 32-bit. What's going on here?
Starting point is 00:57:41 And the answer was that on 64-bit, the function object that they were giving us, which was produced by Bind, but that was kind of irrelevant, grew in size by just enough due to the pointer inside that it tripped our small functor optimization limit and we started dynamically allocating. Whereas on x86, it would not.
Starting point is 00:58:01 I mean, it was like a 10 times perform difference because they were constantly hammering this thing and it was doing a bunch of allocations and deallocations. So I knew I wanted to increase the size of the small functor optimization. So I talked to our CRT maintainer at the time and he also worked on the STL a little,
Starting point is 00:58:15 James McNellis, who now works for the Windows debugger team. And he actually suggested a heuristic that I ended up using in std function and that we have followed elsewhere in the STL I think it's std optional follows the same philosophy nor is it std any. All of these are
Starting point is 00:58:34 blowing together. I didn't write them, it was KC so I don't know about those. But with std function the heuristic that James suggested is we should be able to store function objects up to and including a st stood string in size. Because it seems natural to be able to want to bind a string or store a string within a lambda and then go iterate with it over some range.
Starting point is 00:58:56 And string is actually one of our larger classes because of the small string optimization and extra debugging information we sometimes put into it. It ends up being, and don't quote me on this, but I think it's like 24, no, it's like 28 bytes on x86, and it's like 32 or 48 on x64. It's relatively big, and the numbers vary depending on whether it's release or debug. So I tuned std function so that it can store an object of that size, plus its extra bookkeeping information, like a pointer to the function object that it stores, and a virtual function pointer, and so on and so forth. And that ended up making std function even larger than string,
Starting point is 00:59:36 of course. I think it's up to like 64 bytes on x64, and it's a little smaller, like 48 on x86. And I felt that that was okay because users are unlikely to have huge amounts of std functions. Like, yeah, a vector of std function is very powerful, but are you going to have a million such things? Relatively unlikely. Whereas a vector of string, that's very, very common. So we want string to be a little smaller. And so I felt that tuning it to store one std string was sort of a good compromise. And if you come to us with anything bigger, well, we tried our best, we got it dynamically allocated.
Starting point is 01:00:12 But it's a heuristic. Other libraries, they do differ. At the time that I implemented this, I checked, just by compiling programs like on WANBOX, what the behavior of the other STLs were, and not all of them happened to satisfy the same heuristic. I forget whether it was libc++ or libstdc++, but in some cases they could not hold a std string locally due to the relative sizes of their strings and functions. But if they're ever interested in making binary-compatible changes,
Starting point is 01:00:37 I would recommend following our heuristic, because I think it's a good one. Thanks, James. It does make sense. Now, trying to think of other places where we really try to avoid dynamically allocating memory, aside from the small object optimizations, we generally don't do crazy contortions to avoid it,
Starting point is 01:00:55 but we certainly will not allocate stuff just for fun. We definitely do not use to function unnecessarily within the STL. We try to watch our allocations and deallocations. We will spend them when necessary. Sometimes we need to make a shared putter to something. But generally in the STL, we don't. I guess the one major opportunity for that would be like in Regex. Our Regex implementation right now is just a blizzard of dynamically allocated nodes where probably we could, when we construct the regex, just pack them all into a vector.
Starting point is 01:01:27 Because we're not going to be adding nodes later on in the regex's lifetime. In fact, I feel pretty strongly that regex should have such a representation. But that would be a major overhaul to a 150 kilobyte header. So we currently don't do that. But I could see us doing something like that in the future.
Starting point is 01:01:43 Okay. I feel like we're running out of time here. Where can people find you online, STL? It'd be great to have you on another episode, because I feel like we could really go on forever with you. Oh, that'd be cool. I'm active on the CPP subreddit. And not only do I post things like our STL change logs, but I comment pretty frequently, usually on Visual CS Plus related matters, because that's what I happen to know the most on, but general standards questions, I answer a bunch of those. And on Twitter, I finally joined, I think it was shortly after CppCon 2015, it was like the September of that year, that I joined Twitter. So you can follow me there I generally tweet about standards related topics
Starting point is 01:02:29 or Visual C++ related topics and I also in every changelog that I publish on our Visual C++ team blog VC blog, I make it a point to include my email address I like our user programmers to know that hey, actual real life humans
Starting point is 01:02:44 are maintaining the STL that they're constantly trying to subvert by overloading operator comma and operator ampersand and doing these other nasty things. So if people want to get in touch with me directly, they can mail me at work. I do recommend only mail
Starting point is 01:03:00 me if it's important. I get people sometimes who are like, oh, can you file a bug against the compiler or against Windows or against Xbox? I don't actually have access to most of those databases. But if somebody has an STL bug against Visual CS+, they may as well report it to me directly
Starting point is 01:03:15 because I'm going to have to look at it anyways even if I do end up sending it to one of my coworkers. But we also have other channels for feedback. That's like the direct channel to me though. Right. Awesome. Well,
Starting point is 01:03:28 it's great having you on the show today. Thanks for coming on. Cool. Thanks for having me. Yeah. Thanks for joining us. Thanks so much for listening in as we chat about C++. I'd love to hear what you think of the podcast.
Starting point is 01:03:40 Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, I'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Leftkiss on Twitter. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.