CppCast - San Diego EWGI Trip Report

Starting point is 00:00:00 Thank you. and C sharp. PVS studio team will also release a version of sports analysis of programs written in Java. And by jet brains, maker of intelligent development tools to simplify your challenging tasks and automate the routine ones. Jet brains is offering a 25% discount for an individual license on the C plus plus tool of your choice. See lion resharper C plus plus or app code. Use the coupon code jet brains for CINS for CppCast during checkout at

Starting point is 00:00:46 JetBrains.com. In this episode, we discuss designated initializers and stood-in-bed alternatives. Then we talked to Jeff Bastian from Apple. Jeff talks to us about the San Diego C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how's it going? Pretty good, Rob. How are you doing? Doing okay. Doing great since I saw you two hours ago.

Starting point is 00:01:56 Yes, it's a bit of an inside joke for our listeners, I guess. But we're recording two episodes back-to-back for the first time ever. Yes, we've sometimes done two in a week, but we're doing two in a day just for scheduling reasons yes yeah so this is the disembodied voice of the past or something like that right well anyway uh at the top of episode like three to piece of feedback uh we got a itunes comment here and it's a, it says great hosts and always informative for this from plain Jim. And he says, this is a wonderful podcast that helps me learn about important people in the C++ and software community.

Starting point is 00:02:32 It's not dry at all. And I really enjoy the kind and professional hosts who make it very special. So thank you, Jim. Yeah. Very nice. It's not dry at all. Yeah.

Starting point is 00:02:41 Well, I can see how some programming podcasts could be quite dry. Yeah. Yeah. Well, I can see how some programming podcasts could be quite dry. Yeah. Yeah. No, I mean, I would expect ours sometimes is, honestly. But it's good he doesn't think it is. Yeah. Well, thank you for the kind words, Jim.

Starting point is 00:02:56 We'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com. And don't forget to leave us reviews on iTunes. Joining us today is JF Bastion. JF is the C++ lead for Apple's Clang front end, where he focuses on new language features, security, and optimizations. He's an active participant in the C++ Stands Committee, where he chairs the Language Evolution Working Group,

Starting point is 00:03:20 Incubator, Oogie for short. He previously worked on WebKit'savascript core just-in-time compiler on chrome's portable native client on a cpu's dynamic binary translator and on flight simulators jf welcome back to the show hey thanks for having me again you know i yeah this is your second time right but i can't wait a while yes it's been a long time like three years or something right yeah something like that. Maybe, yeah. I think last time we talked about WebAssembly, shortly after it was announced by the four browser vendors.

Starting point is 00:03:52 Yes. And now that's a real thing. That's in the real world doing things. Yeah, it is. It seems to be getting a lot of traction among just web developers in general. There's still a lot of ongoing work, but the basics are there and seem to be working really well. I just saw a comment on Reddit from someone that said

Starting point is 00:04:13 that they are in the process of porting some C++ code to TypeScript and was looking for some guidance. And I have respect for TypeScript, but my immediate response was, why don't you just compile that C++ for your web browser? Yeah, that's certainly easy. It's certainly the easiest approach. But in a lot of cases, if you do something that's native to the platform, that's well supported, you can trim down code a lot and have kind of faster execution. Not that WebAssembly can't get that,

Starting point is 00:04:49 but there is a cost to the bridging, so you kind of have to think about the way you're going to do that bridging. Right, so if you're having to do lots of bridging. Okay. Right, exactly. And so there's a lot of upsides. If you're trying to do compute-intensive stuff,

Starting point is 00:05:02 WebAssembly is really good, but if you're going to try to get, say, data in and out of the WebAssembly sandbox or something like that, then your C++ code might not be the bottleneck. Right. In some cases, it makes sense. In some cases, just TypeScript or JavaScript are really good solutions. That's what the web is, right? They do it really well.

Starting point is 00:05:24 Right. It's really a question of what's the better tool for the job yeah i would be well i mean it's just a random comment on reddit but it would be curious to have a full conversation with this person and see what they had uh what they had attempted so far and checked out and everything yeah yeah okay well jay if we got a couple of news articles to discuss, and then we want to talk more about your own thoughts out of the San Diego C++ meeting and the work you did as the Oogie chair. Okay. Sounds great. Okay. So this first post is on Therocode's blog, and it's exploring C++20 designated initializers. And it is a really good post going into kind of the history of aggregate initialization and why designated initializers are going to be a great feature for

Starting point is 00:06:12 c++ 20. This kind of seemed like it was a long time coming I don't know when it made it into c but a while ago right c 11 or before. All right this is a feature that came from C, right? Yeah. Right, it is. The difficulty though is C++ has constructors and so designated initializers just based on that have a few issues. If you can reorder the developments or something like that, it's the same as the

Starting point is 00:06:37 constructor where you use the colon and you have the list of initializations. If you don't put it in the same order as it was declared in struct, then they're not necessarily obvious rules on what gets initialized when. And so I think that was one of the biggest holdbacks from getting it into C++, just kind of getting some sane rules in there. And otherwise, the C rules are kind of weird in some cases. And so the committee really wanted to not just adopt the C++

Starting point is 00:07:05 as is, but have something that's compatible yet fits well with C++, I think. And it's ultimately not going to be fully compatible, right? Because it will allow braced initialization of the named designated initializers,

Starting point is 00:07:21 which C will never allow. Is that correct? Well, I don't know if they'll never allow it. Well, no, I mean, right, but it would be incompatible today. Yeah, certainly there's a common subset that will work in both, and then there's things both in C and C++ that won't interoperate at all. Okay, right. But I think we got a pretty good design out of that feature,

Starting point is 00:07:41 so I'm pretty excited. No, I'm pretty happy with it. The only thing in my playing with it, because as far as I know, no compiler currently supports the braced initialization syntax. They all basically just, as far as I can tell, for the most part,

Starting point is 00:07:55 enabled what they had from the C side of things at the moment. Right. And some of the things that I want to know is how does this interact with deduction guides? That kind of thing.

Starting point is 00:08:13 Because you can kind of if you squint your eyes synthesize a constructor by using a deduction guide and I want this to play nicely with deduction guides but it wasn't clear to me in reading it and playing with it how well it would. Yeah. Yeah, I don't know.

Starting point is 00:08:28 Okay. I have no idea. Okay. So the next post we have is stood in bed for the poor C++17 users or cross-platform resource storage inside the executable. And this reference is-in-bed, which we talked about with Jean-Meneed a while back. Doesn't look like stood-in-bed is going to make it into C++20.

Starting point is 00:08:56 So this is kind of an alternative tool that you can use for similar functionality. And the post kind of dives into how he goes about doing it. Yeah, it makes a about doing it. Yeah. Yeah. It makes a lot of sense. Yeah. I didn't want to joke around that. Yeah,

Starting point is 00:09:09 totally. Go ahead. Go ahead, Jason. The Reddit, specifically the Reddit post about this article, John, John,

Starting point is 00:09:23 we need, he, he comments, like has a discussion with the author of this article, Jean-Meneed, he comments, like, has a discussion with the author of this article about the pros and cons of the different options for how he implemented this. So I would suggest that for our listeners who want some more information on the background and history of these options.

Starting point is 00:09:39 Right, and I expect that we'll see... So we saw... I don't want to give too much away, but we saw Stood in Bed in San Diego, and I expect that we'll see... So we saw, I don't want to give too much away, but we saw std embed in San Diego, and I expect we'll see an update in the next upcoming meeting in Kona. And I think there's a pretty big discussion going on inside the committee

Starting point is 00:09:55 about stuff related to that in general. So I'm pretty hopeful that we'll get something in that direction. I'm just very worried about the build system implications, right? If you can pass a constexpr string to std embed, that like at compile time, you figure out which file to include, then your system has to run all of constexpr

Starting point is 00:10:13 before being able to determine the dependencies. Now, it's kind of a niche feature in my opinion. So I'd be really happy to not have that part of the feature yet have the other type of dependencies, right? Especially with modules, like you kind of want to have really quick, easy parsing of the module, and student embed kind of defeats that, whereas modules tries to give you, kind of make your build system easier to support in some senses. So I think there's a balance to be struck, and it's not clear where the committee wants to land on that.

Starting point is 00:10:44 So you're saying, if I understand correctly, a string literal is fine, but if it's any constant expression, that becomes more difficult. Well, so the idea is you would say student-bed foo, where foo has been derived from playing an Amiga emulator as a constexpr thing. As a hypothetical possibility. As a hypothetical thing. I'd be worried about the compile time aspect of that, of figuring out what the

Starting point is 00:11:07 dependency was. I'd like to construct a graph really quickly by quickly pre-processing the top of every file. And then when you see the keyword module or something like that, you know you're done with pre-processing

Starting point is 00:11:23 the module. It's in a perfect, if modules are in C++20 world, your build system should be able to figure out file dependencies really quickly in an ideal world. And I'm worried that std embed would defeat that. But at the same time, I want std embed.

Starting point is 00:11:41 I've done stuff similar to this blog post before and so it's a really interesting feature to have to be able to say take this file and constexpr create this result from the file so yeah I kind of want both yeah the project I'm currently working with a client on has been dealing with some version

Starting point is 00:12:02 of the solution I think for six years or so, rolling their own CMake kinds of things for trying to make effectively a single binary that has everything it needs in it. That's really our goal. We don't need the data at compile time, but we definitely need that data in the binary

Starting point is 00:12:21 and access to it at runtime. And that aspect of this i think is yeah highly necessary the compile time part is fun because i like doing compile time things right okay and then the next post we have is a range of a better span and this is a follow-up post where the author uh just talked about why he doesn like span. I didn't read the first post. Did you, Jason? I did not. Okay.

Starting point is 00:12:49 But this one just goes over kind of refreshing why he doesn't like span and this alternative solution with using the ranges, which are going to be in C++20. Yes. Yes. Yeah. The committee is pretty split on that in general, on how span should look, whether its size is signed or unsigned, and other stuff around that, how comparisons work or don't, whether the type should be regular or not. So there's kind of a lot of different trade-offs in that design space and people who are much better than me at the library side of things are having pretty,

Starting point is 00:13:29 pretty serious discussions about this and, uh, content is part of that. So it's, it's been good to see people try to find an agreeable solution. I'll be curious how it all shakes out. I don't feel like I personally have, I don't know, a horse in this race or whatever, like whatever the committee gives me, I'll be fine with. I have to admit though, when I was reading this article, I got distracted by something right towards the middle where he says, but in general, I think we should try to move away from the header source file separation model. And that's a link to a separate article that he had written a month and a half ago.

Starting point is 00:14:05 And it's, uh, seems to be effectively an argument for header files, like no, you know, just putting everything online, everything in line all the time. And I know organizations who are doing this,

Starting point is 00:14:19 right? We, we discussed, um, we discussed that with Lenny two episodes ago. Yeah. And I was just kind of curious if JF had seen this or if he had an immediate gut reaction to it.

Starting point is 00:14:32 Yeah, well, I mean, especially in the, you know, like the Free C++ 11 times, like we used to do that for header-only stuff everywhere, kind of like what Boost does, right? And I've kind of gone back and forth on things like that. And in a way, I find it hard to navigate when there's all this code in there, right? I kind of like the... It's not just like you abstract away the class, but it's also when, as a human, you read the

Starting point is 00:15:01 code. I like having the abstraction of the header and the CPP file or something like that. But at the same time, you could argue that's the job of the IDE to kind of hide things from you unless you actually want to see them. And again, I kind of go back and forth between using an IDE and not. So I use Xcode fairly extensively, except when I use Emacs. And there's no logic to my usage. I just kind of go back and forth and try to deal with the two of them really well. I think modern codebases care about compile time a lot,

Starting point is 00:15:34 and so they don't tend to put everything in the header file that much. But I guess we'll see what happens with modules, right? How that changes the landscape, how IDEs make your life easier there or not. So, JF, do you want to start off by just telling us a little bit about the new position you had in San Diego as the Oogie chair? Right. Yeah.

Starting point is 00:15:55 So, Herb sent out an email or a blog post, I guess. Well, first he sent an email to the committee, then he posted a blog post saying like, we've got a huge influx of papers coming in. And it's kind of interesting because Bjarne had kind of predicted it in a way. And everyone was like, nah, you're just worried that there seems to be a trend, but that'll die down. And it turns out Bjarne was really right. So for San Diego, there was an influx of papers papers I think it was 276 or something like that where that that's more than twice that there's ever been in a pre-meeting mailing and so that's you know that's not possible to read all the papers I think it's he

Starting point is 00:16:39 calculated it's more than three times the works of Shakespeare or something like that again it doesn't make sense, right? And at the same time, there's an influx of new participants. So I think there were 180 people who showed up in San Diego. Now, some of it is historically there's been an influx just before a standard kind of gets nailed down, right? So before it's kind of finalized. And so some of it is trying to get last-minute features in or last-minute fixes in so that the standard is better.

Starting point is 00:17:06 And you definitely want to prioritize those things. At the same time, there's features that you don't want to rush into the standard. So you want to say, well, let's take a step back. And this feature could make it to 20, but it's not a good idea. We're not confident about the quality of the feature yet, even though it might seem good. So there's kind of a balance to be struck. And also because we have a lot of new people coming in, it's hard to know how to write a paper that does the right thing.

Starting point is 00:17:36 There's a lot of unwritten rules about how the committee works that people have to kind of just figure out. And we've been trying to get better at documenting those things, but it's just a fact that the committee is not that great at documenting those things yet. So you'll write a paper and you'll think like, I got this, it's really great. And then you'll realize, oh, there's all these things I didn't follow. And that's true both on the language side and on the library side. And so I think the goal that Herb had was to really help new people write higher quality papers by having an incubator that tries to work on the features in the smaller group, such that when it gets to the library group or language group, the feature is way more polished.

Starting point is 00:18:23 You don't waste as many people's time trying to polish something, right? So that's one aspect. Another aspect was to just say no to features that shouldn't go forward as early as we can with as few people involved as necessary, right? It's not really saying no, but it's more figuring out why something fits or doesn't fit and what could be done to make it fit, right? Sometimes you're saying no, because you'd like to see the feature done a different way. In a lot of cases, it's just, well, this is nice. We could create a macro for this, but really we should work on constexpr to make it happen instead or on reflection to make it happen instead. So that's another of of trying to redirect the feature of like well you want to do this but have

Starting point is 00:19:09 you thought about doing it this other way and then it's trying to merge features as well we had you know two proposals for pattern matching which is a huge language change and we don't need two proposals for pattern matching right it's it's good to get to a certain point where you can evolve proposals separately, but at a point in time, you've got to choose one. So the goal in the incubator is to kind of try to see trade-offs and try to get just one feature out of two papers. So I think those are the goals of the committee. And at the same time, the language group and library group

Starting point is 00:19:42 want to move forward on certain um on certain papers that will go into 20 but you don't want to pause the world and and not see anything will go after 20 right you want to have some amount of iteration even though um even though you're trying to finish 20 you still want to work on future features just like you know pattern matching so that's another goal of the incubators is really you know let's get a bit of iteration going so that, you know, once 20 is finished, we have stuff that's just not all brand new. That's kind of ready to go or closer to ready to go than if we just hit the pause button. So that was really the goal of the incubators. And I think it was a pretty good setup that Herb had to try to make that happen.

Starting point is 00:20:33 And then he asked me and Bryce to chair the language for me and Bryce the library incubators. And it seems like it's worked pretty great. Throughout the week, we had 10 to 20-ish. I think at the top, we had 30 people in the room for pattern matching at some point, but not that many people compared to the other rooms. So I think that goal was met. And we saw, I think for my group, something like 30 papers. So that was pretty productive, I think. In a way, it sounds like the incubator would be the more interesting to a new person to visit because you are just, you know, you're seeing like exciting new things coming in and kind of seeing the start of the process, I guess. Yeah, definitely. Although part of the goal of the incubator is to also

Starting point is 00:21:16 bring some of the worries that the later groups would have to those papers. And so I think it's useful to, if you're new to the committee, kind of shop around the different groups, see what they talk about, see what concerns they bring up, and also see who the... We really have... I don't think there's one expert who understands all C++, but in the committee there's key people who understand things very, very well. And knowing who to go to to get good input on a particular topic is also really useful.

Starting point is 00:21:48 So it's useful to go around and see who everyone turns to for particular things so that you can not waste everyone's time and just talk to that person directly if you have questions on a particular thing. And different groups really have different

Starting point is 00:22:03 worries. The final groups, the core group So, you know, and different groups really have different worries, right? The final groups, the core group and the library group really care about getting the wording right, integrating things into the standard really well, whereas the evolution groups have more kind of higher level, where should the language go or the library go, what should it do, how should that feature look like? And things like that. So there's really different concerns.

Starting point is 00:22:28 And to me, it was obscure before joining the committee how that part worked. And just seeing it happen has been kind of really interesting. It's a fairly well set up set of responsibilities. And people kind of hold themselves to that. It's really well done, I think. So if you're in one of these discussions like an incubator and you're like well we don't really know how this would interact with this other feature and so and so is the expert on that but that person's in a different room do you ever like go try to grab that other person real quick and

Starting point is 00:22:59 oh yeah definitely then okay so there's an ir an IRC channel where the committee kind of, we announce which paper is being discussed where, and where people get queries for like, hey, could this person be sent here? Or, you know, you're about to present, how about you show up type of thing. Because the scheduling is pretty flexible, right? There's so many people, you can't have everyone in the room at the same time. And so you kind of give a heads up to people. And definitely when there's a question where you'd like to have someone in the room at the same time and so you kind of give a heads up to people and definitely when there's a question where you'd like to have someone in the room that that happens although what we usually try to do is we ask uh we we post a schedule

Starting point is 00:23:32 internally of which paper is going to be discussed roughly when and then uh people can either talk to the chair or you know post on the the wiki or something like hey i want to be there for this discussion so you try to ping them beforehand so that they're there so they kind of self-nominate talk to the chair or post on the wiki or something like, hey, I want to be there for this discussion. So you try to ping them beforehand so that they're there. So they kind of self-nominate. But another thing we do, at least in the incubators, for example, we did this with Stood in Bed, is we say, well, we're not ready to afford this,

Starting point is 00:23:57 but this isn't the group to make this decision. We'd like you to go to, for example, the language evolution group, ask this set of questions, then come back here with the answers, and we can iterate in that direction. So we did that with std embed. We did that with nodiscard. So there's a paper that adds a message to nodiscard. And we want to know, would library use it?

Starting point is 00:24:17 And if so, how? Do they want to have a policy of this, or do they not want to use it for a variety of reasons right it's i think it's important for us when we standardize a language feature to know whether a library would use it and how and so that's part of the discussion so we forwarded that paper with that question still open yeah it's interesting to me that you just brought up no discard because i just got a message from someone on twitter saying that he started putting no discard throughout his code base and then wrote a, uh, lib tooling tool to go in and automatically apply it to every function that

Starting point is 00:24:52 he thought was, was appropriate and ended up finding apparently dozens of bugs and their source because they were ignoring things they shouldn't have been dropping. And it seems like a lot of people are like, wow, no discard should kind of almost be everywhere. I mean, for some definition of everywhere.

Starting point is 00:25:10 Right, and it's pretty interesting. The committee has been discussing this this week, and there's places you don't want it, right? You might say, well, it's a kind of misfeature of the library if it returns something that's fine to ignore, right? But something like printf is the

Starting point is 00:25:25 one nobody ever checks, right? Like nobody looks at the return of printf. But you might argue, well, you shouldn't have been returning something if nobody checks it. But I guess the idea is also should the default in the language be no discard, and then, you know, you would say maybe ignored in the cases where it makes sense to ignore them. So I think what the committee is going to try to do is write a paper that tries to see where you should apply it or you shouldn't as a kind of policy matter, and then decide whether we actually do apply it to the STL or whether it's just a proposal that implementers do this. So I think Stefan, STL at Microsoft,

Starting point is 00:26:09 volunteered to write such a paper, and I guess we'll see where it falls. The thing to remember is attributes are kind of weird in C++ where compilers can just ignore them, and so they're not really normative in that sense. And so even if we put it in the whole STL, it doesn't really mean anything. Although, you know, I'd like

Starting point is 00:26:29 my SCL implementation to follow whatever rules are there for attributes. So, you know, attributes are a bit weird, honestly. Yeah. Yeah, but, you know, it's almost peer pressure in a way. Like, if two out of the three compilers are giving us good warnings on no discard,

Starting point is 00:26:45 the third one kind of has to catch up and do it too. Right, yeah. And in a way, the warnings in language implementations already give you warnings on no discard, right? So if you do some comparisons or certain expressions and they have a side effect but you don't look at the result, then either with WL or something, you'll get warnings in some cases. certain expressions and they have a side effect but you don't look at the result then uh you know either uh with w all or something you'll get warnings in some cases and so it's it's a bit

Starting point is 00:27:09 weird that we have a distinction between library and language where uh discarding things already warns in some cases but not others right so um i guess it will be nice to kind of find those bugs uh Notiscard. So we mentioned stood-in-bed and Notiscard. Were there any highlights out of the Oogie group? Were there any papers that you kind of just forwarded straight on to Evolution? Yeah. So total, we had 60 papers on our plate.

Starting point is 00:27:41 We saw 30 of them. There's five that we sent as- to EWG, and two that we sent just to get feedback. And there's six that had no consensus to continue work as is, but we gave feedback to the author on why we think that's the case and what could be done to change this. So at a high level, the ones we sent to EWG were really just ready, and I think it was trying to make sure that they were ready before EWG saw them, because their schedule was really, really overbooked. So first one, there was array size deduction in new expressions, which was kind of just an oversight that was missing in the language. I think Bjarne pointed it out. Nobody really seems to have run in this ever, but it was just a weird inconsistency. And so this was just fixing

Starting point is 00:28:30 something that Bjarne was just like, oops, I made a mistake like 30 years ago. How about I fix it? So that was kind of neat. Then there was something that was almost a no-op where we're making string literals for char16t and char32t be explicitly UTF-16 and 32. It was effectively the case, but we didn't say it, and the wording around it was a bit weird. Okay. And so the paper just fixed things in a way that was sensible. Then there was something about named character escapes, which came out of the SG16 Unicode group that I really liked. So the idea is, you know, when you do Unicode literals, you can do, like, backslash U something,

Starting point is 00:29:16 or some syntax like that in a character literal, and that inserts a Unicode character in your string. And you have to provide the code point for that. And I don't know about you, but I don't know that many code points off the top of my head. So I know that 20 is a space or whatever, but that's about the extent of my knowledge. So what that does is there's a Unicode standard

Starting point is 00:29:42 for named characters for Unicode. And those are stable names. And so it means you can put the Unicode character in your code using that escape sequence, and you name it, and you say, like, this is the character I have. And then it's obvious what the character is, right, because it's spelled out. And it's useful because, you know, just backslash uesca escapes are really not obvious what you're talking about. But it's also, if you just insert the Unicode character in your source code, some of them look like other characters, and you kind of don't want that in a lot of cases.

Starting point is 00:30:14 So having the named escape makes it much nicer. You can just say, no, this is not like the letter K. It's the letter K for Kelvin, which is separate in Unicode for some weird reason. So it allows you to kind of call these things out much more easily in your source code. So that's really neat. So is this like an optional name that's with the code point, or you name it outside of the string literal and then can use it inside of it? So instead of using the code point, you can use that. Oh, okay.

Starting point is 00:30:47 It's really like you say, put this character here inside the string literal, and so then it substitutes that in there. I think for some reason... So it's the same thing as backslash you something, but with the name instead. Okay. So that was a pretty neat feature. It's adopting something that Unicode has standardized,

Starting point is 00:31:05 but Unicode gives you a lot of options on what you accept. It allows you to ignore underscores and dashes and spaces and ignore the capitalization and other things. And so we had to choose which subset of that we were going to allow. And I think we allowed something that is non-surprising, but that works pretty well. Non-surprising. Well, yeah, because you can change the capitalization

Starting point is 00:31:28 and add spaces and dashes and whatever according to the Unicode spec, and then that's the same stable name, in effect. And that seems a bit weird, so I think we allowed capitalization to change but not spacing and dashes and underscores. Right. Right.

Starting point is 00:31:43 So it sounds like the incubator groups were definitely a success. Do you think they're going to be meeting at all future ISO meetings? Is that not decided yet? So the intent was to relieve pressure for San Diego, but I think at least for the near future, we'll still probably meet and we'll try to gauge how much influx there is for new papers, right?

Starting point is 00:32:10 And how much of a need there is. So in the case of Oogie, we only met for three days out of the six that the committee usually meets. And Oogie met for three and a half, if I remember correctly. And so we'll kind of go on an as-needed basis. And we might do telecons in the middle to try to move certain things forward as well. But if there's no need anymore, then I don't think we should keep meeting.

Starting point is 00:32:32 The same way the other study groups have shut down over time, right? So the file system study group doesn't meet anymore because the file system is already shit. So I guess we'll see how things work out. Okay. Okay. I wanted to interrupt the discussion for just a moment to bring you a word from our sponsors.

Starting point is 00:32:53 PVS Studio Analyzer detects a wide range of bugs. This is possible thanks to the combination of various techniques, such as data flow analysis, symbolic execution, method annotations, and pattern-based matching analysis. PVS Studio team invites listeners to get acquainted with the So I want to maybe talk a little bit more about some of the other core features that had movement at San Diego.

Starting point is 00:33:32 There was a special meeting on modules before CppCon, and they're now saying it achieved consensus. So it looks like it will be making it into C++20? Well, so it's still not clear. So module hasn't been voted into 20 yet. And we'll see what happens. I think the upcoming meeting is the last one to make that happen. But fundamentally, there was the TS,

Starting point is 00:33:57 which was championed by Gabby at Microsoft. And the TS did a lot of things really well. And at least within Microsoft's experience, they had a lot of very positive uptake of their modules approach. But at the same time, the implementation of modules that predates the TS that exists in Clang has had a lot of success with Clang users in general. Especially, you know, Google was really behind the way Clang did some of the features of modules. And so was Apple.

Starting point is 00:34:35 And so we worked with Google to try to take the core of what we thought Clang did better than the TS. And then Richard Smith wrote a proposal called Adam, I think, that had kind of alternatives for what the TS was trying to do. And he was joined by Nathan at Facebook, who works on the GCC implementation of modules. And so I think by having those three implementations try different things out and see what works for their user bases, we got a better outcome through that meeting that we had about modules. And that's where we reached consensus and where there's ongoing work to try to get the TS into a state where what we voted on is what the

Starting point is 00:35:20 wording reflects. And then that can be merged into the IS potentially at the upcoming meeting. And so I think everyone's kind of happy with the outcome. I think we got some really good changes in there. Cool. Okay. How about the coroutines proposal? There's the core coroutines paper that was discussed, and that's another one where it was said there was going to be a decision to be made in Kona.

Starting point is 00:35:48 Right, right. So coroutines as a TS got a lot of traction early on, especially because Gore, who's also at Microsoft, did not just the implementation in MSVC, but he went and did an implementation in Clang and optimized it through LLVM as well. And he gave some pretty entertaining speech. He talks about it.

Starting point is 00:36:06 And that really helped the TS move forward much faster because he was willing to do all that work to show that it's not just a one implementation thing. Now, the problem was as people start using coroutines, they realize that they had issues with some of the capture, right? So it's not obvious when a coroutine is going to capture certain things and cause heap allocations. And it looks a bit different from what lambda syntax looks like. And so there was a feeling by some people that coroutines should kind of look more like

Starting point is 00:36:40 what lambdas look like without being very clear on what that meant, right? So the core coroutines paper tried to explore some of that. And at the same time, there's maybe a dozen or so customization points in the current coroutines TS. And it's a bit weird. The language feature of coroutines

Starting point is 00:37:00 reaches into library-like things to do customization. It looks a bit weird. And there was a feeling that we might be able to do something better, but not really clear on what that would be. And so that's the other thing that people were interested in exploring. So that's what Core Coroutines tried to do. The first version of the paper was really rough.

Starting point is 00:37:19 It was very hand-wavy and didn't really have enough concrete stuff to convince people. And so Core Coroutines was trying to move forward, but barely didn't make it to the standard. And now Coroutines is in a better state. The authors have been working really closely with Gore to address some of the issues they have. And at the same time, some of the Facebook people brought up worries that they had with just kind of the asynchronous natures of Coroutines and how you would use them and integrate that with the library and other stuff. So that's another thing that's ongoing of trying to get all of these actors really happy about what coroutines do, so that if we do vote it into 20,

Starting point is 00:37:56 it's a really solid feature that we don't regret. And there's a lot of demand for coroutines. People really want it. So it's a bit difficult to say, well, let's hold off for 20, right? Like maybe we want to see what it'll look like with parallel algorithms. Maybe we want to see what it'll look like networking.

Starting point is 00:38:14 Or maybe we don't. Maybe we believe it's good enough to ship as is. So I guess we'll see in Kona how things fall. And I have to ask you about the paper you presented, Signed Integers, R2's Compliment. Right. How did that go? Yeah, so I presented that paper a while ago, and I gave a CPPCon talk about it this year as well, and it went really well, surprisingly.

Starting point is 00:38:42 I thought I would get more pushback so initially um i i came at it at a kind of a mild rage because it causes like that type of thing causes security bugs as i talk about in my my cvp con talk and i didn't know of any architecture that was to complement that was relevant to c++ nowadays and so i kind of started doing some research to figure out well, why do we have... Well, first, what liberties do we have in C and C++ implementations with respect to sign-in-vitro representation? And second of all, does anyone take those liberties?

Starting point is 00:39:15 What does it look like when people try to use them? And then if we made changes, say we said everything was just complement, what changes could we make, and what effects would that have on the language and on users? So initially I was proposing that not just storage be to complement, but also arithmetic, meaning that overflow would wrap. And that received huge pushback, partly because in a lot of cases,

Starting point is 00:39:41 when you have a bug, having wrap around wouldn't actually fix your bug. And I go into examples of that in my CPUCon talk where there's bugs where wraparound was the behavior that occurred and the bug was still there like in Donkey Kong and Pac-Man

Starting point is 00:39:57 you have bugs like that and having wraparound wouldn't fix anything same thing with like the Ariane 5 rocket that exploded a few years ago. Having wraparound would not have prevented the rocket from exploding. So one of the worries people had was, well, you're going to find wraparound, it's not going to fix any bugs, but it'll make life harder on sanitizers,

Starting point is 00:40:16 because then they can't just trap when wraparound occurs and still be able to conform to the limitation of sequence space. That's one worry that people had. And at the same time, there was a worry that it would hurt performance. And there's numbers that people have measured in the range of insignificant to 8, 30% perf hit if you were to define integer overflow to wrap. And I think we can make up some of that per fit, but you can't just say, well, we made the language nicer,

Starting point is 00:40:47 but your code now runs 10% slower, right? So it's something that people weren't willing to commit to making integer overflow well-defined. And so what we did instead is we defined the storage, but not the arithmetic. And the goal is instead to have library features that allow you to express the type of behavior you want on overflow, whether you just want to track where the overflow occurred

Starting point is 00:41:13 and do something later, whether you want it to trap or whether you want it to wrap or something like that. So I think as a library feature, we're going to try to do that. And the numerics group has been working on that for a while. So I presented the paper presented paper went really well I Aaron Bowman presented it to the C committee they they want the same thing to be adopted for C and and then he ends Mauer took my wording which was kind of expressed in terms of engineering wording and transformed into a more

Starting point is 00:41:43 mathy thing so he wrote a paper that was eventually adopted that made everything more mathy and in my mind less intelligible but but that's that's the way the standard is usually uh created and so so you know it it should be equivalent in all cases but uh you know the core group thought it was really important to have a more mathy description rather than engineering description. So it is specified now that that part has been accepted, it is specified that storage is two's complement for integers. Right, yeah.

Starting point is 00:42:15 Okay. And then now this gives the library, core libraries, something to do to give us more tools around that, basically. Exactly, yeah. And that work was ongoing before I presented my paper, so it's not like I made it happen. I think I just kind of precipitated things

Starting point is 00:42:33 and told people, like, okay, well, if the language is not going to do it, the library really should be doing it. So Numerix is looking at that pretty intensely. I think they met in San Diego at the same time as my group was meeting, so I wasn't able to attend, but I think that's one of the things they've been working on. Do you have any idea what that will ultimately look like? Will we have new strongly typed signed integer

Starting point is 00:42:56 wrapper type things? Yeah, I would hope. It's probably something that you can customize through templates or whatever, what type of behavior you want, whether you want wrapping or trapping or something like that. All right. But, I mean, I've worked on a handful of codebases that have this type of templated integer and floating point, you know, safe helper.

Starting point is 00:43:18 So, you know, Clang has some with its AP int class and AP float class, so the arbitrary precision ones, those are used to implement things like constexpr and stuff like that. And so those allocate when you overflow and things like that by just being effectively a big int implementation. But then Chrome and WebKit both have implementations of effectively the same thing, which allow you to detect whether overflow occurs.

Starting point is 00:43:42 And so any untrusted input in browsers just goes through that type of class because you just do the arithmetic. You don't trust yourself to do it right. You don't trust the user to not be malicious. And so it's kind of your security boundary. Anything that comes from potentially malicious sources, you feed it through that integer class

Starting point is 00:44:03 and you check whether something bad happened or not. And so, and you know, Firefox is the same thing, I think it's called SafeInt that they use. So there's a handful, and I think I mentioned them in my talk as well. But it's a very common thing. It's just a question of what do we want the standard one to look like? That's assuming the performance characteristics of it are good. Or as you say, if there's a templated set of things that we can choose an emulator I was working on recently, I had to deal with integer overflow and then set the emulated CPU flags appropriately. And the best way I could come up with to do that was to do a 32 bit math and a 64 bit-bit value and then check to see if any of the height order bits were set basically and also compilers have built-ins as well for that if you look at built-in add and things like that so they they can do that that work for you as well okay the way that people do it or or you cast the unsigned you do the arithmetic and then you cast back the signs right so a lot of of the instructions like add and multiply are exactly the same.

Starting point is 00:45:07 So that's another approach to doing it. Right. So going back over some of the other features that made it through core and evolution, there are a lot of changes made to constexpr. Is that right? Yeah. Yeah. So there's a big push to make everything constexpr.

Starting point is 00:45:22 And so Louisian, who works at apple and maintains libc++ for us has been instrumental in that and he's he's he's got a lot of support from people like david vandervoort who's um who's at edg and uh they're just attacking every single constexpr thing one at a time to try to make it possible to have things like constexpr string and constexpr vector and other things like that and so um some of what they they they did was just you know allow unions to be constexpr so that uh an implementation of string that uses the small string optimization can be constexpr right so so and part of the concern they have is well they can't can't break their API, their ABI, through C++20. And so you want whatever is implemented right now to just be constexprable, right?

Starting point is 00:46:11 So union was one of the requirements here. And so what that does, basically, is the compiler has to track what the active member of a union is through constexpr evaluation, which initially was thought to be pretty hard, but people kind of experiment experimented a bit and it turns out it's not that hard especially for DG because DG emulates a lot of other compilers and so the worry was that for them it would be impossible but they tried it out and it's not

Starting point is 00:46:35 so they were really happy to go with that but there's other stuff like constexpr try is something so you can't throw in a constexpr context but you can have try and catch and if you encounter them it's still a valid constexpr try is something. You can't throw in a constexpr context, but you can have try and catch, and if you encounter them, it's still a valid constexpr. Then constexpr dynamic cast, constexpr type ID, constexpr pointer traits,

Starting point is 00:46:54 and there's a handful of other stuff. Eventually we'll get constexpr allocations in the standard as well. So that will allow you to eventually have just vector pushback and other stuff all be constexpr. Do you expect... I'm sorry, go ahead.

Starting point is 00:47:08 Yeah, go for it. Do you expect that the constexpr allocation will make it into 20, or do you expect that's something we'll see later? Oof, I'm not sure. I think we... There's someone in my team who started working on implementing some of those aspects for LLVM.

Starting point is 00:47:26 Especially for allocation, I think we might need some experience to make sure that what we're doing makes sense. You want the allocation to, whatever you've created as a vector, then be usable at run time. And so I'm not sure it'll make it to 20 because

Starting point is 00:47:42 papers that were presented in San Diego to the language group can potentially make it to 20. And I think that one was discussed, but it's a bit tight. So I guess we'll see. We still have hope that it might, but it's really not clear, and I wouldn't put money on it. Okay. Another feature I wanted to talk about was the revised memory model. What is changing exactly? Right.

Starting point is 00:48:11 So SG1 is the group that does concurrent parallelism, chaired by Olivier Giroud, who you've had on semi-recently. And part of what they do is they own like Atomics and Executors and a bunch of other stuff like that, like futures and async. And part of the memory model for coroutines and stuff. And what they've done over time is they've tried to have a formal proof that the memory model holds on existing architectures. So there's a guy who did a PhD under Peter Sewell called Mark Batty, called Mathematizing C++ Atomics or something like that. I don't quite remember. But the idea was that when you write atomics and you don't have a data race, then on every architecture that C++ targets, you can prove that your program behaves as if it were sequentially consistent.

Starting point is 00:49:09 So the model is called Sequentially Consistent for Data-Race-Free Programs that was published in the early 90s, I think, by Sarita Advay. And the idea is to use theorem proving to prove that that's the case. It's pretty rare that you can prove anything about programs, and it's kind of interesting that they've done that for Atomics, and especially since Atomics are very unintuitive to use

Starting point is 00:49:37 in a lot of cases, it's nice that if you stick to these rules, and release, acquire, and sequence consistent things, then your program will behave in this bounded set of ways on all the architectures that you target. Now, since theorem provers are also themselves programs, it turns out that there was a flaw in some of the theorems. And so they found that flaw, and they've been working at kind of fixing things, right? So the idea was, well, some really weak architectures like power and NVIDIA's GPUs,

Starting point is 00:50:11 weak in the sense of the memory model, don't actually obey what C++ said should be obeyed. And so we can either say that they're non-conformant or we can fix the memory model to make them conformant. And what you end up doing there is looking at the programs that are non-conformant, or we can fix the memory model to make them conformant. And what you end up doing there is looking at the programs that are non-conformant, saying, would anyone actually write this code? Does anyone rely on this? And it turned out that nobody would.

Starting point is 00:50:33 It's programs with four threads doing weird things with different accesses to the same variables using acquire, release, and relax, and sequential consistent. Nobody does this. And so they found a way to fix the memory model to still be compliant on those architectures

Starting point is 00:50:50 without degrading performance. They're really minor fixes. It's really kind of a niche topic, but it's really important for hardware people to know that people writing C++ code will have expected outputs when that code is executed

Starting point is 00:51:06 on their architectures. And so for them, it was really important. And they spent a bunch of time trying to fix this and finally came up with something that everyone thought was palatable. So I think it's pretty cool if you're into that type of topic. For most people, you can just keep using Atics and it'll still do the same thing. Right. Right.

Starting point is 00:51:31 I thought we had covered basically everything. Oh, nested inline namespaces are left, huh? Yes, that's another thing that we discussed, and I think we moved to SuperPost 20, and I'm not a fan of that one, I have to say. Okay. So the idea is you can, when you have inline namespaces, you want to be able to nest them and kind of say like, oh, I'm in this namespace right now, but it's inline. And so you can say right now in C++17, I think, like namespace,, namespace, foo, colon, colon, bar, colon,

Starting point is 00:52:05 colon, baz, and then open it curly. But you can't do that if it's inline, right? Okay. What this does is it allows you to put inline in there to say, oh, actually, it's inline. First, the syntax, I think, looks kind of ugly, but it's not that bad, right? But it's not really fixing what inline namespaces wanted to fix, is my opinion. And the goal of inline namespaces was to be wordy, right? So I kind of don't mind having to type more stuff when using an inline namespace, right?

Starting point is 00:52:36 Like, I agree it's convenient to type less. But in this case, that was kind of the point. So it's not a big deal. At the end of the day, I just won't use that feature and whatever. But some people really wanted it, and it got enough consensus in the language working group that it moved forward. So that's kind of how the standard works, right? Some people think some stuff is important. Some people think it's not as important. And you kind of want to allow people to get the features that they want

Starting point is 00:53:05 outside of language. If you don't like it, you don't have to use it in this case. Right. Well, is there anything that was a highlight for you, JF, that we haven't gone over yet? Well, so one thing that I was really enthused about is I wrote a paper about

Starting point is 00:53:22 deprecating volatile, and I have been joking about this for a while and finally pulled the trigger and wrote that paper. And well, first I wrote the paper with the title deprecating volatile just to get people to click on it because they're like, well, you what now? Clickbait with the standards committee, in other words. Exactly, exactly. And I think that the subtext should really be deprecating most of volatile or deprecating the bad parts of volatile.

Starting point is 00:53:48 And so I think I made a pretty strong case, first for why volatile is the way it is, right, in the paper, and then for why I think we should deprecate parts of it in C++ and in C. So I'm going to have to try to talk to the C committee about some of that volatile stuff. Now, I think everyone agrees if you design a language from scratch, you wouldn't design it with volatile as it is in C++, right? You'd have something like a volatile load and volatile

Starting point is 00:54:14 store, and volatile non-terr load and non-terr store, and that would be about it. You'd participate in the memory model. So languages like Rust and other languages do stuff like that already. So volatile has useful semantics. I try to go through them in the memory model. So languages like Rust and other languages do stuff like that already. So volatile has useful semantics. I try to go through them in the paper.

Starting point is 00:54:30 I want to preserve those, so the valid uses of volatile, but I want to remove the ones that are super error-prone or that are really non-intuitive. So for example, if you have a volatile variable like B and C

Starting point is 00:54:42 and you do A equals B equals C, how many volatile loads and stores did you perform doing that? Well, it depends if you're on C and C, and you do A equals B equals C, how many volatile loads and stores did you perform doing that? Well, it depends if you're on C, C++, and which version of the language you're using. So stuff like that is just really unintuitive. And so my paper goes through the stuff I think is important that we want to preserve, even though it might be distasteful and we do it some other way. Like, I don't want to break those uses, right?

Starting point is 00:55:05 What low-level code like, you know, kernels do, we want to preserve those. And at the same time, what's unintuitive, what's usually broken when you write it, what should you not do? And I had a bit of fun writing the paper, too. So I kind of quoted one of my favorite books from Patrick Crossfuss in the book, in the paper, kind of using some of his quotes. I thought that was pretty fun. And he seemed to enjoy it on Twitter, too. That was pretty great. So overall, the paper got really good reception from the committee. And I'm going to send an

Starting point is 00:55:35 update for the next meeting proposing actual wording changes, not just, hey, how about we do these things? So I got really strong support from the committee for that. That was my personal highlight, because I was pretty excited about it. It's in my vein of trying to deprecate things out of C++ that we shouldn't have anyways. So without breaking people's code, I'd like to simplify the language and bring it to look like

Starting point is 00:56:01 a more mature language. A more well-thought-out language. Yeah, it looks like a more mature language, like a more well-thought-out language. Yeah, it looks like a very mature language at the moment. Right, right. It has all the words. I'd like it to be mature and pretty. And then Bjarne was really clear. If you've read Design Evolution of C++, from early on, he put Valtel in C++,

Starting point is 00:56:19 but he wasn't really happy about what was there. People felt like they really needed it, and they did. And so he wanted to have a solution for them without breaking compatibility with C. And that's really why it's there. And then it kind of feature crept into looking like what const looks like, right? So into qualifiers and other stuff,

Starting point is 00:56:39 and you can qualify member functions with it, and arguments and return values. Like, what does it mean to have a volatile argument? Well, for function passing purposes, it doesn't mean anything. The calling convention is exactly the same. It doesn't show up in the ABI. It just means something local to the function itself. And same thing

Starting point is 00:56:56 for a volatile return. It doesn't mean anything. You can, in fact, have the for declaration not have the volatile and have the actual function definition have it, and it's totally valid C++. And so there's a lot of really weird stuff that volatile has that's kind of accreted through what was perceived as consistency with const. And I think that's kind of a misfeature that I'd like to get rid of. And I think if we could take just a second to back up and make sure if I maybe I could try to summarize what you said.

Starting point is 00:57:26 You want to remove most of volatile, but leave some ability to do a volatile load in a volatile store. And for our listeners who are not aware, if you can correct me if I'm wrong, please. The only legitimate use of volatile is when you're talking directly to hardware. So this volatile load and volatile store would be, I need something at this address right here, please. Well, yes, there are actually a few valid uses of volatile, and I go through them in the paper. So load and store are really the main thing.

Starting point is 00:57:57 It's not normative, but what the language says in notes and in the design intent papers that accompany the language is that if you wrote volatile in the code, then the abstract machine should do a load or a store, right? Like that's what should happen. And it's really hand wavy how it's specified, but we kind of all understand what's going on, right? The idea is like you might have a register that has special meaning or a memory location that does special things when you touch it, and you really want it to happen. So that's the main usage that I've seen for volatile. But there's uses for something like

Starting point is 00:58:28 shared memory, where alighting a load or a store through shared memory could be a security issue. And so I have a few examples of that in actual exploits in the paper where a hypervisor or browser used shared memory but failed to use volatile, and the compiler just removed the loader of the store. And so that means that you have a time-to-check, time-to-use bug in your code. So you can cause a race to happen if you're on the other side of the shared memory and you're a malicious program,

Starting point is 00:58:59 and you can cause a higher privilege process to do something it didn't want because it loads the size twice or something like that. Then it's valid to use volatile for setjump and longjump to preserve values across the setjump. So when you longjump back, the value is still there. And that's

Starting point is 00:59:16 important because setjump can return twice, which no other function can. And so the compiler has to know that it has to preserve that value across that twice-returning function. And then it's also valid to use for signal

Starting point is 00:59:31 handlers. So that's the other use case for that. Signal handlers kind of behave the same as setJump, longJump in that case. It's like well, the value should be there. Now, the thing to keep in mind is volatile can tear. And volatile doesn't participate in the memory model. It's not sequentially consistent or anything

Starting point is 00:59:48 unless you use volatile atomic. And so it's really hard to use properly if you're trying to do certain things, right? And there was a feeling initially pre-C++11 that volatile could be used to make threads happen. And in a way, it was, right? So Microsoft's implementation of C++ on x86, to this day, guarantees sequential consistency

Starting point is 01:00:11 for volatiles who load in stores. It might still tear on their implementation, but it'll be sequentially consistent if it doesn't. And if it tears, then it tears in a predictable order. And so there's kind of a wide breadth of implementations of volatile. And it's really, especially when you're trying to deal with hardware, it's really up to the

Starting point is 01:00:31 vendor of your compiler to tell you what that specific thing means on that specific hardware. So yeah, it's a bit tricky and I'd like to make that a bit less tricky. Keep the good uses and just get rid of the bad ones. Okay.

Starting point is 01:00:46 All right. Okay, well, it's been great having you on the show again, JF. Where can people find you online? Online? Well, so I have my usual Twitter account where I just tweet jokes about C++. I don't really tweet anything serious. People take me seriously, but I don't. tweet anything serious. People take me seriously, but I don't.

Starting point is 01:01:05 Please don't. Otherwise, I don't know. I participate in LVM quite a bit. And if they want to meet in person, I'll be at the LVM meetup and other things like that. So CppCon is another place where I hang out. I try to not go to too many conferences because it's kind of hard with a kid to do that. But I'll usually show up at CPCon in the standard meetings. Between the standards meetings, the LLVM meetings,

Starting point is 01:01:29 there's not a lot of time left for conferences, I would imagine, besides... Right, well, the LLVM meetings are more kind of LLVM socials in the Bay Area, and so they're kind of once a month. I usually go there, bring my kid. Okay. And it's not just LLVM people. It's more like people interested in LVM, but most of the people who contribute to LVM in the Bay Area usually show up,

Starting point is 01:01:50 like, semi-frequently. So it's kind of neat to meet people you effectively work with in the input source community and be able to move things forward more easily than through email or IRC or something like that. And because often our colleagues and other companies also work on the Standards Committee, we also end up kind of talking about Standards Committee-related stuff as well. So it's kind of an interesting little group of people.

Starting point is 01:02:16 Cool. Okay. Thanks again, JF. Yeah, thanks for having me. Yeah, thanks for coming. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a topic

Starting point is 01:02:30 we'd love to hear about that too you can email all your thoughts to feedback at cppcast.com we'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter you can also follow me at Rob W Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.

Starting point is 01:02:55 And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - San Diego EWGI Trip Report

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.