CppCast - libunifex and std::execution

Starting point is 00:00:00 Episode 385 of CppCast with guests Jessica Wong and Ian Peterson, recorded 24th of June 2024. In this episode, we talk about the Xcode 16 beta and the libc++ hardening modes, two new C++ books, and about std type identity. Then we are joined by Jessica Wong and Ian Peterson. Jessica and Ian talk to us about libunifex, an asynchronous code at Meta. Welcome to episode 385 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler, joined by my co-host, Phil Nash. Phil, how are you doing today?

Starting point is 00:01:05 I'm all right, Timo. How are you doing? I'm good. I just arrived last night in St. Louis, Missouri, for the committee meeting that's going on right now. So I am actually recording this from my hotel room. So let's see how this is going to go. I have an eight-hour jet lag. I just got here. I don't know how stable the hotel Wi-Fi is. I also got a brand new set of Bluetooth headphones that I have literally never used before. So what could possibly go wrong? Yeah, you'll be fine. You'll be fine. Yeah. Oh, I also found out on my way here yesterday, you know, when you board a flight to the US from Europe and they ask you, you know, do you have any items from other people in your luggage has the luggage always been with you on your way to the airport and

Starting point is 00:01:49 the third question is do you have any new electronic items in your luggage and so electronic devices and so actually yesterday literally yesterday my bluetooth headphones that i normally use to travel stopped working so i bought a new pair at the airport right and so i was like yeah i have actually brand new headphones here that i just bought here at the airport they were like okay and it turns out when you have a new electronic device they they basically take you into this extra room and they like search all your stuff and and everything so that was that was interesting so is that just in case the headphones are bugged by a rival podcast or something yeah i don't know i don't know but but you know apparently that's like a regulation whenever you

Starting point is 00:02:30 board a flight to the us then you can't have electronic devices otherwise you get an extra search sure bear that in mind i'll only bring old tech in future yeah yeah yeah all right um what about you phil what are you up to these days uh Oh, I wish I knew. I hate to play the sleep-deprived card again as well. Yeah, I've been traveling for the last 10 days. I think I've been sleeping in eight different cities in that time. So a little bit tired, but all I have to look forward to now is running a conference next week. Is that one of the workshops that you're doing? What are you doing there?

Starting point is 00:03:01 Where are you going? No, next week I'm running C++ on C. All right. Yeah, yeah. i know about that one okay um i actually have one more thing i want to talk about briefly so um i will actually take a couple months off i'm taking like some parental leave or you could also call it like a mini sabbatical so uh july and august and probably first half of september maybe as well um I will not be around. I'm decided to take a little bit of a break. Don't do any work.

Starting point is 00:03:29 Ideally, don't look at my laptop unless I really have to. So it felt like I kind of really need a break. It's been a lot of stuff going on. So I want to kind of spend focus, focus on my family, spend time with them, take a few months off. So I will not be around for the next few episodes. So hopefully, Phil, you can keep it running and I'll be back in September. And yeah, do you know already, like, are you going to stick to the two-week schedule or are you going to take a break here as well?

Starting point is 00:03:56 Are you going to have guest co-hosts? Do you know? Well, I will hold the fort. I mean, just to say, I think this has been a long time coming. You deserve some time off. Actually get some sleep, I think this has been a long time coming. You deserve some time off. Actually get some sleep, I hope. I'm going to plan to try to stick to the two-week schedule. And it shouldn't be too bad over the summer. Not going to be quite as busy as last time we did this.

Starting point is 00:04:17 So I'm hopeful. So we do have a pipeline of very exciting guests for the next few episodes. So I'm very much looking forward to listening to those. And thank so much phil for holding the fort and no problem i will i will be back in september enjoy your time off all right so so thank you so much so at the top of every episode we'd like to read a piece of feedback so we did this episode with sean baxter uh about uh safety and cycle a couple couple episodes ago and we got a lot of feedback after that episode. So here's another email. So somewhere during that episode,

Starting point is 00:04:50 when we were talking about Circles, Borrow, Checker, and all of that stuff, I made a claim that game dev is one of those industries that don't care quite as much about memory safety as some other industries. And we got an email from Justin who disagrees with that.

Starting point is 00:05:05 And Justin wrote, I wanted to push back on this idea from the last episode that memory safety is not a game dev concern. In fact, I'd say industry trends have increasingly made stability and safety a higher priority. Long running life services are becoming more and more common and memory and security bugs just aren't acceptable anymore. Additionally, game engines are running in all sorts of other industries now. Actually, that's true. We had a guest, I think, a few months ago, Mark Gillard, who was working on medical software that

Starting point is 00:05:31 actually is running Unreal Engine to do medical simulations and stuff. Yes, it's definitely a thing. Thanks, Justin. The days where you put a game in a box and could throw your code base away are long gone. Safety, correctness, and accuracy are more important than they've ever been, and trading them for micro-optimizations is often

Starting point is 00:05:48 not worth it anymore. C++ will continue to bleed users to other languages unless safety is addressed and game dev is not exempt from that. What's the point of trying to evolve the language if the hard problems don't get solved? If safety doesn't get addressed, you may as well just make 23 the last standard and move on to Rust. Thanks for the great podcast. Well, thank you, Justin. I am very happy that you're enjoying the podcast. And thank you very much for setting me straight about what I said about game dev. You're right.

Starting point is 00:06:15 It's good to know. And thanks for that. And that's what happens when you make a throwaway comment. Yeah. So we'd like to hear your thoughts about the show. You can always reach out to us on xmastered on LinkedIn or email us at feedback at cppcast.com. Joining us today are Jessica Wong and Ian Peterson. Jessica is a software engineer at Meta. She started learning C++ while working on real-time backend systems and now works with Unifex, a new paradigm of

Starting point is 00:06:41 asynchronous programming in C++. Jessica is using her experience in Unifex to contribute to std execution, which will hopefully be included in the upcoming C++26 standard. Outside of work, Jessica enjoys traveling and experiencing the world upside down through aerial silks. Ian has been a software engineer at Meta since 2018, working primarily on libraries and tools for mobile C++. His current focus is on maintaining and deploying Unifex.

Starting point is 00:07:08 Prior to meta, Ian spent nine years in the office division at Microsoft working on projects, scheduling algorithm, and Outlook search experience. Ian's interest in C++ is a balance between reveling in the language's deep intricacies and striving to write concise, correct code.

Starting point is 00:07:23 He also has fun coaching others in the effective use of language and debugging his colleagues' weirdest crashes. Jessica, Ian, welcome to the show. Hello. Thank you. Hello. Thank you.

Starting point is 00:07:34 Now, Jessica, I've actually got two questions for you based on your bio. Okay. You mentioned aerial silks. I wanted to know what they are. But before you answer that one, the other question is, does seeing the world upside down actually help with async workflows? Sometimes.

Starting point is 00:07:51 It might uncover things you haven't seen before. But yeah, no, aerial silks is if you've gone to like a Cirque du Soleil show or a circus show, right, you've seen these like really intricate, I guess, fabric that hangs from the ceiling. And then you have these like acrobats that kind of just do tricks in them. And that's, that's what aerial silks is. And I found that's really good exercise. So if you ever want to give that a try, I highly recommend it.

Starting point is 00:08:19 That sounds really cool. Yeah. I think I've seen a performance of that once or twice, but never tried it, but that sounds really fascinating. You should try it I've seen a performance of that once or twice, but never tried it. But that sounds really fascinating. You should try it. That might help you see the world differently. That's awesome. All right, so Jessica, Ian, we'll get more into your work in just a few minutes.

Starting point is 00:08:36 But before we do that, we have a couple of news articles to talk about. So either of you can comment on these if you like. The first news item I have for this week is that Xcode 16 is not out yet, but the beta is out yet. You can download it from developerapple.com. And apparently there are people who use Xcode still as an IDE for C++. So I think the proper release of Xcode 16 will probably be around September.

Starting point is 00:09:03 I think that's the usual schedule, right? But you can download the beta now. It has some new IDE features, which I'm not going to go into. I don't think that's the most interesting part. I think the most interesting part is that if you use Xcode as an IDE for C++, Xcode 16 also comes with Apple Clang 16, right? So it's an update of the compiler. So Apple Clang, they kind of have their own fork. They also have their own version numbers. That always confuses me. So I think Apple Clang 16 is now actually based on proper Clang 17.

Starting point is 00:09:34 I never know how to figure out, like, which version of which relates to which version. If anybody knows, please let me know. I think you just have to know. But I think it's basically Clang 17. So it supports a bunch of new c++ 23 and even some c++ 26 features from both the language and the library that you can now get out of the box on your mac if you use the native mac tool chain so that's really cool the other thing uh which is another feature that libc++ has from that version on that now also ships with Xcode and the Apple toolchain is hardening.

Starting point is 00:10:08 That is super interesting. And they made that accessible directly through Xcode. There's like a new built setting for hardening. There's four modes, no, fast, extensive, and debug. And those are basically a form of contract checking. So no means you get no extra checks it's like whatever it used to do before yes fast uh you get the most critical checks that's just kind of container element access and kind of whether your input range is valid for some

Starting point is 00:10:38 of the kind of stl stuff which you now get checked there's like like a bounce check on it. And if that bounce check fails, your program terminates hard and fast to save you from undefined behavior and other nasty things that can happen. If you have that undefined behavior, that's a recommended mode for production now. Then there's extensive, which enables all the checks. So not just the range and kind of container ones, but all the checks that are kind just the range and container ones, but all the checks that are low overhead and easy to check.

Starting point is 00:11:08 And then there's the debug mode, which enables all checks that are there, including the internal asserts of the library itself, which can be quite slow, so you're not supposed to use that in production, but maybe for debugging. So that's really, really interesting. It kind of is a subset of what we're doing. It's kind of a superset of a subset of what we're doing

Starting point is 00:11:27 for the contracts proposal for C++ 26. But it's really cool that you get that out of the box in one of the major compiler toolchain vendors now. So that's really cool. There's also on llvm.org, there's an article about how this hardening actually works, like a little bit more details about the different options and what they mean, what checks get and and all of that so

Starting point is 00:11:47 i think that's that's really a major kind of step towards safer and more secure standard library implementations so i thought that was newsworthy yeah i mean it's been around for a while but it ships now with xcode so that's kind of the that's the news here yeah and that's the thing here it's uh this is not really about xcode it's about the the apple clang version which as you say is based on clang 17 but even if you don't use xcode the the version that ships with the latest xcode is usually the one that's going to ship with the next mac os release which will also be out in september october so you might want to target that if you're going to be relying on having the latest OS platform in the near future. So that's significant, even if you don't use Xcode.

Starting point is 00:12:32 Do you know if these hardening levels affect how the optimizer takes advantage of VB? It sounds like it addresses things like you won't dereference some random memory, but does it also affect things like the examples of time travel that Raymond Chen, I think his name is, has talked about in Old New Thing? Oh, yeah, yeah. So, I mean, this is the thing with contract checks in general, right? This is just like one implementation of that. But the thing is that if you have a check

Starting point is 00:13:02 and you know that the program is going to deterministically terminate if that check fails, which is exactly what happens here, then there can be no time travel optimization through that. an invalid pointer or index or something like the compiler can't just assume that you will never reach that function and just optimize away the whole code path because like everything that is before the check you know has still has to be there because now basically it's not ub anymore right because if the check fails you don't get to that line that does the illegal dereference right nice so so yeah that that is quite interesting yeah so it's basically not ub anymore to do this this is kind of the whole point uh you get deterministic termination instead all right so speaking of raymond chen it feels like he's releasing a new blog post every day or so there have actually been uh quite a few new blog posts the last couple weeks um since we

Starting point is 00:14:02 recorded last episode that were interesting not just from Raymond, but from a lot of other people. Don't have time to cover them all. I just want to quickly point out two that I found interesting. One is the one by Raymond Chen that I noticed, which was about std type identity, which is like a little feature that's been around since it was 20. And you wouldn't think that it's particularly interesting, but it's like this little thing where you can wrap a type around it and thus disable

Starting point is 00:14:29 type deduction, for example, if you don't want type deduction to happen for whatever reason, because you're doing CTAD or there are other scenarios where you don't want that. Anyway, it's a little utility, but that blog post had quite a few likes on Reddit, so I was kind of surprised like,

Starting point is 00:14:45 oh, wow, like people don't know that this little feature exists still. And apparently it's quite interesting. So yeah, he's basically explaining what it's for. It's kind of cool also for me to see because that's a paper that I actually wrote. So we standardized that back in C++20. So that's kind of fun that people still care

Starting point is 00:15:02 about this little feature. That's, I thought that was fun. It's one of those things that you don't know you need until you need it. Yeah, exactly. Oh, and then we have two new C++ books that have been published just in the last few weeks. One is an e-book written by our friends from PVS Studio, Andrey Karpov and Dmitry Svidetkin, about, it's called

Starting point is 00:15:28 A C++ Programmer's Guide to Undefined Behavior. And it's an 11-part book, and part one of that has now been published. It deals with implicit type conversions. It's kind of similar to some of the stuff that the PVS Studio people have published before. It's kind of some examples of horrible code that unexpectedly compiles and does something, and then you don't understand why it does that. And then they explain it to you. And then through that, you learn something interesting about C++. So there's a new ebook full of that stuff.

Starting point is 00:16:02 Everything they do is usually quite high quality so i think it's pretty cool stuff i haven't actually read it myself yet yeah i just scan through it looks pretty good um and and the other new book that came out literally just now um i thought also was newsworthy it's um by anders schauknatten well i'm not sure how to pronounce it i think he's norwegian i probably didn't pronounce his name correctly. My name is Anders Knotten. But he is the guy who runs cppquiz.org and organizes CPP quizzes at different conferences, and they usually be awesome.

Starting point is 00:16:38 And so his book is kind of riffing off that. It's called C++ Brain Teasers, Exercise Your Mind. So it's a little bit similar to the other one. He has 25 short C++ programs, and you kind of have to guess what the output is, which is also how his quizzes work. And then he kind of explains why that's the case and why the language works like that. However, unlike C++Quiz.org, the book has more elaborate and well-written explanations, according to Anders, explaining the underlying principles of the language. And he also says that the puzzles were selected to be more cohesive

Starting point is 00:17:12 and relevant to real-world users. And the explanations include lots of practical tips to write better and safer code in practice. That sounds really good. That is available as an e-book and also actually as a physical paperback, but only if you live in the US. I couldn't figure out how to order a hard copy to Europe. Maybe that's going to be possible at some point, which is ironic because I think Anders actually lives in Europe.

Starting point is 00:17:36 Yeah. Well, Norway. Norway, yeah. So let's see. But that's another book that I think sounds really interesting. Yeah. And Anders will be at C++ on C, which as we record this is next week. So still time to buy tickets and come and see him. All right.

Starting point is 00:17:54 So that wraps up our news items for today. So we can transition to our main topic. So we're going to talk about libunifx and asynchronous code, which is a really exciting topic. The idea for this started actually at the last committee meeting in Tokyo in March, where I bumped into Jessica and we had not met before, but we kind of briefly had a chat, the usual, like, where do you work? What do you do? And we were thinking for a long time about doing a CPCast episode about Unifx because

Starting point is 00:18:22 it's a very interesting library and kind of sender, receiver and asynchronous programming in general. People struggle with it. It's kind of very interesting, right? And it seemed to me immediately that Jessica would be a great person to talk to about this because she just seemed to know a lot about this. And then we invited her on the show

Starting point is 00:18:37 and then Jessica said, oh, I would like to bring along my colleague Ian because that would be even more fun to have both of us on the show. So we did that. So now we have both of us on the show. So we did that. So now we have both of you on the show. And thank you very much for coming and welcome again. Thanks so much for having us. Thank you. It's exciting to be here.

Starting point is 00:18:53 And I should also say this is the first time Phil and I are doing an episode with two guests at the same time. So let's see how that goes. I think Rob and Jason did a few of those. But yeah, for us, it's the first time. So hopefully it's going to go smoothly. Well, we're talking about async, so it should sort itself out. All right. So my first question to both of you actually is, so what is libunifex actually? And what is the problem to which unifex is the solution?

Starting point is 00:19:18 Yeah, so unifex is an open source C++ library. It's almost completely headers, but it does have some source files. It's maintained by meta. And the original problem that it solves, as far as I know, is proving that the P0443 proposal, which I think was called a unified proposal for executors or something like that, could be implemented. So the first commit to the GitHub repo was in something like 2018 by Lewis Baker, and it's something like a thousand lines. I think he must've been working in private before he made the first commit. So Meta joined with the P0443 proposal midway through its life. I think that paper had something like 14 revisions.

Starting point is 00:20:06 And the first time that a Meta employee was listed as an author was around revision seven or so. And so Louis Baker, Eric Niebler, Lee Howes, and Kirk Shoup were employees of Meta at the time, and they were contributing to this paper. And I think that they joined the paper to sort of steer it in a different direction. And in order to aid their argument that that direction ought to be pursued, they implemented Unifex as like a prototype, a proof of concept to prove that it could be implemented. So that's sort of the practical solution that was the reason that we needed code. But the programming problem that it solves is it is a library of sort of asynchronous combinators. It seems to me that it's similar in spirit to the ranges library, that there's lots of little small algorithms that can be composed together to solve larger problems. But the problems that it's solving is describing asynchronous and concurrent algorithms rather than range-based algorithms. I think it's a fascinating algorithm. I think the,

Starting point is 00:21:13 there's an algorithm library. I think the sort of amazing, not the amazing thing, the sort of most interesting thing about it is that it brings structured concurrency to C++ in a composable way. And structured concurrency, I think the best explanation I've seen of it is a blog post. I forget the author's name, sorry. But the title is Notes on Structured Concurrency or Go Statements Considered Harmful. And it goes into explaining how to extend Edgar Dkstra's note on structured programming to include concurrency in your programming model and it goes on to explain why we're doing concurrent programming all wrong in c++ and unifex is an implementation of a solution to that problem so how do you structure concurrency you introduce an invariant that no parent completes until all of its children have completed.

Starting point is 00:22:09 And here I'm talking about children are like asynchronous things that you've kicked off. So once you enforce that invariant, a common asynchronous pattern is no longer allowed. And that is that you put a function on the back of a thread queue to be run on that thread later. Or you register a listener on some event source to be invoked when an event happens. All of these things amount to creating a divergence in your control flow where the thing doing the registering or adding the function to the thread

Starting point is 00:22:43 is one control flow that continues after the registration has happened. And the code that has been registered is a new control flow that is logically a child of the registerer. But there's no back pointer. Once the registered child code runs, there's no join back with the parent. So the blog post that I referred to explains that when Dijkstra talks about structured programming, at the time, the only way to implement an if is check a condition. And then if a condition is true, literally just jump somewhere else. That was the go-to. With structured programming, any of the jumps have to come back. So for example, if you have an if, you may or may not evaluate the body of the if depending on the value of

Starting point is 00:23:37 the condition that you check at runtime. But regardless of whether you do evaluate the body of the if, the next thing after that is always whatever comes after the if statement. Same thing with functions. If you invoke a function, whatever happens inside the body of the function, magic happens. But whatever that is, once it's done, you return to the call site. So in order to enforce this same invariant with concurrent code, you can imagine that a function might run its body in parallel, but the function doesn't return until all of those parallel tasks have completed. So once you introduce and enforce this invariant, you have to write your code differently. But then you regain this abstraction that structured programming introduced, where you can sort of squint and ignore that inside a function call there is concurrency happening so it's almost

Starting point is 00:24:27 like a new control structure that you could call like fast sequential like code like you call a function and internally it's faster because it's doing concurrency or maybe it's not faster maybe it's just better in some other way but it's logically still a control structure that starts and then completes. Right. Yeah, that makes a lot of sense. Thank you. So I remember first coming across this idea, I think it was 2019, I think, or 2018, 2019. We had a committee meeting, I think it was in Belfast, where that was one of the meetings where this earlier version of that proposal was discussed quite a lot. This whole idea about executors and sender receiver and all that. And I remember at some point, I just didn't understand it.

Starting point is 00:25:16 And I asked Kirk Shoup, who was one of the authors, Hey, can you talk me through this? So we ended up going to a pub and he talked to me for three hours. He tried to talk me through how this actually works and i remember not understanding it um later um i think i had a conversation with somebody else who did explain it to me in a way that kind of makes sense but like can you tell me if this is like roughly right kind of what we normally do today is we have a function and then for example you want to register a callback or something so we just say you know register callback and we give it a function pointer and then that function is going to be called on some other thread somewhere else and you don't do it like that anymore right instead you have something like, here's the shape of like, almost like a graph, like how, like, what are the different like, kind of streams of execution that, and this is how they interact with each other.

Starting point is 00:26:13 And then you kick it off separately from that. You kind of separate like what the shape of this asynchronous problem is and like when and how and where you run it. You kind of separate those things a little bit, like how you separate, I don't know, like iterators, algorithms, and containers in the STL. Is it a little bit like that? Or is that not helpful? So I think that's true. But if I'm understanding you correctly,

Starting point is 00:26:38 what you've described is almost an implementation detail of how structured concurrency is being introduced in P2300 and in Unifex. I think the central new thing that you need to learn while you are busy unlearning how you think you should be doing asynchrony in C++ is if you think about a function call, a regular old synchronous function call, when you invoke a function, the stack frame, the activation frame for the caller is effectively suspended, right? None of those variables are actually directly usable anymore by the CPU. You've substituted a new activation frame for the callee.

Starting point is 00:27:22 And while you're busy running the called function, there's an activation frame that is the stack frame for the local variables of the callee. And the callee can't access the caller. So this is sort of the structured property. You suspend the caller, then the callee takes over and is blocking the thread. And then once it's done, it returns and then you reactivate the caller. So the other caller. The same thing happens with structured concurrent code, except that there's this new primitive that you don't necessarily have to dominate the thread for the entire duration of the activation of the child. So you start a new operation that's a child it's a bit like a function but instead of only being

Starting point is 00:28:07 allowed to either continue computing or return you can also suspend and say i'm not i there's something i need that's not available yet i need to wait for it to be sort of available to me you can suspend the thread which requires moving your activation frame onto the heap and then once whatever it is you're waiting for is available, you get resumed. And so you can pick up where you left off. And then once you're done, then you return to your caller. Well, that sounds a lot like coroutines to me.

Starting point is 00:28:34 Yeah, it's very similar. It's compatible with coroutines. And that's kind of what makes, you effects so so awesome i guess so jessica could you maybe talk a little bit about what the difference is between this model and like the actual coroutines the way we have them in the the language now because it seems like what you're doing is is kind of different it's more stuff on top of that and doesn't actually use coroutines it's just a similar mental model what's like the difference there? I wouldn't say there's, I mean, it is different, but coroutines is also like a type of sender.

Starting point is 00:29:13 So that's why they're compatible in that sense. So for a lot of the sender algorithms and Unifex, you can actually co-await them. So in that sense, they're actually compatible i think the biggest difference really is that senders the algorithms that we're proposing such as like in any sender of um is not a coroutine it's a sender so i think you can think of it like all coroutines are kind of like a subset of a sender but not all senders are coroutines right if that makes sense so a bit like a plain language array is a container but not every container is an array yeah yeah

Starting point is 00:29:59 that's interesting yeah i need to think about that. It's funny because this proposal has been around since I think at least 2018. And every time I hear somebody explain it, there's a little detail of it that is new and like, oh, this is an interesting idea. I understand that. But it's very difficult to grasp the entire picture of it. There's something about it that makes that hard. I don't know what it is. Perhaps one way to get to the bottom of that is by example. So how do you use Unifex at Meta? Yeah. So Unifex adoption at Meta was actually inspired by Folly, specifically Folly Coro, which is another type of C++ async library

Starting point is 00:30:48 that we use internally. But the primary use case for Folly Coro is mainly server-side development. And so it doesn't really optimize for things like binary size and memory utilization, which are both resource limitations on smaller devices. And so when we first introduced Unifex, Oculus was actually really excited about it and is one of our early adopters. And Messenger and Instagram on both Android and iOS also were one of our early adopters, mainly through Arsis, which is our c++ client library

Starting point is 00:31:28 and that mainly does calling and actually arsis is currently our largest production deployment of unifex um so this is kind of where unifex really shines it's mostly for resource constrained devices and brings from shipping currency to that nice interesting that you contrast that with server-side applications because i think there is a move to make those more efficient as well for lots of the same reasons really so i could see it being very useful there yeah there's definitely like you know capacity issues right now but you know like you can definitely add as many um ram as you want right to like solve immediate problems if you need to right whereas you don't really have that solution for like mobile devices right

Starting point is 00:32:16 but if you are running on a server you've um you've got the problem of working at scale as well. So does it scale well as a solution? You mean Folicoro on servers or Unifex? I was thinking about Unifex, but just these libraries in general. Oh, I don't know if we actually have any implementation of Unifex on server-side developments, but I do think it would scale. Well, I don't know't know even if you know yeah i think that um unifex has benefited from like lewis baker was heavily involved in developing the coroutine library and folly yeah um and so unifex is kind of his second system it he's learned

Starting point is 00:33:00 from folly and so i think that unifex would scale. And actually, I should step back a little bit. The proposal P2300 has evolved and Unifex has not kept up. And so there are some features in the proposed standard that are not in Unifex that I think could affect the scaling question. So I think the standard library will actually scale better than Unifex does. We did have, you can look in the history of the PRs in the Unifex GitHub, that somebody tried to use the static thread pool that's in part of Unifex and ran

Starting point is 00:33:33 into scaling issues. There's a lock and a condition variable, and somebody discovered through profiling that signaling that condition variable was showing up on profiles. There was a PR that was not merged because Lewis figured out that it was going to introduce a race condition. It was trying to address this contention issue. I think the problem there is really that the static thread pool in Unifex is kind of,

Starting point is 00:33:59 I don't know if it's fair to call it prototype code. It's not been hand-tuned for large scale. If somebody was to write a work-stealing thread pool and make it model the scheduler concept, I don't see any reason you wouldn't be able to use that within the sender-receiver model on large machines. So I think the gap that Jessica is describing is not that Unifex isn't suitable for servers. It's that we have found that Folly Cover, while it is usable and great for using on servers, it doesn't get the attention that it might need to scale down to smaller devices. Whereas Unifex takes more advantage of static polymorphism versus the dynamic polymorphism in Folly. And that has allowed us to specialize things for phones more effectively. So one thing that I find really interesting,

Starting point is 00:34:51 I think most libraries that end up either being very popular or end up in a standardization proposal or potentially both, they have this kind of life cycle where somebody starts out solving a very concrete problem for a concrete product. Like, for example, we had Nils Lohmann here on the show a couple of months ago who was talking about his JSON library, right? He was saying, okay, we had a very concrete problem. So I wrote some code to solve that. Then it became a library.

Starting point is 00:35:18 Then it kind of started growing. And sometimes those libraries then eventually become standardization proposals, like with some of the boot stuff that ended up like in the standard, right? But it seems like for Unifex, it's kind of the other way around, right? People have tried to develop a, you know, from first principles, like a model for how to do structured, you know, concurrency, like for the standard. And then you implemented that as just a proof of concept and then later it developed into like something that you can use in production code. So it's kind of almost the other way around, right?

Starting point is 00:35:51 I find that really interesting. Yeah. But you mentioned that there are differences between, like now they have kind of diverged again. So there are differences between what we're basically discussing here at the committee meeting, what you want to get into hopefully CSR 26, and what the library that you're using at Meta actually does. Can you talk a little bit more about,

Starting point is 00:36:11 are there any concrete differences that are interesting where things have diverged for particular reasons? The thing that comes to mind is the dependently typed senders. So I'm not sure how to explain that if you don't already know how senders and receivers work well maybe start with dependently typed yes is that in the like the c++ template sense or is that in the sort of the functional programming sense so i only know enough about functional programming to be dangerous, but I think the answer is the latter. So suppose you have a, as Jessica said earlier, you can think of a sender and a coroutine.

Starting point is 00:36:53 They're quite analogous to each other. So if you think of a coroutine that produces, say, a vector of stuff, a vector of ints, right? One of the features of P2300 is that when you are running the work described by a sender, you have previously connected it to a receiver to produce an operation state. And a receiver is just the place that's going to receive the results of the work. So the sender is a description of work. And once you start running it, it will do some stuff and then it will produce an output. and the output is given to the receiver. And a receiver provides something called an environment to a sender, which is sort of like an execution context. It can include things like stop tokens or an allocator that you might use or an execution context like a thread pool or something that if you need to schedule work, you can use the scheduler. So the sender can ask the receiver for this environment and use the

Starting point is 00:37:50 results of that to change how it does its work. And one of the things you can ask it for is an allocator. So if you want to start some work and provide, say, a pool allocator just for that work, that's an option that you can do. So then you might like this coroutine that's going to produce a vector of ints to produce a vector of ints specialized on the allocator provided by the receiver. And you don't actually know what that allocator is until you connect the sender to the receiver. So you can't, I don't think you can do this with coroutines because if you were to write this as a coroutine, you need to provide in the function signature that this returns a task of vector of int and an allocator.

Starting point is 00:38:31 But you won't know what the allocator is until you've invoked it, until you've started the work. And so the return type of that coroutine can't be specified with the current coroutine spec. But in P2300... And should we do some kind of polymorphic allocator type arrays thing, something like that. Right.

Starting point is 00:38:50 So if you wanted to use literally coroutines to do it, you could work around the problem by saying it's a vector of int with a PMR allocator, a PMR vector of int. Yeah. But P2300 allows you to say, well, the thing that I produce depends on which execution environment I run in. And so you defer computation of the type. So you can make the allocation strategy dependent on

Starting point is 00:39:13 whether it runs on just a thread or maybe on a GPU or somewhere else or whatever, right? And you don't have to make that choice when you write the algorithm. Yes. That is so cool. It is super cool. The thing about it that is exciting to me as someone who's focused on mobile code

Starting point is 00:39:31 is that in Unifex, you can't say whether a given sender will produce an exception because determining that answer depends upon the receiver that you're connected to, because connect is allowed to throw an exception. So when you, or no, not connect, set value. There's various parts in the communication between the sender and receiver that could potentially throw an exception.

Starting point is 00:39:54 And so you can't know in advance whether a sender could produce an exception, because it might produce an exception right at the end as it tries to produce its output. But with P2300, the way that it's evolved away from Unifex, you can actually make the error results of a sender depend on the receiver that it is eventually connected to. And so by the time it's got connected, you can say, ah, this receiver will not cause me to add exceptions to my possible error output. So if you're otherwise a no-fail sender and you're connected to a receiver that does not introduce problems, then you can say we don't throw

Starting point is 00:40:29 exceptions. So sort of the equivalent of being able to determine at compile time, but rather late in the process that this is effectively a no-accept sender, which in mobile code means you get to eliminate a bunch of exception tables and reduce your binary size. And then I remember Eric posted a Godbolt in some context. I forget where he shared it. That shows a function that returns a sender and the work described by that sender could be done in parallel. If you were to run this work on a thread pool, you could take advantage of the multiple threads.

Starting point is 00:41:03 But if you're going to run it on a single thread, you could also do that. It would just run in series. And he has an example showing where you have a function that returns the sender and whether it runs in parallel or not is determined by which execution context you run it on. So you can invoke the same function twice, take one result and run it on a thread pool, take the other result and run it on a single thread and it's the context in which you run that sender that determines how much parallelism you extract from the algorithm and this i believe also relies on the dependently typed sender feature of

Starting point is 00:41:36 p2300 interesting so you mentioned p2300 like quite a lot obviously that's the proposal that's making its way through the standard you're both actually active on the committee now right so jessica you met at the last committee meeting in tokyo and ian you were here this week in st louis uh do you know are you actually involved in the kind of standardization process do you know where this proposal is is there any chance that we still get it in c++ 26 do you know what the current status is because i think you know we wanted to get something like this into CSS 20, it didn't work out, and then it didn't work out with CSS 23. Like, where are we now? Yeah, so the paper Ian and I are working on is only one paper amongst the collection of papers that will be part of SID execution. So Ian and I are working on P3149, which introduces Unifex Async Scope and hopefully standardizing it in C++26.

Starting point is 00:42:32 We're having regular meetings with the other authors of the SID execution, like Eric Niebler, Wes Baker, Kirk Shoup, and some others as well. And I think we're making good progress. So far, we do seem to be on track. So hopefully we will get this in C++ 26. That is really exciting. But don't quote me on that if that doesn't happen. I mean, this has happened before, right? That you were really excited about getting something in and then at the last minute it was like,

Starting point is 00:43:05 oh, now there's this other issue we have to figure out and we then have to also review all the wording for this and we don't have time. Yes, yes. And it's a massive paper. Yeah, yeah. But I keep my fingers crossed. This is really exciting.

Starting point is 00:43:19 And it seems like it unlocks whole new paradigms about how to write asynchronous code. So yeah, it would be really cool to have just out of the box in C++. like it unlocks like whole new paradigms about you know how to write asynchronous code so yeah that would be really cool to have just out of the box in c++ um so good luck with that i haven't i have to admit i haven't been following it very closely i you know i kind of dive in a little bit from time to time like and try to understand like okay where's what's going on there but like i don't really closely follow it it's just there's too much going on on the committee but i keep my fingers crossed for you oh thank you i've been following it even less but

Starting point is 00:43:48 i seem to remember in the c++ 23 time frame that the start of a new standard there's always a document what has been for a while anyway you know the bold plan for version x of c++ and there was a prioritization of her still execution for C++23. Clearly wasn't quite enough. Has that been re-expressed for C++26? Because I haven't looked at that document. Is it part of the bold plan? Yeah.

Starting point is 00:44:15 There's definitely a lot of push from the organizers to get P2300 into c++ 26 um ian and i have both experienced a lot of urgency from from the organizers on making sure we we have our paper ready and making sure we will help get p2300 into into c++ 26 so yeah there are definitely there's definitely a lot of urgency and push there to make sure this happens so i just i'm just looking this up there's definitely a lot of urgency and push there to make sure this happens. So I'm just looking this up. You mean this paper by Wittler-Votilainen, right? To boldly suggest an overall plan for CISO 26. So it mentions four kind of major features

Starting point is 00:44:56 that we should make progress on in this cycle. And execution is actually the first one that's being listed. It's even ahead of reflection. It says like execution reflection contracts pattern matching. So execution is listed even ahead of reflection. That's a good sign. So yeah, we are, I mean, that paper is just a suggestion, right? Like at the end of the day, you know,

Starting point is 00:45:20 we are working our way through the stuff and what gets in, gets in. But like my impression also, just from the volume of emails that have been going on, like it's a very high priority thing on the committee right now. I think there's more work being done on this than on reflection right now, probably. Obviously, reflection is also making progress

Starting point is 00:45:38 and everybody's excited about reflection. But yeah, there's also a lot of focus on execution as well. So there's actually another related topic that I wanted to mention before we wrap up this episode. We have a little bit of time left, I think, to talk about it, which is not writing asynchronous code, which can be very difficult, and there are different paradigms to do it

Starting point is 00:46:00 and different ways to do it, and you're working on one such way. But it's also once you have that code, you need to maintain it and different ways to do it and you're working on one touch way but it's also once you have that code you need to maintain it you need to debug it and uh debugging asynchronous code is something that can be very very hard and i'm not aware of really good tools for that and so i was wondering if the two of you have anything you can share on that like how do you deal with asynchronous coded meta that you have to kind of debug or look into what's going on?

Starting point is 00:46:28 Do they have any particular tools for that? Or does Unifex actually help with that? Does kind of structured concurrency make it easier? And anything, any advice you can give to people who have this problem themselves? So I think that Unifex,

Starting point is 00:46:44 it makes it both better and worse. I think the way that it makes it better is that it's a, I think it's a common experience when debugging asynchronous code that, you know, if you're looking at a crash or whatever, or if you're stopped in the debugger and you're looking at the state of the system. Often the current stack is, you know, several frames that you're interested in. It's code that you've written that is whatever business logic that you're trying to understand.

Starting point is 00:47:11 And then you go back several frames and then there's like the work loop of some executor, some thread loop that's just spinning, taking work off a queue. And then going back further from there, you get to like your pthreads library, whatever, having spawned the thread. And that's it.

Starting point is 00:47:27 And there's no state that you can observe in the debugger that explains to you why that particular piece of logic that you're interested in is running at this moment. There's no equivalent to a return address that shows you the chain of execution that led to this asynchronous work starting. And structured concurrency brings with it the necessary sort of foundation to solve that problem because the invariant that we mentioned earlier in the episode that no parent completes until all of its children have completed means that there is a return address somewhere. There is the parent that's waiting for this work to finish somewhere in the system. Now, the existing tools don't make that parent obvious, but at least it's somewhere there in memory. And so you could build tools that uses the pointers that you've left lying around

Starting point is 00:48:17 to go find that parent. And you could build what amounts to a stack trace, except that instead of it showing what the current thread is doing, and it started because you launched a thread some while ago, you can instead show what the current task is doing, where a task is whatever asynchronous unit of work you've launched. So structured concurrency brings this ability to the table. And if you build the tools, then you can all of a sudden see what your asynchronous code is doing

Starting point is 00:48:46 as if it was a synchronous function. And Folly Coro actually has the tooling available now. This is something that Lewis Baker did while he was still at Meta. If you're using the coroutine type Folly Coro task to express your asynchronous work, then while you push and pop coroutines in your overall asynchronous task, in the background, the library is busy pushing and popping an asynchronous stack

Starting point is 00:49:14 that is an in-memory manually managed data structure that represents the stack of that work. And the library also ships with... I know that the integration with LLDB works. I'm not sure if GDB integration also works, but it's intended to. You can actually inspect that stack with debugger integration.

Starting point is 00:49:34 There's a Python script that knows how to look at the memory, find this data structure and dump a asynchronous stack to your debugger console. And I'm working on bringing that to Unifex as well. My plan is to just take that library and suck it into Unifex and augment all of the sender algorithms with async stack management

Starting point is 00:49:57 so that once the integration is done, you'll be able to do the same thing and see the asynchronous stack at any point in a debugger or if you collect crash dumps and teach your crash dump parser how to find this information, you could also get the same thing out of crashes. So with the right tooling, it's significantly better because you all of a sudden have this new understanding. As of today, Unifex does not have this support. And so if you spend a lot of time and effort learning how to read Unifex's stack traces, then you can learn to interpret them, which I think does make debugging better than what you have with just a callbacks-based asynchronous model.

Starting point is 00:50:49 But I will freely admit that that learning process is arduous. One of the ways that both Unifex and P2300 express this structure concurrency is you end up building a type, a very deeply nested template type, where the async stack is encoded in the type. So I have seen crash dumps where the symbol for a single frame, just the symbol for that one frame is 11 kilobytes,

Starting point is 00:51:21 11,000 bytes just for the name of a function on the stack. So parsing that, you have to run the frame through Clang format just to be able to see with levels of indentation, what the heck is this? Once you do that, if you know how to read what those symbols mean, you can figure out like, oh, I know where this came from and what state I'm in. But it's difficult to understand. So right now it's better and worse. So when you're looking at these stack

Starting point is 00:51:52 traces, are you saying, I don't even see the matrix anymore? No, I'm not there yet. I'm at a point where I can read the matrix. Let me put it that way.

Starting point is 00:52:06 Okay. But that's very specific to LibUnifex in this case. Do you have any more general advice for anybody working in async code in general? Or is it always going to be very specific to the library or language feature that you're basing on? Well, I mean, I think once structured concurrency is accessible to people and the tooling catches up, then you won't need special advice, right? I think that's an aspect of the main selling point

Starting point is 00:52:38 of structured concurrency is that the structure of your program, even though it's an asynchronous one, is embedded in the runtime state. If you are not benefiting from structured concurrency, then I'm just as stuck as you are. So I do have actually one more question here because, you know, we mentioned that, you know, structured concurrency will eventually take over the world. So I have written asynchronous code before i used to do like lots of audio programming in particular like a low level like the low level glue code that you need to do like to make that work like when i was working on the juice framework for example earlier native instruments where you interact with a lot of kind of apis that are just kind of callback based where you pass around function pointers.

Starting point is 00:53:27 For example, you want to, you know, make some sound and there's like some kind of low level OS API that gives you say, okay, here you can register a callback. That's where you put in your like processing function. And then whenever the audio interface calls that on some other thread that is somewhere else, then that's where you get your processing done. Or, you know, if the, if the user yanks out the headphone plug or whatever, you get another callback that the configuration has changed on whatever thread that you don't control.

Starting point is 00:53:57 There's a lot of stuff where that is just very unstructured. I would say asynchronous programming, but where that comes from, that's just how the very unstructured. I would say asynchronous programming, but where that comes from, that's just how the operating system API works. So that's just how this library works. So that's how we've been doing it for the last 20 years. So we're going to continue doing it this way, right? I think, especially in audio,

Starting point is 00:54:18 I don't know about like other domains in audio, this thinking is kind of quite pervasive. So I wonder like if there's any advice or anything you can say to somebody who lives in that world like should we all migrate to structured concurrency is unifex something you could use in other domains other companies other places or or other other libraries that people are going to write um but kind of make that better, where do you see the future of all this kind of beyond of meta and kind of the more specific use cases that you're working on? So I think callbacks are always going to be like a thing, right? But Unifex has a bunch of algorithms that kind of help makes that a little bit easier.

Starting point is 00:55:03 Like Unifex Create can take your callback and like convert it into like a sender that you could just use. So then it kind of brings you some level of structure, right? But there's still, you know, that boundary between structured and unstructured code. So I definitely think outside of meta, Unifex has applications for that. I know that NVIDIA has their own version of this

Starting point is 00:55:25 called Studexec, right? You can kind of play around with that if you'd like, but there's definitely, you know, Unifex is there for applications beyond just structured code, right? It was really helpful when we were actually trying to convert unstructured code to structured code. and the paper that we're proposing for async scope is

Starting point is 00:55:51 kind of one of the algorithms that is kind of the glue between making that conversion a lot easier i will say that the ramp up is pretty steep i'm sure you've noticed right it's in terms of like just talking about senders receivers and schedules and all that stuff, even internally, when we, when we were working with the RSS engineers, the ramp up was, was pretty steep. And that was a large challenge for everyone, for all of us involved, right? None of us have really ever used it before. None of us really knew all these concepts were really, really new to all of us. So we learned a lot by making mistakes. And I think we're kind of out of place now where we kind of know what we're doing. But I do think it is worth it in the end. But it will take,

Starting point is 00:56:41 as Ian mentioned, a lot of time and a pain to get there. Yeah. And I think that in the long, long run, you should be able to push that boundary that Jessica's talking about down to very low levels. Kirk Shoup built, I think he called it a cyclotron. And I don't know if it's literally a cyclotron or if it's like a simulator on some kind of one of those, like an Arduino or a Raspberry Pi or something. It's one of those very small computing devices. And he mapped the interrupt service vectors to the sender receiver model. And so fundamentally, an interrupt service vector, that's the CPU invoking a callback that you've previously registered.

Starting point is 00:57:31 So at that level, you're just stuck with whatever the API, the chip vendor has given you. But like Jessica mentioned, Unifex has this create algorithm that can adapt those kinds of things. I don't know how, I haven't seen the code, so I don't know how Kirk did it, but you can map callbacks to senders pretty concisely. And so he was able to go all the way down to the metal and write, I don't know if he would consider

Starting point is 00:57:55 whether he's written an operating system or if it's an application that runs directly on the CPU, but that is written in a sender receiver style code. And you may want to talk to Ben Dean. I forget who else he's working with. I think he's got a partner working on the library together. They've got a publicly accessible implementation of sender-receiver, and I think it's called Bare Metal Sender-Receiver.

Starting point is 00:58:16 And Ben said to me once that the code that he works on runs on the power supply of your computer. And so it never actually stops running because it's the thing that responds to the power button or something like that and he's using sender receiver to do that level of code so i think what i would say to teamer's question is i hope that sender receiver well not specifically sender receiver i hope structure concurrency takes over the world. And I think it could. Sounds like a future

Starting point is 00:58:48 I would like to live in. Nope, I don't have a future. So you've both been forced to become experts at 8-sync programming, but has that left you enough time to have anything else interesting in the

Starting point is 00:59:05 world to say plus plus that you find particularly interesting or exciting i'm excited for modules i know that's already out but we don't actually have it at meta yet oh as it's more out than execution but we don't have it yet at meta so i'm really really excited for when we when we actually get that i think that will duplicate the head of rose yeah yeah i'm uh curious to see what's gonna happen with the um there's various papers floating around related to making exceptions more efficient. I'm curious to see what happens in that space. Yes.

Starting point is 00:59:49 I think reflection is interesting. Seems like it's a long ways off. Well, maybe not. Well, actually people are saying you might get it in the C++ 26. Oh, that's exciting. Yeah, yeah. Yeah, it's making good progress.

Starting point is 01:00:04 It's been making good progress kind of very recently, like this year, there has been a lot of progress. I think there's been a lot of progress all this time under the hood, but like during the last half a year or so, it kind of resurfaced, like, and people are more aware of it and like that the paper is quite mature. Oh, the other thing that I'm really curious to see where it goes, but I think it's early days yet, is the response to memory safety being a concern. I'm interested to see how C++ evolves in sort of contention with languages like Rust. Yeah, I mean, we had quite a few episodes on this topic. Different people have very different opinions on this, so I guess you'll just have to wait and see.

Starting point is 01:00:48 Yeah. All right. So we're again over time. So I think we have to wrap up, but this was a fascinating discussion. Thank you both very much for coming on the show. I learned a lot about structured concurrency and asynchronous programming and LibUnifix.

Starting point is 01:01:04 And I wish you all the best for your proposal hope it goes through and gets into the upcoming standard and yeah thanks again for being on our show it was it was a pleasure to have you on thanks so much for having us thanks for having us this was fun yeah this is great thanks so much for listening in as we chat about t++ we'd love to hear what you think of the podcast please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a guest or topic We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in. Or if you have a suggestion for a guest or topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com.

Starting point is 01:01:49 The theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - libunifex and std::execution

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.