CppCast - libunifex and std::execution
Episode Date: June 28, 2024Jessica Wong and Ian Petersen join Timur and Phil. Ian and Jessica talk to us about libunifex and other async code projects at Meta, how it has evolved in the proposed std::execution and what structur...ed concurrency is. News XCode 16 beta The std library that ships with XCode 16 supports "hardening" libc++ hardening modes "What’s the deal with std::type_identity?" - Raymond Chen "C++ programmer's guide to undefined behavior: part 1 of 11" - PVS Studio "C++ Brain Teasers: Exercise Your Mind" - Anders Schau Knatten Links "std::execution" - P2300R9 "async_scope – Creating scopes for non-sequential concurrency" - P3149R3 "Notes on structured concurrency, or: Go statement considered harmful" Folly Coro
Transcript
Discussion (0)
Episode 385 of CppCast with guests Jessica Wong and Ian Peterson, recorded 24th of June 2024.
In this episode, we talk about the Xcode 16 beta and the libc++ hardening modes,
two new C++ books, and about std type identity.
Then we are joined by Jessica Wong and Ian Peterson.
Jessica and Ian talk to us about libunifex, an asynchronous code at Meta.
Welcome to episode 385 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Timo Dummler, joined by my co-host, Phil Nash.
Phil, how are you doing today?
I'm all right, Timo. How are you doing?
I'm good. I just arrived last night in St. Louis, Missouri, for the committee meeting that's going on right now. So I am actually recording this from my hotel room. So let's
see how this is going to go. I have an eight-hour jet lag. I just got here. I don't know how stable
the hotel Wi-Fi is. I also got a brand
new set of Bluetooth headphones that I have literally never used before. So what could
possibly go wrong? Yeah, you'll be fine. You'll be fine. Yeah. Oh, I also found out on my way here
yesterday, you know, when you board a flight to the US from Europe and they ask you, you know,
do you have any items from other people in your luggage has the luggage always been with you on your way to the airport and
the third question is do you have any new electronic items in your luggage and so electronic
devices and so actually yesterday literally yesterday my bluetooth headphones that i normally
use to travel stopped working so i bought a new pair at the airport right and so i was like yeah i have
actually brand new headphones here that i just bought here at the airport they were like okay
and it turns out when you have a new electronic device they they basically take you into this
extra room and they like search all your stuff and and everything so that was that was interesting
so is that just in case the headphones are bugged by a rival podcast or something
yeah i don't know i don't know but but you know apparently that's like a regulation whenever you
board a flight to the us then you can't have electronic devices otherwise you get an extra
search sure bear that in mind i'll only bring old tech in future yeah yeah yeah all right um what
about you phil what are you up to these days uh Oh, I wish I knew. I hate to play the sleep-deprived card again as well.
Yeah, I've been traveling for the last 10 days.
I think I've been sleeping in eight different cities in that time.
So a little bit tired, but all I have to look forward to now is running a conference next week.
Is that one of the workshops that you're doing?
What are you doing there?
Where are you going?
No, next week I'm running C++ on C.
All right. Yeah, yeah. i know about that one okay um i actually have one more thing i want to talk
about briefly so um i will actually take a couple months off i'm taking like some parental leave or
you could also call it like a mini sabbatical so uh july and august and probably first half of
september maybe as well um I will not be around.
I'm decided to take a little bit of a break.
Don't do any work.
Ideally, don't look at my laptop unless I really have to.
So it felt like I kind of really need a break.
It's been a lot of stuff going on.
So I want to kind of spend focus, focus on my family, spend time with them, take a few
months off.
So I will not be around for the next few episodes.
So hopefully, Phil, you can keep it running and I'll be back in September.
And yeah, do you know already, like, are you going to stick to the two-week schedule or are you going to take a break here as well?
Are you going to have guest co-hosts? Do you know?
Well, I will hold the fort.
I mean, just to say, I think this has been a long time coming.
You deserve some time off. Actually get some sleep, I think this has been a long time coming. You deserve some time off.
Actually get some sleep, I hope.
I'm going to plan to try to stick to the two-week schedule.
And it shouldn't be too bad over the summer.
Not going to be quite as busy as last time we did this.
So I'm hopeful.
So we do have a pipeline of very exciting guests for the next few episodes.
So I'm very much looking forward to listening to those. And thank so much phil for holding the fort and no problem i will i will
be back in september enjoy your time off all right so so thank you so much so at the top of every
episode we'd like to read a piece of feedback so we did this episode with sean baxter uh about uh
safety and cycle a couple couple episodes ago and we got a lot of feedback after that episode.
So here's another email.
So somewhere during that episode,
when we were talking about Circles,
Borrow, Checker, and all of that stuff,
I made a claim that game dev
is one of those industries
that don't care quite as much about memory safety
as some other industries.
And we got an email from Justin
who disagrees with that.
And Justin wrote,
I wanted to push back on this idea from the last episode that memory safety is not a game dev concern. In fact, I'd say industry trends have increasingly made stability and safety a higher
priority. Long running life services are becoming more and more common and memory and security bugs
just aren't acceptable anymore. Additionally, game engines are running in all sorts of other
industries now. Actually, that's
true. We had a guest, I think, a few
months ago, Mark Gillard, who was
working on medical software that
actually is running Unreal Engine to
do medical simulations and stuff.
Yes, it's definitely a thing. Thanks,
Justin. The days where you put a
game in a box and could throw your code base away
are long gone. Safety,
correctness, and accuracy
are more important than they've ever been, and trading them for micro-optimizations is often
not worth it anymore. C++ will continue to bleed users to other languages unless safety is addressed
and game dev is not exempt from that. What's the point of trying to evolve the language if the
hard problems don't get solved? If safety doesn't get addressed, you may as well just make 23 the
last standard and move on to Rust. Thanks for the great podcast.
Well, thank you, Justin.
I am very happy that you're enjoying the podcast.
And thank you very much for setting me straight about what I said about game dev.
You're right.
It's good to know.
And thanks for that.
And that's what happens when you make a throwaway comment.
Yeah.
So we'd like to hear your thoughts about the show.
You can always reach out to us on xmastered on LinkedIn or email us at feedback at cppcast.com. Joining us today are
Jessica Wong and Ian Peterson. Jessica is a software engineer at Meta. She started learning
C++ while working on real-time backend systems and now works with Unifex, a new paradigm of
asynchronous programming in C++. Jessica is using her experience in Unifex to contribute to std execution, which will
hopefully be included in the upcoming C++26 standard.
Outside of work, Jessica enjoys traveling and experiencing the world upside down through
aerial silks.
Ian has been a software engineer at Meta since 2018, working primarily on libraries and tools
for mobile C++.
His current focus is on maintaining
and deploying Unifex.
Prior to meta,
Ian spent nine years in the office division
at Microsoft working on projects,
scheduling algorithm,
and Outlook search experience.
Ian's interest in C++ is a balance
between reveling in the language's deep intricacies
and striving to write concise, correct code.
He also has fun coaching others
in the effective use of language
and debugging his colleagues' weirdest crashes.
Jessica, Ian, welcome to the show.
Hello.
Thank you.
Hello.
Thank you.
Now, Jessica, I've actually got two questions for you
based on your bio.
Okay.
You mentioned aerial silks.
I wanted to know what they are.
But before you answer that one, the other question is,
does seeing the world upside down actually help with async workflows?
Sometimes.
It might uncover things you haven't seen before.
But yeah, no, aerial silks is if you've gone to like a Cirque du Soleil show
or a circus show, right, you've seen these like really intricate, I guess,
fabric that hangs from the ceiling.
And then you have these like acrobats that kind of just do tricks in them.
And that's, that's what aerial silks is.
And I found that's really good exercise.
So if you ever want to give that a try, I highly recommend it.
That sounds really cool. Yeah.
I think I've seen a performance of that once or twice, but never tried it,
but that sounds really fascinating. You should try it I've seen a performance of that once or twice, but never tried it. But that sounds really fascinating.
You should try it.
That might help you see the world differently.
That's awesome.
All right, so Jessica, Ian,
we'll get more into your work in just a few minutes.
But before we do that,
we have a couple of news articles to talk about.
So either of you can comment on these if you like.
The first news item I have for this week is that Xcode 16 is not out yet,
but the beta is out yet.
You can download it from developerapple.com.
And apparently there are people who use Xcode still as an IDE for C++.
So I think the proper release of Xcode 16 will probably be around September.
I think that's the usual schedule,
right? But you can download the beta now. It has some new IDE features, which I'm not going to go
into. I don't think that's the most interesting part. I think the most interesting part is that
if you use Xcode as an IDE for C++, Xcode 16 also comes with Apple Clang 16, right? So it's an
update of the compiler. So Apple Clang, they kind of have their own fork.
They also have their own version numbers.
That always confuses me.
So I think Apple Clang 16 is now actually based on proper Clang 17.
I never know how to figure out, like,
which version of which relates to which version.
If anybody knows, please let me know.
I think you just have to know.
But I think it's basically Clang 17.
So it supports a bunch of new c++ 23 and even some c++ 26 features from both the language and the library
that you can now get out of the box on your mac if you use the native mac tool chain so that's
really cool the other thing uh which is another feature that libc++ has from that version on that now also ships with Xcode and the Apple toolchain is hardening.
That is super interesting.
And they made that accessible directly through Xcode.
There's like a new built setting for hardening.
There's four modes, no, fast, extensive, and debug.
And those are basically a form of contract checking.
So no means you get no extra
checks it's like whatever it used to do before yes fast uh you get the most critical checks
that's just kind of container element access and kind of whether your input range is valid for some
of the kind of stl stuff which you now get checked there's like like a bounce check on it. And if that bounce check fails,
your program terminates hard and fast to save you from undefined behavior
and other nasty things that can happen.
If you have that undefined behavior,
that's a recommended mode for production now.
Then there's extensive, which enables all the checks.
So not just the range and kind of container ones, but all the checks that are kind just the range and container ones,
but all the checks that are low overhead and easy to check.
And then there's the debug mode, which enables all checks
that are there, including the internal asserts
of the library itself, which can be quite slow,
so you're not supposed to use that in production,
but maybe for debugging.
So that's really, really interesting.
It kind of is a subset of what we're doing.
It's kind of a superset of a subset of what we're doing
for the contracts proposal for C++ 26.
But it's really cool that you get that out of the box
in one of the major compiler toolchain vendors now.
So that's really cool.
There's also on llvm.org,
there's an article about how this hardening actually works,
like a little bit more details about the different options
and what they mean, what checks get and and all of that so
i think that's that's really a major kind of step towards safer and more secure standard library
implementations so i thought that was newsworthy yeah i mean it's been around for a while but it
ships now with xcode so that's kind of the that's the news here yeah and that's the thing here it's uh this is
not really about xcode it's about the the apple clang version which as you say is based on clang
17 but even if you don't use xcode the the version that ships with the latest xcode is usually the
one that's going to ship with the next mac os release which will also be out in september
october so you might want to target that if you're going to be relying on having the latest OS platform in the near future.
So that's significant, even if you don't use Xcode.
Do you know if these hardening levels affect how the optimizer takes advantage of VB?
It sounds like it addresses things like you won't dereference some random memory, but does it also affect things like the examples of time travel
that Raymond Chen, I think his name is,
has talked about in Old New Thing?
Oh, yeah, yeah.
So, I mean, this is the thing with contract checks in general, right?
This is just like one implementation of that.
But the thing is that if you have a check
and you know that the program is going to deterministically terminate if that check fails, which is exactly what happens here, then there can be no time travel optimization through that. an invalid pointer or index or something like the compiler can't just assume that you will never
reach that function and just optimize away the whole code path because like everything that is
before the check you know has still has to be there because now basically it's not ub anymore
right because if the check fails you don't get to that line that does the illegal dereference
right nice so so yeah that that is quite interesting yeah so it's basically not ub
anymore to do this this is kind of the whole point uh you get deterministic termination instead
all right so speaking of raymond chen it feels like he's releasing a new blog post every day or
so there have actually been uh quite a few new blog posts the last couple weeks um since we
recorded last episode that were interesting not just from Raymond, but from a lot of other people.
Don't have time to cover them all.
I just want to quickly point out two that I found interesting.
One is the one by Raymond Chen that I noticed, which was about std type identity, which is
like a little feature that's been around since it was 20.
And you wouldn't think that it's particularly interesting, but it's like this
little thing where you can wrap a type
around it and thus disable
type deduction, for example, if you don't want type
deduction to happen
for whatever reason, because you're doing CTAD
or there are other scenarios where
you don't want that. Anyway, it's a little utility,
but that blog post had
quite a few likes on Reddit, so
I was kind of surprised like,
oh, wow, like people don't know
that this little feature exists still.
And apparently it's quite interesting.
So yeah, he's basically explaining what it's for.
It's kind of cool also for me to see
because that's a paper that I actually wrote.
So we standardized that back in C++20.
So that's kind of fun that people still care
about this little feature.
That's, I thought that was fun. It's one of those things that you don't know you need
until you need it. Yeah, exactly.
Oh, and then we have two new C++ books that
have been published just in the last few weeks. One is an e-book
written by our friends from PVS Studio,
Andrey Karpov and Dmitry Svidetkin,
about, it's called
A C++ Programmer's Guide to Undefined Behavior.
And it's an 11-part book,
and part one of that has now been published.
It deals with implicit type conversions.
It's kind of similar to some of the stuff
that the PVS Studio people have published before. It's kind of some examples of horrible code that unexpectedly compiles and does something,
and then you don't understand why it does that. And then they explain it to you. And then through
that, you learn something interesting about C++. So there's a new ebook full of that stuff.
Everything they do is usually quite high quality so i think it's
pretty cool stuff i haven't actually read it myself yet yeah i just scan through it looks
pretty good um and and the other new book that came out literally just now um i thought also
was newsworthy it's um by anders schauknatten well i'm not sure how to pronounce it i think
he's norwegian i probably didn't pronounce his name correctly.
My name is Anders Knotten.
But he is the guy who runs cppquiz.org and organizes CPP quizzes at different conferences,
and they usually be awesome.
And so his book is kind of riffing off that.
It's called C++ Brain Teasers, Exercise Your Mind.
So it's a little bit similar
to the other one. He has 25 short C++ programs, and you kind of have to guess what the output is,
which is also how his quizzes work. And then he kind of explains why that's the case and why the
language works like that. However, unlike C++Quiz.org, the book has more elaborate and
well-written explanations, according to Anders, explaining the underlying principles of the language.
And he also says that the puzzles were selected to be more cohesive
and relevant to real-world users.
And the explanations include lots of practical tips
to write better and safer code in practice.
That sounds really good.
That is available as an e-book and also actually as a physical paperback,
but only if you live in the US.
I couldn't figure out how to order a hard copy to Europe.
Maybe that's going to be possible at some point, which is ironic because I think Anders actually lives in Europe.
Yeah.
Well, Norway.
Norway, yeah.
So let's see.
But that's another book that I think sounds really interesting.
Yeah. And Anders will be at C++ on C, which as we record this is next week.
So still time to buy tickets and come and see him.
All right.
So that wraps up our news items for today.
So we can transition to our main topic.
So we're going to talk about libunifx and asynchronous code, which is a really exciting topic.
The idea for this started actually at the last committee meeting in Tokyo in March,
where I bumped into Jessica and we had not met before, but we kind of briefly had a chat,
the usual, like, where do you work?
What do you do?
And we were thinking for a long time about doing a CPCast episode about Unifx because
it's a very interesting library and kind of sender, receiver and asynchronous programming in general.
People struggle with it.
It's kind of very interesting, right?
And it seemed to me immediately
that Jessica would be a great person
to talk to about this
because she just seemed to know a lot about this.
And then we invited her on the show
and then Jessica said,
oh, I would like to bring along my colleague Ian
because that would be even more fun
to have both of us on the show.
So we did that. So now we have both of us on the show. So we did that.
So now we have both of you on the show. And thank you very much for coming and welcome again.
Thanks so much for having us.
Thank you. It's exciting to be here.
And I should also say this is the first time Phil and I are doing an episode with two guests at the
same time. So let's see how that goes. I think Rob and Jason did a few of those. But yeah,
for us, it's the first time. So hopefully it's going to go smoothly.
Well, we're talking about async, so it should sort itself out.
All right.
So my first question to both of you actually is,
so what is libunifex actually?
And what is the problem to which unifex is the solution?
Yeah, so unifex is an open source C++ library.
It's almost completely headers, but it does have some source files.
It's maintained by meta. And the original problem that it solves, as far as I know,
is proving that the P0443 proposal, which I think was called a unified proposal for executors or
something like that, could be implemented. So the first commit to
the GitHub repo was in something like 2018 by Lewis Baker, and it's something like a thousand
lines. I think he must've been working in private before he made the first commit. So Meta joined
with the P0443 proposal midway through its life. I think that paper had something like 14 revisions.
And the first time that a Meta employee was listed as an author was around revision seven or so.
And so Louis Baker, Eric Niebler, Lee Howes, and Kirk Shoup were employees of Meta at the time,
and they were contributing to this paper. And I think that they joined the paper to sort of steer it in a different direction.
And in order to aid their argument that that direction ought to be pursued, they implemented
Unifex as like a prototype, a proof of concept to prove that it could be implemented.
So that's sort of the practical solution that was the reason that we needed code. But the programming problem that it solves is it is a library of sort of asynchronous combinators.
It seems to me that it's similar in spirit to the ranges library, that there's lots of little small algorithms that can be composed together to solve larger problems. But the problems that it's solving is describing asynchronous and concurrent
algorithms rather than range-based algorithms. I think it's a fascinating algorithm. I think the,
there's an algorithm library. I think the sort of amazing, not the amazing thing, the sort of
most interesting thing about it is that it brings structured concurrency to C++ in a composable way. And structured concurrency,
I think the best explanation I've seen of it is a blog post. I forget the author's name,
sorry. But the title is Notes on Structured Concurrency or Go Statements Considered Harmful.
And it goes into explaining how to extend Edgar Dkstra's note on structured programming to include concurrency in your
programming model and it goes on to explain why we're doing concurrent programming all wrong
in c++ and unifex is an implementation of a solution to that problem so how do you structure
concurrency you introduce an invariant that no parent completes until all of its children have completed.
And here I'm talking about children are like asynchronous things that you've kicked off.
So once you enforce that invariant, a common asynchronous pattern is no longer allowed.
And that is that you put a function on the back of a thread queue to be run on that thread later.
Or you register a listener on some event source
to be invoked when an event happens.
All of these things amount to creating a divergence
in your control flow where the thing doing the registering
or adding the function to the thread
is one control flow that continues after the registration has happened.
And the code that has been registered is a new control flow that is logically a child of the registerer.
But there's no back pointer.
Once the registered child code runs, there's no join back with the parent.
So the blog post that I referred to explains that when Dijkstra talks about structured programming,
at the time, the only way to implement an if is check a condition. And then if a condition is true,
literally just jump somewhere else. That was the go-to. With structured programming, any of the jumps have to come back. So for example,
if you have an if, you may or may not evaluate the body of the if depending on the value of
the condition that you check at runtime. But regardless of whether you do evaluate the body
of the if, the next thing after that is always whatever comes after the
if statement. Same thing with functions. If you invoke a function, whatever happens inside the
body of the function, magic happens. But whatever that is, once it's done, you return to the call
site. So in order to enforce this same invariant with concurrent code, you can imagine that a
function might run its body in parallel, but the function doesn't return until all of those parallel tasks have completed.
So once you introduce and enforce this invariant, you have to write your code differently.
But then you regain this abstraction that structured programming introduced, where you can sort of squint and ignore that inside a function call there is concurrency happening so it's almost
like a new control structure that you could call like fast sequential like code like you call a
function and internally it's faster because it's doing concurrency or maybe it's not faster maybe
it's just better in some other way but it's logically still a control structure that starts and then completes.
Right. Yeah, that makes a lot of sense. Thank you.
So I remember first coming across this idea, I think it was 2019, I think, or 2018, 2019.
We had a committee meeting, I think it was in Belfast, where that was one of the meetings where this earlier version of that proposal was discussed quite a lot.
This whole idea about executors and sender receiver and all that.
And I remember at some point, I just didn't understand it.
And I asked Kirk Shoup, who was one of the authors,
Hey, can you talk me through this?
So we ended up going to a pub and he talked to me for three hours.
He tried to talk me through how this actually works and i remember not understanding it um later um i think i had a conversation with somebody else who did explain it to me in a way that kind of makes sense but like can you tell
me if this is like roughly right kind of what we normally do today is we have a function and then for example you
want to register a callback or something so we just say you know register callback and we give
it a function pointer and then that function is going to be called on some other thread somewhere
else and you don't do it like that anymore right instead you have something like, here's the shape of like, almost like a graph, like how, like, what are the different like, kind of streams of execution that, and this is how they interact with each other.
And then you kick it off separately from that.
You kind of separate like what the shape of this asynchronous problem is and like when and how and where you run it.
You kind of separate those things a little bit, like how you separate, I don't know,
like iterators, algorithms, and containers in the STL.
Is it a little bit like that?
Or is that not helpful?
So I think that's true.
But if I'm understanding you correctly,
what you've described is almost an implementation detail
of how structured concurrency is being introduced
in P2300 and in Unifex. I think the central new thing that you need to learn while you are busy
unlearning how you think you should be doing asynchrony in C++ is if you think about a function call, a regular old synchronous
function call, when you invoke a function, the stack frame, the activation frame for
the caller is effectively suspended, right?
None of those variables are actually directly usable anymore by the CPU.
You've substituted a new activation frame for the callee.
And while you're busy running the called function, there's
an activation frame that is the stack frame for the local variables of the callee. And
the callee can't access the caller. So this is sort of the structured property. You suspend
the caller, then the callee takes over and is blocking the thread. And then once it's done,
it returns and then you reactivate the caller. So the other caller. The same thing happens with
structured concurrent code, except that there's this new primitive that you don't necessarily
have to dominate the thread for the entire duration of the activation of the child.
So you start a new operation that's a child it's a bit like a function but instead of only being
allowed to either continue computing or return you can also suspend and say i'm not i there's
something i need that's not available yet i need to wait for it to be sort of available to me you
can suspend the thread which requires moving your activation frame onto the heap and then once
whatever it is you're waiting for is available,
you get resumed.
And so you can pick up where you left off.
And then once you're done, then you return to your caller.
Well, that sounds a lot like coroutines to me.
Yeah, it's very similar.
It's compatible with coroutines.
And that's kind of what makes, you effects so so awesome i guess so jessica
could you maybe talk a little bit about what the difference is between this model and like the
actual coroutines the way we have them in the the language now because it seems like what you're
doing is is kind of different it's more stuff on top of that and doesn't actually use coroutines
it's just a similar mental model what's like the difference there? I wouldn't say there's, I mean, it is different,
but coroutines is also like a type of sender.
So that's why they're compatible in that sense.
So for a lot of the sender algorithms and Unifex,
you can actually co-await them.
So in that sense, they're actually compatible i think the biggest difference really is that senders the
algorithms that we're proposing such as like in any sender of um is not a coroutine it's a sender
so i think you can think of it like all coroutines are kind of like
a subset of a sender but not all senders are coroutines right if that makes sense so a bit
like a plain language array is a container but not every container is an array yeah yeah
that's interesting yeah i need to think about that. It's funny because this proposal has been around since I think at least 2018. And every time I hear somebody explain it, there's a little detail of it that is new and like, oh, this is an interesting idea. I understand that. But it's very difficult to grasp the entire picture of it. There's something about it that makes that hard.
I don't know what it is.
Perhaps one way to get to the bottom of that is by example.
So how do you use Unifex at Meta?
Yeah.
So Unifex adoption at Meta was actually inspired by Folly,
specifically Folly Coro,
which is another type of C++ async library
that we use internally. But the primary use case for Folly Coro is mainly server-side development.
And so it doesn't really optimize for things like binary size and memory utilization, which are both resource limitations on smaller devices.
And so when we first introduced Unifex,
Oculus was actually really excited about it
and is one of our early adopters.
And Messenger and Instagram on both Android and iOS
also were one of our early adopters,
mainly through Arsis, which is our c++ client library
and that mainly does calling and actually arsis is currently our largest production deployment
of unifex um so this is kind of where unifex really shines it's mostly for resource constrained
devices and brings from shipping currency to that
nice interesting that you contrast that with server-side applications because i think there
is a move to make those more efficient as well for lots of the same reasons really so
i could see it being very useful there yeah there's definitely like you know capacity issues
right now but you know like you can definitely add as many um ram as you want right to like solve immediate problems if you need to
right whereas you don't really have that solution for like mobile devices right
but if you are running on a server you've um you've got the problem of working at scale as well. So does it scale well as a solution?
You mean Folicoro on servers or Unifex?
I was thinking about Unifex, but just these libraries in general.
Oh, I don't know if we actually have any implementation of Unifex
on server-side developments, but I do think it would scale.
Well, I don't know't know even if you know
yeah i think that um unifex has benefited from like lewis baker was heavily involved in developing
the coroutine library and folly yeah um and so unifex is kind of his second system it he's learned
from folly and so i think that unifex would scale. And actually, I should step back a little bit.
The proposal P2300 has evolved and Unifex has not kept up.
And so there are some features in the proposed standard that are not in Unifex that I think could affect the scaling question.
So I think the standard library will actually scale better than Unifex does. We did have, you can look in the history of the PRs in the Unifex
GitHub, that
somebody tried to use the
static thread pool that's in part
of Unifex and ran
into scaling issues. There's a
lock and a
condition variable, and somebody
discovered through profiling that signaling that
condition variable was showing up on profiles.
There was a PR that was not merged because Lewis figured out that it was going to introduce a race condition.
It was trying to address this contention issue.
I think the problem there is really that the static thread pool in Unifex is kind of,
I don't know if it's fair to call it prototype code.
It's not been hand-tuned for large scale.
If somebody was to write a work-stealing thread pool and make it model the scheduler concept,
I don't see any reason you wouldn't be able to use that within the sender-receiver model on large machines.
So I think the gap that Jessica is describing is not that Unifex isn't suitable for servers. It's that we have found that Folly Cover, while it is usable and great for using on servers, it doesn't get the attention that it might need to scale down to smaller devices.
Whereas Unifex takes more advantage of static polymorphism versus the dynamic polymorphism in Folly.
And that has allowed us to specialize things for phones more effectively.
So one thing that I find really interesting,
I think most libraries that end up either being very popular
or end up in a standardization proposal or potentially both,
they have this kind of life cycle where somebody starts out solving
a very concrete problem for a concrete product.
Like, for example, we had Nils Lohmann here on the show a couple of months ago who was talking about his JSON library, right?
He was saying, okay, we had a very concrete problem.
So I wrote some code to solve that.
Then it became a library.
Then it kind of started growing.
And sometimes those libraries then eventually become standardization proposals, like with some of the boot stuff that ended up like in the standard, right?
But it seems like for Unifex, it's kind of the other way around, right?
People have tried to develop a, you know, from first principles, like a model for how to do structured, you know, concurrency, like for the standard.
And then you implemented that as just a proof of concept
and then later it developed into like something
that you can use in production code.
So it's kind of almost the other way around, right?
I find that really interesting.
Yeah.
But you mentioned that there are differences between,
like now they have kind of diverged again.
So there are differences between what we're basically discussing here
at the committee meeting, what you want to get into hopefully CSR 26,
and what the library that you're using at Meta actually does.
Can you talk a little bit more about,
are there any concrete differences that are interesting
where things have diverged for particular reasons?
The thing that comes to mind is the dependently typed senders.
So I'm not sure how to explain that if you don't
already know how senders and receivers work well maybe start with dependently typed yes is that in
the like the c++ template sense or is that in the sort of the functional programming sense
so i only know enough about functional programming to be dangerous, but I think the answer is the latter.
So suppose you have a, as Jessica said earlier, you can think of a sender and a coroutine.
They're quite analogous to each other.
So if you think of a coroutine that produces, say, a vector of stuff, a vector of ints, right? One of the features of P2300 is that when you are
running the work described by a sender, you have previously connected it to a receiver to produce
an operation state. And a receiver is just the place that's going to receive the results of the
work. So the sender is a description of work. And once you start running it, it will do some
stuff and then it will produce an output. and the output is given to the receiver.
And a receiver provides something called an environment to a sender, which is sort of like an execution context.
It can include things like stop tokens or an allocator that you might use or an execution context like a thread pool or something that if you need to schedule work, you can use the scheduler. So the sender can ask the receiver for this environment and use the
results of that to change how it does its work. And one of the things you can ask it for is an
allocator. So if you want to start some work and provide, say, a pool allocator just for that work,
that's an option that you can do.
So then you might like this coroutine that's going to produce a vector of ints to produce a vector of ints specialized on the allocator provided by the receiver.
And you don't actually know what that allocator is until you connect the sender to the receiver.
So you can't, I don't think you can do this with coroutines because if you were to write this as
a coroutine, you need to provide in the function signature
that this returns a task of vector of int and an allocator.
But you won't know what the allocator is
until you've invoked it, until you've started the work.
And so the return type of that coroutine
can't be specified with the current coroutine spec.
But in P2300...
And should we do some kind of polymorphic allocator type arrays thing,
something like that.
Right.
So if you wanted to use literally coroutines to do it,
you could work around the problem by saying it's a vector of int
with a PMR allocator, a PMR vector of int.
Yeah.
But P2300 allows you to say,
well, the thing that I produce depends on which execution environment I run in.
And so you defer computation of the type.
So you can make the allocation strategy dependent on
whether it runs on just a thread or maybe on a GPU
or somewhere else or whatever, right?
And you don't have to make that choice when you write the algorithm.
Yes.
That is so cool.
It is super cool.
The thing about it that is exciting to me
as someone who's focused on mobile code
is that in Unifex,
you can't say whether a given sender will produce an exception
because determining that answer
depends upon the receiver that you're connected to,
because connect is allowed to throw an exception.
So when you, or no, not connect, set value.
There's various parts in the communication between the sender and receiver
that could potentially throw an exception.
And so you can't know in advance whether a sender could produce an exception,
because it might produce an exception right at the end as it tries to produce its output.
But with P2300, the way that it's evolved away from
Unifex, you can actually make the error results of a sender depend on the receiver that it is
eventually connected to. And so by the time it's got connected, you can say, ah, this receiver will
not cause me to add exceptions to my possible error output. So if you're otherwise a no-fail
sender and you're
connected to a receiver that does not introduce problems, then you can say we don't throw
exceptions. So sort of the equivalent of being able to determine at compile time, but rather
late in the process that this is effectively a no-accept sender, which in mobile code means you
get to eliminate a bunch of exception tables and reduce your binary size.
And then I remember Eric posted a Godbolt in some context.
I forget where he shared it.
That shows a function that returns a sender and the work described by that sender could be done in parallel.
If you were to run this work on a thread pool,
you could take advantage of the multiple threads.
But if you're going to run it on a single thread,
you could also do that. It would just run in
series. And he has an example showing where you have a function that returns the sender
and whether it runs in parallel or not is determined by which execution context you run it
on. So you can invoke the same function twice, take one result and run it on a thread pool,
take the other result and run it on a single
thread and it's the context in which you run that sender that determines how much parallelism you
extract from the algorithm and this i believe also relies on the dependently typed sender feature of
p2300 interesting so you mentioned p2300 like quite a lot obviously that's the proposal that's
making its way through the standard you're both actually active on the committee now right so jessica you met at the
last committee meeting in tokyo and ian you were here this week in st louis uh do you know are you
actually involved in the kind of standardization process do you know where this proposal is is
there any chance that we still get it in c++ 26 do you know what the current status is because i
think you know we wanted to get something like this into CSS 20, it didn't work out, and then it didn't work out with CSS 23.
Like, where are we now? Yeah, so the paper Ian and I are working on is only one paper amongst
the collection of papers that will be part of SID execution. So Ian and I are working on P3149, which introduces Unifex Async Scope and hopefully standardizing it in C++26.
We're having regular meetings with the other authors of the SID execution, like Eric Niebler, Wes Baker, Kirk Shoup, and some others as well. And I think we're making good progress.
So far, we do seem to be on track.
So hopefully we will get this in C++ 26.
That is really exciting.
But don't quote me on that if that doesn't happen.
I mean, this has happened before, right?
That you were really excited about getting something in
and then at the last minute it was like,
oh, now there's this other issue we have to figure out
and we then have to also review all the wording for this
and we don't have time.
Yes, yes.
And it's a massive paper.
Yeah, yeah.
But I keep my fingers crossed.
This is really exciting.
And it seems like it unlocks whole new paradigms
about how to write asynchronous code.
So yeah, it would be really cool to have just out of the box in C++. like it unlocks like whole new paradigms about you know how to write asynchronous code so yeah
that would be really cool to have just out of the box in c++ um so good luck with that i haven't i
have to admit i haven't been following it very closely i you know i kind of dive in a little
bit from time to time like and try to understand like okay where's what's going on there but like
i don't really closely follow it it's just there's too much going on on the committee but i keep my
fingers crossed for you oh thank you i've been following it even less but
i seem to remember in the c++ 23 time frame that the start of a new standard there's always a
document what has been for a while anyway you know the bold plan for version x of c++ and
there was a prioritization of her still execution for C++23.
Clearly wasn't quite enough.
Has that been re-expressed for C++26?
Because I haven't looked at that document.
Is it part of the bold plan?
Yeah.
There's definitely a lot of push from the organizers to get P2300 into c++ 26 um ian and i have both experienced a lot of urgency from from the
organizers on making sure we we have our paper ready and making sure we will help get p2300
into into c++ 26 so yeah there are definitely there's definitely a lot of urgency and push
there to make sure this happens so i just i'm just looking this up there's definitely a lot of urgency and push there to make sure this happens.
So I'm just looking this up.
You mean this paper by Wittler-Votilainen, right?
To boldly suggest an overall plan for CISO 26.
So it mentions four kind of major features
that we should make progress on in this cycle.
And execution is actually the first one that's being listed.
It's even ahead of reflection.
It says like execution reflection contracts pattern matching.
So execution is listed even ahead of reflection.
That's a good sign.
So yeah, we are, I mean, that paper is just a suggestion, right?
Like at the end of the day, you know,
we are working our way through the stuff and what gets in, gets in.
But like my impression also,
just from the volume of emails that have been going on,
like it's a very high priority thing
on the committee right now.
I think there's more work being done on this
than on reflection right now, probably.
Obviously, reflection is also making progress
and everybody's excited about reflection.
But yeah, there's also a lot of focus on execution as well.
So there's actually another related topic
that I wanted to mention before we wrap up this episode.
We have a little bit of time left, I think, to talk about it,
which is not writing asynchronous code,
which can be very difficult,
and there are different paradigms to do it
and different ways to do it,
and you're working on one such way.
But it's also once you have that code, you need to maintain it and different ways to do it and you're working on one touch way but it's also once you have that code you need to maintain it you need to debug it and uh debugging asynchronous
code is something that can be very very hard and i'm not aware of really good tools for that and
so i was wondering if the two of you have anything you can share on that like how do you deal with
asynchronous coded meta
that you have to kind of debug
or look into what's going on?
Do they have any particular tools for that?
Or does Unifex actually help with that?
Does kind of structured concurrency
make it easier?
And anything, any advice
you can give to people
who have this problem themselves?
So I think that Unifex,
it makes it both better and worse. I think the way that it
makes it better is that it's a, I think it's a common experience when debugging asynchronous
code that, you know, if you're looking at a crash or whatever, or if you're stopped in the debugger
and you're looking at the state of the system. Often the current stack is, you know,
several frames that you're interested in.
It's code that you've written
that is whatever business logic
that you're trying to understand.
And then you go back several frames
and then there's like the work loop of some executor,
some thread loop that's just spinning,
taking work off a queue.
And then going back further from there,
you get to like your pthreads library,
whatever, having spawned the thread.
And that's it.
And there's no state that you can observe in the debugger that explains to you why that particular piece of logic that you're interested in is running at this moment.
There's no equivalent to a return address that shows you the chain of execution that led to this asynchronous work starting. And structured concurrency brings with it the necessary sort of foundation to solve that problem
because the invariant that we mentioned earlier in the episode that no parent completes until
all of its children have completed means that there is a return address somewhere.
There is the parent that's waiting for this work to finish somewhere in the system.
Now, the existing tools don't make that parent obvious,
but at least it's somewhere there in memory.
And so you could build tools that uses the pointers that you've left lying around
to go find that parent.
And you could build what amounts to a stack trace,
except that instead of it showing what the current thread is doing,
and it started because you launched a thread some while ago, you can instead show what the
current task is doing, where a task is whatever asynchronous unit of work you've launched.
So structured concurrency brings this ability to the table. And if you build the tools,
then you can all of a sudden see
what your asynchronous code is doing
as if it was a synchronous function.
And Folly Coro actually has the tooling available now.
This is something that Lewis Baker did
while he was still at Meta.
If you're using the coroutine type Folly Coro task
to express your asynchronous work,
then while you push and pop coroutines in your overall asynchronous task,
in the background, the library is busy pushing and popping an asynchronous stack
that is an in-memory manually managed data structure
that represents the stack of that work.
And the library also ships with...
I know that the integration with LLDB works.
I'm not sure if GDB integration also works,
but it's intended to.
You can actually inspect that stack
with debugger integration.
There's a Python script
that knows how to look at the memory,
find this data structure
and dump a asynchronous stack
to your debugger console.
And I'm working on bringing that to Unifex as well.
My plan is to just take that library and suck it into Unifex
and augment all of the sender algorithms with async stack management
so that once the integration is done, you'll be able to do the same thing
and see the asynchronous stack at any point in a debugger
or if you collect crash dumps and teach your crash dump parser how to find this information,
you could also get the same thing out of crashes. So with the right tooling, it's significantly
better because you all of a sudden have this new understanding. As of today, Unifex does not have this support.
And so if you spend a lot of time and effort learning how to read Unifex's stack traces,
then you can learn to interpret them, which I think does make debugging better than what you have with just a callbacks-based asynchronous
model.
But I will freely admit that that learning process
is arduous.
One of the ways that both Unifex and P2300 express
this structure concurrency is you end up building a type,
a very deeply nested template type,
where the async stack is encoded in the type.
So I have seen crash dumps where the symbol for a single frame,
just the symbol for that one frame is 11 kilobytes,
11,000 bytes just for the name of a function on the stack.
So parsing that, you have to run the frame through Clang format just to be able to see
with levels of indentation, what the heck is this?
Once you do that, if you know how to read what those symbols mean, you can figure out
like, oh, I know where this came from and what state I'm in.
But it's difficult to understand. So right now it's
better and worse.
So when you're looking at these stack
traces, are you saying, I don't even
see the matrix anymore?
No,
I'm not there yet.
I'm at a
point where I can
read the matrix.
Let me put it that way.
Okay.
But that's very specific to LibUnifex in this case.
Do you have any more general advice
for anybody working in async code in general?
Or is it always going to be very specific
to the library or language feature that you're basing on?
Well, I mean, I think once structured concurrency is accessible to people and the tooling catches
up, then you won't need special advice, right? I think that's an aspect of the main selling point
of structured concurrency is that the structure of your program, even though it's an asynchronous one, is embedded in the runtime state.
If you are not benefiting from structured concurrency, then I'm just as stuck as you are.
So I do have actually one more question here because, you know, we mentioned that, you know, structured concurrency will eventually take over the world.
So I have written asynchronous code before i used to do
like lots of audio programming in particular like a low level like the low level glue code that you
need to do like to make that work like when i was working on the juice framework for example
earlier native instruments where you interact with a lot of kind of apis that are just kind
of callback based where you pass around function pointers.
For example, you want to, you know,
make some sound and there's like some kind of low level OS API that gives you say, okay, here you can register a callback.
That's where you put in your like processing function.
And then whenever the audio interface calls that on some other thread that is
somewhere else, then that's where you get your processing done.
Or, you know, if the, if the user yanks out the headphone plug or whatever,
you get another callback that the configuration has changed
on whatever thread that you don't control.
There's a lot of stuff where that is just very unstructured.
I would say asynchronous programming, but where that comes from, that's just how the very unstructured. I would say asynchronous programming,
but where that comes from,
that's just how the operating system API works.
So that's just how this library works.
So that's how we've been doing it for the last 20 years.
So we're going to continue doing it this way, right?
I think, especially in audio,
I don't know about like other domains in audio,
this thinking is kind of quite pervasive.
So I wonder like if there's any advice or anything you can say to somebody who lives in that world like should we all migrate to structured concurrency is unifex something you
could use in other domains other companies other places or or other other libraries that people are
going to write um but kind of make that better, where do you see the future of all this kind of beyond of meta
and kind of the more specific use cases that you're working on?
So I think callbacks are always going to be like a thing, right?
But Unifex has a bunch of algorithms that kind of help makes that a little bit easier.
Like Unifex Create can take your callback
and like convert it into like a sender that you could just use.
So then it kind of brings you some level of structure, right?
But there's still, you know,
that boundary between structured and unstructured code.
So I definitely think outside of meta,
Unifex has applications for that.
I know that NVIDIA has their own version of this
called Studexec, right?
You can kind of play around with that if you'd like,
but there's definitely, you know,
Unifex is there for applications
beyond just structured code, right?
It was really helpful
when we were actually trying to convert
unstructured code to structured code. and the paper that we're proposing for async scope is
kind of one of the algorithms that is kind of the glue between making that conversion a lot easier
i will say that the ramp up is pretty steep i'm sure you've noticed right it's in terms of like
just talking about senders receivers and schedules and all that stuff, even internally, when we,
when we were working with the RSS engineers, the ramp up was, was pretty steep. And that was
a large challenge for everyone, for all of us involved, right? None of us have really ever
used it before. None of us really knew all these concepts were really, really new to all of
us. So we learned a lot by making mistakes. And I think we're kind of out of place now where
we kind of know what we're doing. But I do think it is worth it in the end. But it will take,
as Ian mentioned, a lot of time and a pain to get there.
Yeah.
And I think that in the long, long run, you should be able to push that boundary that Jessica's talking about down to very low levels.
Kirk Shoup built, I think he called it a cyclotron.
And I don't know if it's literally a cyclotron or if it's like a simulator on some kind of one of those, like an Arduino or a Raspberry Pi or something.
It's one of those very small computing devices.
And he mapped the interrupt service vectors to the sender receiver model. And so fundamentally, an interrupt service vector,
that's the CPU invoking a callback that you've previously registered.
So at that level, you're just stuck with whatever the API,
the chip vendor has given you.
But like Jessica mentioned, Unifex has this create algorithm
that can adapt those kinds of things.
I don't know how, I haven't seen the code,
so I don't know how Kirk did it,
but you can map callbacks to senders pretty concisely.
And so he was able to go all the way down to the metal and write, I don't know if he would consider
whether he's written an operating system
or if it's an application that runs directly on the CPU,
but that is written in a sender receiver style code.
And you may want to talk to Ben Dean.
I forget who else he's working with.
I think he's got a partner working on the library together.
They've got a publicly accessible implementation of sender-receiver,
and I think it's called Bare Metal Sender-Receiver.
And Ben said to me once that the code that he works on
runs on the power supply of your computer.
And so it never actually stops running
because it's the thing that responds to the power button or something like that
and he's using sender receiver to do that level of code so i think what i would say to teamer's
question is i hope that sender receiver well not specifically sender receiver i hope structure
concurrency takes over the world. And I think it could.
Sounds like a future
I would like to live in.
Nope, I don't
have a future.
So you've both been
forced to become experts at
8-sync programming, but has
that left you enough time to have anything
else interesting in the
world to say plus plus that you find particularly interesting or exciting i'm excited for modules
i know that's already out but we don't actually have it at meta yet oh as it's more out than
execution but we don't have it yet at meta so i'm really really excited for when we when we actually get
that i think that will duplicate the head of rose yeah yeah i'm uh curious to see what's
gonna happen with the um there's various papers floating around related to
making exceptions more efficient.
I'm curious to see what happens in that space.
Yes.
I think reflection is interesting.
Seems like it's a long ways off.
Well, maybe not.
Well, actually people are saying
you might get it in the C++ 26.
Oh, that's exciting.
Yeah, yeah.
Yeah, it's making good progress.
It's been making good progress kind of very recently,
like this year, there has been a lot of progress. I think there's been a lot of progress all this
time under the hood, but like during the last half a year or so, it kind of resurfaced,
like, and people are more aware of it and like that the paper is quite mature.
Oh, the other thing that I'm really curious to see where it goes, but I think it's early days yet, is the response to memory safety being a concern.
I'm interested to see how C++ evolves in sort of contention with languages like Rust.
Yeah, I mean, we had quite a few episodes on this topic.
Different people have very different opinions on this, so I guess you'll just have to wait and see.
Yeah.
All right.
So we're again over time.
So I think we have to wrap up,
but this was a fascinating discussion.
Thank you both very much for coming on the show.
I learned a lot about structured concurrency
and asynchronous programming and LibUnifix.
And I wish you all
the best for your proposal hope it goes through and gets into the upcoming standard and yeah
thanks again for being on our show it was it was a pleasure to have you on thanks so much for having
us thanks for having us this was fun yeah this is great thanks so much for listening in as we chat
about t++ we'd love to hear what you think of the podcast please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a guest or topic We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in. Or if you have a suggestion for a guest or topic, we'd love to hear
about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate
it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil
individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.