CppCast - Modules and build systems
Episode Date: June 9, 2023Daniel Ruoso joins Phil and Timur. After covering a couple of blog posts and a new UI library, we welcome Daniel back to talk with us about modules, package and build systems and SG15, the tooling stu...dy group. We also revisit the Ecosystem International Standard. News Modern C++ In-Depth — Is string_view Worth It? How to check if a pointer is in a range of memory - Raymond Chen Nui - new C++ Webview UI library Nui on Reddit Timur's Undefined Behaviour survey Links P2898R0 - "Importable Headers are Not Universally Implementable" "Clang Automated Refactoring for everyone with clangmetatool" - Daniel's C++ Now 2019 talk P1689R5 - "Format for describing dependencies of source files" (Kitware)
Transcript
Discussion (0)
Episode 362 of CppCast with guest Daniel Ruoso, recorded 7th of June 2023.
This episode is sponsored by JetBrains, smart AD posts and a new C++ user interface library.
Then we are joined by Daniel Ruroso.
Daniel talks to us about his work on C++ modules.
Welcome to episode 362 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Timo Dummler, joined by my co-host, Phil Nash. Phil, how are you doing today?
I'm all right, Timo. How are you?
I'm not too bad. I should say, if my audio sounds a little bit different or less good than usual, it's because I'm traveling.
I don't have my usual mic set up with me. I'm currently in Thessaloniki in Greece,
visiting a friend.
And next week, I'm traveling from here
to Varna in Bulgaria,
which is where you're going to have the committee meeting.
It's actually not very far from here at all.
And then, yeah, head back home to Finland after that.
How are you, Phil?
Are you also still traveling?
Or how's it going?
Well, after my trip to Norway a couple of weeks ago, which concluded my Scandinavian tour,
I'm actually done with traveling.
Released a couple of months now, although I have a holiday here in August.
But apart from that, and C++ on C in a couple of weeks,
which is only an hour or so drive, but I will be staying away from home.
Apart from that, I'm going to be looking forward to reminding myself what home looks like.
All right, so at the top of every episode, I'd like to read a piece of feedback.
This time we have a tweet by Rene Ferdinand Herrera-Morrell, who commented on our Conan
2.0 episode, that was, I believe, two episodes ago, with Luis Caro Campos. And Rene said,
good interview, love the ecosystem IS mentions, smiley face. So Rene, I'm very happy you liked
the episode. And actually, I'm sure that the ecosystem IS
will be mentioned again today
because we are continuing our mini-series
on C++ tooling,
which we started in that Conan episode.
Before we get to that,
I just want to briefly mention also that
apparently there's been a bit of confusion
with the last CPPcast episode,
the one with Anthony Peacock,
where apparently some people initially only saw
a six-minute long audio file instead of the full episode. So Phil, can you explain what was going
on there? Angle brackets, sigh, yes. For all the issues, we're all resolved. This is a new one,
actually, because I was traveling and we had recorded the week before, but I uploaded the episode
from the airport in Oslo and I thought it would have gone through fine.
But apparently there was, I think, a processing glitch on the host.
So initially it only went out as a six minute episode.
So of course, when I landed, I corrected it, uploaded the full episode, thought it was
all fine.
Then I started getting reports that, yeah, there's only a six minute episode. And it's another one of these things where every podcast host or podcast
redistributor, they have their own way of caching things and clearing things. Some cases you just
have to re-download. In other cases, I had to re-upload or give it a new GUID. But eventually
Spotify was the holdout, where the only way i could get spotify to not serve up
a cached version that was only six minutes long was to change the file name that it was coming
from so i did that and then it was seen to be fine except that for me and i don't know if anybody
else has seen this i've not heard anybody else report it i'm not getting past episodes showing
up in my podcast player and every time i remove them just gives me another batch so i don't know
if anybody else has seen that.
Hopefully it's just me,
because I can't see why it would be related.
But if you have seen that, do let me know.
I'm not sure there's much we can do about it at this point,
but I will investigate further.
Other than that, hopefully we're back on track.
Well, thanks, Phil.
And we'd like to hear your thoughts about the show.
You can always reach out to us on Twitter or Mastodon
or email us at feedback at cppcast.com.
Joining us today is Daniel Ruozo.
Daniel has been working for over 20 years,
working in and around build systems and package management.
He introduced a package management system at Bloomberg
that is used today for over 10,000 C++ projects.
In the last five years, Daniel focused on static analysis, automated refactoring, and
building consensus on engineering practice.
In the last two years, he collaborated with the ISO C++ Tooling Study Group to help figure
out C++ modules.
Daniel, welcome to the show.
Thank you for having me.
It's a pleasure to be here for the second time.
Many of the things on your bio we're actually going to talk about a bit more when we get into the interview but the one thing that i picked up on that stood out to me because
i work at sonar do static analysis tools that mention the static analysis got my attention
so is that something that you you are or have been doing in-house at bloomberg or is it
just using existing tools uh both uh so we have a team that is called the STAR team, Static Analysis and Automated Refactoring.
It's a nice acronym.
And the role that we have is to introduce static analysis, both for off-the-shelf tools, and we use Clang Tidy and related tools.
We also use CodeQL, which is the GitHub Advanced Security,
and Fortify, and a bunch of other tools.
We happen to also have some Sonar,
but we're not actually driving the Sonar Analyzer a lot.
It's actually something that we have to investigate
how much overlap there is with the other tools we do.
But we also have a very dedicated effort
to building custom tools for specific purposes.
So we have, for instance,
a feature enablement library.
And so we have a custom tool that will go in,
like retire the code that's behind the switch
after the switch has been fully rolled out
and do that automatically for the users when the switch is set.
And also a bunch of other refactorings that we do,
mostly using Clang tools.
Yeah, a lot of interest in those custom rules
and custom refactorings at the moment, I think.
We probably need better tooling, too, to support that,
but Clang tooling is pretty good.
Yeah, that was actually the subject of a talk I gave
back in 2019 in C++ Now.
So if anyone is interested, there is more details.
The talk is about the Clang MetaTool library
that we developed at Bloomberg.
It's open source and github.com slash Bloomberg slash Clang MetaTool library that we developed at Bloomberg. It's open source and github.com slash Bloomberg slash Clang MetaTool.
That essentially makes it very easy to build a small standalone tool.
And the workflow is usually you're going to build that small tool,
run it across your code base, be done with it, and throw it away.
Nice. We'll put a link to that in the show notes. Thanks.
All right, Daniel. We'll get more into your work in just a few minutes,
but we have a couple of news articles to talk about.
So feel free to comment on any of those, okay?
So the first one, I already kind of mentioned it briefly,
next week is the committee meeting in Varna, Bulgaria.
So I'm very much looking forward to that.
It's going to be our first post-C++23 meeting
where we can start voting new things into C++26.
So that's going to be very exciting.
And yeah, I expect there's going to be lots of trip reports afterwards so you can see what happened there if you haven't had the opportunity to be there yourself.
Yeah, and I think Vana was also the one that was cancelled during the pandemic.
The first one to be cancelled, I seem to remember.
Yes. Well, there were quite a few that were cancelled during the pandemic,
but that was, I think,
the first one that would have been summer 2020.
Yes, and then it has been postponed multiple times,
and I'm very, very happy that you finally get to go there.
I actually have been to Varna myself a few years ago
on a completely unrelated vacation trip.
So it's a really nice place.
I'm very much looking forward to being back there.
I'm jealous.
I'll see you there.
See you there, yeah.
Then we have a couple of blog posts
that caught my attention this week.
First one was by Michael Christophic,
and it's actually part of his series,
Modern C++ in Depth.
And the blog post is called,
Is StringView Worth It?
And I found that really interesting
because it was kind of from the perspective
of a large financial data company
that started using C++
before the string was actually available
in the standard library.
So they developed their own string class,
like I'm sure thousands of companies out there.
I've actually worked at one of those myself,
a music tech company that had their own std string
like way before std string was a thing
and obviously with different semantics.
So that's quite a common scenario, I think,
especially for bigger companies
that have been around for longer.
And so the article actually says,
well, StringView can be helpful
as a kind of lingua franca
if you want to move away from char pointers
as the lingua franca.
You have this kind of situation.
And then it gives a really nice overview
of when and how you actually should use the string view
and when you really shouldn't.
And I thought that was a really nice overview.
It really discusses the typical pitfalls
you get with the string view.
It has reference semantics,
so every reference it can dangle.
But actually, it's even worse than normal references.
But if you have're playing const ref,
and you assign like a temporary object to it, then that's going to extend the lifetime of the
temporary object. But std string view doesn't actually do that. So it's even easier to end
up with kind of dangling string views. And yeah, I thought that this blog post was a really nice
overview. Because now that we had the string view since c++ 17
so for six years kind of kind of nice to uh look back and see okay what are the use cases what have
you learned from from using it in the field yeah i actually uh i had my own string class
post c++ 98 in fact uh even post c++ 11 because there were some semantics that we particularly wanted
that std string didn't give us.
And we had our own string view,
where we called it string ref.
And the nice thing is that we could actually make them work well together.
So our string class was reference counted.
And the string view could also have like a weak reference in.
So you could take a strong string from it at the end,
and it will pick up the reference count again.
So there are still some things, still some reasons that you might want to have your own string class you should think very carefully before doing so but uh but it may still apply i think the
interesting thing this this blog post made me think about is just how deep into the C++ ecosystem the problem of vocabulary is,
where you end up with this niche places,
each one with its own set of words to talk about things,
and how important it is that we start converging
on those basic vocabulary words that we use.
Yeah.
Right.
And so the second blog post that I want to briefly mention is by Raymond Chen.
It's called How to Check if a Pointer is in a Range of Memory.
And that was from the Microsoft Dev Blog.
And I thought that was a really fascinating blog post.
It basically asks the question, given a range described by a pointer and a size, can you
check if some other pointer
actually lies within that range?
And you might be tempted
to say, well, okay, we'll just check the
integer value of the pointer and see if
it's numerically between
pointer and pointer plus size.
But that's actually wrong
because conceptually
in the C++ abstract machine,
pointers are not just integers, right?
They have an address, but they also have a provenance.
And so that kind of makes this approach not work at all.
And you have to think about this quite differently.
And while on modern platforms,
typically you have a flat memory model,
but in the end, it's again, just an address,
at least at runtime.
But Raymond actually gives an example
of an older architecture,
like the 802286 processor,
where that is actually not the case.
And pointers, indeed, are not just addresses,
but they have, like, different parts
that mean different things,
where, kind of, you need this concept
in a C++ abstract machine of provenance
to actually work with that kind of thing.
So I thought that was really cool and fascinating.
Yeah, it was a real trip down memory lane for me.
I did start out working with near and far pointers on the 286 back in the day.
I'd sort of forgotten what it was like to work with that, but yeah.
The idea that a near pointer is sort of within a single 16-bit integer jump from where you are now you
could actually address that more quickly than something that was further away where it needed
like the two segments so yeah these things still exist i think it's actually in in the very near
future we're starting to go to a place where we're going to have more exact architectures that we interface with as like gpu
code gets uh more common and other specialized chips if you think about how the risk 5 architecture
is being designed where there's all these extensions so i think we're we're actually
going to be on a point where these kinds of subtleties in the abstract machine are going to become more relevant.
The one thing that the post made me think about
was a joke that we have internally with some core workers
that we really should build a troll OS
where everything that's in the standard
is implemented to the latter
and then everything else is just randomized.
That's a great idea.
I actually thought about that too at some point like you can have um you know 11 bit bytes and all kinds
of weird stuff and like the size of int can be equal to size of char and all kinds of evil things
that could be fun i'm curious how many how many library uh test suites would fall apart if you do that. Let's do it.
Right.
And so one more thing that I want to mention is
I saw this initially on Reddit.
There was a post about a new C++ user interface library
called Nui.
And so there was a discussion on Reddit,
but there's also GitHub,
and there's also a dedicated website.
So that's a new GUI library that is permissively licensed.
It has a boost license, and it lets you write a UI in C++.
But it's not like Qt or anything like that,
because it then turns that UI into a web view.
So it's a bit more like Electron or something like that,
but you actually write the UI in C++.
So my understanding is that the C++ you write for UI
gets compiled
to WebAssembly and then rendered by
WebView in a finished app.
It does use modern C++. It has
quite a few interesting features.
The author says that it still needs
some polish and features, but it's already
fully usable and documented.
And that currently only Linux and Windows
are supported, but the author
is hoping for a contribution from Mac in the future.
So I thought that was kind of an interesting approach.
I'm curious what you think about that.
I think interesting is doing a lot of work there.
Now, I will confess I haven't actually read the article,
so maybe some of this is explained there,
but I wonder what the use case for this would be because
obviously there are plenty of webview based frameworks that are more javascript based or
typescript based and it seems like a better fit for that sort of thing and that's of course you're
trying to bundle in a bit a big c++ library that's the only reason i could think of it but
i just think there are probably still better ways to interoperate. But maybe there's a good use case for it.
It also struck me as amusing
that one reason that you would want
to do this more web-based is
for maximum cross-platform capabilities,
but it's not available in all
platforms yet.
I was actually wondering
the same thing, like what's the use case
for this? Because
from what I've seen, I would approach it the other way around, that you often wondering the same thing, like what's the use case for this? Because, you know,
from what I've seen,
kind of, I would approach it
the other way around,
that you often don't want to write
the GUI in C++, right?
Because it's not necessarily
a language that lends itself to that.
Like I've used like Qt and Juice
and like other frameworks,
and it's typically quite painful
to write a GUI in C++
compared to some like declarative approach
or, you know, JavaScript
or something like that.
Particularly because as integration with WebAsm becomes a thing,
if you have a specific part of your UI that has a heavy computation step,
then you can still write that heavy computation step in C++
and call that from the JavaScript UI code.
Yeah.
But what I found interesting is that in the Reddit discussion,
there were lots of very positive comments like,
oh, this is amazing, this is great.
So apparently, you know, there seems to be a use case for this kind of stuff.
Oh, technically, I'm sure it is amazing.
It does sound like a very interesting problem to be solved.
I'm just not sure.
I will claim defensive ignorance on writing GUIs.
GUIs have been at least 20 years since I did one.
Not my area at all.
Yeah, and it seems like if you are writing a GUI in C++,
then you kind of want it to be a native GUI
rather than a work view, right?
But again, maybe there is a use case here that I'm missing.
All right, so that kind of concludes the news articles,
but I do actually have two things on my own behalf
that I quickly want to mention.
First one is, if you remember,
I said I'm going to organize a meetup in Helsinki
and actually it's going really well.
So we're going to have our first meetup in two weeks
on Tuesday, the 20th of June.
We're going to have our first ever C++ Helsinki meetup.
So if you are in Finland, please come.
We will have, so it's going to be the week after Vana.
So we're going to have a talk from
yari ronkainen about forming habits and teaching c++ but we're also going to have a report about
what happened in varna by lauri wasama mark gillard and myself and yeah it's our first
ever meetup so i'm very curious about how it's going to go hopefully not the last so this is the
this is the second time that you've done your first ever meetup in Finland, isn't it?
Right. So the other first ever meetup was basically just a very informal thing.
It's just a bunch of people met in a bar who were on the Discord.
It was never announced officially on the internet anywhere.
It was just whoever was on the Discord channel, we just got together for a beer, basically.
But that was kind of the unofficial first meetup.
But this is going to be the official first meetup, but this is going to be the
official first meetup. We're going to have talks and everything. And we expect quite a few people
to come. So we're going to be excited. Well, good luck with that.
Thank you. Thank you. And the other thing is that my undefined behavior survey for CPP on C is also
still running. So please, if you haven't done so yet, please participate under timur.audio
slash survey. And if it hasn't terminated at this point, it may well be undefined behavior,
as we said before. So with that, let's transition to our main topic for the day, which is modules
and C++. And our guest, Daniel Rousseau. Hello again. Hello. So you mentioned in your bio that
you work on build systems.
So how did you get to work on build systems?
This is something that I know some people I've met
try to avoid working in,
although it's a fascinating kind of area.
So I'm curious how you got into that particular slice
of the C++ universe.
So I think the way that the industry worked when I started,
which was like 98, 99,
we still have very few division of labor classes.
Like we were all playing like all the roles all the time.
And I ended up being the person
that started putting together the package management.
At the time, it was not mostly C++.
It was mostly a Perl shop,
but it still has to build packages
and create deployable artifacts
and create like a CI, CD framework of some sort.
And I ended up being the person that solved that problem.
And that's kind of the decision that starts pushing you in a direction in your career,
regardless of how much you want it or not.
It's not that I didn't want, but it's kind of like a self-guiding process
where the more you
start looking at that problem the more the problem becomes more complex and the more you have to spend
on it um then after this initial contact uh context i i built a little automation at the
time we were using cvs and we were using deb. And so I had a little automation that would watch the CVS repo and
whatever you changed the package,
it would like rebuild the Debian package and ship to the repo that would
then get installed in the production machines.
And that actually was how I ended up becoming a Debian developer myself
around 2004.
And that,
that like just pushed me all the way through.
I got involved into trying to bootstrap new architectures.
I was at some point trying to build a Debian
based on UC LibC to run in really small devices.
But that didn't go anywhere.
But that was a huge introduction to the whole world of
toolchains. And then I was just lost to the world. And that was my life.
So you work at Bloomberg, but you're also on the C++ Standards Committee. And if you draw a
Venn diagram of well-known C++ people at Bloomberg and people on the C++ Standards Committee.
There's quite a big overlap.
But what is it you do both at Bloomberg and on the committee?
And is there a relation between the two?
There is.
So I joined Bloomberg in February 2011, so 12 years ago at this point.
And one of the very few, well, not the very first, but one of the very few well not the very first but one of the first
projects i worked with was actually again introducing automation to getting package
management and builds and uh ironically or not ironically but funnily enough also based on the
debian packaging system so i was like back to 1999, building the same thing at Bloomberg,
but now with sandboxing and a bunch of other things
and supporting C++.
And it was only around 2019
that I started to be more interested
in the C++ standard committee work
because I was honestly scared of the direction of modules
because they seemed somewhat incompatible with the way that we built code.
So in the early experiences, a lot of the discussions that happened
in the context of defining how modules are going to work were heavily influenced by organizations with what I call heavily regulated build systems or monorepos. different than the requirements of organizations like Bloomberg, where we have an open-ended package-based build system where you don't see the source of the other package. You just see
files on disk that were produced by the build process. And that's the only interface you have.
And the different projects could be using entirely different build systems. And in our case, sometimes they do.
It's often that they do.
So it was clear to me that we had a huge gap to get to a point where we could have like a sconce project and a CMake project in a C++ ecosystem with modules and everything worked. And that's what pushed me into the work in the C++ standard committee.
And later what pushed Bloomberg to start a funding kit
where to work on the implementation of modules
to drive a vision where our universe at Bloomberg
actually could work with modules.
In a way, our biggest fear at the time
is that we would have a big fork of the C++ ecosystem
where you would have some parts of the ecosystem
that could use modules
and some parts of the ecosystem that couldn't.
And now either you're in these companies
that have these huge monorepos and module work for them, or you're not.
And then there's a bunch of open source libraries that you can't use.
So that was our biggest fear.
Right. So let's talk about modules.
We can get into the specific problems maybe in a second.
But let me kind of zoom out and ask a very kind of general question
that I think quite a lot of people are asking themselves these days.
So we have modules and standards in C++ 20,
so it's been three years now.
They're still not widely used in the C++ community, right?
If you go and get, I don't know, whatever library,
it might even be using very modern C++,
but it's going to be a header,
it's going to be a header and source or something.
It's not going to be a module, right?
And so you don't really see modules
really being widely used in the C++ community.
And I'm just curious why that is,
like what's the actual challenge there?
And did we as a committee get them wrong?
Like, what do you think is the main
issue there? Why do we not
see them in the wild?
So I want to
focus on named modules first
because named modules are
something a lot easier to talk about
and we can get into
importable headers or header units
later. Can you maybe quickly
for our listeners,
can you say what named modules are?
Yeah, so named modules are what you see,
what you would expect to say import STD.
So you say STD without angle brackets, without quotes.
There is this new name space of like module lookup.
So this is an entirely new avenue of module lookup.
So this is an entirely new avenue of how things are named in C++
because C++ doesn't have enough types of namespaces.
And the idea is,
and the main distinction between a named module
and a header unit is,
the import statement cannot affect
the state of the preprocessor.
But all the entities that were exported from that module are now reachable to the translation
unit that did the import.
What does that mean in practice?
Right.
So if I have a file that's like module blah and then export this export that and then
another file i say import blah then that's what you're talking about that's what name modules
correct yeah yeah and and so what the reason why they're called named modules is because the
compiler has this new namespace which is like the module name lookup. And then you can tell the compiler,
for module blah,
you can take the pre-built module interface from this location, right?
And the distinction from that
to importable headers or header units
is primarily that,
one, you don't have this new namespace.
When you say import angle bracket IO stream,
you're essentially doing the same...
You're supposedly doing the same lookup
as you would do with the include statement.
But we don't really have a concept of identity of headers, right?
Is that also why you can't standardize pragma once,
even though everybody's using it?
Yeah, that is exactly the reason.
Once what?
We barely acknowledge in the standard that the things are in files.
How can we even define that they're the same file?
What is a file?
And so that's the first challenge.
So if I have to give the compiler
the pre-built module interface,
or the built module interface,
we use the acronym BMI all the time,
saying this header unit is in this location,
what is it to tell that this is going to be coming
from the same source that is what the compiler
would have seen if it was doing an include?
There's nothing to say this.
And for the user, I think this is going to be
very challenging.
And the second part is the compiler is allowed to of code is understood by the compiler
can be drastically different
if you allow the compiler to do the source inclusion
or if now you're saying, take this BMI instead.
So that's the main difference.
And I think we kind of stepped ahead a bit on the discussion,
but that's the main thing that's the biggest challenge for header units.
And there's a lot of additional problems in the tooling space
that come because of it.
Right. So let's rewind maybe and start with the name modules.
So you say that those are actually a lot more straightforward,
but those are also not really
widely adopted yet.
The main thing is there is a
gigantic shift that happens when
it comes to the adoption of C++ modules in the tooling ecosystem.
We like to use this term embarrassingly parallel when we talk about C++ build systems, because
up to modules, the order at which you translated individual units was irrelevant.
There were no dependencies across translation units.
There were dependencies from translation units to source files that were included.
And the traditional way that this is implemented is you have the first translation generate a dependency file
that gets read by the build system when it's available,
and that tells the build system
when an incremental build of that translation unit is necessary.
So this is how C and C++ build systems have worked forever now when we get to modules
we have a significant change in which not only the order of translation units is now relevant, but we have a new type of relationship between translation units.
So before we had only one kind of translation unit relationship, which was linkage. So we only had
ABI relationship between different translation units. Either you built your objects coherently
and everything should work, or you built your objects incoherently
and you're going to have an ODR violation,
which hopefully fails at link time,
but most likely will just sag fault in production.
With built module interfaces,
you have this new import relationship
where in order to translate this particular unit,
you need to have translated the module being imported beforehand.
So now you need the build system to topologically sort
all the translation units in order to build them in the right order
because otherwise it just can't build.
It would just fail saying,
hey, I can't find this module.
I can't compile this translation unit.
Yeah, but isn't this like the whole point of modules?
In pre-modules, you can do hash include vector
in your 2,000 different source files,
and then you're going to be parsing and compiling header vector 2,000 times, right?
And then you throw it all out again when you link.
And so you just don't want to spend all of those resources
on recompiling the same stuff over and over again.
You want to compile it once and then reuse it.
So this relationship, that's kind of the whole point, right?
That's what you want.
Yeah, and conceptually, it's all great.
It's just for this to work,
there is a lot of steps in converging how build systems work.
For instance, how do we know the order
in which the translations have to happen?
Which means that now we need a dependency scanning step before the build
even starts, because the build needs to know what modules are where and which modules depend
on what modules.
And so there has been a long process of figuring out.
So now we have this extra dependency scanning.
What is the output of this dependency scanning?
Great.
So now we have a common output format for the dependency scanning.
Now let's figure out how there is.
It was a paper that Kitware wrote back in 2020, I think,
that essentially describes the output format for the dependency scanning that will say,
this source file provides this named module
and requires this other named modules.
Oh, is it this JSON format that I think CMake now supports?
I think the three major compilers are supposed to support it now, I guess.
GCC is almost there, I think.
There's a patch that Kitware is pushing upstream on this.
But the reality is, it's only last year that we got all the compilers,
or at least MSVC and Clang, and a patched GCC to produce this format.
And this doesn't even work with header units on GCC.
Because, and we'll go back to header units.
I'm very aggravated about header units.
So let's say unnamed modules for now.
So now that we converged on the dependency scanning step,
and then we had a second problem, which is,
right, so CMake generates a build system in Ninja, but Ninja didn't support dynamically adding dependency nodes
between translation units,
between build nodes, right?
You couldn't add dependency edges between nodes
dynamically as part of the build.
And it turns out that Kitware had a fork of ninjas
since a long time ago
because that was required for Fortune modules
because C++ modules is heavily inspired by Fortune modules.
And it was only when it became clear that this was a requirement for C++
that Ninja Upstream finally accepted the patch.
And so CMA could generate a Ninja that worked upstream
to support building modules in the right order.
And so it has just been this very long process of getting the tools in place.
And even if we're confident that with the work that Bloomberg is helping fund with Kitware, that by the end of this year,
if you have a CMake project
and you want to use modules internally to that project,
it's going to be a viable solution.
Or if you have a multi-project integrated CMake build
where everything is in the same CMake project,
kind of like submodule style,
that this is likely going to work.
Or even with export files, as long as you're careful about making the flags close enough.
Because here comes the other challenge of modules, which is the implementation of the
built module interface and how that gets imported actually had its roots
in precompiled headers,
which means that
this new relationship
of the importing and translation unit
is actually driven
from the performance
of the import statement
rather than the interoperability of the tooling.
What does that mean?
What I mean is,
while different objects produced by different compilers,
as long as they use the same standard library
in ABI-compatible ways,
can be linked together.
If you have a BMI that was produced by Clang
and you want to import it from GCC, it's just not going to work at all.
It's even worse.
If you have a BMI produced by Clang 14
and you want to import in a translation unit that you're using clang-15, that's not going to work.
Because the import process is actually doing heavily optimized memmap, memcopy kind of things to make the import really, really fast. But it gets even worse because specific flags,
even if you're using
exactly the same compiler,
will change the way
that the abstract syntax tree
is constructed,
which means that that BMI
is not going to be useful.
So let's say you have a library
that was built with standard C++20
and then you have your translation unit
building with C++ standard 23,
the BMI is not going to be usable.
And so that's the second challenge that we have been working through.
And we kind of have a consensus of the idea that
the compiler needs to advertise
kind of like an opaque hash token
describing what is the compatibility of this BMI.
And then you can ask the compiler doing the import
what is the compatibility of its BMIs.
And essentially, in a way,
you kind of have to build in the most pessimistic way possible
as if every translation unit doing
an import needs its own build of every module that it needs transitively, and then just
hope that the deduplication across those translation units is actually going to get you a better
performance.
Because if we don't do that, what ends up happening is
your build system starts working
and then suddenly you just get a compilation
failure saying
you chose the wrong flags, that's
too bad. Sounds fun.
But again, we're
almost there.
Does all of this imply that we actually got it wrong when
we standardized modules
in the first place?
I don't think we got it wrong when we standardized modules in the first place? That was one of Tim's original questions.
I don't think we got it wrong.
I think the main thing is a lot of the challenges would have been significantly easier if we had done packaging before modules. Because a lot of the things that we spent a year talking about in SG15
was how do I ship a pre-built library with modules?
How does that look like?
What are the files on this?
How can CMake import a library that has modules that were built in SCONs?
And we have been slowly working through it,
and we have a general mental model of how that should work.
And now it's a simple matter of programming to get CMake to actually implement all of that.
But I don't think it was...
From a language perspective,
I think modules
are fine. Named modules are fine.
I think it's
just there was a general
underestimating
of the impact
that it would have
in the tooling space.
And for that reason,
the effort required to make it work
is significantly higher than most people expect.
GCC has had the module support since GCC 10, I think.
But the tooling to make it usable is just
not there yet.
Well, I know you want to get on to talk about
importable headers and
header units. But before we do
that, it's a good time to take a little
break. And while we're talking about
the sorry state of C++,
it's a great time to
talk about Sonar, the home of CleanCode,
a sponsor for this episode.
So Sonar Lint is a free plugin for your IDE.
It helps to find and fix bugs and security issues
from the moment you start writing code.
You can also add Sonar Cube or Sonar Cloud
to extend your CI-CD pipeline
and enable your whole team to deliver CleanCode consistently
and efficiently on every check-in or pull request.
Sonar Cloud is completely free for open-source projects
and integrates with all of the cloud DevOps platforms.
All right, Daniel, so let's talk about importable headers.
So you recently wrote a paper, P2898,
saying that importable headers are not universally implementable.
And you also had a talk at C++ Now a few weeks ago,
which was a great talk, by the way,
about the challenges of implementing header units.
First of all, apologies if that's a stupid question,
but are importable headers and header units the same thing?
It's not a stupid question at all.
Naming things is really hard.
But it's essentially talking about similar things
just on different levels of abstraction.
When the standard talk about importable headers,
it's talking about the header in the abstract
from the perspective of how the compiler should think
about the semantics of the importation process.
And when we talk about header units,
we're talking about how the build system needs to think about it
in terms of the fact that there is now one more node in the build graph
to produce that header as a translation unit,
and that's what we call a header unit.
So it's a very similar thing just talking about them
in different layers of abstraction
because the header unit is the translation of an importable header.
All right.
And so when you say they're not universally implementable,
what does that mean and how bad is it and can you fix it?
So let me start by talking about what universal means, right?
So, and this goes back to the conversation about how the conversation about modules in the beginning was very focused on specific environments that, again, what I call the highly regulated environments. And the main difference is that those environments tend to have
very elaborate and powerful build systems
where adding new nodes to the build graph
during the build itself is a normal thing to do.
But when we consider the C++ ecosystem as a whole,
then we have to consider that POSIX make
is a part of the C++ ecosystem, right?
And while it is possible to get named modules
to work in POSIX make,
and it's definitely not going to be the case
that you're going to be manually editing those make files,
like that part is gone.
But from the perspective of the tooling ecosystem, it's still possible to write for CMake, for instance,
to generate POSIX make files that will be able to build a system using named modules.
And the reason for that is that you have the dependency scanning process
happens before everything else.
And that's a requirement in all cases.
But as I was talking before, in the case of named modules, the import statement is not allowed to affect
the state of the preprocessor at time of the import, which means that you can look at a
translation unit in isolation, and you're going to find all the edges that this translation unit
has in terms of dependencies. And it doesn't matter that they're dangling at the time of the dependency scan, because
you can do the dependency scan in an embarrassingly parallel step, and then you collate all that
data into a coherent build graph, and it works.
With importable headers, on the other other hand the import statement or
the transparent rewriting of a pound include
into an import by the compiler which is allowed by the standard
is allowed to influence
the state of the preprocessor
what does that mean? It means that
if we are doing a header import and we are saying that the translation of the header unit happens independently from this translation unit, it means that I can't just pretend that the source inclusion is equivalent to the import. Because if I have something in my preprocessor state that would result in that header being
interpreted in a different way than if it was being imported to standalone, I will end
up with an incoherent build. And the end result of that thought process is that the list of header units become a dependency off the dependency scanning process itself.
So we need to know all the list of all importable headers before we read any file. But it gets worse than that.
Because as we translate the header unit independently,
that header unit needs its own compiler flags.
And the compiler flags of the header unit
are not necessarily the same
as the compiler flag of the translation unit
doing the import.
In fact, this is very much a desired outcome.
Like one of the things that we want is to be able to isolate the preprocessor flags such that
everyone sees Iostream the same way. On the other hand, that means that I need to know
what are the compiler flags for Iostream before I do the dependency scanning.
And that the dependency scanning now needs to emulate
what the import process will do. What does that
emulation look like is the pre-processor
stops at the point of the import, starts a new
pre-processor context informed by the compile command
of the header unit
that's going to be translated later,
processes that header unit,
gets the final state of the preprocessor
at that point,
and merges it back into
the original preprocessor state.
Right?
So that's what's necessary
to correctly do the dependency scanning with header units.
The end result of that is that the list of all header units
and all the arguments through all the header units
is now a dependency of the dependency scanning process itself. Consequently, any change to any one of those things,
to the list of header units, either adding a new header unit
or removing a header unit, or the changes to the arguments
of how those header units are translated,
effectively invalidates the entire build.
Because with POSIX make,
like if a target gets invalidated,
it just, it's over,
the build plan goes all the way through.
So a workaround for that
is that in the case of Ninja, for instance,
you can use the restat option on that target
and say, okay, if the dependency scanning ran and the output of
the dependency scanning is the same just don't like essentially write through an intermediary file
and then have the final file depend on the intermediate file with the restat option but
only copy it over if the contents are different.
And then Ninja can stop the invalidation after that.
Or in the case of SCONs,
where you have checksum-based invalidations.
So if the dependency scanning produces the same checksum,
again, the invalidation stops.
But in the case of POSIX make,
that's not how it works.
The moment that you invalidate a rule,
all downstream rules are automatically invalidated.
So what does that mean?
It can mean two things.
It can mean, one, that module units, should not be used
because it's unusable in environments where C++ is used today.
Or it means that we're declaring, like Michael Scott style,
that POSIX make is no longer a valid part of the C++ ecosystem.
And it may be that we get to that point,
but my main concern has been
that we have been driving this conversation
in a very implicit way,
as saying like, oh, the standard requires this,
therefore the standard is right.
Therefore, it doesn't matter what the cost is.
And the thing that I want us to be explicit about is
if we're saying that we want to commit to what's in the standard,
then we're explicitly saying, fine,
like we're explicitly choosing to say
POSIX make is no longer a valid C++ build system driver.
I personally think it's weird for us to make that choice,
but I'm not going to hold everyone back if that's where the consensus is going.
This definitely brings to mind that quote, and I used it in a talk just recently.
No plan survives first contact with the enemy.
I think that's definitely what we're seeing with modules.
It's been years specifying it.
And then when we actually try to use it,
then we hit all these rafts of unexpected consequences,
I think, that we're still working through.
So I want to thank you for your part
in trying to make sense of all this
and even do something about it.
Thank you for recognizing the pain.
Yeah.
Well, we definitely need to get somewhere with it.
I actually want to switch gears a little bit because we don't have a lot of time left
and there's another bit that we want to get into.
So just taking a step back and talking about SG15,
the tooling study group,
because you're quite a central member of that group.
Can you talk about the group and what they do?
So the main thing, it's a bit of a weird thing,
and it has been weird for a while, but I think we're now finding a different way of framing it.
Because I remember in cpp con 2021 there was a panel with the like members like chairs of the standard
committee and and there were quite a few questions i i asked a similar question of if WG21 doesn't think that ecosystem is something that they should be interested in, not as individuals, but for WG21 to be interested in the ecosystem as a whole and not just the semantics of the language itself. And at the time, there was a surprising yet very clarifying answer
that there was a sentiment that this is out of scope for WG21,
that WG21 was meant to work on the language itself
and that the ecosystem was not part of the scope.
Since then, there have been a number of conversations with various people and in kona
last year uh october i think uh kona a substantive shift on how the WG21 chairs
think about this problem. And there is a consensus building that driving the ecosystem as a whole,
not just the semantics of the language itself, is and should be part of the WJ21 scope.
And it was one of the most expressive, positive votes in the room in Kona, where there was this
realization that, yes, we need to work to consolidate how the ecosystem is driving,
because the amount of divergency we have
and most of the time unnecessary divergence
is hurting the ecosystem a lot.
So what is the scope of this ecosystem IS?
Is, for example, the JSON file that we mentioned earlier
that describes the dependencies of the modules,
is that something that would be standardized in there? And what else could be part of this new standard?
So the basic framework where this was presented is a framework of interoperability. So we're not
trying to specify what is the standard build system for C++? What is the standard package management?
Right, that's not the goal.
We are profoundly aware that this would not only fail catastrophically
to actually be standardized,
it would actually be profoundly damaging to the ecosystem
to commit to a single build system.
But what is important is that for us to be able to say hey i have a cmake project you have a sconce project maybe we
should be able to share code right like i should be able to ship you a library and you should be
able to import my library into your build system. That's not...
I think we have a bit of a Stockholm Syndrome thing
with the C++ ecosystem,
where we just accept that this is how things are.
But if you explain this to anyone
that hasn't been in this ecosystem for long,
they will look at us like we're crazy.
And it's like, how is this still a problem in your ecosystem?
Every other ecosystem has dissolved.
Yes, I've experienced this where people that were relatively new to C++,
they were like, okay, so I need this library.
How do I use it?
Which command do I need to type to download and install this library
and link against it and everything.
And then you have to go and say, well, that's not really a thing.
You have to actually get the source, and then it depends on the build system,
how you compile it, and then you get into this mess.
And as you say, everybody looks at this and says, this is crazy.
How can you get any work done if this is how your ecosystem works?
If you come from Rust or Python or anything like that.
But the reality is that a lot of the C++ shops,
they have solved this problem internally.
Like Bloomberg, we have a package management ecosystem
with more than 10,000 C++ projects.
And if you're at Bloomberg as a C++ developer writing a library,
you know exactly what you have to do.
And if you're consuming someone else's library,
you know exactly what you have to do.
The same thing is true for Google.
Google, they have their Blaze build system.
And if you want to consume a library,
you know exactly what you're going to do. And if you want to consume a library, you would know exactly what you're going to do.
And if you want to ship a library,
you also know exactly what you're going to do.
And it's also not entirely broken
in some ecosystems like GNU Linux distributions.
If you commit to a specific Linux distribution,
you actually have a fairly reasonable way to specify things.
Like, you know how you depend on the library, you know how you ship a library.
And if you commit to that ecosystem, it kind of works.
And we now also have things like VC package.
And as we discussed a few episodes ago, we have Conan. Yeah, and those are starting to create
specialized ecosystems where people have been able
to reduce this kind of divergency.
But we are now at this, again,
I talked before about bifurcating the ecosystem,
and we're kind of in this point where,
well, I'm in Conan
but this library
I need doesn't have a Conan file
so
maybe I can't use that library
or I need to become a Conan expert
to be able to figure
out how to package a
third-party library that I
barely know how it builds.
So it seems like that's
actually solved the problem which bumps it one one level up right kind of right it just makes
it more affordable to the people where it was completely unaffordable before without actually
solving the problem so what we're doing is essentially finding a bunch of local maxima
and some local maxima have a lot of investment like bloomberg has
like a lot of people working in the packaging system on the build orchestration on making sure
everything's coherent making sure all of that and then you have like small organizations that can't
afford that and so they will find whatever is the local maxima that works for them. But I think the thing that we're hoping for is that we can break through that local maxima via interoperability. Let's imagine a world where a C++ project describes in an automatable way the steps for its build, which dependencies it has.
How do you look up dependencies?
So we don't need everyone to converge on the same package manager.
We don't need everyone to converge on the same build system. We just need to find which
languages we're missing to allow this interoperability to happen today. And so that's
what you're going to put into this or what you're aiming to put into this ecosystem international
standard that's being developed? That is the goal. Yeah. It's just build interoperability like one of the first things that is being
discussed and uh to be fair like i've i'm still focusing mostly on modules i'm not like really
investing a lot of effort in in that right now but one of the first things that is being discussed
is an introspection mechanism where you can ask where you like your tool chain or your tooling environment,
what are the capabilities of your tooling environment in regards to the
specified interoperability languages?
Because then CMake can go and say like, oh,
this package manager supports this format.
I can go and ask for what libraries are installed in the system
and what modules come with those libraries in an interoperable way.
So we might finally get something richer than the compilation database.
Yeah, the compilation, it's a very good example.
It's something that has been profoundly useful.
But everyone that seriously interacts with it knows
just how painful it is to try and extract
semantics out of the compile commands. Like in the case of Bloomberg,
we have legacy environments, and we run
clang tooling on the legacy environment, so we have a bunch
of code that just,
oh, I recognize the semantics of this particular compiler.
Let me rewrite this compilation command into a clang equivalent of whatever this compiler was doing.
So we need to raise the semantics from dash capital Y
into more structured data.
What do we mean for the compiler to be doing?
Well, we are running long again.
We didn't really get to ask you any more personal questions,
so just very, very quickly,
if you could say one other thing that we haven't talked about so far
in the world of C++ that you find interesting or exciting,
what would it be?
Oh, I don't know.
I've been so sucked into the tooling world
that I don't even know.
Well, to be fair, you're actually working
on some pretty interesting, exciting stuff there.
So we'll give you a pass on that one.
But anything else that you do want to tell us, though?
Anything you want to let our listeners know
before we wrap up?
Just if you're
a person that is
tooling-inclined,
we do have
a lot of work
to get through
the tooling ecosystem
international specification.
So this is all work.
It will take effort.
And we need people to actually come in and chip in on figuring stuff out.
There's a lot of stuff to be figured out.
And if people don't come in to join the effort, it's just going to take longer.
And hopefully we're going to be motivated five years from now.
So if you're tooling inclined,
please pretty much get involved
in the committee work to help us get there.
Well, there's a call to action.
Thanks for that.
And thank you so much for being a guest
on the show today and telling us all
about the current state of modules
and build systems and international
ecosystem standard.
Thank you for having me. It's a pleasure.
So how can people reach you if they want to talk to you about this stuff?
So my email is druoso at bloomberg.net or my personal email daniel at ruoso.com.
I am on Twitter, but I don't really use twitter i'm there just because
twitter was annoying me enough with not having an account that i eventually created one uh so it's
daniel ruoso there if you want to dm me but yeah uh and i also try to follow the sg15 mailing list
so if you want to discuss something tooling-based, maybe just go straight
there. All right. Well, thank you so much, Daniel, for joining us today and for this fascinating
discussion. I found that very, very informative. Thank you for having me.
Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the
podcast. Please let us know if we're discussing the stuff you're interested in, or if you have
a suggestion for a guest or topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.
You can also follow me and Phil individually on Twitter or Mastodon.
All those links, as well as the show notes, can be found on the podcast website at cppcast.com.
The theme music for this episode was provided by podcastthemes.com.