CppCast - Spack
Episode Date: May 28, 2021Rob and Jason are joined by Todd Gamblin and Greg Becker. They first discuss a documentation tool, a blog post about floating point numbers, and yet another post about ABI changes. Then they talk to T...odd and Greg from Lawrence Livermore National Laboratory (LLNL) who both work on Spack, the popular open source package manager aimed at HPC. News Poxy: a Doxygen frontend with extra fancy Mostly harmless: An account of pseudo-normal floating point numbers Removing an empty base class can break ABI Links Spack Spack on GitHub Spack Tutorial Spack Slack Build all the things with Spack: a package manager for more than C++ - Todd Gamblin - CppCon 2020 Clingo: A grounder and solver for logic programs Build: Solving the Software Complexity Puzzle Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Date Processing Attracts Bugs or 77 Defects in Qt 6 COVID-19 Research and Uninitialized Variables
Transcript
Discussion (0)
Episode 301 of CppCast with guests Todd Gamblin and Greg Becker, recorded May 18th, 2021.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss a documentation tool and floating point numbers.
Then we talked to Todd Gamblin and Greg Becker.
Todd and Greg talked to us about SPAC, the package manager for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing?
Doing okay. Don't think I have any news to share or anything. How about you doing today? I'm all right, Rob. How are you doing? Doing okay. Don't think I have any news to share or anything.
How about you?
Nothing at the moment.
Although I guess we passed six years doing this now.
Because I got a bunch of LinkedIn comments from people.
Congratulations on your work anniversary.
And I'm like, what are you talking about?
Yeah, that sounds right.
I mean, we started in like February or the very episode was in February, which was just me and John.
Yeah, I didn't join you until May, something like that.
April or May, yeah. But yeah, six years going strong, and we're now past another big round number.
Yeah. Numbers are meaningless, it's fine.
They are. I mean, the number's not even visible the numbers for our
internal tracking really right okay uh well at the top of every episode i'd like to read a piece
of feedback uh this week we got a comment on youtube this from aaron saying hey guys just
wanted to say thanks for producing such high quality content so consistently i've learned so
much about steeple sp++ over the years of
listening. Hope to do an empty crate training
course one day. And yeah,
well, thanks for listening, and I'm glad
the content is appreciated.
That would be fun. I don't know when I'll next
be offering a class that
random people can sign up for. We'll see.
You're starting to do more training now
that the pandemic is
coming to an end? I'm starting to plan training plan
more training okay there's nothing that is definite yet um but it looks possible that
maybe at some upcoming conferences that are meeting in person and potentially um some on-site
corporate kind of classes will be coming up okay, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter, or email us at feedback at cbgas.com.
And don't forget to leave us a review on iTunes or subscribe on YouTube.
Joining us today first is Todd Gamblin.
Todd is a computer scientist in the Advanced Technology Office in Livermore Computing at Lawrence Livermore National Laboratory. He created SPAC, a popular open source package manager aimed at HPC,
which has a rapidly growing worldwide community of contributors. He also leads the packaging technologies area of U.S. Exascale Computing Project and Build,
an LLNL strategic research initiative on software integration and dependencies.
His research interests include dependency management, software engineering, parallel computing,
performance measurement, and performance analysis.
Todd has been at LLNL
since 2008. And also Greg
Becker. Greg is a computer scientist
in the tool development group in Livermore
Computing at Livermore
National Laboratory. His focus is on
bridging the gap between research and production
software at LLNL. His work
in software productization has led him to work on SPAC,
an open-source package manager for high-performance computing,
with users and contributors all over the world.
Greg also works on the Build Research Project,
working to resolve dependency integration issues related to binary interfaces,
and on internal LLNL infrastructure using SPAC.
He's been at LLNL since 2015.
Welcome both to the show.
You're muted, Todd. Thanks.
So
can one of you tell me what exascale
means? So exascale is
it refers to exaflops.
So in supercomputing, we measure
everything in flops. So floating point
operations per second. And exaflop
is 10 to the 18 flops.
And so the supercomputing generations tend to go in these multiples of 1,000.
So before exascale, there was petascale, and there was terascale before that.
And those are the milestones, and those are the things around which the funding sort of
revolves.
We basically, there's an argument made to Congress every 1000x performance that we need funding to continue to get the performance to this level to support, you know, all the physics simulations that we do at Livermore and so on.
And so, I mean, that's what the app is.
It's designed to build a robust software ecosystem for exascale computing.
And so are you at exascale now?
No.
So it's aimed at machines that are going to start hitting the floor late this year, early
next year.
And, you know, it's been going on for a few years now.
And so Frontier at Oak Ridge, Aurora at Argonne, and then El Capitan at Livermore are the three machines
that are going to, those are the US Exascale machines. And maybe someone else will, you know,
come out with an Exascale machine in the meantime or before or after ours too. But we're hoping to
hit it first. So now I'm just going off rails here, but as soon as you actually hit Exascale,
are you going to change the name of your department or the project so that it's whatever's after exascale, since you're always
pushing for that next 1000 times? I mean, that has happened, right? Like a lot of research grants
before we were shooting for exascale, we're shooting for petascale. And so, you know, those,
those culminated with, you know, to some extent, I guess the blue gene, well, let's see the Sierra
system that we currently have on
the floor at livermore which is a big power nine volta system and so and then before that you know
our computing facility used to be called the terra scale computing facility at livermore and uh it's
been renamed to i think the livermore computing complex because the name sounded slower and slower
over time so yeah these things happen in. So what's after Exascale?
Yeah, I was thinking the same thing.
ZetaScale.
ZetaScale is next.
I'd say the US Exascale project in particular
is like a limited duration project.
So that funding will be going back
to kind of general computer HPC research
instead of that particular form of project
going on to Zeta scale.
It might be reconvened later to get there,
but it won't go straight through.
Okay.
All right.
Well, Todd and Greg,
we got some news articles to talk about,
so feel free to comment on any of these
and then we'll start talking more about SPAC
and some of the other things you're working at at LLNL, okay?
Yeah, sounds good.
All right.
So this first one we have is POXY,
and this is a library on GitHub,
and I think it's one of these ones we've talked about so many
of the libraries from this particular author, I believe.
Right, Jason?
It looks familiar to me.
But POXY is a documentation generator for people's plus based
on doxygen and it looks like it's just a little prettier than uh default doxygen is uh clean some
things up and you can take a look at what some of the output looks like on um a link to the
tumble plus plus documentation which is another project from this
author. And
the docs looked really
nice. They have a really nice dynamic
search function. So definitely worth
checking out if you're already using Doxygen. I think
it would be pretty easy to switch over to using this.
I've always heard it Tumul.
It could be, yeah. I have no idea.
I'm
not going to weigh in an opinion here.
Although I agree.
It is,
can be like a bit of a pain to get doxygen set up.
Nice.
So to have something that gives you a nice,
good,
clean,
uh,
template to start from sounds good.
And I also like,
there's a nice trudge.
If you go to the Toml plus plus website that all these docs built with their flavor of their doxygen front end here
have um uh there we go yeah the examples have a highlighted link here try this code on compiler
explorer and it takes you straight over to compile explorer link where you can actually play with the
examples very nice okay uh next thing we have is a blog post on Red Hat's developer blog.
And this is Mostly Harmless, an account of pseudo-normal floating point numbers.
And this refers to an interesting set of errors they had where isNAN would not work correctly with some like malformedformed doubles is that right jason that's how
i read it yeah yeah 80 bit intel floating point representation it's definitely a little scary
because is nan is something you would call to make sure you have something valid and if that itself
can choke on uh a malformed double you're in a bad situation. Is this the kind of thing you all have to deal with?
Floating point representations with all of your fancy exaflops?
Yeah, yeah.
There's a lot of research going into non-standard floating point
in terms of trying to get a little bit more performance out of the system,
especially with AI workflows and machine learning
using a lot of single and even half precision.
And so that's what's becoming cheap in terms of hardware
because that's where the demand is.
So then there's a lot of research going into
how can we take our double precision algorithms
and run them at 4x speed on half precision floats
and things like that.
Can we get similar correctness
with that performance boost sorry there's also some really long-term research on non-standard
floating point representations like unums and posits so non-ieee which have better error
properties for for some stuff oh okay yeah because i'm just thinking like half
point half precision or something like what 16 bit floating point numbers if you're if you're
you know doing all these fancy calculus simulators calculus physics simulations like
how do you get a reasonable answer back with that research livermore has a whole bunch of applied math people who specialize
in that kind of stuff it's like way beyond what i'm what beyond my expertise but i mean there's
a whole solver team um there's people who work on finite element methods and basically all the
numerical scientists are well steeped in in the ways of floating point errors and limiting them.
And I think partially you take a little bit of domain knowledge
and you say, okay, this is the part of our calculation
that's really sensitive, and this part isn't so sensitive,
and maybe we can get a performance boost
on the part that's not so sensitive.
Okay, interesting.
Yeah, some of the, I mean, there have been people
who've done dynamic analysis to try to find parts of the program where, you know, for a given run, certain calculations are
not, the error is not significant enough that you would need a double precision floating
point number.
And I mean, most of that is targeted at GPUs where half precision floating point goes way,
way faster than double precision.
And so we can get a pretty big speed boost,
like 3x for going half precision.
It also limits your,
if you can rework your data structures
to use smaller floating point numbers,
it limits your,
it reduces the memory bandwidth required for the calculation
because you're not transferring as much data.
And so if that's your bottleneck,
you can make your algorithm faster that way too.
So are either of you or any of the people that, you know, at L and L, L and L involved in any of these standard committee things to try to get 16 point floating 16 bit floating point into the standard?
I don't think that's been our focus. I mean, we send people to the standard committee. So you may know Tom Skoglund.
He's from Livermore.
He's been to a few standard committee meetings.
And I guess Chris Earl used to go.
We send some people.
But I don't think 16-bit floating point's been a focus for us.
Because most of our physics calculations are still using double precision for now.
I think a lot of these are still in the research phase. Okay.
And then the last blog post we have
is on Arthur O'Dwyer's blog.
And this is removing an empty base class can break ABI.
And obviously we've been talking about this subject
a lot lately.
And this is pointing out how,
you know, with some changes in C++14, I believe, you could take out base
classes that no longer are necessary, but the compiler implementers actually can't because
it would break the ABI. Right, Jason? Well, yeah. And even more to the point, it's me,
it's less interesting that removing them would break ABI in a way,
or removing them as base classes,
because unary function and binary function not only are no longer necessary,
they were removed in C++17.
A conforming C++17 compiler shouldn't even have those types,
but they still have them as base classes for things like std plus so that they don't break API.
Yeah.
Greg, I noticed in your bio, there was a mention of dependency integration issues related to binary interfaces.
Is this a subject you deal with a lot, API compatibility?
Yeah.
So we started last year, I guess. Todd canBI compatibility? Yeah, so we started last
year, I guess. Todd can correct me.
Yeah, default. Todd's actually the
PI on the project.
This research
project on
binary dependency
management. And the
idea was that we both
work on SPAC a lot, and we
see the version constraints that users give to their package manager.
And someone has to write it by hand, and hopefully they use SemVer, and that makes it a little easier.
But really, we as developers generally don't have a great understanding of our own application's binary interface, let alone anyone else's that
we're trying to integrate with. And so if we can go in and actually look at the binary interface,
you know, use something like Red Hat's Lib Abigail, get the symbols, see which symbols are
actually accessible from the functions that we're interested in, then if our dependency bumps a version and we know actually this version
only changed types that we don't touch,
well, then we don't have to be too worried
about a conflict there.
But if they release a patch version,
but it changes the type that we touch,
we might have to treat that like a major version upgrade
in terms of dependency compatibility.
So then are you going to...
Oh, sorry, John.
Yeah, so the goal is to really take that kind of information
and integrate it with the dependency solvers
and package managers
so that we're not using constraints from humans
to figure out if two libraries are compatible.
We're actually using the ground truth from the binary.
That's the sort of long-term goal for the project.
So then are you in the situation where if you do detect
or are made aware of one of these breaking binary compatibilities
that you can rebuild world and redeploy when necessary?
Yeah, we usually are,
with the exception of some vendor libraries on the machine.
So a lot of people like to use the host MPI
or some host math libraries,
and so we don't typically rebuild those
um but for the most part hpc stuff is open source or at least open to you while you're working on it
like for the export controlled codes and stuff you're you have the code if you're on the project
so we deploy stuff from source and so yeah we rebuild everything but it takes a while nobody
likes that they they'd like to be able to develop quickly. And, you know, we'd like to be
able to reuse binaries on the system more easily, because right now we do kind of rebuild the world.
And one of our complaints is like, why is SPAC building Perl? Or why is SPAC building, you know,
this other thing? And it's for reproducibility. So like system Perls are not all created equal,
some of them lack certain modules. And if you rely on them for your deployment,
you're going to break some places.
And so, yeah, we rebuild from source a lot,
but we want to accelerate it by building from binary.
And so we're trying to enable that.
I feel like it would be tempting
if I were in either of your positions
to fall into the guy who's always in a bad mood role.
So that if someone's like,
I don't understand why we had to rebuild Pearl and be like,
I do not even get me started on why we had to rebuild Pearl and then just
keep walking,
you know,
like we haven't tried that.
Yeah.
They're usually a bad mood when they come to us though.
So it's,
they've,
they've got the preemptive bad mood strike going on.
Yeah.
And yeah,
we try to deescalate.
We work with a package manager,
which means our users work with build systems.
So they're always in bad moods to begin with.
Yeah.
If you're asking for help on the package manager channel,
you've probably been through something.
What does a rebuild world look like to you?
Is that like hours, days, minutes, months of rebuilding?
It depends on the project.
I mean, so SPAC does parallel rebuilds.
It'll synchronize the DAG bottom-up with file system locks.
And so you can S run on the single node
and rebuild 300 packages in 90 minutes
for pretty sizable packages on a big node.
So it doesn't take that long.
But yeah, go ahead, Greg.
It depends what sort of rebuild world we're talking about.
Are we talking about someone who has like an application that they care about?
Yeah.
Or are we talking about someone who's actually deploying like the system software
that they make available to their
users someone who has an application if they run if they've got a really complicated application
and they're not even building in parallel we're probably talking three or four hours of rebuild
time um but then you go to a system deployment where they're doing all the compilers all the
mpi implementations laypack, they're probably talking,
if they do it in parallel,
maybe 12 hours to redeploy everything.
I think that's what some of the system folks at Oak Ridge
who do that have been telling us.
Yeah, they deploy about 1,300 packages for their users.
And so their rebuild is like an overnight thing.
It's like building an entire distribution, basically. Yeah, and that's pretty much what life at an HPC site is like if you're
on the facility side, like you're deploying all the MPI implementations and things in libraries
that your users use. And combinatorially for different compilers and different MPI implementations,
different math libraries, whatever affects ABI. And yeah, that can take a long time.
And not just whatever affects ABI,
but whatever we think might affect ABI,
because like we said earlier,
we don't actually have a great detailed understanding
of what that is all the time.
Right.
Well, at this point, I think it's probably a good idea
to actually tell our listeners a little bit more
about what SPAC is,
because that's what you're both here to talk about.
And we've mentioned it a few times.
So who wants to go?
What is SPAC?
I guess I'll go.
It's a flexible package manager for building things all the different ways.
And so if you think about your standard Linux distro package manager, it's for building one version of something
and for upgrading it when a new version comes along.
And SPAC is not that.
It's a system for building the versions you want of something
and maybe lots of them.
And so we call it combinatorial builds.
You can essentially take a matrix of compilers, MPIs,
different dependency versions, different flags,
different options on the dependencies,
and build all of that from source. Or you can cache it as a binary and reinstall it from binary.
We do relocatable binaries. But it provides essentially two languages, if you will. There's
one for the command line to talk about the parameters you want for the build. So you can
say, hey, build HDF5 plus the high-level interface
plus the Fortran interface.
Or you can say, you know,
build boost with streams or without streams.
And you can say particular versions
or what compiler you want that built with.
And then there's a package language.
All the package files are basically Python files,
but they're parameterized by these things.
And so you could think of SPAC as a system
that takes this sort of abstract spec from the user, makes it concrete into, you know, something
that you can actually build with all the options set on it, and then uses those package files to
instantiate that build. So they're written in a way that you don't have to rewrite the package
every time something changes. You want to add anything to that, Greg? I think the key thing that then ends up being different about SPAC
because of this is our core use case.
How do we support that?
The key thing that ends up being different is that
we're not installing into like a system location.
We're installing in user space.
And we install into these complex paths
that actually include a hash
at the end. That's the full provenance of the package and all of its dependencies.
Wow.
So because that hash is different, I can actually have coexisting installs of the exact same
configuration of Boost with all of its dependencies the same, except for one dependency I changed the
version of way down at the bottom of the tree.
And those are going to be separate installs.
They're going to live in separate places on my system.
And then you can SPAC load one of them, SPAC unload it, SPAC load the other, swap between them.
Okay.
Can we go more into why is that necessary to have all these different built versions of a library like boost on your system? Different applications depend on different versions of boost. So I mean,
we have users who even in the same workflow, they might use a mesh partitioner that depends on one
version of booster HDF five or some other library and then their application that they want to use
in the same environment depends on another one. So you don't you don't always have a software stack
that actually, you know,
is consistent. Different applications can have different versions of things. And the way that
we link stuff in SPAC, we use our path. And so basically every install knows where its dependencies
live and which ones it built with. And so like the, one of the core design philosophies is you,
you run the way you built because I mean,
basically we,
we see a lot of people screw this up,
right?
LD library path,
people who mess with that,
they they're,
they're in for a lot of pain when they try to deal with something like
this.
Cause you know,
it's a global variable in your environment that tells the linker where to
find stuff.
And if two different programs depend on two different versions of the
library,
then you're in trouble.
So the, the idea with SPAC is that you or the package manager knew what you were doing when you built this thing.
And you've probably forgotten that long since when you get around to running it.
And so we want to run the way you built.
So if I have three packages that all rely on one specific version of three applications that all rely on one specific version of three applications that all rely on
one specific version of boost yep um and then a fourth application that uses a different version
of boost i will have two versions of boost installed yeah that's right okay i was just
trying to think like initially i thought you were saying that it would be kind of like i cannot
remember the name of these tools but where you can like bundle all of your Python
stuff into, okay,
where it would have everything
bundled together for that application.
But no, you do have a shared location.
Yeah. So every package
goes in its own prefix when we install
it. And the prefix gets a hash that is
basically a function of that package's
configuration and all of its dependencies configuration.
So it's a Merkle hash.
It's like a Git commit tree.
And you share where it's possible,
but you can differentiate where it's not.
And then for a single build, you know, SPAC install boost,
if there are multiple dependency paths to the same package,
those are unified.
So we use a directed acyclic graph as our internal data
structure. And we make sure that there's only each package appears only once in the DAG.
Okay, and that's, that's the idea there is that your, your linker only supports one version
of a given library in a particular process. And so you can have two programs running
with different versions of a library, but at least the way that all c runtimes that i know of right now work um you you can only have you know one version
of a library linked with your program unless you want to have nice race cases at runtime right
and for c++ i mean it's actually an odr violation to do that so um you know it doesn't stop us it
does not stop us and i kind of like that because I would like to exploit it one day,
but we're not there yet.
So I'm trying to then wrap my mind around like,
I mean,
cause Conan does kind of similar things,
right?
I'm assuming you've tried Conan.
Okay.
I have a cursory familiarity with Conan.
Right.
So you can,
if you install a package with a whatever specific set
of compiler flags and compiler version whatever then it it makes a hashed installation as well
and then when you go to use conan then it says okay oh i want this version and oh by the way
this is all the flags that i'm using and it goes to look to see if you have that installed. And does that sound similar to what you guys are doing?
Yeah, I looked into this initially.
Conan and SPAC are similar in a lot of ways.
SPAC's actually older.
It started in 2013 and Conan came along later.
But Conan, it seems like if you pick a version on a package,
it doesn't do anything to ensure
that you're running with what you built with.
So you can pretty easily get yourself into an ABI nightmare that way
by setting versions on things throughout your stack
and forcing them to be a certain thing in the resolver.
Whereas with SPAC, at least right now, we deploy as if built from source.
And so there's this sort of hash structure that we talked about
where you get all the dependencies that you built with,
and we don't try to mix that stuff at deployment time.
I mean, that's actually one of the motivators
for this project that we talked about earlier, right,
for build,
is we'd like to have a little more flexibility in deployment
to reuse more binaries,
but that means, in many cases,
potentially violating ABI
because you're no longer in a situation
where you're deploying as if you built from source.
So we're trying to add support
and we have some preliminary prototypes
for stuff where we would swap in a binary
with another ABI
and we keep the provenance for that
so that we can go and check the metadata
and understand if we did violate ABI somehow when we do that deployment.
So we're moving to a model where we could safely reuse binaries, but I don't think we're quite there yet.
That's the research project.
Okay, that makes some sense.
Yeah, because yeah, with Conan, it's a build packaging tool, package manager.
But when it goes to deployment, you're on your own.
Like, how do I
get those binaries? Or do I just try to statically link everything or whatever? Yeah, that's right.
Okay. Yeah, I'd say the package model is different. And I mean, SPAC also has, I mean, I was looking
through, I guess, key differences between SPAC and Conan, I'd say Conan has more of a C++ focus,
obviously, because we're trying to support a bigger ecosystem. So we're trying to support Fortran and Python and R
and all the other things that you would combine
with your C++ libraries.
The other difference between SPAC and Conan
is the dependency resolver.
So we have recently gutted SPAC's concretizer,
which is what we call the dependency resolver,
and replaced it with an answer set programming solver.
And what that looks like is it's prologue that boils down to satin on the back end.
Wow.
And so it's kind of cool because you write your dependency resolution rules in first order logic.
And users don't see this, so don't get scared.
That's kind of awesome.
It's actually legitimately
the first commercial use of Prolog
I've ever heard of.
It's not Prolog, but it's...
But you said it's like Prolog
that boils down into SAP solvers, right?
It's a very similar-looking language,
but it's a little different.
It's got some quantifier statements that I don't think are
directly analogous to anything in Prologue, for example, that you can say
I need at least one and at most one things that look
like this. And I think the reason we added
this is because all of these conditional dependencies in our packages got to be
really complicated to manage. SPAC has a very expressive DSL. You can say that, you know,
boost depends on MPI when the MPI option is enabled and it'll, you know, and MPI is a virtual package
in SPAC. So you can use any of the MPIs that support the API that you want there. So like you
can build boost with MPitch, boost within BatPitch, boost with OpenMPI.
Those are all separate installs, right?
And same for, you know, a normal library
that's not HPC specific.
All of that, you know, conditionality in the solve
meant that, you know, our dependency solves
were becoming increasingly wrong
with our old greedy solver.
And so the new one is,
it's pretty awesome to see what the AS greedy solver. And so the new one is, is that it's pretty awesome to see
what the ASP solver can do. Because it's essentially doing sat plus optimization on
the back end. So we optimize for like 11 different criteria in the solver to try to pick, you know,
recent versions, default values of build options, and so on. And the solver is, you know, it can
tweak the options. So if you say I want this package built with, you know, it can tweak the options. So if you say, I want this package built
with, you know, MPI, or with MPitch somehow, it'll figure out that, oh, on that package,
I need to flip the MPitch option, or the MPI option on, so that I can depend on MPitch and
get it in the graph. And, you know, there's a lot of cool stuff like that. It'll, it'll solve for,
if you ask for a particular compiler, and it knows that that
compiler does not support your micro architecture, it'll, you know, set the micro architecture to
something lower that the compiler can actually generate. So you're sort of simultaneously
solving for compiler architecture support, and, you know, pick an architecture for a compiler,
it'll warn you, if you say like, hey, I have the GCC
4.x
from my distro in my
configuration registered
and you try to install something built for Skylake,
it'll say, yeah, you can't do that.
That compiler does not support Skylake.
So we've spun out a separate
project called ArchSpec,
which is the
detection and compiler support levels for all of the different
architectures. So, you know, if you read proxy PU info and you get these flags, you know,
it's a Skylake. And if it's a Skylake and you're trying to build for GCC. Well, that's supported starting at this version.
And in the first two versions that support it,
the flag is this,
and then it settles in to be dash MRG equals Skylake,
things like that.
And so it's basically a giant JSON file
with all of this information
and then a little front end around that
to do the detection.
Wow.
And that's something you can use in another tool
because like Greg said, it's just a JSON file.
It's got a Python interface on top of it,
but you can use that to reason about compatibility.
So if you want to tag your binaries
with the microarchitecture they were built for,
then you can take a binary, look at it,
say, oh, this is built for Skylake with AVX 512.
And you can say, oh, this is not for sky lake with avx 512 and you can say oh this
is not um you know it's not compatible with my haswell system because it's going to have
instructions that my architecture doesn't support um and you know that's you can you've got like a
less than and a greater than operator that you can use on these you know cpu names i'm fascinated by
greg's comment that you it sounded like you said the tool can actually look at proc CPU info, whatever,
and figure out what the ID flags are.
On your HPC clusters,
are the CPU architectures 100% homogeneous?
That depends a lot on the cluster
and on the philosophy of the site that hosts that cluster.
Yeah.
Okay.
I'd say at Livermore,
we have almost entirely homogeneous clusters.
Well,
homogeneous in the CPU architecture.
We have heterogeneous clusters where it's CPU GPU.
Right.
Yeah.
But,
but homogeneous in the, in the CPU architecture.
And that's per cluster.
Yeah, and so this only comes up when you go between clusters at our site.
But there are other sites that, as they get new nodes,
they add those nodes into one sort of large cluster,
and they might have separate slurm queues for the different architectures.
But instead of getting, you know, five new nodes and setting up a little cluster,
they just add those to the big cluster they have. And now you've got a heterogeneous cluster. And so
we'll see a lot of folks at sites that do that will set things in SPAC that say like,
set my default architecture to be Ivy Bridge
instead of whatever I happen
to be on at the moment.
Because they know that that is the lowest
common denominator of
every chip on the system that they're targeting.
There are sites like Fugaku
where they have
the compute nodes on the cluster
are A64FX, so that's the arm
with SVE, but they have ThunderX2 and x86-64 front-end nodes for whatever reason.
And that can be somewhat difficult to work with.
I only thought about this because Compiler Explorer, if you do dash march equals native,
Compiler Explorer will pop up a warning saying,
you don't really know what CPU you're running on because amazon's clusters well that's exactly that's exactly why we wanted
art spec is because we previous to this we were essentially building with march native
and it if you want to distribute binaries and if and you want them to be optimized and you use
march native you have no idea where the binary can be used.
So that was the motivation for ArchSpec.
We want to label it with a microarchitecture, not just x86-64.
Yeah, there's two ways to solve this.
You can either label them, or you can just build everything for the baseline architecture and build everything x86-64 works works but it's not really appealing for high
performance computing we want to get everything we can out of these chips right other folks have
you seen the um x86 levels that they recently that they introduced into recent gcc and clang
no i guess not so there's other folks have realized that this is a problem and they've tried to simplify the,
you know,
optimized builds that you would do.
So there's X86 64,
which is what you're familiar with.
There's V2,
which I think is the Halem like V3,
which is like Haswell and V4,
which is like sky,
like ABX 512.
Interesting.
V2 is when you add like SSE four,
I think.
Okay. V3 is where you add like sse4 i think v3 is where you add avx2
and
v4 is where you
add avx512
so they're nice because i mean it's
kind of like the you know arm versions
they're virtual architectures and you can build for those
and get you know some optimization
because the compiler can use all the fast vector instructions.
But you're not building for really, really specific microarchitectures.
So we're actually adding those to ARP spec.
Someone contributed that recently.
Pretty happy about that.
That's cool.
And I guess on these supercomputing clusters, they take a while to spec out, build, and install.
So even if it's brand new, it's not necessarily the CPU architecture that came out
yesterday, right? It depends. So like, in some cases, we get unit zero machines. So like for
our biggest machines, like the Blue Gene machines, Sierra, and then El Capitan that's coming along,
we work with the vendor like five years in advance to set up that contract. And then our machine will
be, you know, one of the first with a new processor generation.
Okay.
Yeah.
So it really depends on the system.
For our commodity clusters,
we're targeting something different when we buy them.
We're going for price or for optimizing for cost.
And so we'll typically choose something
that's not the bleeding edge because, you know,
you get a better price performance that way.
Sponsor of this episode is the pvs studio team the team develops the pvs studio static code analyzer the tool detects errors in c c++ c sharp and java code when you use the analyzer regularly you can
spot and fix many errors right after you write new code the analyzer does the tedious work of
sifting through the boring parts of code never gets tired of looking for typos the analyzer does the tedious work of sifting through the boring parts of code. It never gets tired of looking for typos.
The analyzer makes code reviews more productive by freeing up your team's resources.
Now you have time to focus on what's important, algorithms and high-level errors.
Check out the team's recent article, Date Processing Attracts Bugs or 77 Defects in Qt 6, to see what the analyzer can do.
The link is in the podcast description.
We've also added a link there to a funny blog post, COVID-19 research and uninitialized
variable, though you'll have to decide by yourself whether it's funny or sad.
Remember that you can extend the PVS Studio trial period from one week to one month.
Just use the CppCast hashtag when you're requesting your license.
So a few
minutes ago you mentioned how uh spac works with other languages like fortran and r i
think you said is that because users of spac are not really writing c++ programs are probably
writing something in python but it's making use of c++ libraries do you want to go into that a
little bit sure uh do you want to take it greg? Sure. Go for it. So users of SPAC are writing everything.
We have SPAC users who are writing pure C++ libraries
that only depend on other C++ libraries
and could use Conan, maybe have a Conan package as well,
but have also contributed a package to SPAC
because they don't know what their users are going to want.
We have some folks who exist in kind of a pure ARM ecosystem, and they contribute a bunch of
our package, a pure R ecosystem, and they contribute a bunch of our packages. And those
packages all depend on each other and are fairly self-contained. Some of them have C and C++ components as well.
And then the HPC-specific thing is you get a lot of these kind of big physics codes
that use C, C++, and Fortran all at the same time.
And Python and Lua.
Yeah, and everything else that they can get their hands on
because whatever was best for this particular sub package
is what they used.
And the build system is a problem to figure out later.
And so we've got some of those at Livermore.
We've got some of those in the open source community
that use SPAC.
And they kind of have their own challenges.
And we think SPAC's good at a lot of things,
but that's really where our starting use case was
in terms of this HPC workflow
where all the languages are thrown in together at once,
and that's kind of what has defined
what features we need desperately
in terms of being able to support, you know,
intermixed C and Fortran compilers and things like that.
It's getting good.
I was wondering along the thought that Greg was talking about,
like,
do these things play nicely with like other package managers,
like pip,
for example,
like if I say,
you can make a spec environment and you can use pip inside of it.
So like environment.
Yeah. So spec environment. Yeah.
So SPAC supports the notion of virtual environments,
kind of like what virtual info do in Python.
Okay.
We said earlier that you install into separate prefixes,
but one cool thing is that you can also say,
I have this stack of software.
I want to present it to users in a different way.
And that can either be,
you know,
if they're all consistent,
you can link them into one prefix and that's like and that becomes a virtual environment um or you can put them in some
nice file system layout that differentiates them by like mpi compiler and so on um but yeah so if
you make a SPAC environment it's you you write this yaml file that says i need like these you
know six SPAC packages you can say SPAC install that it'll spit out a lock file with all the versions that it actually concretized to. But then inside of that,
if you want to use pip, you can, we do to enable that we do the same kind of tricks that virtual
does, we copy the Python interpreter in, so it thinks it actually lives there. And everything
else is like the same linked out. And then yeah, you can use pip like normal inside the SPAC environment to deploy some Python packages
on top of what you've got there.
All right, so if I create a SPAC,
I keep wanting, like, first of all,
before I get to my actual question,
how often do you like have jokes or whatever
about spackling over the problem or something like that?
Because I just keep wanting to...
There's a whole thing about want to speckle it
i don't i don't know if we've had that one so much we have uh spactivate and despactivate for
our environments though okay yeah and that was done to differentiate between deactivate um
which pip has so you could use both at the same time. All right, so I create a spec environment. I install my favorite version of, I don't know,
Boost and libformat or something like that.
And now I want to compile my C++ project
that uses CMake inside of there.
What does this look like?
Does CMake just find those packages?
Because, like, what is...
You want to take it? Go ahead. When we activate the environment, like? Does CMake just find those packages? Because, like, what is... So when we...
Go ahead.
When we activate the environment,
we
put all of the paths
to the things in the environment in your
user environment.
And so one of the...
And so we have what we call prefix inspections.
We look at the package, we see it has
a bin directory. Okay, see it has a bin directory.
Okay, if it has a bin directory,
that bin directory gets added to your path.
If it has a lib directory, that gets added to your LD library path.
And one of the prefix inspections we do
is just for the prefix itself,
which every package has,
and we stick that in CMake prefix path.
So CMake is going to go look
where all of our packages actually are.
We try really hard to make it so that when
you activate the environment that
you're going to find all the packages that are
in the environment. That sounds pretty
fancy. It sounds like if I'm
managing three clients at
once with my contracting
work that is, this
could be helpful
instead of having three virtual machines shut up set up with
all of those things which is how i would have been working for the last 10 years i can have one
and have spec yeah and you can we have support for containerization too so if you make a spec
environment um you can take that and you can deploy it on bare metal um it'll it'll and you
know you can choose whether you want it in one prefix
or all separate or whatever.
But I do think that by bare metal,
you do actually mean an operating system.
You don't mean like it's not an operating system by itself.
It is not yet an operating system by itself.
Yeah.
If we got down to LibC,
we could start thinking about really minimizing the environments and having a self-contained stack.
We have not gotten down to that level yet.
So SPAC is basically stopping at the compilers and runtime libraries.
But yeah, you can then call SPAC containerized in your environment,
and it'll spit out a Docker file that builds you a container that has that same environment in it.
That is fancy.
Yeah. So you can use it to have the same environment
deployed on bare metal
as you might run in the cloud in a container.
And the abstract description
with the package names is similar,
or, well, the same.
And then the lock file that you get
in either of those environments is different.
Huh.
So if, you know, GCC 12 were to come out tomorrow and it destroys abi compatibility and
breaks all the uh standard library you guys are just like no big deal we'll just we're we're ready
for this go ahead press the button we can rebuild with gcc 12 even though it broke all the things
yeah and then we'll we'll patch all the packages that stuck w error in there for all the new
compiler errors that show up when that happens um but you know aside from things like that um that's like a c i can as a service
announcement here psa don't put w error in your public flags on your projects that's uh
yeah help your local package manager stop using W.
We may just, I mean,
so one thing spec does that may be interesting to you guys is we use
compiler wrappers internally.
And so, I mean,
the way that we get all the R paths that we mentioned earlier,
so that the libraries know where to find their dependencies in there is we
have wrappers that inject the R paths and include and lib paths for like your
your dependencies so i mean that's one way that we try to isolate these things in some ways like
a spac install in its own build environment it thinks all the dependencies are just installed
on the system because things like you know auto tools builds that test for whether they can include
a header they just work um and that's how we let you deploy with different flags.
They get injected via the compiler wrappers.
And packages can opt out of that and pass them through the build system if they want to, but that's our default.
So yeah, I'm very, very tempted to take the compiler wrappers and just start stripping WER out.
We could do something like that.
And I think that would make things work pretty well is it fair to
say that this only works on linux based off of our conversation now at this point um it so it it does
at the moment um but we are working with it it works on it works on mac os also yeah that's true
okay so unix like operating systems i guess at the moment but windows support is underway
and so uh kitware is working on that along with TechX.
And they have some preliminary builds of initial packages done.
So like Gtest built with an unmodified SPAC package on Windows
with their modifications to SPAC.
There's a lot of plumbing to rip out,
but we're looking at some interesting things.
So like one thing that it's, it's, it's cool working with the kit where folks, cause they
come up with a neat build things.
Cause they've seen it all via CMake.
They, you know, we talked about this RPath thing and, you know, PE doesn't support RPath
on, on windows.
You did the executable format just doesn't have it.
Right.
Although you can, and the tooling doesn't support it. Right. Um, although, um,
you can,
and the tooling doesn't support it,
but you can have in the format, a full path to your dependency library in a windows binary.
And so Brad King came up with a way to hack dot lib files on windows to get
something like our path so that we can use our tooling to,
you know,
not have this restriction that windows has,
that the libraries basically have to be
deployed alongside the executable
if you've experienced
that. So we're hoping we can
have kind of the same link model
when we get Windows support as we do
on Linux
and macOS because our path is
super helpful for this kind of stuff. We don't want
library hell.
So we've talked about how SPAC is built
very specifically for the HPC research community,
but it does sound like it could be very useful
for kind of more general use case application developers
who want a package manager.
Do you agree?
Like, does it make sense for non-HPC,
non-research developers to use SPAC?
Yeah.
And are those packages available?
Like, you know, lots of other libraries not necessarily meant for...
Oh, I mean, like, CalSA is in SPAC.
Like, all sorts of things that are not HPC are in SPAC.
There's 5,500 packages and, you know...
Oh, wow.
Yeah.
We joke that I'm the first native speaker of spack because i started working
on it straight out of undergrad and didn't really have a lot of experience with the linux package
managers i use spack as my package manager on my laptop yeah me too so you don't even bother using
your your you know pac-man or or whatever i'm on a Mac, so it would be better.
But I don't have Brew installed.
I just, well, and since I work on SPAC,
if a SPAC package doesn't exist for something I want,
I just create one.
And then I keep using SPAC
instead of having to use another tool.
Yeah, you can do, you can just say SPAC create URL
and it'll examine your tarball or whatever
and make a template SPAC package for you to hack on.
It'll say, oh, this is a CMake package.
So I'll make you a CMake boilerplate build and you can go and write some code to actually
execute the, you know, CMake and make install.
I guess I was just going to ask how easy it is to make a SPAC package if you want to add
a new library.
It sounds like it's extremely easy.
Well, yeah, I mean, it depends on how complicated your library is,
right? If you're making a new SPAC package, and luckily you don't have to for TensorFlow,
then it is rather complicated, right? Like there's a lot of options, a lot of dependencies to add.
You're still responsible for adding your dependencies and setting, you know, version constraints and things like that. So if it's a big package, yeah, it can be complicated. But
for something simple to get you started, yeah, I mean, I think you can just say spec create,
give it a URL to the thing, and it'll generate a recipe. If you're a developer of a package,
and you have a good sense of what your dependencies already are, then it then it's pretty easy. If
you're looking at someone else's package, and they haven't told you what the dependencies are, then you have to figure that out. But beyond that, it's kind of as easy as their build system makes
it. You have to figure out how to pass the options that it needs. If their build system is arcane,
maybe you've got a little bit of digging to do to figure out exactly what the options are. But
if their build system is reasonable, you just pass the obvious options. And that's, that's it.
And it's been around long enough that we have, you know, some canned support for the common
build system. So it gets back to I don't know if you saw Robert Shoemakers, the VC package guy,
his talk at CPP con about don't write packageable software, like make sure that your software is
using something known
and don't try to do your own thing
because it's better for you
because the people installing your software
don't understand your bugs.
They understand CMake's bugs.
So this can actually, I'm sorry,
but I was just browsing through your package database.
It looks like if I go to install,
if I'm just starting from scratch
and I'm going to install libformat, for example,
because it's one that I just looked at,
which depends on CMake,
it will also to install CMake automatically so that it can then.
Yep.
It'll go down to the build dependencies.
It really is like a full operating system package manager.
It's almost like a part of me is like,
in a way you reinvented gen two Linux,
but people have said that before.
Yeah.
Gen two is single prefix, right?
And so it does not have the same kind of multiversion.
They have slots in Gen 2, but it's not the full
combinatorial space like SPAC is.
Right.
Yeah.
Gen 2 is also a full OS, and it goes down to LibC.
And so that's one distinction, at least for the moment.
I was going to say, you're not that far off.
Yeah, exactly.
One of the things we're adding is better support for compilers.
And I mean, the incompatibilities that we see now in builds with SPAC
have to do a lot with the runtime libraries underneath compilers.
So if you're doing mixed compiler builds, which our code teams do,
and you want to build with the Intel compiler for most of your stack,
but you have some things that only build with GCC,
getting the runtime libraries to be compatible there,
especially Libstead C++, can be pretty complicated.
And so we're trying to model that fundamentally in SPAC.
We want to have Libstead C++ in the DAG,
and we want to be able to ensure that the Lib know, the Libstd C++ that this package
used is compatible with the one that this other package used, and they can be linked together.
Or, you know, ideally just use one and get all the compiler flags correct so that, you know,
Intel and GCC are using the same one. Or, you know, with Clang this comes up, it can do Libstd C++
or Lib C++. I am legitimately confused as to why the DOE projects
that I've been working on for the last 10 years
haven't considered using spec for dependencies.
It's getting more uptake.
Which ones are you working on?
I've been involved with EnergyPlus and OpenStudio
off and on for 10 years now.
I'm assuming you're at least familiar with EnergyPlus.
Actually, no. So what is that?
It's building energy modeling simulation software.
It's the other part of the Department of Energy that's not involved in nuclear physics.
Okay. Yeah, so we're mostly with the
Office of Science and the NSA labs, which are all nuclear physics.
Well, not all. There's a bunch of climate science and other stuff that goes on.
It's a bunch of climate science and other stuff that goes on. I mean, it's a big place. Right.
Yeah.
So, I mean, I guess I would say SPAC, it's a general package manager.
And if you're thinking about, you know, how that applies to C++, I would say that, like, if you have an ABI problem in C++, like, people in HPC have probably seen that either in C++ or somewhere else.
So, like, we're solving, in some sense, a more general problem than what the C++ folks are doing.
We have all the same problems.
Fortran is worse about ABI than C++. Even the module files change compatibility
with compiler minor versions sometimes.
So it gets pretty nasty.
And because we support that,
I think it's something for the C++ folks
to consider pretty seriously because C++ apps often aren't just C++.
Like, you know, we link in a lot of Fortran, we link in a lot of other stuff.
And at some level, you get into this interop state where you need something that really thinks about all these packages.
Well, we're starting to run out of time.
But before we let you go, I'm wondering what else is on your roadmap for the future.
You mentioned that Windows support is being worked on anything else you want
to highlight,
um,
window support,
better compiler support,
um,
the ABI analysis that we're doing under build.
Um,
and then Greg,
you got anything else?
Developer features,
developer features.
Um,
we want it to be that you can just specify a get hash and we'll treat that
as a version and we'll we'll actually pull the git repo
and we'll look for the the versions we know about and we will figure out where your commit that you
specified sits between those versions so that if you have something in the package that says
if it's version three or earlier uh it needs this flag and otherwise it
needs this other flag so we need to figure out then whether this random commit that we're building
is before or after three right to be able to throw the right build system flags or even maybe use the
right build system if you switch to cmake at a certain point yeah So that's something that we're working on. Binary bootstrapping
for SPAC dependencies. The default concretizer in SPAC is still the original one. And we have
this new one with ASP. But ASP isn't pure Python, we can't just vendor it into our project. And we
really like the fact that SPAC currently, you can just get clone SPAC and then run it.
And so we're working on having binaries built on many Linux.
So they'll work on all the operating systems and built for the base architectures,
ARch64, x86-64, PowerPC64, Little Endian, that have the binaries for Klingo,
which is the ASP solver that we're using,
so that we can just automatically grab those binaries
when we need them and use the more advanced solver
right from the get-go instead of having to bootstrap it.
Well, I guess that would be bootstrapping it,
but instead of having to bootstrap it with a source install
that requires then concretizing with the old one to get there. Yeah. And for
developers, I mean, the right now you can make a SPAC environment and say, you know, SPAC develop
some package in your DAG. It'll, it'll check out the source and you can work on it and build the
whole stack with that thing. It'll rebuild the stuff that depends on it. If you modify the
source, um, but get integration Greg was talking about is so that you can just,
you know, check out an arbitrary version of the thing.
We want to make that integration smoother right now.
It kind of has to be written into the package.
Okay. All right. Well, it was great having you both on the show today.
Thank you so much for telling us about SPAC.
Anything else you want to plug before we let you go?
No, it gives back a try. I mean, and you know,
join our Slack.
There's over a thousand people on there and,
and there's,
um,
you know,
help,
um,
if you need it.
Um,
we're also on the CPP Lang Slack.
So there's a SPAC channel.
It's not too active at the moment,
but we'd love to see more C++ people.
Give us a try.
Awesome.
Oh,
I need to get on that one.
Yeah.
All right.
Thanks,
Todd.
Thanks,
Greg.
All right.
Thank you. Thanks a lot. Thanks. That was awesome. Thanks, Greg. All right. Thank you.
Thanks a lot.
Thanks.
That was awesome.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in,
or if you have a suggestion for a topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate if you can like CppCast on Facebook
and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on
Twitter. We'd also like to thank all our patrons who help support the show through Patreon.
If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode was provided by podcastthemes.com.