CppCast - Spack

Starting point is 00:00:00 Episode 301 of CppCast with guests Todd Gamblin and Greg Becker, recorded May 18th, 2021. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss a documentation tool and floating point numbers. Then we talked to Todd Gamblin and Greg Becker. Todd and Greg talked to us about SPAC, the package manager for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing?

Starting point is 00:01:22 Doing okay. Don't think I have any news to share or anything. How about you doing today? I'm all right, Rob. How are you doing? Doing okay. Don't think I have any news to share or anything. How about you? Nothing at the moment. Although I guess we passed six years doing this now. Because I got a bunch of LinkedIn comments from people. Congratulations on your work anniversary. And I'm like, what are you talking about? Yeah, that sounds right.

Starting point is 00:01:40 I mean, we started in like February or the very episode was in February, which was just me and John. Yeah, I didn't join you until May, something like that. April or May, yeah. But yeah, six years going strong, and we're now past another big round number. Yeah. Numbers are meaningless, it's fine. They are. I mean, the number's not even visible the numbers for our internal tracking really right okay uh well at the top of every episode i'd like to read a piece of feedback uh this week we got a comment on youtube this from aaron saying hey guys just wanted to say thanks for producing such high quality content so consistently i've learned so

Starting point is 00:02:24 much about steeple sp++ over the years of listening. Hope to do an empty crate training course one day. And yeah, well, thanks for listening, and I'm glad the content is appreciated. That would be fun. I don't know when I'll next be offering a class that random people can sign up for. We'll see.

Starting point is 00:02:40 You're starting to do more training now that the pandemic is coming to an end? I'm starting to plan training plan more training okay there's nothing that is definite yet um but it looks possible that maybe at some upcoming conferences that are meeting in person and potentially um some on-site corporate kind of classes will be coming up okay, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cbgas.com. And don't forget to leave us a review on iTunes or subscribe on YouTube.

Starting point is 00:03:17 Joining us today first is Todd Gamblin. Todd is a computer scientist in the Advanced Technology Office in Livermore Computing at Lawrence Livermore National Laboratory. He created SPAC, a popular open source package manager aimed at HPC, which has a rapidly growing worldwide community of contributors. He also leads the packaging technologies area of U.S. Exascale Computing Project and Build, an LLNL strategic research initiative on software integration and dependencies. His research interests include dependency management, software engineering, parallel computing, performance measurement, and performance analysis. Todd has been at LLNL since 2008. And also Greg

Starting point is 00:03:52 Becker. Greg is a computer scientist in the tool development group in Livermore Computing at Livermore National Laboratory. His focus is on bridging the gap between research and production software at LLNL. His work in software productization has led him to work on SPAC, an open-source package manager for high-performance computing,

Starting point is 00:04:10 with users and contributors all over the world. Greg also works on the Build Research Project, working to resolve dependency integration issues related to binary interfaces, and on internal LLNL infrastructure using SPAC. He's been at LLNL since 2015. Welcome both to the show. You're muted, Todd. Thanks. So

Starting point is 00:04:29 can one of you tell me what exascale means? So exascale is it refers to exaflops. So in supercomputing, we measure everything in flops. So floating point operations per second. And exaflop is 10 to the 18 flops. And so the supercomputing generations tend to go in these multiples of 1,000.

Starting point is 00:04:49 So before exascale, there was petascale, and there was terascale before that. And those are the milestones, and those are the things around which the funding sort of revolves. We basically, there's an argument made to Congress every 1000x performance that we need funding to continue to get the performance to this level to support, you know, all the physics simulations that we do at Livermore and so on. And so, I mean, that's what the app is. It's designed to build a robust software ecosystem for exascale computing. And so are you at exascale now? No.

Starting point is 00:05:29 So it's aimed at machines that are going to start hitting the floor late this year, early next year. And, you know, it's been going on for a few years now. And so Frontier at Oak Ridge, Aurora at Argonne, and then El Capitan at Livermore are the three machines that are going to, those are the US Exascale machines. And maybe someone else will, you know, come out with an Exascale machine in the meantime or before or after ours too. But we're hoping to hit it first. So now I'm just going off rails here, but as soon as you actually hit Exascale, are you going to change the name of your department or the project so that it's whatever's after exascale, since you're always

Starting point is 00:06:08 pushing for that next 1000 times? I mean, that has happened, right? Like a lot of research grants before we were shooting for exascale, we're shooting for petascale. And so, you know, those, those culminated with, you know, to some extent, I guess the blue gene, well, let's see the Sierra system that we currently have on the floor at livermore which is a big power nine volta system and so and then before that you know our computing facility used to be called the terra scale computing facility at livermore and uh it's been renamed to i think the livermore computing complex because the name sounded slower and slower over time so yeah these things happen in. So what's after Exascale?

Starting point is 00:06:45 Yeah, I was thinking the same thing. ZetaScale. ZetaScale is next. I'd say the US Exascale project in particular is like a limited duration project. So that funding will be going back to kind of general computer HPC research instead of that particular form of project

Starting point is 00:07:07 going on to Zeta scale. It might be reconvened later to get there, but it won't go straight through. Okay. All right. Well, Todd and Greg, we got some news articles to talk about, so feel free to comment on any of these

Starting point is 00:07:21 and then we'll start talking more about SPAC and some of the other things you're working at at LLNL, okay? Yeah, sounds good. All right. So this first one we have is POXY, and this is a library on GitHub, and I think it's one of these ones we've talked about so many of the libraries from this particular author, I believe.

Starting point is 00:07:39 Right, Jason? It looks familiar to me. But POXY is a documentation generator for people's plus based on doxygen and it looks like it's just a little prettier than uh default doxygen is uh clean some things up and you can take a look at what some of the output looks like on um a link to the tumble plus plus documentation which is another project from this author. And the docs looked really

Starting point is 00:08:09 nice. They have a really nice dynamic search function. So definitely worth checking out if you're already using Doxygen. I think it would be pretty easy to switch over to using this. I've always heard it Tumul. It could be, yeah. I have no idea. I'm not going to weigh in an opinion here.

Starting point is 00:08:26 Although I agree. It is, can be like a bit of a pain to get doxygen set up. Nice. So to have something that gives you a nice, good, clean, uh,

Starting point is 00:08:35 template to start from sounds good. And I also like, there's a nice trudge. If you go to the Toml plus plus website that all these docs built with their flavor of their doxygen front end here have um uh there we go yeah the examples have a highlighted link here try this code on compiler explorer and it takes you straight over to compile explorer link where you can actually play with the examples very nice okay uh next thing we have is a blog post on Red Hat's developer blog. And this is Mostly Harmless, an account of pseudo-normal floating point numbers.

Starting point is 00:09:13 And this refers to an interesting set of errors they had where isNAN would not work correctly with some like malformedformed doubles is that right jason that's how i read it yeah yeah 80 bit intel floating point representation it's definitely a little scary because is nan is something you would call to make sure you have something valid and if that itself can choke on uh a malformed double you're in a bad situation. Is this the kind of thing you all have to deal with? Floating point representations with all of your fancy exaflops? Yeah, yeah. There's a lot of research going into non-standard floating point in terms of trying to get a little bit more performance out of the system,

Starting point is 00:09:59 especially with AI workflows and machine learning using a lot of single and even half precision. And so that's what's becoming cheap in terms of hardware because that's where the demand is. So then there's a lot of research going into how can we take our double precision algorithms and run them at 4x speed on half precision floats and things like that.

Starting point is 00:10:23 Can we get similar correctness with that performance boost sorry there's also some really long-term research on non-standard floating point representations like unums and posits so non-ieee which have better error properties for for some stuff oh okay yeah because i'm just thinking like half point half precision or something like what 16 bit floating point numbers if you're if you're you know doing all these fancy calculus simulators calculus physics simulations like how do you get a reasonable answer back with that research livermore has a whole bunch of applied math people who specialize in that kind of stuff it's like way beyond what i'm what beyond my expertise but i mean there's

Starting point is 00:11:14 a whole solver team um there's people who work on finite element methods and basically all the numerical scientists are well steeped in in the ways of floating point errors and limiting them. And I think partially you take a little bit of domain knowledge and you say, okay, this is the part of our calculation that's really sensitive, and this part isn't so sensitive, and maybe we can get a performance boost on the part that's not so sensitive. Okay, interesting.

Starting point is 00:11:41 Yeah, some of the, I mean, there have been people who've done dynamic analysis to try to find parts of the program where, you know, for a given run, certain calculations are not, the error is not significant enough that you would need a double precision floating point number. And I mean, most of that is targeted at GPUs where half precision floating point goes way, way faster than double precision. And so we can get a pretty big speed boost, like 3x for going half precision.

Starting point is 00:12:08 It also limits your, if you can rework your data structures to use smaller floating point numbers, it limits your, it reduces the memory bandwidth required for the calculation because you're not transferring as much data. And so if that's your bottleneck, you can make your algorithm faster that way too.

Starting point is 00:12:23 So are either of you or any of the people that, you know, at L and L, L and L involved in any of these standard committee things to try to get 16 point floating 16 bit floating point into the standard? I don't think that's been our focus. I mean, we send people to the standard committee. So you may know Tom Skoglund. He's from Livermore. He's been to a few standard committee meetings. And I guess Chris Earl used to go. We send some people. But I don't think 16-bit floating point's been a focus for us. Because most of our physics calculations are still using double precision for now.

Starting point is 00:13:04 I think a lot of these are still in the research phase. Okay. And then the last blog post we have is on Arthur O'Dwyer's blog. And this is removing an empty base class can break ABI. And obviously we've been talking about this subject a lot lately. And this is pointing out how, you know, with some changes in C++14, I believe, you could take out base

Starting point is 00:13:29 classes that no longer are necessary, but the compiler implementers actually can't because it would break the ABI. Right, Jason? Well, yeah. And even more to the point, it's me, it's less interesting that removing them would break ABI in a way, or removing them as base classes, because unary function and binary function not only are no longer necessary, they were removed in C++17. A conforming C++17 compiler shouldn't even have those types, but they still have them as base classes for things like std plus so that they don't break API.

Starting point is 00:14:07 Yeah. Greg, I noticed in your bio, there was a mention of dependency integration issues related to binary interfaces. Is this a subject you deal with a lot, API compatibility? Yeah. So we started last year, I guess. Todd canBI compatibility? Yeah, so we started last year, I guess. Todd can correct me. Yeah, default. Todd's actually the PI on the project.

Starting point is 00:14:32 This research project on binary dependency management. And the idea was that we both work on SPAC a lot, and we see the version constraints that users give to their package manager. And someone has to write it by hand, and hopefully they use SemVer, and that makes it a little easier.

Starting point is 00:14:56 But really, we as developers generally don't have a great understanding of our own application's binary interface, let alone anyone else's that we're trying to integrate with. And so if we can go in and actually look at the binary interface, you know, use something like Red Hat's Lib Abigail, get the symbols, see which symbols are actually accessible from the functions that we're interested in, then if our dependency bumps a version and we know actually this version only changed types that we don't touch, well, then we don't have to be too worried about a conflict there. But if they release a patch version,

Starting point is 00:15:36 but it changes the type that we touch, we might have to treat that like a major version upgrade in terms of dependency compatibility. So then are you going to... Oh, sorry, John. Yeah, so the goal is to really take that kind of information and integrate it with the dependency solvers and package managers

Starting point is 00:15:52 so that we're not using constraints from humans to figure out if two libraries are compatible. We're actually using the ground truth from the binary. That's the sort of long-term goal for the project. So then are you in the situation where if you do detect or are made aware of one of these breaking binary compatibilities that you can rebuild world and redeploy when necessary? Yeah, we usually are,

Starting point is 00:16:15 with the exception of some vendor libraries on the machine. So a lot of people like to use the host MPI or some host math libraries, and so we don't typically rebuild those um but for the most part hpc stuff is open source or at least open to you while you're working on it like for the export controlled codes and stuff you're you have the code if you're on the project so we deploy stuff from source and so yeah we rebuild everything but it takes a while nobody likes that they they'd like to be able to develop quickly. And, you know, we'd like to be

Starting point is 00:16:45 able to reuse binaries on the system more easily, because right now we do kind of rebuild the world. And one of our complaints is like, why is SPAC building Perl? Or why is SPAC building, you know, this other thing? And it's for reproducibility. So like system Perls are not all created equal, some of them lack certain modules. And if you rely on them for your deployment, you're going to break some places. And so, yeah, we rebuild from source a lot, but we want to accelerate it by building from binary. And so we're trying to enable that.

Starting point is 00:17:15 I feel like it would be tempting if I were in either of your positions to fall into the guy who's always in a bad mood role. So that if someone's like, I don't understand why we had to rebuild Pearl and be like, I do not even get me started on why we had to rebuild Pearl and then just keep walking, you know,

Starting point is 00:17:33 like we haven't tried that. Yeah. They're usually a bad mood when they come to us though. So it's, they've, they've got the preemptive bad mood strike going on. Yeah. And yeah,

Starting point is 00:17:44 we try to deescalate. We work with a package manager, which means our users work with build systems. So they're always in bad moods to begin with. Yeah. If you're asking for help on the package manager channel, you've probably been through something. What does a rebuild world look like to you?

Starting point is 00:18:03 Is that like hours, days, minutes, months of rebuilding? It depends on the project. I mean, so SPAC does parallel rebuilds. It'll synchronize the DAG bottom-up with file system locks. And so you can S run on the single node and rebuild 300 packages in 90 minutes for pretty sizable packages on a big node. So it doesn't take that long.

Starting point is 00:18:28 But yeah, go ahead, Greg. It depends what sort of rebuild world we're talking about. Are we talking about someone who has like an application that they care about? Yeah. Or are we talking about someone who's actually deploying like the system software that they make available to their users someone who has an application if they run if they've got a really complicated application and they're not even building in parallel we're probably talking three or four hours of rebuild

Starting point is 00:18:56 time um but then you go to a system deployment where they're doing all the compilers all the mpi implementations laypack, they're probably talking, if they do it in parallel, maybe 12 hours to redeploy everything. I think that's what some of the system folks at Oak Ridge who do that have been telling us. Yeah, they deploy about 1,300 packages for their users. And so their rebuild is like an overnight thing.

Starting point is 00:19:23 It's like building an entire distribution, basically. Yeah, and that's pretty much what life at an HPC site is like if you're on the facility side, like you're deploying all the MPI implementations and things in libraries that your users use. And combinatorially for different compilers and different MPI implementations, different math libraries, whatever affects ABI. And yeah, that can take a long time. And not just whatever affects ABI, but whatever we think might affect ABI, because like we said earlier, we don't actually have a great detailed understanding

Starting point is 00:19:56 of what that is all the time. Right. Well, at this point, I think it's probably a good idea to actually tell our listeners a little bit more about what SPAC is, because that's what you're both here to talk about. And we've mentioned it a few times. So who wants to go?

Starting point is 00:20:10 What is SPAC? I guess I'll go. It's a flexible package manager for building things all the different ways. And so if you think about your standard Linux distro package manager, it's for building one version of something and for upgrading it when a new version comes along. And SPAC is not that. It's a system for building the versions you want of something and maybe lots of them.

Starting point is 00:20:35 And so we call it combinatorial builds. You can essentially take a matrix of compilers, MPIs, different dependency versions, different flags, different options on the dependencies, and build all of that from source. Or you can cache it as a binary and reinstall it from binary. We do relocatable binaries. But it provides essentially two languages, if you will. There's one for the command line to talk about the parameters you want for the build. So you can say, hey, build HDF5 plus the high-level interface

Starting point is 00:21:05 plus the Fortran interface. Or you can say, you know, build boost with streams or without streams. And you can say particular versions or what compiler you want that built with. And then there's a package language. All the package files are basically Python files, but they're parameterized by these things.

Starting point is 00:21:22 And so you could think of SPAC as a system that takes this sort of abstract spec from the user, makes it concrete into, you know, something that you can actually build with all the options set on it, and then uses those package files to instantiate that build. So they're written in a way that you don't have to rewrite the package every time something changes. You want to add anything to that, Greg? I think the key thing that then ends up being different about SPAC because of this is our core use case. How do we support that? The key thing that ends up being different is that

Starting point is 00:21:56 we're not installing into like a system location. We're installing in user space. And we install into these complex paths that actually include a hash at the end. That's the full provenance of the package and all of its dependencies. Wow. So because that hash is different, I can actually have coexisting installs of the exact same configuration of Boost with all of its dependencies the same, except for one dependency I changed the

Starting point is 00:22:24 version of way down at the bottom of the tree. And those are going to be separate installs. They're going to live in separate places on my system. And then you can SPAC load one of them, SPAC unload it, SPAC load the other, swap between them. Okay. Can we go more into why is that necessary to have all these different built versions of a library like boost on your system? Different applications depend on different versions of boost. So I mean, we have users who even in the same workflow, they might use a mesh partitioner that depends on one version of booster HDF five or some other library and then their application that they want to use

Starting point is 00:22:58 in the same environment depends on another one. So you don't you don't always have a software stack that actually, you know, is consistent. Different applications can have different versions of things. And the way that we link stuff in SPAC, we use our path. And so basically every install knows where its dependencies live and which ones it built with. And so like the, one of the core design philosophies is you, you run the way you built because I mean, basically we, we see a lot of people screw this up,

Starting point is 00:23:28 right? LD library path, people who mess with that, they they're, they're in for a lot of pain when they try to deal with something like this. Cause you know, it's a global variable in your environment that tells the linker where to

Starting point is 00:23:39 find stuff. And if two different programs depend on two different versions of the library, then you're in trouble. So the, the idea with SPAC is that you or the package manager knew what you were doing when you built this thing. And you've probably forgotten that long since when you get around to running it. And so we want to run the way you built. So if I have three packages that all rely on one specific version of three applications that all rely on one specific version of three applications that all rely on

Starting point is 00:24:05 one specific version of boost yep um and then a fourth application that uses a different version of boost i will have two versions of boost installed yeah that's right okay i was just trying to think like initially i thought you were saying that it would be kind of like i cannot remember the name of these tools but where you can like bundle all of your Python stuff into, okay, where it would have everything bundled together for that application. But no, you do have a shared location.

Starting point is 00:24:34 Yeah. So every package goes in its own prefix when we install it. And the prefix gets a hash that is basically a function of that package's configuration and all of its dependencies configuration. So it's a Merkle hash. It's like a Git commit tree. And you share where it's possible,

Starting point is 00:24:49 but you can differentiate where it's not. And then for a single build, you know, SPAC install boost, if there are multiple dependency paths to the same package, those are unified. So we use a directed acyclic graph as our internal data structure. And we make sure that there's only each package appears only once in the DAG. Okay, and that's, that's the idea there is that your, your linker only supports one version of a given library in a particular process. And so you can have two programs running

Starting point is 00:25:22 with different versions of a library, but at least the way that all c runtimes that i know of right now work um you you can only have you know one version of a library linked with your program unless you want to have nice race cases at runtime right and for c++ i mean it's actually an odr violation to do that so um you know it doesn't stop us it does not stop us and i kind of like that because I would like to exploit it one day, but we're not there yet. So I'm trying to then wrap my mind around like, I mean, cause Conan does kind of similar things,

Starting point is 00:25:55 right? I'm assuming you've tried Conan. Okay. I have a cursory familiarity with Conan. Right. So you can, if you install a package with a whatever specific set of compiler flags and compiler version whatever then it it makes a hashed installation as well

Starting point is 00:26:13 and then when you go to use conan then it says okay oh i want this version and oh by the way this is all the flags that i'm using and it goes to look to see if you have that installed. And does that sound similar to what you guys are doing? Yeah, I looked into this initially. Conan and SPAC are similar in a lot of ways. SPAC's actually older. It started in 2013 and Conan came along later. But Conan, it seems like if you pick a version on a package, it doesn't do anything to ensure

Starting point is 00:26:42 that you're running with what you built with. So you can pretty easily get yourself into an ABI nightmare that way by setting versions on things throughout your stack and forcing them to be a certain thing in the resolver. Whereas with SPAC, at least right now, we deploy as if built from source. And so there's this sort of hash structure that we talked about where you get all the dependencies that you built with, and we don't try to mix that stuff at deployment time.

Starting point is 00:27:11 I mean, that's actually one of the motivators for this project that we talked about earlier, right, for build, is we'd like to have a little more flexibility in deployment to reuse more binaries, but that means, in many cases, potentially violating ABI because you're no longer in a situation

Starting point is 00:27:27 where you're deploying as if you built from source. So we're trying to add support and we have some preliminary prototypes for stuff where we would swap in a binary with another ABI and we keep the provenance for that so that we can go and check the metadata and understand if we did violate ABI somehow when we do that deployment.

Starting point is 00:27:47 So we're moving to a model where we could safely reuse binaries, but I don't think we're quite there yet. That's the research project. Okay, that makes some sense. Yeah, because yeah, with Conan, it's a build packaging tool, package manager. But when it goes to deployment, you're on your own. Like, how do I get those binaries? Or do I just try to statically link everything or whatever? Yeah, that's right. Okay. Yeah, I'd say the package model is different. And I mean, SPAC also has, I mean, I was looking

Starting point is 00:28:16 through, I guess, key differences between SPAC and Conan, I'd say Conan has more of a C++ focus, obviously, because we're trying to support a bigger ecosystem. So we're trying to support Fortran and Python and R and all the other things that you would combine with your C++ libraries. The other difference between SPAC and Conan is the dependency resolver. So we have recently gutted SPAC's concretizer, which is what we call the dependency resolver,

Starting point is 00:28:43 and replaced it with an answer set programming solver. And what that looks like is it's prologue that boils down to satin on the back end. Wow. And so it's kind of cool because you write your dependency resolution rules in first order logic. And users don't see this, so don't get scared. That's kind of awesome. It's actually legitimately the first commercial use of Prolog

Starting point is 00:29:10 I've ever heard of. It's not Prolog, but it's... But you said it's like Prolog that boils down into SAP solvers, right? It's a very similar-looking language, but it's a little different. It's got some quantifier statements that I don't think are directly analogous to anything in Prologue, for example, that you can say

Starting point is 00:29:32 I need at least one and at most one things that look like this. And I think the reason we added this is because all of these conditional dependencies in our packages got to be really complicated to manage. SPAC has a very expressive DSL. You can say that, you know, boost depends on MPI when the MPI option is enabled and it'll, you know, and MPI is a virtual package in SPAC. So you can use any of the MPIs that support the API that you want there. So like you can build boost with MPitch, boost within BatPitch, boost with OpenMPI. Those are all separate installs, right?

Starting point is 00:30:07 And same for, you know, a normal library that's not HPC specific. All of that, you know, conditionality in the solve meant that, you know, our dependency solves were becoming increasingly wrong with our old greedy solver. And so the new one is, it's pretty awesome to see what the AS greedy solver. And so the new one is, is that it's pretty awesome to see

Starting point is 00:30:25 what the ASP solver can do. Because it's essentially doing sat plus optimization on the back end. So we optimize for like 11 different criteria in the solver to try to pick, you know, recent versions, default values of build options, and so on. And the solver is, you know, it can tweak the options. So if you say I want this package built with, you know, it can tweak the options. So if you say, I want this package built with, you know, MPI, or with MPitch somehow, it'll figure out that, oh, on that package, I need to flip the MPitch option, or the MPI option on, so that I can depend on MPitch and get it in the graph. And, you know, there's a lot of cool stuff like that. It'll, it'll solve for, if you ask for a particular compiler, and it knows that that

Starting point is 00:31:06 compiler does not support your micro architecture, it'll, you know, set the micro architecture to something lower that the compiler can actually generate. So you're sort of simultaneously solving for compiler architecture support, and, you know, pick an architecture for a compiler, it'll warn you, if you say like, hey, I have the GCC 4.x from my distro in my configuration registered and you try to install something built for Skylake,

Starting point is 00:31:34 it'll say, yeah, you can't do that. That compiler does not support Skylake. So we've spun out a separate project called ArchSpec, which is the detection and compiler support levels for all of the different architectures. So, you know, if you read proxy PU info and you get these flags, you know, it's a Skylake. And if it's a Skylake and you're trying to build for GCC. Well, that's supported starting at this version.

Starting point is 00:32:06 And in the first two versions that support it, the flag is this, and then it settles in to be dash MRG equals Skylake, things like that. And so it's basically a giant JSON file with all of this information and then a little front end around that to do the detection.

Starting point is 00:32:24 Wow. And that's something you can use in another tool because like Greg said, it's just a JSON file. It's got a Python interface on top of it, but you can use that to reason about compatibility. So if you want to tag your binaries with the microarchitecture they were built for, then you can take a binary, look at it,

Starting point is 00:32:41 say, oh, this is built for Skylake with AVX 512. And you can say, oh, this is not for sky lake with avx 512 and you can say oh this is not um you know it's not compatible with my haswell system because it's going to have instructions that my architecture doesn't support um and you know that's you can you've got like a less than and a greater than operator that you can use on these you know cpu names i'm fascinated by greg's comment that you it sounded like you said the tool can actually look at proc CPU info, whatever, and figure out what the ID flags are. On your HPC clusters,

Starting point is 00:33:13 are the CPU architectures 100% homogeneous? That depends a lot on the cluster and on the philosophy of the site that hosts that cluster. Yeah. Okay. I'd say at Livermore, we have almost entirely homogeneous clusters. Well,

Starting point is 00:33:36 homogeneous in the CPU architecture. We have heterogeneous clusters where it's CPU GPU. Right. Yeah. But, but homogeneous in the, in the CPU architecture. And that's per cluster. Yeah, and so this only comes up when you go between clusters at our site.

Starting point is 00:33:52 But there are other sites that, as they get new nodes, they add those nodes into one sort of large cluster, and they might have separate slurm queues for the different architectures. But instead of getting, you know, five new nodes and setting up a little cluster, they just add those to the big cluster they have. And now you've got a heterogeneous cluster. And so we'll see a lot of folks at sites that do that will set things in SPAC that say like, set my default architecture to be Ivy Bridge instead of whatever I happen

Starting point is 00:34:28 to be on at the moment. Because they know that that is the lowest common denominator of every chip on the system that they're targeting. There are sites like Fugaku where they have the compute nodes on the cluster are A64FX, so that's the arm

Starting point is 00:34:44 with SVE, but they have ThunderX2 and x86-64 front-end nodes for whatever reason. And that can be somewhat difficult to work with. I only thought about this because Compiler Explorer, if you do dash march equals native, Compiler Explorer will pop up a warning saying, you don't really know what CPU you're running on because amazon's clusters well that's exactly that's exactly why we wanted art spec is because we previous to this we were essentially building with march native and it if you want to distribute binaries and if and you want them to be optimized and you use march native you have no idea where the binary can be used.

Starting point is 00:35:27 So that was the motivation for ArchSpec. We want to label it with a microarchitecture, not just x86-64. Yeah, there's two ways to solve this. You can either label them, or you can just build everything for the baseline architecture and build everything x86-64 works works but it's not really appealing for high performance computing we want to get everything we can out of these chips right other folks have you seen the um x86 levels that they recently that they introduced into recent gcc and clang no i guess not so there's other folks have realized that this is a problem and they've tried to simplify the, you know,

Starting point is 00:36:07 optimized builds that you would do. So there's X86 64, which is what you're familiar with. There's V2, which I think is the Halem like V3, which is like Haswell and V4, which is like sky, like ABX 512.

Starting point is 00:36:19 Interesting. V2 is when you add like SSE four, I think. Okay. V3 is where you add like sse4 i think v3 is where you add avx2 and v4 is where you add avx512 so they're nice because i mean it's

Starting point is 00:36:37 kind of like the you know arm versions they're virtual architectures and you can build for those and get you know some optimization because the compiler can use all the fast vector instructions. But you're not building for really, really specific microarchitectures. So we're actually adding those to ARP spec. Someone contributed that recently. Pretty happy about that.

Starting point is 00:36:53 That's cool. And I guess on these supercomputing clusters, they take a while to spec out, build, and install. So even if it's brand new, it's not necessarily the CPU architecture that came out yesterday, right? It depends. So like, in some cases, we get unit zero machines. So like for our biggest machines, like the Blue Gene machines, Sierra, and then El Capitan that's coming along, we work with the vendor like five years in advance to set up that contract. And then our machine will be, you know, one of the first with a new processor generation. Okay.

Starting point is 00:37:27 Yeah. So it really depends on the system. For our commodity clusters, we're targeting something different when we buy them. We're going for price or for optimizing for cost. And so we'll typically choose something that's not the bleeding edge because, you know, you get a better price performance that way.

Starting point is 00:37:50 Sponsor of this episode is the pvs studio team the team develops the pvs studio static code analyzer the tool detects errors in c c++ c sharp and java code when you use the analyzer regularly you can spot and fix many errors right after you write new code the analyzer does the tedious work of sifting through the boring parts of code never gets tired of looking for typos the analyzer does the tedious work of sifting through the boring parts of code. It never gets tired of looking for typos. The analyzer makes code reviews more productive by freeing up your team's resources. Now you have time to focus on what's important, algorithms and high-level errors. Check out the team's recent article, Date Processing Attracts Bugs or 77 Defects in Qt 6, to see what the analyzer can do. The link is in the podcast description. We've also added a link there to a funny blog post, COVID-19 research and uninitialized

Starting point is 00:38:28 variable, though you'll have to decide by yourself whether it's funny or sad. Remember that you can extend the PVS Studio trial period from one week to one month. Just use the CppCast hashtag when you're requesting your license. So a few minutes ago you mentioned how uh spac works with other languages like fortran and r i think you said is that because users of spac are not really writing c++ programs are probably writing something in python but it's making use of c++ libraries do you want to go into that a little bit sure uh do you want to take it greg? Sure. Go for it. So users of SPAC are writing everything.

Starting point is 00:39:09 We have SPAC users who are writing pure C++ libraries that only depend on other C++ libraries and could use Conan, maybe have a Conan package as well, but have also contributed a package to SPAC because they don't know what their users are going to want. We have some folks who exist in kind of a pure ARM ecosystem, and they contribute a bunch of our package, a pure R ecosystem, and they contribute a bunch of our packages. And those packages all depend on each other and are fairly self-contained. Some of them have C and C++ components as well.

Starting point is 00:39:48 And then the HPC-specific thing is you get a lot of these kind of big physics codes that use C, C++, and Fortran all at the same time. And Python and Lua. Yeah, and everything else that they can get their hands on because whatever was best for this particular sub package is what they used. And the build system is a problem to figure out later. And so we've got some of those at Livermore.

Starting point is 00:40:16 We've got some of those in the open source community that use SPAC. And they kind of have their own challenges. And we think SPAC's good at a lot of things, but that's really where our starting use case was in terms of this HPC workflow where all the languages are thrown in together at once, and that's kind of what has defined

Starting point is 00:40:38 what features we need desperately in terms of being able to support, you know, intermixed C and Fortran compilers and things like that. It's getting good. I was wondering along the thought that Greg was talking about, like, do these things play nicely with like other package managers, like pip,

Starting point is 00:40:57 for example, like if I say, you can make a spec environment and you can use pip inside of it. So like environment. Yeah. So spec environment. Yeah. So SPAC supports the notion of virtual environments, kind of like what virtual info do in Python. Okay.

Starting point is 00:41:10 We said earlier that you install into separate prefixes, but one cool thing is that you can also say, I have this stack of software. I want to present it to users in a different way. And that can either be, you know, if they're all consistent, you can link them into one prefix and that's like and that becomes a virtual environment um or you can put them in some

Starting point is 00:41:30 nice file system layout that differentiates them by like mpi compiler and so on um but yeah so if you make a SPAC environment it's you you write this yaml file that says i need like these you know six SPAC packages you can say SPAC install that it'll spit out a lock file with all the versions that it actually concretized to. But then inside of that, if you want to use pip, you can, we do to enable that we do the same kind of tricks that virtual does, we copy the Python interpreter in, so it thinks it actually lives there. And everything else is like the same linked out. And then yeah, you can use pip like normal inside the SPAC environment to deploy some Python packages on top of what you've got there. All right, so if I create a SPAC,

Starting point is 00:42:12 I keep wanting, like, first of all, before I get to my actual question, how often do you like have jokes or whatever about spackling over the problem or something like that? Because I just keep wanting to... There's a whole thing about want to speckle it i don't i don't know if we've had that one so much we have uh spactivate and despactivate for our environments though okay yeah and that was done to differentiate between deactivate um

Starting point is 00:42:38 which pip has so you could use both at the same time. All right, so I create a spec environment. I install my favorite version of, I don't know, Boost and libformat or something like that. And now I want to compile my C++ project that uses CMake inside of there. What does this look like? Does CMake just find those packages? Because, like, what is... You want to take it? Go ahead. When we activate the environment, like? Does CMake just find those packages? Because, like, what is... So when we...

Starting point is 00:43:05 Go ahead. When we activate the environment, we put all of the paths to the things in the environment in your user environment. And so one of the... And so we have what we call prefix inspections.

Starting point is 00:43:21 We look at the package, we see it has a bin directory. Okay, see it has a bin directory. Okay, if it has a bin directory, that bin directory gets added to your path. If it has a lib directory, that gets added to your LD library path. And one of the prefix inspections we do is just for the prefix itself, which every package has,

Starting point is 00:43:39 and we stick that in CMake prefix path. So CMake is going to go look where all of our packages actually are. We try really hard to make it so that when you activate the environment that you're going to find all the packages that are in the environment. That sounds pretty fancy. It sounds like if I'm

Starting point is 00:43:56 managing three clients at once with my contracting work that is, this could be helpful instead of having three virtual machines shut up set up with all of those things which is how i would have been working for the last 10 years i can have one and have spec yeah and you can we have support for containerization too so if you make a spec environment um you can take that and you can deploy it on bare metal um it'll it'll and you

Starting point is 00:44:24 know you can choose whether you want it in one prefix or all separate or whatever. But I do think that by bare metal, you do actually mean an operating system. You don't mean like it's not an operating system by itself. It is not yet an operating system by itself. Yeah. If we got down to LibC,

Starting point is 00:44:42 we could start thinking about really minimizing the environments and having a self-contained stack. We have not gotten down to that level yet. So SPAC is basically stopping at the compilers and runtime libraries. But yeah, you can then call SPAC containerized in your environment, and it'll spit out a Docker file that builds you a container that has that same environment in it. That is fancy. Yeah. So you can use it to have the same environment deployed on bare metal

Starting point is 00:45:09 as you might run in the cloud in a container. And the abstract description with the package names is similar, or, well, the same. And then the lock file that you get in either of those environments is different. Huh. So if, you know, GCC 12 were to come out tomorrow and it destroys abi compatibility and

Starting point is 00:45:28 breaks all the uh standard library you guys are just like no big deal we'll just we're we're ready for this go ahead press the button we can rebuild with gcc 12 even though it broke all the things yeah and then we'll we'll patch all the packages that stuck w error in there for all the new compiler errors that show up when that happens um but you know aside from things like that um that's like a c i can as a service announcement here psa don't put w error in your public flags on your projects that's uh yeah help your local package manager stop using W. We may just, I mean, so one thing spec does that may be interesting to you guys is we use

Starting point is 00:46:12 compiler wrappers internally. And so, I mean, the way that we get all the R paths that we mentioned earlier, so that the libraries know where to find their dependencies in there is we have wrappers that inject the R paths and include and lib paths for like your your dependencies so i mean that's one way that we try to isolate these things in some ways like a spac install in its own build environment it thinks all the dependencies are just installed on the system because things like you know auto tools builds that test for whether they can include

Starting point is 00:46:39 a header they just work um and that's how we let you deploy with different flags. They get injected via the compiler wrappers. And packages can opt out of that and pass them through the build system if they want to, but that's our default. So yeah, I'm very, very tempted to take the compiler wrappers and just start stripping WER out. We could do something like that. And I think that would make things work pretty well is it fair to say that this only works on linux based off of our conversation now at this point um it so it it does at the moment um but we are working with it it works on it works on mac os also yeah that's true

Starting point is 00:47:17 okay so unix like operating systems i guess at the moment but windows support is underway and so uh kitware is working on that along with TechX. And they have some preliminary builds of initial packages done. So like Gtest built with an unmodified SPAC package on Windows with their modifications to SPAC. There's a lot of plumbing to rip out, but we're looking at some interesting things. So like one thing that it's, it's, it's cool working with the kit where folks, cause they

Starting point is 00:47:48 come up with a neat build things. Cause they've seen it all via CMake. They, you know, we talked about this RPath thing and, you know, PE doesn't support RPath on, on windows. You did the executable format just doesn't have it. Right. Although you can, and the tooling doesn't support it. Right. Um, although, um, you can,

Starting point is 00:48:07 and the tooling doesn't support it, but you can have in the format, a full path to your dependency library in a windows binary. And so Brad King came up with a way to hack dot lib files on windows to get something like our path so that we can use our tooling to, you know, not have this restriction that windows has, that the libraries basically have to be deployed alongside the executable

Starting point is 00:48:27 if you've experienced that. So we're hoping we can have kind of the same link model when we get Windows support as we do on Linux and macOS because our path is super helpful for this kind of stuff. We don't want library hell.

Starting point is 00:48:44 So we've talked about how SPAC is built very specifically for the HPC research community, but it does sound like it could be very useful for kind of more general use case application developers who want a package manager. Do you agree? Like, does it make sense for non-HPC, non-research developers to use SPAC?

Starting point is 00:49:07 Yeah. And are those packages available? Like, you know, lots of other libraries not necessarily meant for... Oh, I mean, like, CalSA is in SPAC. Like, all sorts of things that are not HPC are in SPAC. There's 5,500 packages and, you know... Oh, wow. Yeah.

Starting point is 00:49:22 We joke that I'm the first native speaker of spack because i started working on it straight out of undergrad and didn't really have a lot of experience with the linux package managers i use spack as my package manager on my laptop yeah me too so you don't even bother using your your you know pac-man or or whatever i'm on a Mac, so it would be better. But I don't have Brew installed. I just, well, and since I work on SPAC, if a SPAC package doesn't exist for something I want, I just create one.

Starting point is 00:49:57 And then I keep using SPAC instead of having to use another tool. Yeah, you can do, you can just say SPAC create URL and it'll examine your tarball or whatever and make a template SPAC package for you to hack on. It'll say, oh, this is a CMake package. So I'll make you a CMake boilerplate build and you can go and write some code to actually execute the, you know, CMake and make install.

Starting point is 00:50:19 I guess I was just going to ask how easy it is to make a SPAC package if you want to add a new library. It sounds like it's extremely easy. Well, yeah, I mean, it depends on how complicated your library is, right? If you're making a new SPAC package, and luckily you don't have to for TensorFlow, then it is rather complicated, right? Like there's a lot of options, a lot of dependencies to add. You're still responsible for adding your dependencies and setting, you know, version constraints and things like that. So if it's a big package, yeah, it can be complicated. But for something simple to get you started, yeah, I mean, I think you can just say spec create,

Starting point is 00:50:48 give it a URL to the thing, and it'll generate a recipe. If you're a developer of a package, and you have a good sense of what your dependencies already are, then it then it's pretty easy. If you're looking at someone else's package, and they haven't told you what the dependencies are, then you have to figure that out. But beyond that, it's kind of as easy as their build system makes it. You have to figure out how to pass the options that it needs. If their build system is arcane, maybe you've got a little bit of digging to do to figure out exactly what the options are. But if their build system is reasonable, you just pass the obvious options. And that's, that's it. And it's been around long enough that we have, you know, some canned support for the common build system. So it gets back to I don't know if you saw Robert Shoemakers, the VC package guy,

Starting point is 00:51:37 his talk at CPP con about don't write packageable software, like make sure that your software is using something known and don't try to do your own thing because it's better for you because the people installing your software don't understand your bugs. They understand CMake's bugs. So this can actually, I'm sorry,

Starting point is 00:51:55 but I was just browsing through your package database. It looks like if I go to install, if I'm just starting from scratch and I'm going to install libformat, for example, because it's one that I just looked at, which depends on CMake, it will also to install CMake automatically so that it can then. Yep.

Starting point is 00:52:12 It'll go down to the build dependencies. It really is like a full operating system package manager. It's almost like a part of me is like, in a way you reinvented gen two Linux, but people have said that before. Yeah. Gen two is single prefix, right? And so it does not have the same kind of multiversion.

Starting point is 00:52:28 They have slots in Gen 2, but it's not the full combinatorial space like SPAC is. Right. Yeah. Gen 2 is also a full OS, and it goes down to LibC. And so that's one distinction, at least for the moment. I was going to say, you're not that far off. Yeah, exactly.

Starting point is 00:52:46 One of the things we're adding is better support for compilers. And I mean, the incompatibilities that we see now in builds with SPAC have to do a lot with the runtime libraries underneath compilers. So if you're doing mixed compiler builds, which our code teams do, and you want to build with the Intel compiler for most of your stack, but you have some things that only build with GCC, getting the runtime libraries to be compatible there, especially Libstead C++, can be pretty complicated.

Starting point is 00:53:15 And so we're trying to model that fundamentally in SPAC. We want to have Libstead C++ in the DAG, and we want to be able to ensure that the Lib know, the Libstd C++ that this package used is compatible with the one that this other package used, and they can be linked together. Or, you know, ideally just use one and get all the compiler flags correct so that, you know, Intel and GCC are using the same one. Or, you know, with Clang this comes up, it can do Libstd C++ or Lib C++. I am legitimately confused as to why the DOE projects that I've been working on for the last 10 years

Starting point is 00:53:47 haven't considered using spec for dependencies. It's getting more uptake. Which ones are you working on? I've been involved with EnergyPlus and OpenStudio off and on for 10 years now. I'm assuming you're at least familiar with EnergyPlus. Actually, no. So what is that? It's building energy modeling simulation software.

Starting point is 00:54:07 It's the other part of the Department of Energy that's not involved in nuclear physics. Okay. Yeah, so we're mostly with the Office of Science and the NSA labs, which are all nuclear physics. Well, not all. There's a bunch of climate science and other stuff that goes on. It's a bunch of climate science and other stuff that goes on. I mean, it's a big place. Right. Yeah. So, I mean, I guess I would say SPAC, it's a general package manager. And if you're thinking about, you know, how that applies to C++, I would say that, like, if you have an ABI problem in C++, like, people in HPC have probably seen that either in C++ or somewhere else.

Starting point is 00:54:47 So, like, we're solving, in some sense, a more general problem than what the C++ folks are doing. We have all the same problems. Fortran is worse about ABI than C++. Even the module files change compatibility with compiler minor versions sometimes. So it gets pretty nasty. And because we support that, I think it's something for the C++ folks to consider pretty seriously because C++ apps often aren't just C++.

Starting point is 00:55:08 Like, you know, we link in a lot of Fortran, we link in a lot of other stuff. And at some level, you get into this interop state where you need something that really thinks about all these packages. Well, we're starting to run out of time. But before we let you go, I'm wondering what else is on your roadmap for the future. You mentioned that Windows support is being worked on anything else you want to highlight, um, window support,

Starting point is 00:55:28 better compiler support, um, the ABI analysis that we're doing under build. Um, and then Greg, you got anything else? Developer features, developer features.

Starting point is 00:55:37 Um, we want it to be that you can just specify a get hash and we'll treat that as a version and we'll we'll actually pull the git repo and we'll look for the the versions we know about and we will figure out where your commit that you specified sits between those versions so that if you have something in the package that says if it's version three or earlier uh it needs this flag and otherwise it needs this other flag so we need to figure out then whether this random commit that we're building is before or after three right to be able to throw the right build system flags or even maybe use the

Starting point is 00:56:18 right build system if you switch to cmake at a certain point yeah So that's something that we're working on. Binary bootstrapping for SPAC dependencies. The default concretizer in SPAC is still the original one. And we have this new one with ASP. But ASP isn't pure Python, we can't just vendor it into our project. And we really like the fact that SPAC currently, you can just get clone SPAC and then run it. And so we're working on having binaries built on many Linux. So they'll work on all the operating systems and built for the base architectures, ARch64, x86-64, PowerPC64, Little Endian, that have the binaries for Klingo, which is the ASP solver that we're using,

Starting point is 00:57:09 so that we can just automatically grab those binaries when we need them and use the more advanced solver right from the get-go instead of having to bootstrap it. Well, I guess that would be bootstrapping it, but instead of having to bootstrap it with a source install that requires then concretizing with the old one to get there. Yeah. And for developers, I mean, the right now you can make a SPAC environment and say, you know, SPAC develop some package in your DAG. It'll, it'll check out the source and you can work on it and build the

Starting point is 00:57:39 whole stack with that thing. It'll rebuild the stuff that depends on it. If you modify the source, um, but get integration Greg was talking about is so that you can just, you know, check out an arbitrary version of the thing. We want to make that integration smoother right now. It kind of has to be written into the package. Okay. All right. Well, it was great having you both on the show today. Thank you so much for telling us about SPAC. Anything else you want to plug before we let you go?

Starting point is 00:58:02 No, it gives back a try. I mean, and you know, join our Slack. There's over a thousand people on there and, and there's, um, you know, help, um,

Starting point is 00:58:10 if you need it. Um, we're also on the CPP Lang Slack. So there's a SPAC channel. It's not too active at the moment, but we'd love to see more C++ people. Give us a try. Awesome.

Starting point is 00:58:18 Oh, I need to get on that one. Yeah. All right. Thanks, Todd. Thanks, Greg.

Starting point is 00:58:24 All right. Thank you. Thanks a lot. Thanks. That was awesome. Thanks, Greg. All right. Thank you. Thanks a lot. Thanks. That was awesome. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in,

Starting point is 00:58:35 or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - Spack

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.