CppCast - Performance Matters

Starting point is 00:00:00 Episode 267 of CppCast with guest Emery Berger recorded September 30th, 2020. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about C++20 support in GCC and pattern matching. Then we talk to Emery Berger from the University of Massachusetts Amherst. Emery talks to us about performance profiling tools and more. Welcome to episode 267 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing?

Starting point is 00:01:23 I'm doing okay. Still recovering from all the conferencing, but I'm glad to get back to normal. All the virtual conferencing. Not quite as exhausting as actually flying out for a week, but it can be a little bit. I don't know. In some ways, I think it was more exhausting. I don't know. Yeah, maybe.

Starting point is 00:01:44 You have something to share, I think, right, Jason? Yeah, we've mentioned that I've been working on a C++ best practices book, but I did officially release it last week. So it is available. And I'm guessing we can probably put a link to it in the notes here at some point. It's at leanpub.com slash CBP best practices. And well, it's out there now, I guess. I won't spend a whole lot of time talking about it since we have a guest today. But how long is the finished book, though? About 125 pages and a kind of typical technical, you know, sized book, like, you know, about the size that you would expect a programming book when you know one of the smaller ones,

Starting point is 00:02:21 not the Tome kind that have you know the complete c++ reference and they're full like by 11 john lakers's thousand page book i think right right right so it is it is intentionally short um this is one of the things i've learned as a contractors if you write a lot of words people don't tend to read them so i i aimed for fewer words hoping that people would read them very cool okay well I'll have to read that myself and I'll put the link in the show notes like we said thanks bro yeah well at the top of the episode I'd like to read a piece of feedback this week we got this tweet from Harold saying it was a bit confusing to get CBPcast already on Thursday morning because it usually it's an indicator that weekend is close but it was again a very interesting episode especially if you have some interest in organizing c++ events and yes for anyone who uh follows you

Starting point is 00:03:15 know when the episodes actually drop on uh online i did i accidentally post this last episode a day early uh yeah i apologize for that you know i did the same thing with c++ weekly a few weeks ago also accidentally dropped it a day early and i'm like oh oh well i guess i'm just sticking with it now yeah yeah there's no one doing that no you really can't well we'd love to hear your thoughts about the show you can always reach us on facebook twitter or email at feedback at cbcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Emery Berger.

Starting point is 00:03:52 Emery is a professor at the College of Information and Computer Science at the University of Massachusetts Amherst. He graduated with a PhD in computer science from the University of Texas at Austin in 2002 and has been a visiting scientist at Microsoft Research. Professor Berger's research spans programming languages, runtime systems, and operating systems with a particular focus on systems that transparently improve reliability, security, and performance. Emery, welcome to the show. Thanks so much. Thanks for having me. I feel like there's some other people that we know who were at the University of Texas, also in the C++ world. Isn't that where, no wait, where was Bjarne and Gabby? Right, so they were at Texas A&M, which is the permanent rival of the University of Texas. I realized as soon as you corrected me that it was Texas A&M that, well, clearly your computer science department or that you work in is better than the Texas A&M one, though, right?

Starting point is 00:04:45 I'm not going to say that because I have some very good friends who actually went from UT Austin to Texas A&M. So all fine departments. Well, sorry about that. That makes up there. I know school rivalries are weird. It's since I get the opportunity to slam Texas A&M for a moment, I will take advantage of it. Go for it. It's a very asymmetric rivalry.

Starting point is 00:05:09 So they are obsessed with the University of Texas and lurk on message boards and things like this. And their big insult is to refer to UT Austin, which is UT, as lowercase T-U. And we're all supposed to be clutching our pearls. This is so upsetting. And they actually have us in their fight song. Really? Yeah, yeah. It's completely insane.

Starting point is 00:05:32 They're obsessed with UT Austin. And UT Austin is not at all obsessed with them. So it's pretty funny. Virginia Tech has University of Virginia in their fight song as well, actually. Virginia Tech is where I went. So I'd have no idea if University of Virginia feels the same way. I mean, the college football games are always a big deal. But other than that, yeah.

Starting point is 00:05:53 Yeah, yeah. Anyway. Yeah. It's pretty hilarious. That's funny. Okay, well, Emery, we got a couple news articles to discuss. So feel free to comment on any of these. And we'll start talking more about

Starting point is 00:06:06 your work on performance. Sounds great. So this first one we have is a blog post on the Red Hat Developer blog and it's about new C++ features in GCC 10 which adds some of the C++ 20 features

Starting point is 00:06:22 that are official now. So we can get some of them. Not everything is available yet from C++20, though. They're still working on some features, right, Jason? Yeah, but I've been using... Well, for the stuff that I've been prototyping recently in C++20, GCC right now is my go-to compiler because it has, at the moment, the most solid support

Starting point is 00:06:42 for the features that I care about. So I'm using, you know, concepts pretty extensively and lots of constexpr things. So that's obviously where my, my interests are going to pop up on these, which that seems to have pretty solid support for at the moment. Yeah. That's the first thing they mentioned that concepts is I think they're saying it's complete based on what's in C++20, right? Yeah, I don't know. It's all early stuff, right? I mean, if they say it's complete, I haven't hit a problem, but I assume I'm going to at some point, right? Because this is still largely untested, to be fair, right? So,'s yes i'm sure it's complete but how complete how stable is

Starting point is 00:07:28 it i don't know i'm i definitely take a pessimistic view on that just out of just nothing personal to the gcc developers if any compiler said we're complete on the c++ 20 feature right now i'd be like i'm sure there's corner cases left sure it's just how it goes. Emery, have you are you also playing with C++ 20 features right now? So I haven't started, mostly because of compatibility fears. Right? I mean, I'm definitely one of those people who likes to move to, to a newer standard as soon as possible. But there are people who use software that I make, who are not so quick. And so that's always a challenge. I mean, I'm actually, for reasons that are related to performance, I'm really excited about stuff like Constinst, because this has been a thorn in my side for quite some time. So we do a lot of

Starting point is 00:08:17 stuff where we interpose on libraries. And so we have some static initialization that happens. And, you know, the initialization definitely happens at the beginning, it's guaranteed, but we suffer because we have to take the check every single time it gets accessed. And being able to actually just tell the compiler, look, yes, it's not a pretty big difference in performance for some cases, like replacing the memory manager, replacing some thread libraries and things like that. Awesome. So that's pretty cool. So yeah, so I'm always looking for the things that will enhance performance. Also, all the enhancements for, I know that this is like a bad word for some people, but like meta compilation support. I depend on that pretty heavily. And it's also a way to get amazing performance gains.

Starting point is 00:09:09 And obviously at the expense of amazing compile times. Do you mean like template metaprogramming kind of things? Yeah, so I mean, I used to use template metaprogramming a lot. And I've been gradually moving out into the const world as that's been made possible um and so you know the there's a there's an infrastructure that i built uh years ago for my phd called heap layers that we use to build a bunch of um fast memory managers so there's this thing called horde that i worked on for my phd um that is that forms the basis of the mac osx allocator actually um and it depended on a lot of template metaprogramming

Starting point is 00:09:46 to compute a bunch of stuff at compile time that avoided lookups at runtime. And so moving all that overhead to compile time makes a lot of sense, but it can get pretty gory, as you know. So cleaning that all up is really nice. And just being able to put constexpr and constants to every place will be tremendous.

Starting point is 00:10:04 It's true. I find myself doing almost nothing that I would call template metaprogramming anymore. It's just as constexpr, just constexpr. Yeah. Yeah. Yeah. I mean, there's still cases where I need it. Um, and I, but I would absolutely love to, uh, you know, to be in a place where I don't have to do it at all. Yeah. Okay. Next thing we have is another mailing list from the ISO. So this is, you know, mailing list for September, 2020. I didn't go through any of these specific articles, but I definitely wanted to bring it up. Jason, do you have a chance to look at any of these? I did. Anything particularly interesting for you? Yeah. I flipped through a fair number of them. I want to make what I think will be a controversial statement. I think the most important paper for C++23

Starting point is 00:10:50 is pattern matching. And that's in here. And we talked about that a while ago with Michael Park. I hope it makes it into C++23. Yeah, I do hope some version of it makes it into c++ 23 well now that i've honestly since i played with rust and saw some of the power of what a good pattern matching syntax can do and i've been doing versions of that using variant visitors and c++ uh i guess i I'm using C++ 20, yes. And seeing like, okay, I need this to go to the next step. It can clean up so many things. Yeah, that's good. How about you, Emery?

Starting point is 00:11:34 So I had not been aware of this. I've not been following C++ 23 development. It's actually, it's crazy to see something like this. I mean, so I spent a year when I was in an undergrad in England, and I was at one of the homes of functional programming. And so one of the languages that we learned was a predecessor to Haskell. And it's just crazy to look at the code that comes out of the pattern matching stuff. I mean, it just looks straight out of, well, you know, with a few ampersands thrown in.

Starting point is 00:12:04 There's always more ampersands. There we go. That's right. We need, we need triple ampersands. I think that'll really, that'll really clear up all the confusion. But, but yeah, I mean, a lot of it makes it look, you know, like this very, very nice, clean, you know, ML style, Haskell style way of doing pattern matching. It just avoids a lot of boilerplate. And it's really, you know, you look at the code and you're like, I know what this code does. So that's super cool. It really, you know, if I show this, I mean, I'm totally after I get up this call, I'm going to send this paper around to some of my colleagues. And I'll be like, how is this C++?

Starting point is 00:12:39 It just no longer looks like C++ at all. I was I read through the paper this time looking for more implementation details. Because if it were an enhancement to lambdas, they would say, it's as if the compiler did this, for example, and show you how it could be translated into C++ 20. But there's nothing like that in the pattern matching paper that I can see. I'm pretty sure it's going to be very much, for lack of a better word, compiler magic compared to most features that get added these days, not just syntactic sugar. Yeah, I mean, these things, I mean, they all get, I mean, they are kind of syntactic sugar in a way, but I mean, in a very deep way, right? Like a compiler has to really, it has to do a lot of work. And there's no straightforward translation of these things. So yeah, but it's, you know, this is I will say this, I mean, this is incredibly mature language technology, right? People have been doing so, you know, there's this term for pattern matching, I don't know how much it implements

Starting point is 00:13:36 of this, but it's called Hindley Milner type inference. And you can actually do this kind of destructuring without even adding types, and it'll infer the right types for you. And this is something that functional programmers have had literally since the 80s right and so you know that's a long time ago uh and so seeing this enter um c++ this way um it's great that would be uh that would be super cool i can see why it will be controversial though uh because it's definitely a shocking change in the look of everything. I think one thing that I mean, it's been since my first C++, the second C++ Now conference I went to, so five or six years ago, as watching talks on people who want multi method dispatch kind of libraries, and

Starting point is 00:14:17 that's painful to do in C++, you can do it with some tricks today but this pattern matching it just falls out of there if you need that kind of thing so it's yeah i'll be interested to see i'm sorry go ahead no no that's all right what i was gonna say is i was just interested to see what's going to be the interaction of these things with the type system writ large i mean you've got all the virtual dispatch and you have all the the crazy things that can happen with uh you know diamond inheritance and other insanity that uh you know things that people should maybe try to avoid um but uh how this plays with this i mean this is a traditional pl problem when you're like i'm going to put some type thing and does this touch all the other types and do i have to deal with the cross product of all the interactions or is it something that's kind of separable? So that I'm not sure. I mean, it looks really clean on paper right now, but I would be personally

Starting point is 00:15:10 terrified of trying to implement this and making sure everything is good. So we'll see what happens. That is definitely something that has been happening. So since I said I've been using standard visit with variants to do similar kind of pattern matching things for our listeners who don't know you can take visit pass in the visitor and then pass in multiple variants or i figure which order those go in regardless uh so if you've got four variants that you pass in, it's going to generate, like you said, the cross product of all the possible interactions of all of these to try to generate all of the calls to the visitors that can be very painful at compile time, but it does

Starting point is 00:15:56 seem to actually generate efficient code. It's a couple of, you know, nested lookup tables in it and it's good, but the compile time can get a little scary on it. Yeah. I mean, it's clearly a concern. I mean, one of the things that... So years ago, Rob Pike gave this keynote talking about Go when the language was just brand new. And the primary motivation, not the primary motivation, but one of the primary motivations was compile time concerns that they had with C++. And I was like, Jesus Christ, we're building a whole new language because of compile time?

Starting point is 00:16:31 It seems insane. But you're Google. You can do things like this. And they're recompiling stuff all the time, so it makes sense. And if you put something in that is going to lead to quadratic or worse explosion in compile time, it's definitely a risk. So we'll see what happens. Yeah. But, but I mean, you can already do it today. I mean, you can write your template metaprograms that compute, you know, the Ackerman function or something if you feel like it. So there's nothing stopping you from shooting yourself in the foot

Starting point is 00:17:00 in the finest of C++ traditions. You know you're doing something right if you have to up the template recursion limit on your compiler parameters, right? I have to confess, I have that in a couple of my projects and it's set in the thousands. I think I could have assumed that based on what you said.

Starting point is 00:17:21 Yeah, dirty little secret. Don't look too closely at the command line, everybody. That's awesome. Okay. Next thing we have is a guest post on the Fluent CPP blog, which I don't think we've talked about for a while, but there's always good posts here. And this is a guest post from Ben Dean,

Starting point is 00:17:40 who we've obviously had on the show before. And this is a guest post about Stood Exchange, which I'll admit I was not that familiar with, and I think that was kind of the point of his post. I did a C++ Weekly episode about it, Rob. Did you do one? Okay. Yeah. I think that was kind of the point of his post, though, that a lot of people don't know about Stood Exchange, and he's trying to kind of showcase some of its uses. Yes, and he definitely showed uses in here that I had never considered but that's what ben does yeah i'm not gonna go too much into detail on this but i definitely

Starting point is 00:18:13 would encourage reading the blog post especially if you're like me and we're not too familiar with state exchange before okay and then the last thing i wanted to mention is uh meeting c++ is coming up soon i think it's yeah the second week of November, 12th to 14th. And Include C++ is a group we've certainly talked about on the show before. And they like to support scholarships for sending attendees to some of these conferences who might not otherwise be able to go.

Starting point is 00:18:41 And they are running a scholarship for Meeting C++ this year, which is still going to have an in-person component so if you're interested in going to meeting c++ and uh you're interested in applying to the scholarship uh this link will be in the show notes and i'm not seeing a link for like supporting the scholarship though but that must be they're probably looking for that type of support as well i would assume if you go to include cpp.org so these these scholarship tickets are given by meeting c++ to include c++ but also to clarify this is specifically for the online tickets yes is still as far as i can tell

Starting point is 00:19:17 planning to have an in-person version but this does say that these are free online tickets for meetings okay gotcha cool well yeah like i said if you're interested in going to include C++ virtually, you should definitely check out and apply to the free ticket sponsorship. Yeah, scholarship. Okay. I'm so curious how this goes, just for the record, how the in-person version is going to work out for them. I hope it goes.

Starting point is 00:19:44 Yeah, I hope it goes well. Okay, so Emery, so you gave a great talk at CppCon this year, one of the keynotes where you talked about performance. I definitely encourage listeners to go and watch that talk once it's available on YouTube. I'm not sure if it's up there yet, but I wanted to just start off

Starting point is 00:20:01 by talking a little bit about performance and some of the things that you think can affect performance with C++. Sure. So the talk is actually up. I'll share it with you. Or you can find it yourself. So, yeah, so performance in C++. I mean, the real issue, I mean, obviously, you know, people use C++ primarily because it is close to the metal and can give you really great performance. You know, the code gen is great. There's, you know, very little in the way.

Starting point is 00:20:36 You have access to assembly language. You have all this sort of stuff. There's no garbage collection, which I should stress is something we've also studied that is mostly a space tradeoff. So you can run your C++ programs in way, way smaller memory footprints than you could run something like Java or JavaScript, no matter how good the code is. Because of garbage collection? Yeah. So I'll explain it briefly. So we have a whole paper on this.

Starting point is 00:21:01 It's quite old, but the lessons still hold. Essentially, the issue is as follows. So suppose you have, so most garbage collectors trigger collection once the heap fills up to a certain amount, right? And you have some heap size parameter. And so if you set the heap super, super tight, you could be in a situation where you've got a bunch of memory in use, and then you allocate something and then you allocate something, and then you free it, in effect, right? It goes away. You're not using it anymore. But you're bumped up against the edge of the heap limit. And so it triggers a full garbage collection, and it reclaims one object. And then you call new again, and then rinse and repeat. And so you can be in

Starting point is 00:21:39 a situation where the runtime just goes through the roof because the heap is too small. And so as the heap gets smaller and smaller, you get almost an exponential curve that just goes through the roof because the heap is too small. And so as the heap gets smaller and smaller, you get almost an exponential curve that just goes up and up and up. It's actually power law, but anyway. And then as the heap gets bigger and bigger, then the runtime that you spend collecting decreases because you allocate, allocate, a bunch of stuff dies, a bunch of stuff dies, a bunch of stuff dies. You do a GC, you reclaim a ton of stuff, right? And then you have a lot of headroom until you have to collect again. And in the limit, like in the limit where the heap is infinitely large, you never collect, right? And so what happens is at some

Starting point is 00:22:14 point you get to a steady state where you're pretty close to barely collecting at all. This is especially true for a generational garbage collector that is periodically reclaiming very, very short-lived object. But it holds for any garbage collector. But the problem is that the amount of space that you need before you get to the point where the runtime is basically the same as C or C++ running malloc and free or new and delete is like three to five times as much memory. Interesting. Yeah.

Starting point is 00:22:42 So people just are like, oh, garbage collection is great. And yeah, garbage collection is super convenient, but it comes at a great space cost. And if you have plenty of RAM, terrific. But if you would need that RAM, or you're making really a lot of use of the RAM, like it's a cache, or it's an in-memory database or key value store, you know, you choose, you end up throwing away a lot of capacity by using a garbage collected language. Go ahead. Quantifying the performance of garbage collection? Yeah, that's the paper. Exactly.

Starting point is 00:23:10 Okay. I know there's at least a few listeners who will be very curious in that. Yeah. So actually Chris Lantner, who is the creator of LLVM and the designer, or at least co-designer of the Swift language, specifically cited that paper as a justification for why Swift doesn't use ordinary garbage collection and uses reference counting. Wow, okay. So anyway, sorry for that segue. Sorry.

Starting point is 00:23:36 Anyway, so be that as it may, if you get rid of your garbage collection, what are you left with? You're left with the metal. You're left with whatever machine you're running on. And the problem is these machines have gotten enormously complex. So, you know, processors got, you know, they used to be really simple. When I started out, I actually had an Apple two plus, and it had a 6502. And in the 6502, the instructions in the reference manual literally said how many cycles it takes for every instruction, which now is hilarious because it's like, well, did it hit in the cache?

Starting point is 00:24:10 Oh, there were no caches. So there were no caches. There were no branch predictors. There was no virtual memory. So there was no TLB. There were all of these things. There's no pipeline. There was no dependency on the past.

Starting point is 00:24:22 There's all kinds of complexity in modern hardware. And this complexity, unfortunately, surfaces in ways that can be very surprising. So if you, you know, what they do, so just briefly talk about like branch predictors. Branch predictors essentially record history of which way your if was taken. Did it go the if way or the else way? And it does this so it can prefetch the instructions and start loading them and executing them speculatively. And if it guesses correctly, most of the time, it saves a lot of time. So it's not just stuck waiting to evaluate the if expression. It just goes forwards and keeps running. So you have all of this parallelism that's happening.

Starting point is 00:24:57 So it has to be pretty accurate. And when it is, that's terrific. The way that it actually manages all these history tables is by hashing the program counter, which is just the instruction pointer, the address. So this means that if you have a bunch of things that map to the same address, they can actually overflow the buffers. And then you get misses. And so then the predictor doesn't work as well. So this is referred to as aliasing for branch predictors.

Starting point is 00:25:20 But it's the same problem for caches, for the instruction the instruction level caches for data caches, uh, for the TLB, it's a problem because the TLB maps your pages of virtual memory. It's a physical memory. So can you explain TLB or tell us what TLB means? Sure. Sure. So it's a stupid name. Unfortunately, it's almost better to not know what it means. Um, so it's, it stands for translation look aside buffer. Um. So again, a terrible name. Like I told you the name and now you're like, now you understand, right? Yeah, so it's a stupid name. But the idea basically, you can think of it as just being a map. It's just a map that maps the start address of a page that's in virtual memory to the

Starting point is 00:26:00 start address of the page that's actually the physical memory in your machine. Okay. Right? So, you know, your machine has a bunch of RAM and it goes and it puts pages wherever. And then your virtual memory, you have this abstraction, which is really pretty. Like everything seems to be organized like in a contiguous way. But of course you're sharing your, your thing with many processes and they're actually picking things that are discontinuous. So you have to have this map. And so this map is, it's just a,

Starting point is 00:26:28 so it's actually stored in memory in its full glory, but there's a cache to that map. And the cache to that map is the TLB. That's all it is. It really should be called virtual page cache or something, but it's not. And so if you don't, if you have an application that actually spans more pages than fit in that cache,

Starting point is 00:26:45 then on like an Intel, it will go out to a data structure that's in RAM. And this happens every single time you access any data at all, or any instructions at all. So if you're in a situation where it's in cache, then it's free. It returns typically in a cycle. So it's like essentially invisible. If it goes to RAM, it could miss in the cache, it could miss in the L3 cache, it could go all the way out to RAM, and it could take hundreds of cycles. So imagine you could be in a situation where something accidentally takes 100x longer just because of this failure to fit in the cache. Yeah. So this stuff is really nasty. And it's, I think, poorly understood just how brittle performance can be.

Starting point is 00:27:25 You can change a line of code, you can add another new, you can restructure things, you can change your make file so it has the.os in different places and this can lead to gigantic performance swings. Yeah, so that kind of brings us to some of the tooling you went over in your talk.

Starting point is 00:27:43 You talked about how performance is so brittle and you introduced these tools that can be used to analyze performance in such a way to get around that brittleness. Can you tell us a little bit about those? Yeah, sure. So I'm going to have to say up front that the tool that we used for this experiment,

Starting point is 00:27:59 unfortunately, LLVM itself is a moving target. And we relied on some internal plumbing because we were trying to make LLVM do things it's not meant to do. And so LLVM changed. And it's changed to an extent where it would require months of work for somebody to go and port it forwards. And, of course, you port it forwards and then LLVM could change again. So it's not the happiest story for a graduate student who wants to like get out with a PhD. So unfortunately, it suffered from bit rot. We've talked about reviving it, but it just seems like such a thankless task that we're just like, look, we did this. We

Starting point is 00:28:37 understand what the performance is. We know how to do this. If somebody wants to actually do this in industrial tool or bring this into LVM, that'd be great. But we're not going to do it. So let me just explain what it does. So the problem, like I described, is things in memory shift around. Your performance can go one way or the other. And you can be like, oh, I had an awesome performance increase because I'm a genius. Or you change something like, oh, oh, darn. Things went south, like performance fell apart. I need to, you know, I have this giant regression. I better roll it back, but it could just be a complete accident. And it's just depending on where things ended up getting laid out in memory. And, you know, this can even be affected by what directory you're in, what the day of the week is, because it adjusts your environment variables and

Starting point is 00:29:19 your environment variables shift the stack. Yeah. Fricking crazy. I actually had code once. I didn't talk about this in the talk, but I had a that ran faster on wednesdays than on tuesdays and you were able to actually like quantify this yeah yeah yeah i actually changed the clock and went back to tuesday and uh and that was the problem it was the the length of the day that somebody somebody was storing in an environment like the length of the string literally w-e-d-n-e-s-d-a-y it's longer than tuesday yeah yeah so um yeah so you know the moral of the story is obviously only program on wednesdays um so anyway so what we wait monday should be faster yet then right yeah yeah but well but then it just screws everything up no anyway yeah so it's obviously terrible this is

Starting point is 00:30:02 a bad situation so the challenge is how do you tease apart like the thing you did with the code, like even what a compiler does when it optimizes the code from where the stuff got laid out in memory and all these interactions below. So what we did is we built the system that we jokingly called stabilizer because it actually like messes everything up. What it does is it randomly moves everything in memory periodically during the runtime of the program. And the reason that we do this, first, if you just do random at startup, that's not enough. That's just like a fixed layout. And it doesn't, like the effects will still manifest.

Starting point is 00:30:36 Like you ended up in one layout. So what you want to do is the moral equivalent of a randomized control trial. You're basically randomizing everything. We randomize where the globals are, where the functions are. We have a randomizing heap. So when you allocate new objects, it's a little bit decorrelated with where the previous one had been freed. And then you can actually, you know, run the code a bunch of times, try it with your optimization or whatever your code change is, try it with, you know, try it with something else, and then you can compare, and you know whatever the change is

Starting point is 00:31:07 has nothing to do with the layout. Okay, so you mentioned that this tool has suffered from some bit rot. Do you know if there's any other similar tools out there that someone could try if they wanted to? Yeah, that's a great question. So Stabilizer is super extensive in what it does. And this is why it relied on LVM. Like it literally changes where the stacks are laid out.

Starting point is 00:31:29 And that's, you know, that's right at the heart of the compiler, um, you know, generating stacks. So, um, that said, we have a few randomizing allocators that we've built for various purposes and any of these undermines, uh, the effect of layout in the heap. So it doesn't affect the stack, it doesn't affect the globals, it doesn't affect functions, but it changes where objects will be laid out in the heap. And so that particular kind of confounding factor goes away. actually for reliability. If you have a program with memory errors, Diehard makes it probabilistically likely that your program will run correctly. But as a side effect, it also, well, it's part of the whole thing is it's randomizing the location of things. Let me, I see lots of puzzled faces. So let me explain how it helps really quickly. So the way that it helps with use after free errors or dangling pointer errors, whatever you want to call them, where you free something too soon and it gets recycled. So a conventional allocator, when you free something, it's immediately available for reclamation.

Starting point is 00:32:32 And then when you call new, it's almost certain to be the next object. So you call delete, you call new, you probably get that same object right back, which is the worst possible situation. If you had something pointing in that wanted the old value, it gets immediately clobbered, right? And so this is the problem that garbage collection solves. Like garbage collection makes sure nobody has a pointer into anything before anything gets reclaimed, right? So what Diehard does is that it actually has a bitmap-based allocator,

Starting point is 00:33:00 and it randomly chooses among all of the freed objects for the next object to be used. So when you call delete, it just sets a bit, and the bit is set to zero. So zero means it's free. And then it just says, hey, find me a zero, and it randomly pokes into this bitmap. And if it finds a zero, it returns that object. So if you have a very large heap, suppose you have a million objects on the heap, a million objects have been freed. When you call new, you have a very large heap suppose you have a million objects on the heap a million objects have been freed um when you call new you have a one in a million chance of clobbering the object you just freed right because you're basically like you know here's here's a million bits in my hand right like you know you're gonna like here's the one that you freed there's a million other ones

Starting point is 00:33:39 right so one in a million chance and that chance continues to exist right so it's like it's not going to be you know maybe it's not going to be a million news that it's going to survive. But it could actually survive in principle for a very, very long time. So that's one thing that Die Hard does. The other thing, which is maybe even easier to understand, is that the heap that it allocates is somewhat larger than required. And then it randomly places the objects in memory. So there's a likelihood that if you have an overflow, it'll overflow into nothing.

Starting point is 00:34:06 Okay. Okay. I get it. I understand what you're saying and I understand why it makes the program more stable, but my, the rest of my brain says, but I wish that it made the program less stable so I could use it to find rant kinds of errors.

Starting point is 00:34:24 Totally. Totally. I, so, um, so this is actually part of a line of work that we did. Some of this found its way into Windows. As Rob mentioned, I've spent actually a lot of time at Microsoft. And so Microsoft had this genius thing that they did, which is an adaptation of this idea that they called the fault-tolerant heap. So they would detect if a program had been

Starting point is 00:34:45 crashing more than a certain amount of time over a certain period, they would swap out the heap with a heap kind of like Die Hard, inspired by Die Hard, that would then like, it's sort of like, all right, there's pain that this person is experiencing. Let's see if we can do something to lessen the pain. And then we built other stuff that follows on that actually is designed to automatically find the bugs and fix them. So we had a follow-on paper called Exterminator, which builds on Die Hard to do that. And then we have another paper called Die Harder. You can see that I really liked – I like Bruce Willis a lot. And so Die Harder is a secure allocator.

Starting point is 00:35:23 And so Die Harder is a secure allocator. And so Die Harder actually is the opposite. It's more like what you're asking for, Jason, in a way, which is it makes it very, very unlikely that you'll have any information that you can leverage for an attack. And so one of the things that it does is it randomly allocates things, but everything is a chunk that's separated in space, in virtual address space. So it's super far away from virtual address space. So it's super far away from the next chunk and it's randomly located. So if you try, if you do a buffer overflow, it's very likely you'll segfault. Interesting. Okay. So I know this is not really why I got brought on to talk about stuff today, but anyway, this is how this goes sometimes. Yeah. Yeah. So yeah, I mean, we actually also have, I mean, inside of exterminator, we have a thing

Starting point is 00:36:08 called die fast, which is designed to do what it sounds like. But, but the secret sauce with exterminator is if you have a program and you run a program a bunch of times and you normally look at the heap, like suppose it was deterministic, the heap is the same, like suppose you hit the exact same error, you know, five times in a row. If you look at the heap, the heap state is identical. So it gives you no information. You could run it five times, a thousand times, you're getting the same heap over and over and over again. But by using the randomization, all the heaps are different. And so you can actually identify like, oh, it's this, when this object is this many bytes before this object,

Starting point is 00:36:46 that's when the thing fails. So it's a buffer overflow of this size from this object. And then we could use that information to basically create these things so that when you run the program, again, it would patch the allocator. So we would say, hey, when you allocate something at this line of code, pad it by this many bytes. Wow. So you can use that information to send it home to the developer and also keep the program running. So win-win. I have no idea if you said you've spent a lot of time at Microsoft Research.

Starting point is 00:37:15 I don't know if you can speak to this at all. But as you're telling these stories, I'm thinking, is this how we still get some of these old Win32, Windows 3.1 applications that can still run on Windows 10? Is it because they can do some of these tricks and just be like, pretend like it's all working fine? Yeah, I wish I could say yes, but the answer is no. I mean, they've done a lot of engineering, obviously, to keep these things alive. I will say this. One of the things that is pretty funny, when I first went to Microsoft when I was a PhD student, I was like, hey, I have this super fast allocator.

Starting point is 00:37:47 Your allocator is garbage. I'm going to replace the Windows allocator. I'll speed up all Microsoft products. You know, woo, win. And what I found was, so then I had access to, like, Microsoft's code, right? So I could recompile things. So I went to recompile some Office code and some, I think, SQL Server, and everything crashed. Every single thing crashed as soon as I replaced the memory allocator.

Starting point is 00:38:10 And I was like, oh, my God, I guess my memory allocator has a bug. But that was not the problem. The problem was that everybody who had written this code had debugged their code with Microsoft's allocator. And so, for example, if you were using Microsoft allocator, suppose you requested 24 bytes, this is all made up, but this is the idea. You request 24 bytes, Microsoft allocator might've rounded it up to 32. So you have an overflow from the 24 bytes, 25th, 26th, 27th, 28th, like you're writing way past the end. Doesn't matter if it lands in the 32. So you had all these latent errors that were lying around because of course they debugged

Starting point is 00:38:48 it with the allocator they had. But as soon as I replaced it with another allocator that didn't have those exact same sizes and the exact same way it reused memory, then everything fell apart. Wow. Yeah. So this is, it's an interesting, it was a good lesson to learn. It's like, you know, legacy software is hard. And maintaining things forever is hard. And it means you can't often, you know, move forwards, because you're stuck in this situation. You can't just be like, hey, could, could we just stop like your billion dollar revenue stream for like a few weeks and like fix this? Like, that's not how this works so right well so we started this this so the scary part of your talk and the thing that you've kind of touched on and when you said like

Starting point is 00:39:31 the day of the week can change how the program how fast the program runs and then we talked about stabilizer right and then you said uh just for reviewing from my own mind to make sure i got this all right uh that your the other products that you have that do still work today without requiring Herculean effort to port them to a current LLVM all affect the heap. And so I was curious how much the stack layout and memory

Starting point is 00:39:59 is important versus the heap layout and memory for stable execution timing runs and these kind of things? Yeah, yeah, that's a good question. I mean, I think I mean, I'm not sure I can give you a solid answer. The I will say this, I mean, the stack is always allocated by the the compiler is a contiguous chunk, right? So you know, you've got all your local variables, they all appear in the stack frame, and they're all together. So this means that, you know, like got all your local variables they all appear in the stack frame and they're all together so this means that um you know like they're all going to be in cache almost certainly right so that is um that's something that you know takes away a certain performance question the cache is almost always hot um right because you you're accessing the

Starting point is 00:40:40 cache you're executing functions you're just visiting the same memory over and over and over again so the memory at the you know the of, depending on your definition, the top or the bottom of the stack is always hot, right? So that means that the stack has less performance impact in general than the heap, because the heap is kind of like, you have many objects, they're spread around, it's all dependent. Like if I allocate one more object here or a different sized object, then it can change the whole layout of everything. So it's a lot more brittle than the stack. That said, the start of the stack, we found significant impact just from moving it around. And in fact, the stack is exactly what gets moved by the environment variable shift.

Starting point is 00:41:21 Okay. Yeah. Interesting. So that part is fixable, by the way, in GCC, or at least in LD. You can tell it, you can give it a linker script. This is super obscure, but you can give it a linker script, and you can align the segments of where things get mapped to page boundaries, and then this brittleness goes away. It may not be a good, it may be that when you align it, it actually is slower than when you don't align it because of random effects. But it guarantees

Starting point is 00:41:50 much more reliability. So that's something they could do today that they just don't do for some reason. And telling people, hey, put in this, you know, scary compiler, like linker flags or linker script stuff is maybe a bridge too far for some people, but that can definitely make the performance a lot less susceptible to that kind of issue. Wow. Today's sponsor is the PVS Studio team. The company develops the PVS Studio Static Code Analyzer designed to detect errors in the code of programs written in C, C++, C Sharp, and Java.

Starting point is 00:42:22 Recently, the team has released a new analyzer version. In addition to working under Windows, the C Sharp part of the analyzer can also operate under Linux and macOS. However, for C++ programmers, it will be much more interesting to find out that now you can experiment with the analyzer in online mode on the godbolt.org website. The project is called Compiler Explorer and lets you easily try various compilers and code analyzers.

Starting point is 00:42:43 This is an indispensable tool for studying the capabilities of compilers. Besides that, it's a handy assistant when it comes to demonstration of code examples. You'll find all the links in the description of this episode. So you mentioned that Stabilizer can't really be used today, but if you want to go back to an earlier version of LLVM, could you build and run a program with Stabilizer? And is that still worth doing if you want to do some profiling?

Starting point is 00:43:10 Yeah, I mean, you could do it. I'm not sure if it's good or not. I mean, LVM has moved on, right? So maybe the code gen is a lot better. I don't think it's a gigantic difference, but it's going to be some difference. There's going to have been bug fixes and so on. I mean, obviously, people are welcome to do it. There's the specific version of LVM. If you go to the GitHub site, right, it says, you know,

Starting point is 00:43:35 here's all the information you need. But honestly, I think the, you know, using some sort of randomizing heap is probably the easiest thing for somebody to do, you know, just to try to iron out these things. But at the end of the day, maybe it makes the most sense to just be like, look, performance is going to go up and down. It's going to change a lot. And it could change by huge amounts. And so I need to be really careful when I get a regression, like, one of my former student at UMass worked on the V8 project at Google the the JIT compiler for JavaScript and he was saying that they would actually roll back things that caused a performance regression on their benchmarks of one percent and I was like dude no that is ridiculous right like one percent it's like I mean if you jump up and down in the same room where the program

Starting point is 00:44:26 is running right it's like maybe it'll warm up the temperature by one degree and then it'll throttle down the cpu right that's just crazy noise you can't possibly be making like software engineering decisions on the back of a one percent change that's just freaking noise don't do that i feel like you just need to repeat that again, because I know lots of people who, you know, live on the 1%. They're constantly aiming to get the half a percent faster than microseconds. I mean, you know, we talked about high frequency traders on the show, we have them on that kind of thing. So you're saying just that's just crazy talk, you have no way of knowing if

Starting point is 00:45:05 that 1% is real. It's exceedingly unlikely. I mean, I think again, like literally, when a CPU gets warmer, it lowers its clock speed. Right? So just ambient temperature changes are going to affect the performance of your of your your program. This is noise, run your program a few times, see what the noise is, If you're, you know, uh, and even then you're still running it in the exact same configuration. This is something that I stress. People will be like, Oh, I ran my program 30 times and only has this much noise. It's like, yes, but it's always 30 times in the exact same layout. And as soon as you change a tiny little thing, right, somebody blows on that house, it's going to all fall down. Um, it's yeah,

Starting point is 00:45:43 it's kind of crazy. I mean, you know, people talk about like, performance optimization being, they kind of misquote Knuth, right? He says, like, you know, basically, you know, don't optimize, like, it's the root of all evil. That's the line premature optimizations are all evil. And then he immediately followed it up by saying, but sometimes it really matters. So sometimes it does really matter. And you know, if you can get consistently significant improvements, then great. But I do think that, I'm going to give you a very simple example. So when I was working on Horde, I was like, okay, I'm going to try to, to like speed up this particular path. And because replacing the

Starting point is 00:46:21 memory allocator screws with all your tools, like all your tools cease to work. So I'm basically putting in some other way to track which paths I went down. So I literally had a printf on a path to see if this path was taken. The path was never taken. Sped up the program by 15%. Adding the printf, then. Yes. Yes, adding the printf.

Starting point is 00:46:42 And for a moment, I was tempted to leave in the printf. Right? Because I was like, like, this speeds it up by 15%. But of course, you know, if I change any other code, maybe it will, you know, so that that will go away. So like having code that literally never executes can speed up or slow down your program by, you know, up to 40%. So focusing on point 5% is probably crazy. So do you find that like an extremely large, complicated systems, this kind of print F somewhere is going to have less impact or more impact? Or is it still just completely you have no way of knowing?

Starting point is 00:47:18 So yes to all three. So I will say one thing that definitely has the effect of smoothing things out is concurrency. So if you have a multi-threaded program, and the multi-threaded program is constantly running different threads that are not being synchronized, right? So it's doing a good job. Everything is actually running more or less independently. This is going to smooth things out because everybody is just kind of running code and that code is changing when it's executing and what it's operating on all the time. It's like an electron cloud, right? It's just fuzz. And so, you know, what people often do with performance optimization, they'll be like, let's turn off all the, you know,

Starting point is 00:48:00 the non-determinism. Let's like, you know, lock everything down and let's make it, you know, kill all the background processes and, you it, make it run the exact same job over and over again. But that's, that's no good. You're like missing all the noise that's inherent in the system. And so you'll get, it will be noisier, but it will also tend to cancel out a lot of effects. Okay. All right. So, I mean, you can, like, you can think of it, you're going to have some peaks and you're going to have some valleys. And then those peaks and valleys are going to kind of smooth out and you'll get just kind of a mean. So it's actually, in that situation, probably a lot better in a highly complex app if it is full of stuff that is asynchronous. So which brings us to cars, then.

Starting point is 00:48:39 Yeah. Okay. Do you want to tell us a little bit about cars? Sure. So basically, what we discovered when we were doing some performance analysis and research into things is that the existing profilers that were out there really didn't help. And the reason that they didn't help was they were kind of designed for the applications of the 80s or earlier. So if you have a sequential program that you just care about how long it takes from the start to the end, then these programs, these profilers are fine. They're

Starting point is 00:49:12 not great, but they're fine. You know, they tell you like, oh, here's the line, if you're lucky, the line of code, you know, how much time it spent on the line of code, you know, how often that line of code is executed, and that can help point you to a place to optimize your code, right? So this is how classically prof worked from Unix and then gprof, which is included with GNU. And they've been improved for concurrent programs. There's some stuff on finding critical paths. The problem with the critical path is that there's no, well, there might be a critical path, which just means the longest amount, like from start to finish, the longest sequence of code, right? That's the critical path. And so in principle, you should always optimize the critical path. Because if you have a concurrent program, like the critical path is what's slowing everything down. Like if everything is finishing super fast, and there's the one thing that takes a long time, that's the blocker, right? That's the bottleneck. The problem is in a real program, if you focus all your efforts on one critical path, it's like whack-a-mole. Like that critical path goes away and then another thing becomes a critical path.

Starting point is 00:50:12 It's not like you suddenly went from, oh, I got the critical path and now my program runs 10 times faster. It could be much worse. It could be like, here's a critical path, here's the other critical path, except it's 0.0001 seconds faster, right? Imagine if you went and spent weeks working on critical path one, and then you're done and you optimize it, you could optimize it to zero. And then critical path two will mean that you actually had no impact at all. And then, you know, we also care about other things these days. Like we don't care, like start up a program and see how long it takes to run. Like you don't start up your browser and, and like optimize it. So it quits faster, right. Um, or your server, right. You have programs that run forever and you care about

Starting point is 00:50:53 things like latency and throughput, not like program total execution time. And profilers just tend not to do like, that's not what they do. Like profilers, traditional profilers are like start to end. That's all I care about. So, so we were like, well's not what they do. Like profilers, traditional profilers are like, start to end, that's all I care about. So we were like, what can we do to attack this problem? And what we were looking for was some way where we could have a profiler just tell us, here is what would happen if you optimize this line of code.

Starting point is 00:51:18 What would the impact be on latency or what would the impact be on throughput? And what we wanted, ideally, was like a graph where we could literally be like, here on the x-axis is how much I optimize this line of code from zero to a hundred percent. And then on the Y axis is how much does the overall program speed up or how much does latency decrease or how much does throughput increase? And so if you have a graph that looks like this, right, you would never optimize that line of code at all. This is a flat line, right? A flat line says, no matter how much I speed up this line of code,

Starting point is 00:51:46 program still takes, it's unaffected. It doesn't affect performance. It's not on a critical path. Maybe it's IO bound, who knows? But if you had one where it was like, man, you optimize that line of code by 10% and your program speeds up by a large factor, you would definitely work on that line of code.

Starting point is 00:52:02 So what we were looking for was, so this thing we call a causal profile. So it tells you, if you do this, it will have this effect for sure, as opposed to the profilers in the past, where, well, they tell you where something spent its time, but it's kind of just like, that's where it spent its time. But that doesn't mean if you change it, it's going to make things faster. Right. So that's what led to Cause. So Cause, C-O-Z, is a causal profiler for Linux. And it gets these graphs through a kind of trickery. So you can't really know, just by looking at a line of code, how much performance would increase by optimizing it from zero to 100%. If you could optimize everything by zero to 100%, you would just optimize everything by 100%, right?

Starting point is 00:52:47 Like who needs cause? Just use that. So instead, what it does, it basically takes advantage of this kind of nice insight, which is that you can get the effect of speeding something up by slowing everything else down. So if I've got some line of code,

Starting point is 00:53:04 what I can do is I can look around and see all the other threads that are running and tell them to wait for a certain amount of time. So I literally just signal them. And then they pause for some amount of time. And then I can, and I do this with sampling, I don't actually run the thing forever. I just hit it for a little bit, slow everything down with kind of a pulse. And then I can observe the effect out the other side. And so that's what cause does. It randomly injects these delays, which it does with sampling. So it doesn't have much effect on overall runtime. Like you can run cause in production. Um, and yet it produces these profiles that you can sense them out on a socket if you want. Uh, and you can look at the performance profile and say,

Starting point is 00:53:44 Oh, here's these lines of code that I really should be working on. So is it just for multi-threaded applications? Good question. So you can run it for a single-threaded application. I personally, obviously I'm super biased, but I actually use it even for sequential code. It's just convenient to get this result where you get these causal graphs, as opposed to like poring over, you know, perf results or something. It really shines though,

Starting point is 00:54:13 when you have concurrency, when you have an asynchrony, so it doesn't have to have multiple threads using async IO, for example, will also, you know, that introduces concurrency, right? So you could have like an event-driven server that conceptually has no threads, but the OS has threads. So concurrency is in there and cause will work better on that. And then anytime you care about latency or throughput, a conventional profiler has nothing to say about those things. So with cause, you can actually say, here's the start of something and here's the end. Like imagine it was just a sequential server that just took in an input, did something with it and produced a result. Right. So classic kind of event loop.

Starting point is 00:54:53 You could just say, here's the begin point. So we call them progress points. This is progress begin. This is progress end. And then what cause will do, it will try to find the lines of code that will reduce the latency. So for a game developer, would it be fair to say like start of a game tick is the begin? Do you do all the calculations, whatever, before you like dump it to the screen? That's the end. Do we actually have to put like markers in our code to tell cause that this is what we care about?

Starting point is 00:55:20 Yeah, yeah, you have to do that. So there's literally just three macros. One is a progress point, which is for throughput. And the others are begin and end, which are for latency. So you do have to say something. But I think most, you know, if you're a game developer, you probably know where the start is of when you're actually doing the calculations and where the end is. And so you literally just put in a line at the beginning, a line at the end, you can have multiple ones too. So you can give them, um, identifiers if you want. Okay. And then we run it and it produces a magic graph that says speed up line 12 and pretty much it'll be faster. Yeah, yeah, exactly. Yeah. I mean, we, we actually, it was funny cause we built it and, uh, and we're like, we're, you know, there's some like, uh, light, uh, theorems in the paper. Um, they're not, not super deep. There's, deep. But anyway, we were like, okay, so we have mathematical proofs in effect that this is going to work. And we demonstrated with some simple programs. And then it was like, all right, here's the plan. I told my student who's

Starting point is 00:56:16 now a professor at Grinnell College, Charlie Kurtzinger, I was like, Charlie, go and take this benchmark suite of concurrent multi-threaded programs that Intel put together with Princeton and spend no more than an hour on each program and see how much you can optimize it with cause. And so these are programs he'd never even looked at. So he had no idea at all of what was going on inside. He ran the programs, looked at the code, very quickly found places where he could insert these progress points and came out the other side with optimizations ranging from 10% to 70%. Wow. And Coz doesn't rely on any weird LLVM internals or anything like that, so it should be around for a while?

Starting point is 00:56:58 Yeah, yeah, yeah. So it's pretty stable. I mean, you can go and install it like, you know, with apt or snap or what have you. It's, it relies on this package that Austin Clements put together. He's the head of Go development at Google. It's called Live Elfin. So it manages like reading stuff out of Elf format, which is something that it relies on. And that's it.

Starting point is 00:57:23 And so I happen to know austin uh and i was like there was like some little change we needed him to make because they changed the elf format um so i was like please do this uh and so so he did it so which is great but yeah that's it so it's good to have a few dependencies we need to compile with dash G to get symbols or right. Um, so, um, so let's see. So it can find, if I recall correctly, it can actually find like whatever information is out there.

Starting point is 00:57:52 But, um, I don't think that the, yeah, I actually can't remember. I always use dash G when I do it. Um, it doesn't matter if it's dash Oh three or whatever.

Starting point is 00:58:01 Like that's just fine. Um, it's totally fine for totally, uh, you know like it does line granularity stuff even though it is optimized so that part is totally fine you may need to use dash g like i said we always use it i always use it yeah i have other friends who recommend just always because why not you're going to want debugging symbols at some point

Starting point is 00:58:20 exactly doesn't affect the the resulting binary right mean, it just produces this stuff in a, in door format and you can just access it. I should also add, and even though this is a, you know, C plus plus, uh, like we're among friends,

Starting point is 00:58:33 we can talk about other languages. Um, it works for, it works for C of course. It works for rust as well. Okay. And, somebody made a version of it,

Starting point is 00:58:42 uh, for Java. Uh, so there's a version called Jcos that works for Java programs. And in principle, it could work for anything that generates debugging output. So it's possible to do for JavaScript,

Starting point is 00:58:52 but we just haven't done it. Interesting. Yeah, very cool. Well, I think we are running a little low on time, Emery, but is there anything else you want to share with us before we let you go? Jeez. I've shared so much. You have, you much. So yeah, I mean, the only thing

Starting point is 00:59:09 I would say is, obviously, you know, we welcome feedback about these things. So you know, for listeners out there who go and use cause, you know, please, you know, if you discover some issue, let us know on GitHub, if you optimize something something and you have a great experience with Cause, we'd love to hear about that. If you discover some use cases that Cause doesn't work for, there's some other interesting things. We're always interested in hearing from real developers out there solving real problems. Awesome. Very cool. Thanks, Emery. All right. Thanks so much, guys. Thanks.

Starting point is 00:59:44 Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Facebook and follow CppCast on Twitter. You can also follow me at Rob W Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons

Starting point is 01:00:11 who help support the show through Patreon. If you'd like to support us on Patreon you can do so at patreon.com slash CppCast and of course you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - Performance Matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.