CppCast - Programming History, JIT Compilations and Generic Algorithms

Starting point is 00:00:00 Episode 270 of CppCast with guest Ben Dean, recorded October 21st, 2020. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio Static Analysis Tool. And by JetBrains, the maker of smart IDEs and tools like IntelliJ, PyCharm, and ReSharper. To help you become a C++ guru, they've got CLion, an Intelligent IDE, and ReSharper C++, a smart extension from Visual Studio. Exclusively for CppCast,

Starting point is 00:00:31 JetBrains is offering a 25% discount on yearly individual licenses on both of these C++ tools, which applies to new purchases and renewals alike. Use the coupon code JETBRAINS for CppCast during checkout at JetBrains.com to take advantage of this deal. In this episode, we talk about an update to Clang and smaller features from C++17.

Starting point is 00:01:07 Then we talk to Ben Dean from QuantLab. Ben talks to us about just-in-time podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. You know, on the topic of being the first podcast, CBPCat actually recorded an episode yesterday, and I was in on that one. Oh, what was the topic on that?

Starting point is 00:01:58 Rust and C++ Roundtable was the title. Did you do that with your cousin as well? No, he did not. He was not on it. It was other people, including with your cousin as well no he did not he was not on it it was other people including uh nicole masuka as well actually who we know in the c++ community here right how'd that go uh pretty good i don't know i wasn't watching it so much as being on it but uh then it sounds like watched it who i'm giving a sneak peek here we can ask him what he thought later i did watch it okay well before we get to ben uh the top episode i'd like to read a piece of feedback uh we got this tweet this week from uh i can't pronounce this name but the twitter handle is claim.red and they said awesome episode build system related casts are definitely my jam

Starting point is 00:02:46 basil's worldview looks really similar to fast builds shared cash vendorization etc fast build we've heard of fast build before right yes i think so i don't think we've talked about them in any detail but uh yeah i'm glad you enjoyed uh the talk on on basil it was a good one you look concerned jason oh i was just wondering why we had heard of fast build before i was trying to Yeah, I'm glad you enjoyed the talk on Bazel. It was a good one. You look concerned, Jason. Oh, I was just wondering why we had heard of FastBuild before. I was trying to remember who owned that or whatever. I don't know. Anyhow.

Starting point is 00:03:13 Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback.cpcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Ben Dean. C++ wasn't even among the first 10 languages that Ben learned on his programming journey, but it's been the one that has paid the bills for the last 20-odd years. He spent most of that time in the games industry. Many of the games he worked on used to be fondly remembered,

Starting point is 00:03:36 but now he's accepted are probably mostly forgotten. These days, he works in the finance industry, writing high-frequency trading platforms, and the most modern C++ that compilers can support. In his spare time, he watches a lot of YouTube's educational sector, practices the Japanese art of sundoku, reads about the history of programming, avoids doing DIY, and surprises his wife by waking in the middle of the night

Starting point is 00:03:57 yelling, of course, it's a monad, before going back to sleep and dreaming of algorithms. Ben, welcome to the show. Thanks very much for having me all right i've got like two questions from this bio i think i wrote the bio to to to tease you there jason a bit i well okay so maybe i'll take a slightly different uh point um tact than you were hoping for but how many of your first 10 languages were dialects of BASIC? I'm going to say maybe two or three. So I started out with Sinclair BASIC on the spectrum.

Starting point is 00:04:33 Then, of course, BBC BASIC and Quick BASIC or QBASIC. I actually list QBASIC and Quick BASIC separately, personally. Yeah. I think there are very minor differences, right? It's pretty minor, yeah. Yeah, so three basics, if you like. But yeah, before C++, there was, let's see, Logo, ML, Pascal, Fortran 77, Lisp, Modular 3, maybe a couple other things. These were various courses that I had through about second year of university.

Starting point is 00:05:08 Okay. And when I first went to work, I was writing in C. We weren't allowed to use C++. It was very, you know, game developers are very skeptical about new technology, and C++ compilers weren't nearly as advanced as they are today. And so C was the thing that everyone trusted. Okay, now at this moment, I feel that I must go completely off script

Starting point is 00:05:30 and completely outside of your bio here and do something that's kind of more interview-ish, if you don't mind. Rob, I hope you don't mind. Go ahead. But it came up in that Rust C++ roundtable, and I've heard this conversation, I've heard this argument many times from many

Starting point is 00:05:45 different people, saying that one of the main problems with C++ is that each company only allows a subset of the language, so basically everyone's programming in a different subset of C++, because, you know, whatever, and a lot of the newer features can't even be used because companies don't allow them. Now, I've heard this comment from many people, but I've never actually seen it happen, not in any of the companies that I've ever gone teach at or in any of the jobs that I've had personally. So I'm kind of curious, Ben and Rob, since you're both here, if you have seen this, where you're like, oh, no, my company doesn't allow, and if you're saying my company doesn't allow raw new because we use make unique, that's not what I'm talking about, right? I'm talking about like my company doesn't allow operator overloads or whatever. Yeah.

Starting point is 00:06:37 Yeah. Exceptions is kind of the obvious thing to mention here. Okay. Many people who work in games and embedded in systems like that don't use exceptions at all. There are enough exceptions as a compiler flag. But other than that, yes, in particular

Starting point is 00:06:54 this was visible to me around the time of adoption of C++11 and 14. So working in the games industry at that time, everyone was sort of waiting to see what the new standards would be. And there were so many features that came in with 11. It took us a while to, it took people a while to get to know them.

Starting point is 00:07:15 Some were very quickly adopted. Some were universally a good thing. Scope denums is in that category. Like game developers just loved scope denums more or less from the get-go. They were obviously... Anything where it's obviously a benefit with absolutely no performance impact, like that, great. Things that take a bit more getting used to, like lambdas,

Starting point is 00:07:39 I think it's now pretty much accepted that lambdas, at least shortish, let's say, Lambdas, are a good thing, right? In game development, I might say. Although I haven't been in game development for a couple of years. But it took a while for people to accept Lambdas. It took a while for people to be educated about Lambdas. In particular because I think Stood Function came in at the same time, I think. And that confuses a lot of people.

Starting point is 00:08:07 And that really doesn't have the same performance profile as Lambdas. So people were confused about those two for a couple of years back there in the mid-teens. And so, yeah, more it came down to, at least the company I was working

Starting point is 00:08:24 at at the time, the individual game teams, the technical directors and the, at least the company I was working at the time, the individual game teams, the technical directors and the lead engineers of the individual game teams would get together and decide which features they wanted to open up, as it were, and which features they definitely didn't want used in their code base. Okay. Well, that answers my question. Because, like I said, I've never really seen that. Every now and then when teaching, I'll have someone have someone say oh we're not allowed to do x and i'll be like well you're wrong and let me show you why and then we move on but it's it's it tends to be like small minor things and well and i think quite a few companies adopt you know the the big third party coding standards they're

Starting point is 00:09:01 out there like google's coding standard and um standard. There are some prohibitions in there. Oh, yes. It's been modified over the years. Yeah, the early versions of that were terrible. The early versions of it were absolutely terrible because it disallowed basically everything that makes C++ good. You effectively couldn't use

Starting point is 00:09:20 RAII, effectively. You couldn't do any kind of operator overloading. You couldn't do anything that makes code readable, basically. It forced you into a corner of fancy C, unreadable code. Okay. I haven't looked at it lately, but do they allow... I imagine something they still maybe don't allow is user-defined literals because of the scoping issues and the maintenance issues around that. I don't think Titus is a big fan. That might be true.

Starting point is 00:09:47 I also know Chandler's not a big fan of them. So that, uh, yeah. Anyhow. Okay. Well, that was quite the diversion from your bio,

Starting point is 00:09:57 but thank you. Great. Yeah. Okay. Well, Ben, we got a couple of news articles to discuss. Uh,

Starting point is 00:10:03 feel free to talk and comment on any of these, and then we'll start talking more about what you've been up to lately. Okay. All right. So this first one is Clang 11.0 has been released and you've got a big change log here. Some of the things I thought were interesting was they improved the AST for when you get some broken code so they can better

Starting point is 00:10:25 diagnose it it's always good to have you know more expressive errors uh anything else you want to point out jason well to me the most notable thing is that they didn't add any c++ features they have one bullet point of c++ language changes but yeah it's pretty minor yes if you and if and i'm like wait am i missing something so i went to the cbp reference chart of c++ 20 features in clang sure enough c++ clang 11 doesn't doesn't add any new c++ 20 features to it i was just surprised i mean i'm not on the clang development team obviously but it could be that they're not counting or they're not counting sort of improvements or bug fixes to existing features or sort of um you know ongoing work on

Starting point is 00:11:12 for example you know concepts is a feature that's been added recently not in the oven but it's late before but but you know i'm sure concepts has had a few bugs and and you know they're sort of working on those in general. But it's not called Blout as a new feature per se. That's a conversation you and I had relatively recently about some concepts, differences between Clang and GCC. Yes, yeah. It'd be interesting to see if they more agree with each other now or not. Okay. Anything else we want to call out with this?

Starting point is 00:11:42 I guess at this point, we can go more into new C++ features. I do find it good and bad that Clang format is still constantly being developed. Because there's things that Clang format doesn't do right. Sometimes it gets confused around lambdas and such. And one of the things that they have in here is lambda things. But if you are on a team that relies on Clang format and everyone does a Clang format before they commit their work, then sometimes you end up in a scenario where people don't have quite the same version of Clang format installed. And then you're going like back and forth on one line

Starting point is 00:12:19 of code or something like that. There's no way around it. It's just a tiny, mild frustration for me. Yeah, I am on the team that does that. We use Clang format. We have a pre-commit hook that, I think it's a pre-commit hook, that enforces it. And we have to make sure everyone's got the same version, I guess.

Starting point is 00:12:40 Yeah. I think, were you thinking of the new familiar template syntax lambdas when you were saying that Clang format doesn't quite get lambdas right in some cases? I don't know if it's still... I know there was a time when there were cases there where it didn't get them right. Or it didn't, I mean, obviously. It didn't get them as beautiful as one might expect, I guess.

Starting point is 00:13:04 They're still great. And there's something else that I have in my current code base that I'm experimenting with, and I don't remember what it is. I'd have to go look. It's not worth the time at the moment. But clang-format is like not adding a space where it's supposed to. It's obvious. This is a comparison, and it has a space on the left side of the comparison but not on the right side in one particular context and i'm like

Starting point is 00:13:29 what are you doing yes i've encountered things like also with a requires clause yeah it was if it's just like requires single term like x whatever um it seemed to form it slightly differently than if it's requires and then you have something in parens. It takes away the space between the requires keyword and the paren. I think I've run into that. That might be related. But as a whole, I didn't used to use Clang format when I worked in the games industry. There was some push to use it.

Starting point is 00:14:01 In fact, I was sort of trying to convince people to use it, but getting 200 programmers to agree, we didn't quite get there in my time. But coming into the finance industry, I just came onto a team at QuantLab where we just, Clang Format's just a thing. It was already there. And so it was just a case of, all right,

Starting point is 00:14:18 now I just don't, you know, I don't worry about formatting anymore. It's just something that happens on safe. And for me, I'd like... Yeah, if it comes up weird, then I just... I'm not worried. I know it's still correct. So I just sort of think, oh, well, that's just how it is right now. Right.

Starting point is 00:14:35 Fine. I think on open source projects, too, it lowers the barrier to entry if you can say... Instead of having to argue with the person saying your code is terribly formatted just say just please run clang format before and i'll review your commit after that or whatever yeah i often find it useful i often when i get a error message that is very templated um i'll just copy paste that error into a C++ buffer in my editor and Clang format it.

Starting point is 00:15:09 And of course, then it's much, much easier to read. That's a neat idea. Yeah, I've never heard of that. Okay, next thing we have is a blog post on a BarTex coding blog. And this is 17 smaller but handy C++ 17 features. So features that we probably haven't spent too much

Starting point is 00:15:31 of any time talking about, but are nonetheless useful features. And he's got a list of 17 here. Any specific ones you wanted to mention, Jason? I know one of the ones that I liked was removing auto pointer, just no longer having to see that in the code, not having to worry about it anymore is nice. Removing old functional stuff recently came up when I was trying to get an older code base

Starting point is 00:15:54 compiling with C++17 Visual Studio, because Visual Studio is the only one that actually removes these features, just for the record. Oh, find on oh yes oh okay they don't want to break other people's code basically right right um there's one here that has to me like this huge implication and i hadn't even been aware of it before try in place uh try and place in this example in Vartek's blog, try and place, he passes an object by move and to try and place. And this becomes a maybe move because if the try and place doesn't succeed because an object of that key is already in the map, then it doesn't move that value. Now there's a fairly good push, then C++20 had one change for this to make move more automatic in more places. And there's another push to make move more automatic in more places where if you pass something by rvalue reference, it's just going

Starting point is 00:16:58 to be moved. So we don't have to type std move everywhere inside our functions. Well, here is an example in the standard library of something that's a maybe move. And if we make move more automatic and more places, that becomes tricky, potentially. In this case, the example shows moving a string as what will be the mapped type in the map, right? Yes. The signature of try and place actually takes an arg pack that will construct the mapped type in place, I believe. So it takes a key type and then an arg pack, which would be perfectly forwarded to construct the mapped type.

Starting point is 00:17:37 But only if that key doesn't already exist. Yes, only if the key doesn't already exist. So I think what you're saying still applies. It's one of these cases where, you know, std move doesn't move in the cast to r value, and here it is writ large. And I'm not saying I like the fact that std move doesn't move, that it's just a cast, right?

Starting point is 00:17:58 I'm just saying whenever I bring this up, that in cases you might have a maybe move, people are like, oh, but, well, here's a case in the standard you might have a maybe move, people are like, oh, but well, here's a case in the standard library. That's a maybe move. Right. One of the things here that I'm interested in is the variable templates for traits. So in 17, the type traits got the underscore V versions for just the Boolean values. I've run into this recently because, you know,

Starting point is 00:18:26 I do a lot of, on my team, we do a lot of compile-time programming and we use a lot of type traits. And the interaction with concepts here is interesting because, so normally you have like trait underscore T, and that is usually a thing either of type true type or false type, right? And then you have a trait underscore V, which uses the value out of that trait underscore t typically um but there was

Starting point is 00:18:51 i think i remember some discussion in the community about whether whether the underscore v should be should use the the value or just an instantiation of the true type and false type which is convertible to a bool okay right so Right. So, and I've noticed that at least, and I suspect this is a standard thing, because the compilers do this right now in concepts. You can use bools in concepts, but you can't use true type or false type. There's no conversion there. It just doesn't happen in the context of the concept, I guess. I haven't looked up chapter and verse in the standard for this,

Starting point is 00:19:28 but I found that when... So the upshot is now all of my underscore V declarations that I use, I make them use the value from inside the trait rather than instantiating the trait's type as true type or false type. And so that makes them usable in concepts. And that is what the standard type traits do as well, right? Their underscore Vs are just true or false. Yeah.

Starting point is 00:19:52 Yeah. They're just plain old balls. Which actually makes them faster to compile then, because instantiating an object of type true type or false type actually has a teeny bit more compile time overhead. But it can add up in a heavily templated code base. I guess so. One more I wanted to mention was has include preprocessor expression.

Starting point is 00:20:14 You can now basically have a preprocessor check to see if you have an include header available. And if you don't, have some different alternative code to use something without the dependency on that header and i just i know i've seen lots of older open source libraries where they always have like you know different has whatever header pre-processor defines that you have to define when you're building a library and you could get rid of all that code using config.h kind of thing that has to set all these things up.

Starting point is 00:20:46 Yeah. Yeah. I would like to call... Go ahead. I was just going to say, I would like to call myself out here and say, I still don't understand how VoidT works, even though the first time I watched a conference talk on it

Starting point is 00:20:58 was in like 2015. I just, you know what? I'm happy concepts are here. Yeah, in most cases, it's probably super simple concepts, right? So you kind of dodged it there. It had its time. It's like Visa local bus. It was here for a while and then gone.

Starting point is 00:21:18 I was going to call out FromChars and also 2Chars, but I found myself using FromChars a lot more. I like it i i want i because i really just want the one one function which i don't want to have to look at whether to do you know a to i or strutted d or whatever it might be from chars i just want the one place where i can go and know that that's the fastest possible thing. Now, all the implementations aren't quite yet there, I think. But, you know, for example, Microsoft's STL, Stefan Laue, he did a great job, you know, doing to-chars and from-chars. He had a talk about that, I think, last CppCon.

Starting point is 00:22:00 The one thing I don't like about it is it's not constexpr. It should be. It absolutely should be. And I asked Stefan whether it would be easy to make it constexpr, and he said it should be fairly easy because it's mostly just plain old algorithmic stuff. There's a couple of intrinsics in there, but nothing that would hinder constexpr, according to him.

Starting point is 00:22:23 And in particular, with the new stuff with C++20 with more and more using string-like things as compile time as non-type template parameters and things like that, I found myself needing from chars in the constexpr context a lot. And I'm usually just, you know, I have to make do with just a very sort of naive

Starting point is 00:22:41 decimal integer only type loop that I have to write. By any chance, do you just wrap that just for the fun of it and like, and is constant evaluated and then call the more efficient thing if you're not being constant evaluated at the time? Actually, no, I think that's a good idea though. Most times I need it. I know that it's going to happen at compile time lately. Right. But yeah, that's a good idea though. Most times I need it. I know that it's going to happen at compile time lately. But yeah, that's a good idea.

Starting point is 00:23:09 Something for me to think about. And one of the things I'm playing with right now. It's also, I think, really important to point out that from cars and to cars does not use locale. And as far as I know, it is the only string number conversion things in the entire standard library that don't use the current locale, which makes that significant if you care about, oh, hypothetically, you have a scripting language that needs to be able to parse floating point numbers, and someone's using it in Europe,

Starting point is 00:23:38 and they have their locale actually not set to ENUS, and then everything breaks. I had to write my own number parsing. Yes. Okay. Well, Ben, you had several CppCon talks that I think we wanted to touch on a little bit today. To start off, you did this one lightning talk

Starting point is 00:24:00 about older programming language books, or one specific one, but it reminded me that uh connor hoaxer also did a similar talk about another like 50 60 year old programming language book um so what should we you know be learning from some of these older programming language books what did they know that we uh you know think is still new to us? Well, I picked up this book because I like to read about the history of our field. And the more I read about it, the more I find that all the things we think are new these days. You know, there's a lot of new technology these days, and there are new things these days. But all the sort of, as I said in the talks, all the elements of the programmer's condition have been the same for the last 50, 60 years, probably. And it's just fascinating to go back and, you know, this book was printed in 1969. It's by Gene Samet. And it contains a large

Starting point is 00:25:01 number of languages. You know, you wouldn't necessarily think that there were that many languages extant at the end of the 60s. But there are, you know, as well as the ones we know these days like Fortran and COBOL, there was just really an explosion of languages and they were very much more, a lot of them were very much more special purpose or fitted to their domain than languages are these days.

Starting point is 00:25:24 So you had languages for numeric computation, languages for string handling. You know, all these different domains had separate languages. We can see that a bit in the history of Unix as well, things like Ork and Zed and Enrof and those kind of languages. And they harken back to the day when there was this amazing fecundity of languages and people trying out all these different things before we sort of settled down into what we think of today as a language,

Starting point is 00:25:55 which is sort of a general purpose, sure and complete thing. So your question was, what can we learn? What was your question? What should we take away from that, perhaps? Just that, you know, it's a sense of history. It's a sense that the same issues we have today, surprisingly, you know, you think of, you know, are we worried about compile time or we're worried about performance?

Starting point is 00:26:21 They had all those same issues then. And in fact, sometimes even in ways we don't think of now, like compile time, compile time was a concern back then, not just because, you know, it could take programmers time, but because you might, it was much more common to be compiling something which would only run maybe once or twice, because it was the problem was very specific at the, you know, and they didn't have general-purpose tooling to build up things to solve problems. They had to write programs specifically

Starting point is 00:26:49 to solve specific problems in some cases. So the compile time could easily be... Compiling is very expensive compared to maybe solving the problem. So the compile time could easily be 50% of the total time, computer time, spent on the problem. Whereas these days, know we we like to minimize compile time but um i don't think most people run time outweighs compile time

Starting point is 00:27:12 and no matter almost no matter how long the compile takes almost right do you uh ever find yourself since you do enjoy this history of programming languages and such, saying, you know, this one function right here that I'm currently working on, I wish I could write it in Fortran or Lisp or whatever. E-Lisp. In your case, maybe. Well, sometimes. I don't really ever wish I could write it in...

Starting point is 00:27:41 I mean, sometimes you run up against, you know, in C++ some particular odd or annoying syntax. C++ is pretty powerful, though, in terms of what it can do. It's not always beautiful or nice to do things.

Starting point is 00:27:58 But yeah, it's more of a case of not wishing for other languages, but using... And the reason to study all sorts of different languages is to find out how to differently do things, different ways of looking at things most obviously perhaps in the last decade

Starting point is 00:28:14 the influence of Haskell has shown that in many languages C++ included different way of thinking about things, different way of expressing things to look at a computation differently that's i think where it comes through okay um how else does uh you know your your knowledge of all you know these older programming languages and the way they did things back in the 60s uh help you with your day-to-day work

Starting point is 00:28:40 uh to be honest mostly i'm i'm glad that i have modern tools available um you know because it's all very well to study history but um it's it's it's kind of like going back and playing a video game from your youth and thinking it's going to be great it's great for five minutes you know and and then you're like okay i've remembered that i can go away now and best fondly remembered. Having said that, there are, you know, just a little bit sort of, I guess now the 70s counts as a long time ago. Yeah. I think of it as a long time ago. But oftentimes, you know, having a knowledge of things like Ork,

Starting point is 00:29:21 for example, AWK, the Ork language, that comes in handy still many, many times, like maybe a few times a month at least, when you want to go through some data. Although Python is pretty much the glue language that a lot of people use these days for gluing things together when you don't need that high performance. Knowing things like Ork in particular is a great language to know

Starting point is 00:29:49 just for throwing things around on the command line and chopping things up very easily. And it's surprisingly difficult to beat performance-wise, actually. Funnily enough, Unix tools, which have been optimized over 40 years, are pretty fast. And for any given problem involving chopping up text, doing it in five minutes on the Unix command line gets you a solution that, even performance-wise, is hard to beat, even if you go to C. Because, after all, it's all written in C and pretty up to my underhood. There was a talk I went to in,

Starting point is 00:30:25 there was a C++ meetup in the Bay Area talk I went to a couple of years ago now, where the speaker compared Unix tools versus a C solution versus a C++ solution to solve a problem like that on the Unix command line. The takeaway I got from that talk was, you know, just use Unix tools. Just use sed, orc, gre use Unix tools, just use Z,

Starting point is 00:30:46 or grep, they're pretty much fast enough, and it would take you quite some effort to beat them. You see, that's like the kind of thing that I was thinking of when I asked if you wished you could just write this function and insert language here. I've never thought about this before, but it does seem like it would be kind of handy if in the middle of my c++ code i could be like i need to do a quick regex replace on this string you know at runtime if i could just put a sed string in there that would just be sweet yeah that that would be good um you should work on that maybe i'm not sure but yeah uh the when i joined quant lab one of the when the interview test was um write a little function to write write a little c c++ program to uh to go through some some training data you know like do some do some stats crunching find averages min and max and

Starting point is 00:31:41 stuff like that from a csv file basically um and uh writing that in c plus plus yeah you can do it it's not difficult it takes an hour or two you know uh and you end up with a program that's fine it's probably you know a few hundred lines writing in orc i after i joined the company i think about this time last year um i went back to the exercise and i did it in orc uh and it was 40 lines and 20 of those are because orc just is so primitive it doesn't have things like min and max um and and you know it handily beats the performance of of of uh certainly my my original solution and you know a couple of candidate solutions I tested it against. Wow. And it was five to ten minutes work.

Starting point is 00:32:32 Today's sponsor is the PVS Studio team. The company develops the PVS Studio Static Code Analyzer designed to detect errors in the code of programs written in C, C++, C Sharp, and Java. Recently, the team has released a new analyzer version. In addition to working under Windows, the C Sharp part of the analyzer can also operate under Linux and macOS.

Starting point is 00:32:50 However, for C++ programmers, it will be much more interesting to find out that now you can experiment with the analyzer in online mode on the godbolt.org website. The project is called Compiler Explorer and lets you easily try various compilers and code analyzers. This is an indispensable tool for studying the capabilities of compilers.

Starting point is 00:33:07 Besides that, it's a handy assistant when it comes to demonstration of code examples. You'll find all the links in the description of this episode. Okay, well, let's maybe move from history and older programming languages to some new things that are happening in C++. You also did a talk on JITs, right? Yeah, Chris Jusiak and I put together a talk on just-in-time compilation based on a paper by Hal Finkel from a few years ago. Now, he's gone through some revisions.

Starting point is 00:33:40 The idea behind it is to, so we extend Clang to, and there's a new attribute you can use on a function template, and it's a Clang JIT attribute. Okay. And what that means is that the function can be, the template can be instantiated at runtime through the just-in-time compilation mechanism. And basically the mechanism for that is you build a string, literally a std string, and you put the contents of that std string in as the template argument.

Starting point is 00:34:17 So your std string might literally be the string int, I-N-T. And when that gets jitted, the function template is instantiated with a template argument, int. But you can put any string in there, so as long as that string in some sense resolves to a type, that gets just-in-time compiled. That template is instantiated at run time

Starting point is 00:34:38 at the point of use. So that's the basic mechanism of it. And it means, yes, it means, now you're thinking, yes, it means that the whole compiler is based, or at least the whole clang, yeah, the whole compiler is inside your application. And so naively your application is quite large. But there's a way, as Chris and I showed in the talk, to once you've instantiated the template,

Starting point is 00:35:01 basically in memory then you have the IR, the AST, and you can save out the IR, which has already been through the Clang frontend and through the optimization passes, and then is ready for basically very cheaply turning into an object file and linking. So although your naive sort of program that is all single dancing and allowing all of this JIT to happen is large, you can end up having JITed the right things. You can end up with a outputting, a relatively small binary that does exactly what you need.

Starting point is 00:35:36 But instead of being based on things you had to foresee ahead of time compile time. This JIT phase I think of as sort of a configuration time or an install time setting. You know, it's distinct from ahead of time compile and it's distinct from final runtime, but it's sort of a one usage of it is sort of as a configuration time where you're like, okay, I know something about my data now. I can use that to output exactly the optimized binary that I need to crunch my data. But is it only types, parameter, template parameter types, or can you also do non-type template parameters? You can also do non-type template. Yeah, I was explaining it in terms of regular templates, but you can do non-type template parameters. Okay, but it is only the template parameters,

Starting point is 00:36:28 only the bits between the angle brackets if we will. Yes. But that gives you a lot of power as you can imagine. In particular, when you mix it together with lambdas in unevaluated contexts,

Starting point is 00:36:44 being able to do the decal type of a lambda expression, and in particular the decal type of an immediately invoked lambda expression, then basically you can write an arbitrary lambda, returning an arbitrary thing, doing arbitrary computation, immediately

Starting point is 00:36:59 invoke it, stick the whole thing in a decal type, pass that to a function template, and now you have the ability to provide for your users to inject types at this JIT time. Because inside the Lambda, you can have a local struct, and you can return the thing of that type, and you can put functions in there, in your struct, inside the Lambda, and you can apply concepts to this type.

Starting point is 00:37:25 So you can say, user, give me a lambda, write this object in your own way for your use case, but I expect it to adhere to these concepts. I expect it to be some kind of IO thing, so I expect it to have a read function and a write function, for example. So at which point do you get the compile time error if something went wrong versus the jit time error i guess so you get the yes you get a compiler error a jit time compiler error if if there was a malformed

Starting point is 00:38:01 piece of input there right, as you might expect. And so it does sort of involve sometimes turning your users into C++ programmers. They're used to working in a certain domain. They need to get a little bit more used to C++ to know what the error messages mean sometimes. So do you have practical, where you've actually seen this be useful kind of things that you can talk about? Well, I can't get into the details, but yes, you know, exactly. Kind of if you extrapolate from what I've been saying, you know, imagine that you have a large amount of data that you want to crunch. You want to find, you some statistics out of out of

Starting point is 00:38:46 the data right um it puts the power it puts the choice of how to do that in the hands of the users rather than having to bake in everything ahead of time so the user only wants to look at this field out of data and all these other five fields they don't really care about this time they get to do just that through through jitting what they need so the the type that you would pass to your template might be something like we would consider dependency injection it's something that's i think that might be a good word yeah yeah where you could say okay here is a class that i just created just for you template and it has a compute function and now it's going to be able to highly optimize that whole

Starting point is 00:39:28 flow because that compute function is known as JIT compile time for this set of data. That's a good way to put it. And there really is no difference because it's still Clang under the hood. It's still the compiler. And it's still

Starting point is 00:39:43 you can actually pass in jit time compilation flags as well usually we match them to the to the ahead of time compilation flag so for example you know every oh three you pass in oh three everything is optimized with oh three um so it's literally the case that the jit time code is as optimal. It's sometimes more optimal, right? Because you know more at JIT time than you did ahead of time. But it's, you know, in theory, absolutely no different. The code is just the same. It's just coming out the same compiler as it were.

Starting point is 00:40:15 Okay. Now, once you have this object file, I mean, I've played a little bit with embedding Clang in a project and I'm like, okay, I created an object file. Now I need to link it. And it seems like invoking the LLD is a little bit painful if you want to invoke it from inside your program. I mean, if you want to embed the linker versus calling the external linker. Right, right, right.

Starting point is 00:40:40 So I'm just curious if you solved that problem by calling the external linker or if you tried embedding LLD also. No, we didn't embed LLD. We have some driver scripts that make the binaries out of the emitted IR code. I would stick with that plan. Yeah, that seemed like a good idea. A better idea than trying to do everything in one. Compartmentalization of concerns that's

Starting point is 00:41:05 it's a good thing right the project i'm working on just as an aside it has to be cross-platform everything i work on has to be cross-platform and the goal is to make a single binary with external no external dependencies so embedding lld then became the next phase of pain. Right. Sounds like fun. Yes. What kind of limitations are there with this? I mean, how far have you pushed it? Most of the limitations come from a couple of limitations. Really, we're trying to use things

Starting point is 00:41:40 which are so modern that they aren't quite here yet in some cases. So, Lambdas in unevaluated contexts is a very new feature, and the compilers do an okay job, but they don't cover all of the corner cases. When you get into actually using this thing for real and actually doing sort of arbitrary things inside Lambdas,

Starting point is 00:42:02 you can expect sometimes to run into run into issues um one of the things that we do that's not in standard c++ yet we're hoping it will get in is allowing templates in local functions is that right or in local yeah templates and local functions templates and functions yeah has still not been allowed i mean we get around it with generic lambda generic lambdas are templates yeah and they can be in local functions? Templates and functions, yeah, has still not been allowed. I mean, we get around it with generic lambdas. Generic lambdas are templates, and they can be in local functions. Except they're not.

Starting point is 00:42:31 They're defined outside the func... Well, whatever. Well, yes. Anyway, the fact that you can put generic lambdas already in functions indicates that this probably doesn't have an implementation bottleneck in a compiler. And so it would be nice to have that so that is actually a proposal out for that right now there is a proposal i'm not sure of its status so there

Starting point is 00:42:51 has been a proposal a while ago uh there were two proposals i think and they're about to be or have been merged the two authors working on them were told to get together by the by the committee i think and i feel like that restriction is like you can't – I've tried to work around it because you can have a local class. That's not a problem. But you still can't have a template function inside of a local class inside of a function. It'll be like, no, you're cheating.

Starting point is 00:43:17 You're not allowed to do that. Right. Yeah, and there are ways around that, as you say, with generic lambdas. Yeah. Which really are templates inside. If you're interested in playing around with the JIT, is there somewhere on GitHub you can go to test this out? There is Hal Finkel had a fork of clang.

Starting point is 00:43:46 As I say, his paper, we're using version one of his paper, and his paper has progressed since version one and has changed in a few ways based on feedback from the committee and others, I think. Actually, I'm not sure offhand where to go, but we can dig up a link. Okay. Put it in the show notes, I think. Actually, I'm not sure offhand where to go, but we can dig up a link and put it in the show notes, I think. Since you mentioned the paper again, how is the committee responding to this paper? Do you know? I don't know. I don't know how actively... Clearly, this is a massive feature and I don't think committee has time

Starting point is 00:44:24 right now to review this feature like it with all the others probably um i don't know if it's something that we'll get eventually i don't you know i i'm not optimistic about it being in the standard inside of 10 years right but yeah after everything else is done we'll get to that paper. Yeah, maybe. Maybe. After Bjarne gets operator dot, which is probably at least nine years away at the moment. Okay. And then you also did

Starting point is 00:44:54 a talk at CPCon about algorithms, right? Yeah. I did a talk about how to construct generic algorithms. And what are some of the guidelines you go over about constructing your own generic algorithms? Yeah, well, that was an interesting talk because...

Starting point is 00:45:12 So the first thing you get whenever you do a talk that involves that sort of problem, there's the tendency for people to... You know, I got messages on social media after the talk saying hey you know good talk did you know you could also solve the problem this way people get people get caught up in how you're solving the problem rather than the process of constructing the the generics if you like um so yeah the idea behind that talk was not really to say this is how you solve this problem, but rather to say, here are some things, you know, given that you have solved a problem in your code in a non generic way. Here's how you can alter that code. Here's how you can make it generic without losing speed without losing, and in some cases, adding generosity. So making it useful for more use cases. In particular things like looking at, you know, the obvious thing is, you know, you convert it to a template and then it works with iterators and or ranges and you have to decide,

Starting point is 00:46:12 do I want to provide this for forward iterators or random access iterators or bidirectional iterators or what strength and what are the various performance trade-offs there perhaps. And then there are things like affines-based types like Stoodchrono has, where you look at separating the types. So Adi Shavit and Bjorn Fala had a great

Starting point is 00:46:35 talk about this. The idea is that sometimes you shouldn't assume that a type is one thing. For example, you've got a function that takes a couple of arguments. And when you're writing it first, you think of those arguments as having the same type because you just want to add one to the other or do something like that. But affine space types are two pairs of types that work together to form an affine space and

Starting point is 00:47:08 what that means is it's a space it basically one of the types is a point type meaning a position in the space and one of the types is a vector type meaning a length or a direct or in duration so in chrono we have time point and duration and if you look in the standard you can find that you can add a duration to a time point you can take away duration from a time point if you subtract two time points the result is a duration if you add two time points well you you can't do that because it's meaningless to add points right even though these things have the same representation under the hood even though they're just ints or floats when you get down to the registers. The type system gives you this ability

Starting point is 00:47:47 to encode what's sensible for the types. And so how that converts to generic algorithms is, you know, I have a generic algorithm that works on integers. I like to see if it works on chronodurations, for example, because that's quite often a useful use case. And if it works on those, does it work on time points?

Starting point is 00:48:07 And what does it, you know, sort of my go-to test case for making sure this A-fine space stuff kind of shakes out correctly is time point and duration. Okay. So you said, you started that by saying you tend to think of two types as the same, and you're talking about like reducing the requirements, I guess, on your algorithms? Yes, yeah. So, or making them sometimes more generic.

Starting point is 00:48:31 You know, another good example of this is when ranges, we get ranges in 20, but before that in 17, we got the sort of relaxation of begin and end iterator, stuff for like range force. So instead of being two iterators, it's an iterator and a sentinel now okay and so we've more correctly more more more generically realized what's required of that of that end iterator as iterator as was but now we call it a sentinel and what's required is really that it's comparable to the begin iterator and the other position.

Starting point is 00:49:07 But that end iterator never gets dereferenced, obviously, never gets incremented or anything like that. And so now we've sort of more cleanly recognized that and scoped the requirements on that type. And therefore allowed things to be more generic. The functional helpers like plus and minus and less than, and those things pre C plus plus 14, I think it was maybe it was 11.

Starting point is 00:49:35 They required both sides to be the same type, but now they've got a generic avoid type, whatever for the template instantiation that lets them be two different types oh right okay sounds like a similar kind of scenario yeah could well be i'm curious about something as you're talking about algorithms um and your interest in compile time stuff if i recall correctly back in c++ 11 time frame you re-implemented the standard algorithms as in constexpr c++ 11 constexpr yeah i did uh i have a github repo which is now lying fallow which uh was experiments around that yeah did it did you did it all work eventually is what i'm curious

Starting point is 00:50:21 about uh well i a qualified all. All that I bothered to implement, and I wasn't rigorous in any sense. But as you know, most of the algorithms were made constexpr just by putting the keyword constexpr in front of them. Right. And so in that sense, although, you know, that was in C++ 14, constexpr14, when it was not just a single expression. But yes, I mean,

Starting point is 00:50:50 back in C++11, it was a lot of recursion, a lot of use of the ternary operator, a lot of simple linear recursion on sequences. But yeah, it basically worked, I think. As I say, I haven't really kept... The GitHub repo was an experiment. It's out there, but it worked, I think. As I say, I haven't really kept...

Starting point is 00:51:05 The GitHub repo was an experiment. It's out there, but it's lying fallow. Just a random aside, I thought about it while you were talking about this. Yeah, I had a lot of fun doing that. And somewhat masochistically, perhaps, I also did implementations of MD5 string hashing at compile time in C++11. And that was interesting.

Starting point is 00:51:27 Someone just shared with me a whole CRC library that is 100% constexpr. That's interesting. Maybe something for us to talk about in a future episode, Rob. Okay. Well, Ben, it's been great having you on the show again today. Is there anything else you wanted to talk about before we let you go? Well, my company is hiring.

Starting point is 00:51:50 We're actively looking for good people all the time. So I can put a plug in for that. Quantlab.com slash careers. So, and in fact, my team is hiring. So, you know, if you'd like to work on a team using the most modern C++ that compilers support, solving really interesting problems, then drop us a line. And a team that cares about good code quality as well. Yes. Yeah.

Starting point is 00:52:17 I have to say, actually, the team I work on right now is really one of the best teams I've worked on in my career. And it was a definite, you know, even, so when I worked in the games industry, I was taking an interest in modern C++ and I was doing a lot in my spare time and I was starting to do conference talks. And then coming, making a career change, coming up to the team I'm on now, that also was a sort of sea change in the style of C++ that I was writing. So I learned a lot. Cool. And I continue to.

Starting point is 00:52:48 Okay. Well, it's been great having you on the show again today, Ben. Thank you. Thanks for having me. Thanks for coming on. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in,

Starting point is 00:53:01 or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on patreon you can do so at patreon.com cppcast and of course you can find all that info and the

Your Ad Here

CppCast - Programming History, JIT Compilations and Generic Algorithms

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.