CppCast - High Performance Computing

Starting point is 00:00:00 This episode of CppCast is sponsored by JetBrains, maker of excellent C++ developer tools including CLion, ReSharper for C++, and AppCode. Start your free evaluation today at jetbrains.com slash cppcast dash cpp. Episode 34 of CppCast with guest Dimitri Nustruk, recorded November 2nd, 2015. In this episode, we discuss upcoming changes in Visual C++ 2015 Update 1. Then we'll interview Dimitri Nustra from JetBrains. Dimitri will talk to us about high-performance computing and some of the new features coming to C-Lineast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner.

Starting point is 00:01:21 Jason, how are you doing today? All right, Rob, how about you? Doing pretty good. This little kind of time change we're doing with the recording this week is a little odd. Just so people know, we're recording this episode about two weeks before it's going to air. So some of the articles might seem a bit older. That's why. So at the top of every episode, I'd like to read a piece of feedback. This one, we got a lot of great feedback from Andre, Alex Andreescu's episode on D. And this one I picked out was a Reddit comment

Starting point is 00:01:53 where someone listened to the episode and was saying how they talked with Andre during a break at CPPCon, and he was incredibly humble, down to earth. And it was after his talk, and he said he can now get out of character. And he was the only guy in the whole conference who actually had an interest in what he was incredibly humble down to earth. And it was after his talk and he said he can now get out of character. And he was the only guy in the whole conference who actually had an interest in what he was working on. And Andre goes and actually replies to him and says, so my trick to pretend to listen did work.

Starting point is 00:02:15 So it's great to see Andre's sense of humor. It was really great having him on the show. Right, Jason? Yeah, so it was a good talk. Yeah. So we'd love to hear your thoughts about the show as well. You can always email us at feedback at cppcast.com, follow us on Twitter at twitter.com slash cppcast,

Starting point is 00:02:30 and like us on Facebook at facebook.com slash cppcast. And you can always review us on iTunes as well. Joining us today is Dimitri Nustruk. Dimitri is a developer, speaker, podcaster, and a technical evangelist at JetBrains. His interests lie in software development and integration practices in the area of computation, quantitative finance, and algorithmic trading. His technological interests include C-sharp, F-sharp, and C++ programming, as well as high

Starting point is 00:02:58 performance computing using technologies such as CUDA. He has been a C-sharp MVP since 2009. Dimitri, welcome to the show. Thank you. Glad to be here. So you're also a podcaster. Well, yeah, I do a podcast, although my podcast is in Russian. Oh, okay. So is it a C-Sharp or C++ focused podcast? Well, it originally started, it's actually called Solo on dot net so it originally started as a podcast i was doing with

Starting point is 00:03:26 other people uh related to the user group that i was taking care of back in st petersburg and it was yeah it was dot net related but from then on it kind of branched out into all sorts of directions because obviously my personal interests have diverged somewhat and i just decided i also recently the last i think two or three episodes were actually both uh podcast as well as uh video recordings as well because i got a new camera i thought i'd check out you know 4k recording and all the rest of it so it's now dual mode if you will wow you know i'm would be a little embarrassing to find out if there had been a russian language c++ podcast this whole time,

Starting point is 00:04:05 or we've been calling ourselves the only C++ podcast. Well, yeah, a lot of it is C++. Okay. Okay, maybe we can still say we're still the only completely focused on C++, although I guess that's not true if we've had episodes on D and Rust. Oh, well. Yeah. So I want to go over a couple news items. This first one is Visual Studio 2015 Update 1 Release Candidate is available.

Starting point is 00:04:31 And again, we are going through a bit of a time change. So maybe by the time this episode airs, the Update 1 release will actually be out. Hopefully. Well, there's an interesting note from Eric on that where he says, well, I can't give a specific date, but the conference is coming up. Connect 2015 is coming up on November 18th. So maybe there'll be some interesting news then. Yeah. You know, thinking back to last year, I think they had a similar event. I actually got to attend that one and they released a visual studio. I guess it may have been still in beta

Starting point is 00:05:03 at that point, but they made a bunch of releases on that day. So hopefully update one will come out during that event. You know what's funny? I remember releases of at least 2013, maybe even 2012, where Microsoft explicitly promised that the updates to the C++

Starting point is 00:05:20 compiler would be out of band. They basically said that from now on, we're going to do everything kind of asynchronously whenever it happens. And it still hasn't happened. So maybe we're finally seeing this sort of thing. Maybe, but I mean, we are expecting the update once. I think everything's coming with that,

Starting point is 00:05:35 from what I can tell. So some of the things coming with this update, though, one of the big ones that we've talked about already was the C++ build tools, where you can have a separate package of just the C++ compiler without any of the Visual Studio IDE. I know a lot of people are looking forward to that. And it looks like they're also making improvements with memory diagnostics, compiler improvements, improvements with the cross-platform development tools. Jason, was there anything else you wanted to call out in this? Well, just specifically constexpr and Sphene support

Starting point is 00:06:10 are things that keep coming up over and over again. And if you've been following any of the tweets from STL, it looks like they're very, very close to having good support for those things now. Yeah, it says partial expression Sphene support. So I'm not sure what that means exactly for it to be partially done. I don't know. Sorry, wasn't it the case at CppCon

Starting point is 00:06:32 that they said they would have support for modules as well? Yes, I think actually they said it currently in... Oh, shoot. I saw something about it. They explicitly said update one will have like a beta preview thing with modules yeah so they say in the comments they say the documentate or the excuse me the support for modules is still going to be unofficial and you're going to have to use whatever out of band unofficial documentation blog postings and whatever to get the information you need to use

Starting point is 00:07:02 it but it will be in update one i believe that's that's good i i personally i kind of get the feeling that that's what many people are waiting for in terms of the build process i mean certainly having a separate kind of visual studio free stack is great for continuous integration but it still doesn't cover the problem of why is my program so slow to compile but i guess even even the introduction modules i imagine would still leave the problem of well you have your stl and your boost and they're not you know they're not going to jump to modules overnight it's going to be like a process right jason speaking about modules really need to get gabby on to uh to talk about modules more depth i think that'd be a great episode yes we, we should try to do that.

Starting point is 00:07:46 Okay, so the next article is reverse iteration with range-based for loops. Jason, do you want to go over this one? Yeah, it's just a quick comment from someone on Reddit who's like, hey, how come we can't use the range-based for syntax in C++11 to do a reverse iteration? And I think it's an interesting point

Starting point is 00:08:04 that as useful as range-based for loops are, they do have a lot of limitations. But the basic answer is, well, we pretty much have to wait until Eric Niebler's ranges proposal gets fully accepted. But someday it'll be here. Yeah, I think an interesting question is what would happen in such an arrangement

Starting point is 00:08:24 if you've got the yield keyword in? Because yield kind of goes forward in time. You get your next value and your next value and the next value. So starting from the end, the paradigm no longer applies. Because we're kind of encouraged in a way to use the for keyword together with yield. But it doesn't make sense if you introduce some sort of for r keyword, which does it backwards, because there is no final element, effectively. So that's an interesting problem.

Starting point is 00:08:51 Yeah. I've not yet watched any of the videos on the continuations in yield work yet. Well, I'm coming from the C-sharp world where yield is kind of a standard thing. And you have this idea of infinite collections or generators which will effectively yield the values infinitely however many times you ask.

Starting point is 00:09:11 Right. So in the context of this, reverse iteration kind of doesn't make much sense. Right. That's an interesting point. Okay, so this last article is really a project which is pretty interesting. You can now interactively create Clang format configurations. And basically they have this interactive web page where they have a little code sample on the right. And you can switch between the default styles that Clang

Starting point is 00:09:38 format offers LLVM, Google, Chromium, Mozilla, and WebKit. And then you can dive down into all the different Clang format options and see it live in this little interactive code editor on the right. So if you're deciding what Clang format you want to use, it's really the best way I've seen to test different options. Yeah, I feel like if you have any interest at all in libclang and the stuff that you can do with it,

Starting point is 00:10:04 you should just go and play with the formatter for a little while. Because it's just kind of fun to play with, too. Yeah. Is there anything you want to add with this, Dimitri? Oh, I don't know. I mean, we're kind of the competitor in this space in the sense that all of our tools provide a certain formatting support. An interesting story, actually. I know you already had Anastasia here on the show a couple of episodes ago, but the formatting options in C-Line, as opposed to ReSharper C++, were actually

Starting point is 00:10:33 inferred using genetic algos, which I think is pretty impressive, because we hardly see any sort of AI-type tech being used for development specifically but the case there was that you essentially took a bunch of code which already conformed to a particular coding style and you ran a genetic algorithm shifting all the options that you have in the IDE until your gene sequence if you will match the formatting that was originally in the code so that's an interesting point I think we also have to do this in vshharper at some point as well, because I think it kind of nails perfectly the set of settings that you need for a particular coding style.

Starting point is 00:11:12 Okay, do you mind if we dig into that for just a second? Does that work with anyone's... I mean, starting from my own code base, can it learn what my style is? No, this was essentially something that we did. We have this hackathon thing once a year where everybody at the company just takes the weekend to do their own crazy thing.

Starting point is 00:11:34 And so this is essentially somebody's hackathon project that was applied specifically to samples taken from the particular settings. I suppose we can start talking about including it in a kind of inferential way in the sense that you feed it your own code and say, can you please infer all the stuff from here. But from what I remember, this is not exactly a quick operation in the sense that we,

Starting point is 00:11:55 I think when the presentation was happening live in front of an audience, we actually had to just look at the final results, because it wasn't quick, you basically have, I don't know, you might have a couple of hundred settings that are constantly being readjusted and re-evaluated. And because all of CLion's infrastructure has to be effectively erased to build the parse tree and everything, it's not a cheap operation to do so. I don't know if we can actually make it palatable to the end user in the sense of not annoying them and having them to

Starting point is 00:12:26 wait for like, I don't know, 10-15 minutes while the code is being analyzed. And if I had to do that exactly once on my project, I think I would probably put up with it personally. Although my style is not so far out there that I can't generally adjust it with whatever settings are available. Well, yeah, we can sort of discuss it and make a feature request. That would certainly get somebody's attention. But from what I understand, we generally try not to skew the user experience

Starting point is 00:12:53 in terms of long waits, because even if you do explain clearly that there is a long wait coming at the end of this, people are still going to be annoyed about it. That's pretty neat, though. Yeah, I think so. So, we'll dive

Starting point is 00:13:08 a little bit deeper into what you've been working on with JetBrains. It's actually been 29 episodes since we had your colleague Anastasia on. What has the JetBrains team been busy with since then? Well, we've been doing pretty much the same stuff. In fact, today, we've had

Starting point is 00:13:24 the world release. We released just about everything, every product line that we do in a single day, which is kind of a bit crazy. But we've still been working on CLion, obviously, improving the cross-platform story and improving the different features there. In addition, we have been working on ReSharper C++, which is something that I guess didn't get covered so much in your previous podcast. That's essentially the ReSharper. We have a product, let's start from the beginning. We have a product called ReSharper, which is over 10 years old, and it supports primarily.NET development, though in recent times, it's branched into all sorts of things. It's done web languages like HTML, CSS, JavaScript. It's also done all sorts of specific formats, like,

Starting point is 00:14:11 for example, in the latest version, we support things like the Google Protocol Buffers format or the JSX format from Facebook. So sort of format-specific kind of tech. And in recent years, we've added support for C++ there as well. So the story right now is that if you are into cross-platform development or if you want an IDE, which is a standalone IDE, then C-Line is the way to go. However, if you're still in the Visual Studio mindset,

Starting point is 00:14:40 you work with a Microsoft compiler, then ReSharper C++. So essentially, we've made a kind of umbrella uh product if you will called resharper ultimate and that's something that includes both everything from resharper supporting dot net and whatnot as well as the support for c++ and also our tools for things like memory profiling and performance profiling and that sort of thing and just to add to this a little bit you know know, working a bit in the C-Sharp community myself, ReSharper has been like, you know, a must have for Visual Studio developers for

Starting point is 00:15:12 a long time. So do you think that's going to become the same case with ReSharper C++ for Visual Studio developers? I think if you're on Visual Studio, then, well, I'm certainly hoping that it does. And I think that as we, I mean, it's a fairly young product, but as we provide more and more value, I think people are going to sort of see that it's really, you know, it adds so much in terms of, you know, making your life easier, then it's silly not to use it. Suddenly, that's what we've been experiencing with ReShop. And I think

Starting point is 00:15:42 the challenges are, in most cases, quite similar. Although I have to admit that, of course, supporting C++ is a bit harder because you effectively have to implement your own preprocessor and you effectively have to, I mean, to analyze a particular translation unit correctly, you have to build it yourself. So even if you have a tiny little hello world, which includes some boost header somewhere, you may end up with 300 megabytes worth of text that ReSharper would have to process.

Starting point is 00:16:08 So the story is a lot harder than it is with.NET. I'm hoping it will improve with modules and whatnot. That's why I'm sort of enthusiastic about them. But at the moment, it's quite a challenge. So one thing worth mentioning is with Visual Studio 2015, Microsoft actually delivered some of their own built-in refactoring support, which I think has been a big request for years, and they finally delivered on it. What does ReSharper bring to enhance that? Well, if you look at the feature set just like for like, then we generally provide so much, then I could take up a podcast or two just talking about the features. And let's, I mean, even if you regress just to the features, which have, which have their identical kind of counterparts in Visual Studio, then what we pride ourselves on

Starting point is 00:16:57 is the correctness in terms of how they work. I mean, doing an ordinary kind of basic refactoring, something that everybody can more or less manage, but of basic refactoring is something that everybody can more or less manage. But doing a refactoring where your variable is kind of dragged through a lambda or used in some bizarre setting, or it's actually part of a macro where, you know, if you do the refactoring, you actually break what this macro is doing. These sorts of things are the things that we provide diagnostic for. And quite often in ReSharper, what you're going to see is you're going to see a window pop up and that window will say, oh, by the way, we found a couple of conflicts where this is simply not going to work because

Starting point is 00:17:35 of the way you're using it. And people, I think, are sometimes not ready for that because they assume that if you're doing a rename, a rename will work consistently across the board, whereas this only works if you're just using plain C++ without macros, without any kind of template magic and whatever. So I would say only like-for-like features, we pride ourselves on the thoroughness and the correctness. And of course, we put a lot of things on top of that, a lot of analyses, a lot of interesting kind of, you know, notions to help people to make their lives better. So, I mean, you just consider a simple printf statement where you make a mistake in the format specifier. That's not something that Visual Studio will necessarily pick up, but this is something that we can do. And of course, the fact that we implemented our own preprocessor implies that one of the

Starting point is 00:18:29 things we can do is we can expand the actual preprocessing macros to like a depth of one, for example, or an infinite depth. So what this means is, let's say you're writing Google tests or something, which is just, you know, macros basically being used to actually manufacture these massive tests. If you want to take a peek at what's going on with ReSharper, you can just navigate on top of your test, press Alt-Enter and expand the entire macro. So you get to see the final code. It's, you know, readable.

Starting point is 00:18:58 Finally, you see what's wrong and then you can undo and go back. And that way you can sort of get a feel for what's actually going on in macro, what's going wrong there. Because otherwise, diagnosticity is effectively lost. You get an error at compile time, and you're unable to determine what actually happened. Well, there's been some times in the past that it's using boost preprocessor library before variadic templates were available. And I could see being able to expand those preprocessor macros would be huge. Does it work like that? Well, I mean, like, it doesn't really matter how complex the macros are. It doesn't because unlike certain other products, I won't say which we actually took this whole

Starting point is 00:19:37 thing seriously, because one approach to macros will be to simply ignore them. And that's what some people do. They just look at the C++ code, and if they see a macro, they assume it to be redundant tokens that we have nothing to do with. Whereas with ReSharper, we first of all, internally expand all the macros, and then we perform the analysis on top of what's expanded, which is interesting. So if you use the macro incorrectly, you might actually get a highlighting on top of the macro, which will tell you an error which relates to what would be expanded. So it gives you an insight into the final kind of output. So you don't have to wait until compilation time to find out something's wrong. ReSharp will just tell you right here. I'm curious, like how much time did you guys

Starting point is 00:20:20 spend developing the ability to be able to handle this much C++ parsing before you're able to actually release it to the public? It's actually difficult to say because I think Anastasia already went with the history. Essentially, originally, we were going through the phase where we were adding C++ support to Objective-C programs because they can use a kind of C++, a really somewhat bizarre version of C++ with some of the things removed, like destructors, for example. Very strange. But we knew we had to do it at least to some extent.

Starting point is 00:20:52 And then from then on, we kind of branched out into these two products. So it took several years and it took quite a few people to get where we are. And we're still improving. It's still an ongoing process. I would say that the ReSharper C++ team, if you include the testers, it's about 10 people, maybe less than 10. And yeah, it did take a couple of years. But, you know, you want to, when you go to market, you want to avoid, you want, well, you don't want any false positives. And certainly one of the kind of challenges is that we still have false positives.

Starting point is 00:21:30 I recently put out an article on why that is. Essentially, the Microsoft compiler does some things incorrectly with respect to C++. And as a result, we have to take sides. Do we side with the C++ compiler or do we side with correctness? Because we can imagine situations where people would work in Visual Studio and then try to cross-compile on Linux and whatnot.

Starting point is 00:21:51 In this case, it's a problem. And that's why if you go into some of the usages of Boost, like using AccumulatorSet, for example, then yeah, we're still going to have a few nitpicks here and there. And these, unfortunately, I mean, in my post, I just explained this is Microsoft's problem. And we've been in communication with Microsoft about it. We told them, you know, and I think you're going to,

Starting point is 00:22:15 you will see there was Herb Sutter's comment in that original blog post saying that they are working on fixing those issues so that hopefully some years down the line we will have a kind of perfect consistency. But we did get the product to a state where the vast majority of things are parsed and resolved correctly. And that includes handling all those crazy cases. I'm sure you've seen the C++ Watt talk from the lightning talks at CppCon. We do handle just about everything, including the really bizarre. Cool.

Starting point is 00:22:49 So besides refactoring, it looks like ReSharper plus what's going to also do some static analysis in the IDE. Is that correct? Yes. Yes, indeed. Well, essentially, as soon as you open up a solution, ReSharper starts continuously analyzing everything that you edit. And it kind of does this in real time effectively.

Starting point is 00:23:08 So as you type your code, if you make a mistake or if you write something which isn't a mistake but can be improved, then ReSharper is going to underline it with a wavy underline. And you will have a pop-up where you can, for example, fix an error or you can somehow improve the code. And there is also a marker bar on the right-hand side of the editor that shows you throughout the whole file where particular issues are, from subtle hints to actual warnings and errors. Cool. Okay.

Starting point is 00:23:36 So changing gears a little bit, I see that you recently finished up a Pluralsight course on high-performance computing in C++. I was first just wondering what your background was with high-performance computing. Well, a couple years ago, I got into quant finance. I'm kind of self-taught, but quant finance is all about doing math and doing all sorts of simulations and mathematical modeling. And the thing about this is it's one of those areas where, you know, if you're doing

Starting point is 00:24:06 any kind of random simulations, the more computing power you have, the better, essentially. So I had to get into computation just by virtue of wanting my stuff to run faster than it runs in MATLAB. So QuantFinance is actually very much a C++ oriented kind of business. It always was and I imagine it will be for a very long time. So it's the language that they teach if you go and you actually do like a master's in financial engineering. So in this setup, after I found a very large machine cluster where I could actually do my research, I realized that I didn't have the necessary skills to actually, you know, leverage all this power. And so I went, I kind of systematically picked out the topics that I wanted to master in order to be able to basically leverage all this wealth of computing power. So that's, the course was born from that. Okay, could you give us an overview on some of the,

Starting point is 00:25:03 you know, instructions you go over in the course? It looks like the first chapter you have to, I mean, you have your algorithm, and you want to give your algorithm more entities to compute on. And by entity, we can mean different things. So at the simplest level, at the most basic level, we have this idea of single instruction, multiple data. So essentially, in addition to the ordinary registers, you have very large registers on the CPU, and you can stick several values into them, like instead of sticking one value, you can stick four values in a register, and then you can stick another four values in another register. And when you perform the add operation, it's not going to be an ordinary sort of x86 add, it might be an add PS. So it would add these four values and these four values and give you kind of four additions at the same time. So you can see the performance improvement in this regard. And this has been going on for a very long time.

Starting point is 00:26:08 I think when I was a child and processors were like 166 megahertz, not gigahertz. I think we started out back then with technologies like MMX. They weren't primarily targeting multimedia back then. But right now, you know, it's open to anybody, including, you know, codecs and certainly scientific computing to leverage this. Unfortunately, this is something that you can only really do in C++.

Starting point is 00:26:33 It's not available in.NET just yet. And I don't think it's available on the JVM just yet. There are efforts in both cases to bring it to those platforms, but, you know, it might take a few more years to get the JIT working because in the managed world, people just assume that your JIT compiler

Starting point is 00:26:52 will do everything for you. And unfortunately, it doesn't. It really, really doesn't, even in the simplest cases. And nobody is really complaining. I mean, if you go out on the market and you actually buy yourself a math library, let's say you buy yourself a popular.NET math library, that's just a wrapper around a C++ library.

Starting point is 00:27:18 It's a C++ library with all the optimizations, and then you get a wrapper on top of it for.NET or Java or whatever. So, SIMD is the first step. And certainly, the compiler tries to help you in this regard. So unlike the managed compilers, the C++ compilers like the Intel compiler, they're actually very smart. So they try to, you know, they have vectorization built in.

Starting point is 00:27:39 So if they see something that's obviously vectorizable, they would go and rewrite it in terms of these large registers. And also you can give the compiler hints in terms of pragmas and whatnot to make it even more efficient. So I'm sorry, what compilers did you say actually support that? Well, I think pretty much every popular C++ compiler supports vectorization to some extent.

Starting point is 00:27:58 Some do it better than others. I was recently surprised that GCC apparently supports the vectorization on Intel CPUs, which are not even out yet. So it would actually use instructions that no CPU can use, which is fantastic in terms of, I guess, some sort of future-proofing. But the problem with SIM you have to realize is it makes your code not really portable because it's not the sort of underlying x86. It's something that's been evolving in time. So if I have my SIM code using AVX4

Starting point is 00:28:32 and I run it on an older machine, well, the program will just go boom. It will say, I'm sorry, but these instructions are not supported. That's it. So it's not, in terms of portability, it's not great. But if you have your machine cluster and you know exactly what versions of what sort of CPUs you're using,

Starting point is 00:28:48 you can build for specifically this instruction set, and it will be just fine. So you cover in your course how to help best write your C++ code so that the compiler can leverage SIM to that kind of thing. Well, actually, to start with, I show inline assembly. I know it's a cardinal sin to do this kind of stuff, but I start with inline assembly and then move on to mnemonics, which are little wrappers around assembly language

Starting point is 00:29:13 for doing those little operations like initialize or large register with four values. There would be a tiny little kind of C-like function which would do that for you. And then, of course, yes, the next stage is really just getting the compiler and giving the compiler hints on how to do it properly. Okay. So then where does this next chapter come in on open multiprocessing?

Starting point is 00:29:36 All right. So you're leveraging your instructions to the best of your ability, and the next kind of entity, the next level of scale is multi-core. And of course, we kind of rely on this idea that if you spin up multiple threads, then the operating system will actually put them on different cores and it will all run concurrently

Starting point is 00:29:55 and therefore improve your program. And here, there are actually two approaches. There is an imperative approach, which is kind of straightforward. You make your threads and you sort of make your own thread pool, maybe even. You use an imperative approach, which is kind of straightforward. You make your threads and you sort of, you make your own thread pool, maybe even you use some library functions, but there is also a declarative approach. So I thought that since somebody else on Pluralsight is doing a course on Intel threading building blocks, which is the imperative library for multi-threading,

Starting point is 00:30:20 I decided to go with the declarative route. And the idea of declarative was once again, that instead of doing the parallelization manually, you actually give the compiler certain instructions on how to do that. So once again, you do it with pragmas. These are very smart programs, you can stick a lot of stuff into them. But they're pragmas, which essentially say, oh, by the way, here is a loop. And I know that you the compiler can parallelize this loop. And here are some hints as to which variables you can capture and where, which ones are private or public and whatnot. And the end result is that OpenMP, which is a compiler plus library solution. So the compiler has to support these pragmas. But in addition, you have some library function calls for actually

Starting point is 00:31:01 deploying the stuff using the thread pool and whatever. So this compiler plus library solution is what actually turns your code parallel. And the interesting thing about modern optimizing compilers is that even if you're not using OpenMP and you turn on the parallelized flag in your compiler, what will happen is your compiler will still use OpenMP behind the scenes. You can look at the disassembly and see that. So in both cases, and OpenMP is a very old technology. It's a very mature technology. It looks weird because, I mean, people look at all these pragmas and they think, haven't we gotten away from it already? But in actual fact, it just works and people are

Starting point is 00:31:38 sufficiently happy with it. So you have a choice in a way. You can go with the imperative route and just, you know just write your functors and write a parallel for, which takes a lambda plus an iteration variable, and that will work just fine, or you can use it this way. And OpenMP is less intrusive because you can take existing code and you don't have to rewrite it to any great degree. You just put the pragmas. If the compiler doesn't know those pragmas, if it doesn't know OpenMP, you just get serial execution. So that's it. So do most compilers support the OpenMP pragmas? Yeah, they totally support it.

Starting point is 00:32:12 OpenMP is now at version 4. There is a lot of stuff you can do in there, not just parallelizing loops. For example, you can just slice up your code into different tasks, and those would run in parallel. And this is, once again, a very unobtrusive operation. So you have a chunk of code that you've written already. And you say, oh, by the way, these things are independent. So why don't I slice it into three little tasks, and then they can complete at the same time, and I'll just wait for them. Very convenient. Yeah, I don't know if I should admit this on the air or not. But I'm pretty sure the only time I've

Starting point is 00:32:40 actually seen OpenMP in use was in Fortran code. So I might have to take another look at that for the C++ support. Yeah, C++ and Fortran are kind of, they go kind of hand in hand for scientific computing. Although I would say that Fortran is mainly a remnant which is used by astronomers, surprisingly enough. So if you're into astronomy or I guess cosmology, then you might see some Fortran.

Starting point is 00:33:05 For the most part, I think people use C++ daily. Right. Yeah, I'll definitely have to look into that a bit myself. The next article or the next chapter you have is on message passing interface. How does that factor in? All right, so that's the highest level essentially. So you've got your machine cluster

Starting point is 00:33:21 of however many thousand cores you have and you want to run your calculations, your Monte Carlo simulations, whatever on them. It turns out that the API, the most common API that people use for actually spreading out the data and then sort of collecting the end results is called MPI. That's short for message passing interface. It's been around for years, and it doesn't look like anybody is replacing it anytime soon, because it works pretty well. Although personally, the variety of MPI that I use is called Boost MPI. That's essentially a wrapper around the core MPI. So when you build Boost by default, it doesn't get built because it's kind of, how should I put it? It's library specific. There are different implementations of MPI. for example, the Microsoft MPI, the MPI Chameleon, there is the Intel MPI. So I build it for my variety of MPI and I get these nice boost wrappers. So it becomes, because originally it's

Starting point is 00:34:16 kind of like C style functions, it looks pretty ugly and it has pointers all over the place. And then you plug in boost and suddenly you get things like Boost serialization. So you just sent an object across the wire as a single parameter. It just works. I mean, it's beautiful, and the friction is minimal. So I love Boost MPI. It's very useful. I wanted to interrupt this discussion for just a moment to bring you a word from our

Starting point is 00:34:40 sponsor, JetBrains. ReSharper C++ makes Visual Studio a much better IDE for C++ developers. It provides on-the-fly code analysis, quick fixes, powerful search and navigation, smart code completion, automated refactorings, a wide variety of code generation options, and a host of other features to help increase your everyday productivity. Code refactoring for C++ helps change your code safely, while context actions let you switch between alternative syntax constructs and serve as shortcuts to code generation actions. With ReSharper C++, you can instantly jump at any file, type, or type member

Starting point is 00:35:15 in Solution. You can search for usages of any code and get a clear view of all found usages with grouping and preview options. Visit jb.gg slash cppcast dash rcpp to learn more and download your free 30-day evaluation. And use the following coupon code to get a 20% discount for the ReSharper C++ personal license. CppCast, JetBrains CPP tools. And then the last chapter you have there is C++ Accelerated Massive Parallelism. Now, is this where you're using the GPU to parallelize? Yes, indeed. And first of all, I have to mention that in addition to this, I also have another course on CUDA up on Pluralsight.

Starting point is 00:36:00 And the GPU story is actually a very interesting story. So maybe I can elaborate a bit more, especially for those of your listeners who are not so aware of the situation. So originally, years ago, we had a situation where the graphics output went to the monitor right from the motherboard. And nobody found that in any way surprising. So at some point, people realized that, you know, there is two ways this can go. We can improve the motherboard or we can start selling a separate piece of hardware, which would go into the motherboard on some sort of bus and then take care of the rendering. And that's the rise of the GPU.

Starting point is 00:36:34 And it's an interesting, it's a unique story in a way, because if you think about, for example, sound cards, then not many of us have discrete sound cards these days. Most people just use what's on the motherboard. But for GPU, what happened essentially is the GPU industry got a lot of money flowing in all the time because people would buy more modern games and they would require better and better hardware. I think it's slowed down a little bit now, but it used to be the case of kind of an arms race. So as a consequence, a lot of money got funneled into development of GPUs. And at some point, we got this idea of

Starting point is 00:37:13 programmable shaders. Now, a shader is just a piece of microcode that you can, it's kind of like hardware programming in a way. You program your GPU to, for example, take a single pixel and color it in a specific way to perform a particular effect, kind of in something like C-like or assembly language like fashion. So that came in and it was very popular because, I mean, the GPU is a massively parallel kind of construct. It runs lots of threads at the same time. So it's specifically designed for processing like large arrays of pixels or large arrays of vertices. So people could write these little microcodes to actually like do all sorts of wonderful fantastic effects. And they still do, by the way. However, at some point, the sort of scientific community and the computation community realized that this capability can be tricked. You can actually take your, like, let's say you want to multiply two matrices.

Starting point is 00:38:05 You turn those matrices into textures. You feed those textures to the GPU. You perform the multiplication using microcode, and then you get the data back from the resulting texture. And people realized that this was so fast that it was worth actually doing. And back then, the technology wasn't really that advanced. It didn't support, you know support double precision variables, for example. It was really kind of coming from the ground up in this regard.

Starting point is 00:38:31 But what happened, and this was great, is that manufacturers themselves realized that this was worth it. This was worth doing. So this explains why now you can buy GPUs like the Tesla GPU from NVIDIA, for example, which doesn't even have video output. I mean, it's a graphics card that doesn't output any video because it's sold specifically for computation. So we're in a way lucky that this kind of evolutionary approach led us to a situation where for massively parallel tasks, for data parallel tasks, like when you have, let's say, you have a huge array of financial data, and you want to perform the same operation on the same kind of groups within that data, for example, the GPU is

Starting point is 00:39:10 a fantastic tool, and it saves a lot of time. So on the market, we at the moment, we have two players, we have AMD, formerly ATI, they bought ATI, and we have NVIDIA. NVIDIA is the one that's commercially successful, and it provides fantastic tools, and it introduced this idea of CUDA, which is, well, a slight small extension to the C language, which support the programming model for their specific graphics

Starting point is 00:39:35 cards. However, the problem is, if you use CUDA, it doesn't support ATI or AMD, as it is now. It doesn't support those graphics cards. And let me just remind you that when the Bitcoin, when Bitcoins were mined on GPUs, they were mined on ATI GPUs and not on NVIDIA GPUs because one particular instruction there

Starting point is 00:39:55 was twice as fast as NVIDIA. And that's kind of settled the field back then. Of course, right now we're past that. You need ASICs to mine GPUs, mine Bitcoin rather. But the end result is that NVIDIA's tool set is fantastic. I did a course on it. However, it only targets one side of the equation. If you want to use AMD GPUs, then, well, you would have to use OpenCL.

Starting point is 00:40:16 And OpenCL is great. OpenCL is an open standard, which aims not to support just GPUs, but also things like the Intel Xeon Phi, and even Altera is now experimenting with OpenCL for FPGA development. So it's a great technology. However, if you're targeting just GPUs and nothing else, it gets really verbose. You have to write a lot of code for the simplest of things because it tries to be so general. So what Microsoft did is they said, hey, how about we try to get our compiler to produce uniform code or to produce code which will actually, you know, you write it once and it works both on NVIDIA and on the AMD device as well. And maybe at some point it will also work on,

Starting point is 00:40:58 you know, other classes of devices. And they did exactly that. So they came out with C++ AMP, which is essentially a technology. It's just two extensions to the C++ compiler that Microsoft does, which enable them to figure out that a certain chunk of code is intended for the GPU. So essentially, what does it mean that the chunk of code, the chunk of C++ code is intended for the GPU? This means that you cannot just call any arbitrary function. You cannot pass a string into it. You can only do what's possible on the GPU because GPU is not an x86 device. So you cannot just run arbitrary code on it. And essentially, they compile things in a kind of uniform fashion. And it's great. It's a great approach. And it would be even more great if the device manufacturers

Starting point is 00:41:41 kind of jumped in and took a more active participation. So Microsoft provides what they call a reference implementation. So they kind of, they show you how to do it and what you have to support. Unfortunately, I haven't seen any other compiler support C++ AMP, but what Microsoft is doing is already pretty interesting. And this part of the HPC course, I'm actually showing how you can get started with it

Starting point is 00:42:03 and get some work done, basically. So do you have any hope for that, that other compilers might support it, or there might be other initiatives that are as flexible as the Microsoft solution? I don't know. I think we're still in the limbo in terms of some sort of convergence language or convergence technology, because we have to mention other classes of devices. We have to mention the Intel Xeon 5, which is essentially your 60-core coprocessor solution that you can also stick into the PCI slot. And it's something, I mean, we see some innovation from Intel.

Starting point is 00:42:34 Again, Altera, which I guess is now also Intel since they bought them, Altera is experimenting with using OpenCL for the FPGA side of things. So I think in the long run, we might see a convergent technology. It's certainly still C or C++-like language. So essentially, at some point, it might be folded into C++ proper and just become part of a compiler. I don't know how exactly that will happen, because obviously, you know, in the modern world,

Starting point is 00:43:00 whenever Microsoft comes up with a standard, people are a bit apprehensive. They think that, you know know you've built this because i mean the c++ amp technology is currently reliant upon directx which is i mean great for windows not so great if you want to have widespread availability because directx is windows only so these are you know some of the hurdles that have to be overcome somehow i don't think anybody has the sort of silver bullet or the magic solution that will work on every class of device, but at least we're seeing new things.

Starting point is 00:43:34 And so I try to basically cover two out of three of these technologies on Pluralsight. Okay. So you came at all this with an interest in quantitative finance. If you're like an application or library developer and you're interested in paralyzing your code, where do you think would be a good place to start? Well, it depends on whether you have written code already or whether you're writing something from scratch

Starting point is 00:43:57 because I think you can sort of, if you're starting from the ground up, you have to ask yourself whether you're writing the library just kind of for your internal use or for external use. Because if you're like the authors of Eigen, for example, which is a very popular matrix manipulation library, then your best goal, and that's what the Eigen people are doing, your best goal is to basically target every kind of CPU support, every SIMD level that there is. And it's a huge task, effectively, to make a library which performs its best under all CPUs. But unfortunately, you would have to have this kind of stratified testing.

Starting point is 00:44:36 This problem, by the way, it's not just a CPU problem. The same happens on GPUs because GPUs get more and more modern. And so if you want to leverage all the best stuff, then it becomes a case of testing on all sorts of different devices. And that's why quite often you see programs like, let's say, MATLAB or Mathematica, they only support a cross-section of GPUs. They're honest. They're saying we're only supporting the latest

Starting point is 00:45:02 because it is extremely difficult to write code which is portable across you know a whole history of gpus across time right and what if you're working with an existing code base and we're trying to increase performance through parallelization well um guided auto parallelism is an interesting idea and that's something that intel is doing so essentially what they do is they kind of run your code, and they give you hints on possible locations where you might actually be able to parallelize things. Because, I mean, certainly, if you're doing, let's say, some data parallel stuff, and you have four loops all over the place, that's easy to detect. Unfortunately, in the real world, if you're working with, let's say, a tree-based structure, for example, it becomes a lot less obvious. And in this case, you would have to use your brain much more to figure out where the parallelism lies.

Starting point is 00:45:49 But once you figure that out, then you have plenty of choices for how you want to actually parallelize things. And certainly we're finally having thread support as part of C++ proper. But in addition, there are so many libraries out there. I would say that the pair of libraries, the Intel Threading Building Blocks and the Microsoft Power of Patterns library, their interface is almost identical. The difference is that the Intel's realization, it gives you a bit more in terms of data structures and whatnot. very basic level, you can try OpenMP, just marking up code and seeing how that goes. At the more advanced level, where, for example, you're using, let's say, a vector, you might have to turn that into some sort of concurrent, you know, hash map or something in order to get the parallelism, to get the correctness effectively. Because remember, STL constructs are not thread safe by default. So this is where, you know, if you start writing to a structure from multiple threads,

Starting point is 00:46:44 you might get all sorts of problems. So that requires analysis that requires time, unfortunately. So if you were to throw all these solutions as much as you could at your problems, SIMD and C++ AMP and whatever, what kind of like crazy performance improvements can have you seen? A hundredfold is not untold of essentially the the issue here is this some uh some algorithms are what we call embarrassingly parallel in the sense that you if you're not paralyzing them you should be embarrassed of yourself because it's really so obvious like like for example a typical matrix multiplication is something that you can uh speed up quite a bit however it's not really so straightforward for example there is example, there is still,

Starting point is 00:47:25 I would say, a fairly significant difference between single precision and double precision. And that's something that you would particularly feel on the GPU. So once again, you might jump from a CPU implementation to a GPU and not see such a massive increase because double precision on the GPU, depending on the class of the device, can still be not as great. But in terms of... It's very difficult to put a number. I would say that if you are just parallelizing multi-core and your algorithm is embarrassingly parallel, you should see a linear increase or more or less. However, if you get some sort of entanglement or at some point you have a barrier or you're waiting on something or I don't know

Starting point is 00:48:06 then that effectively reduces that part to a single core so it's difficult to kind of put a number there but at least you know on the CPU you can expect a linear increase more or less if you're a fairly straightforward data parallel

Starting point is 00:48:22 solution. On the GPU things are well you still get fairly significant increases, but it's very difficult to predict what they are because, remember, the performance of parallelization is dependent upon a rather large set of parameters related to the way that your invocation of the GPU's actual infrastructure is. Because when you fire up a GPU kernel, you give it some parameters for how you want to partition your data.

Starting point is 00:48:52 And interestingly enough, this partitioning has all sorts of weird effects. In fact, NVIDIA, they actually ship you an Excel spreadsheet where you can put in the size of these partitions, the size of your data, and they give you a graph where you can put the sort of optimum these partitions, the size of your data, and they give you a graph where you can put the sort of optimum point where you get maximum performance. And it's not a straightforward graph. It's not like a line where if you split it in this way, you get a linear increase.

Starting point is 00:49:15 It's a very jagged graph, which I guess unless you're an engineer, a GPU engineer, you're going to have a really hard time figuring out what the dependencies are. So GPU, I would say, it's fairly hard to realize again, for if we're talking about things like the Xeon Phi, you know, just coprocessors where you've got these 60 cores, and they're slow cores, they're like Pentium 4 class cores, but there are 60 of them, and they're sitting there just doing nothing kind of like a computer within your computer, then once again, because this device is effectively x86, despite the fact that you have to recompile, this device should also give you a straightforward increase because, I mean, it supports things like OpenMP and pthreads and whatever you throw at it. It even supports MPI, though, unfortunately, I have to admit, I didn't get it to work in a kind of uniform MPI setting.

Starting point is 00:50:06 The problem with this is that this device, even though it's a PCI card, it runs its own brand of Linux on one of the cores. It's effectively a computer that just feeds off your PCI line. So you can use it as a separate machine, which doesn't have much to do with your main machine. You can just peek at the data from time to time with SSH or something. Or you can do the offload mode where you have your program running on your ordinary machine. But some of your loops, for example, have these little pragmas. So we have pragmas again. And these pragmas say that, if possible, can you please offload this computation to the Xeon Phi. So that's an interesting model. But once again, it's very difficult to say what the performance increase will be, because I mean, it's a bit of

Starting point is 00:50:49 a black box, you have to use the Intel compiler, and they do some magic behind the scenes. But certainly, it's of benefit in terms of computation. And then again, keep in mind that you can plug in more than one device into your machine, you can in both a g you can plug in two gpus or two of these enfis after two devices it gets a bit non-linear in terms of performance especially if you're saturating pci bandwidth and it doesn't go so well but two devices seems to be uh the norm around here and it seems to be uh it seems to be adding the performance benefit and of course if you are kind of uh you know if you have tons of money to burn, and certainly financial institutions do have tons of money to burn,

Starting point is 00:51:30 then you can go the route of FPGAs and custom chips and all the rest of it, where you can get insane performance increases simply by the virtue that instead of leveraging the instruction set of a processor, an FPGA is essentially you designing your own processor, you doing parallelization intrinsically. So it's not parallelization where, you know, you have a multi-core device and you're spreading tasks or spreading kind of threads across or your operating system doing that. It's actually a piece of silicon effectively, which is inherently parallel. You feed it a data and four things happen physically at the same time. It's not just,

Starting point is 00:52:03 you know, an emulation. So in this case, it's a different ballgame. But unfortunately, development for these classes of devices is extremely expensive. The tools are fairly immature, I have to say. And it's something that is available for certain classes of tasks. And certainly, the financial industry is using it for feed handlers and for essentially once something comes over the wire from the stock exchange for example to process this data faster than your competitors instead of it going through an ordinary you know cpu and ram it goes through this specialized piece of hardware so uh that that provides you you know it might be like a microsecond level advantage but it's worth it if there is big money on the line. So people do that, but

Starting point is 00:52:45 the development costs are huge, unfortunately. That's amazing. Yeah. So is there anything else you wanted to go over before we let you go, Dimitri? Gosh, I don't know. I wanted to talk more about the C++ stuff because we just had a release and

Starting point is 00:53:01 I thought there would be questions like, what have you done that's new since the last version? I really haven't prepared the answer for that. You see, it's actually funny, because I mean, I could go off listing the features here. But in actual fact, like number one feature, we now open Unreal Engine in under 40 seconds. That's, I mean, a lot of people.

Starting point is 00:53:21 We took a lot of flack for taking a huge amount. I saw some of those comments, yeah. Yeah, you have to be honest, though. It is a massive solution. And like I said, because we have to do it correctly, we have to get each of the files and process it and make sure that we're treating everything fairly. But now we're actually doing it in reasonable time,

Starting point is 00:53:42 and hopefully this would sort of tone down the rhetoric on the internet regarding this particular thing. So for comparison, how long did it used to take, if it takes 40 seconds now? Hard to say, but

Starting point is 00:53:59 the first run could have taken maybe half an hour, something like that. Oh, okay, that's quite the improvement. It was an unreasonable amount of time now it's kind of tolerable and of course you only pay this price on the first uh on first startup now so whenever you open the solution again it's gonna be fairly quick because we kind of we we cash everything but apart from that we actually uh did other things. We are now correctly supporting the C language. And I know people are kind of like, oh God, not C, you're taking us even further back in time.

Starting point is 00:54:33 But honestly, it's something that we felt we had to do and do it well. And in addition, as always, there are analysis features and cogeneration features. Actually, cogeneration is the thing that quite often gets ignored in the discussions of tools and whatnot. I'm a huge fan, actually, of code generation. I think that if I want a constructor which initializes my fields, I should get it in a fraction of a second, not, you know, writing it with all the correct types being passed.

Starting point is 00:55:01 Because, well, it's the whole point of IDEs, the whole point of you paying your CPU time and your RAM time is that you get it back in terms of productivity. So CodeGen is kind of a personal favorite of mine, I would say.

Starting point is 00:55:17 It also solves that reverse iteration problem because why have compiler support for it when you can just generate a chunk of code, which, yes, it does go backwards through an iteration variable of some kind. But, I mean, so what? I mean, you just have, instead of having a built be kind of familiar with this whole generated code thing because macros, because like when you use something like Google Test, you're essentially manufacturing these huge tons of C++ anyway. You're just not seeing it. So you feel kind of insulated and you think, you know, all this cozy Google Test, nothing magical is happening.

Starting point is 00:56:00 Whereas behind the scenes, you have like classes full of functions in them and you're totally... Ignorance is bliss in this particular regard. I wouldn't say overall, but in this regard it's good to just get the final result and start using it. I'm looking at the CLion 1.2 blog post and it does mention

Starting point is 00:56:19 Google Test support. Is there anything else you wanted to mention there? Google Test, yeah. Google Test actually appeared in VShopper c++ before it appeared in c line and obviously the implementations are somewhat different because the uis are different uh visual studio and resharper c++ or they have their own user story in terms of unit testing because from the early days of resharper we were providing our own test runner. And well, I would say it's a much nicer test runner and it can do a lot of things. For example, in the latest version, we provide continuous testing. And I mean, not many people out there provide

Starting point is 00:56:56 continuous testing. I'll just explain for the listeners that the whole business of continuous testing is that you rerun tests whenever either somebody saves a file or somebody builds a project. But the key thing about continuous testing is you only rerun those tests which are actually affected by the code that you changed. And this is a rocket science problem. This is an extremely difficult problem. It's not something that you can just, you know, do within a month or two. So it took us, I would say, maybe two years to get this. And we're talking about.NET now. I don't know what the situation is with C++ because it's even harder to get it in this regard.

Starting point is 00:57:38 But it's a rocket science problem that I think we cracked to some degree. And even though it looks very simple, it's a fairly simple interface, and we actually like to keep it simple and straightforward. What's happening behind the scenes is that we're essentially leveraging code coverage analysis, but instead of just telling you what tests affected what parts of your code, what we're doing is we're looking at how your changes,

Starting point is 00:58:01 your recent changes when you press Ctrl-S to save a file, how those changes propagated across all the other files, how those files affected the test that you wrote, and then we're not rerunning your whole test suite. And test suites, some people have huge test suites. We are rerunning only the parts which have actually been affected.

Starting point is 00:58:20 So that's one of the really cool ReSharper Ultimate 10 features. Where can people find you online, Dimitri, if they want to find more of your info? So that's one of the really cool ReSharper Ultimate 10 features. Okay. Well, where can people find you online, Dimitri, if they want to find more of your info? All right. So I have a Twitter. It's dnesterook.

Starting point is 00:58:35 I also, well, I have a couple of courses on Pluralsight, like you mentioned, so you can just Google Pluralsight and Nestorook, and you'll find me. In addition, JetBrains' youtube channel has lots of my videos both obviously product overview videos as well as uh webinars because sometimes we do webinars where we just talk about you know whatever the hell we like basically so i think uh some of my recent webinars were things on uh what talks on things like generative art, for example. So, you know, all sorts of experiments with C++ and the way that you can fill it.

Starting point is 00:59:10 Once again, the generative art webinar, by the way, is interesting because it actually used all those HPC practices that were mentioned here on the podcast today. And also, it's actually part of the demo on the plural side course which i thought was really neat because you know generative art you you show people actual pictures of stuff and then you say by the way co-generate our generation of those pictures is really slow so let's look at 10 ways of optimizing that and mentioning webinars you also did one on like kind of modern c++ in general right yes indeed and and this was uh one of the more popular webinars although there are some things which uh didn't fit into the one hour time slot but yeah it was uh very popular although i i get the feeling that i would have to be redoing this this one regularly because obviously as new

Starting point is 00:59:56 stuff comes out you have to kind of update your samples and update the code and whatever so yeah and and it was actually a lot of... I got a lot of unpleasant surprises during the preparation of this webinar because it turned out that the compiler I was using, which I use the Intel compiler, which, well, it's a commercial compiler. You have to pay money for it. You expect top-notch performance.

Starting point is 01:00:19 What you don't expect is to be lagging quite a bit behind other compilers in terms of standards. And unfortunately, it does lag unfortunately the intel compiler is kind of it's uh it's not for everyone in the sense that one it doesn't support uh c++ to the same language level it's not as fast in terms of getting you uh you know automatic uh return type deduction for example but on the other hand it also has like really insane errors where sometimes you write a piece of valid C++ code and instead of, you know, you don't even get decent output, you just get something like error four.

Starting point is 01:00:52 And you have to take that error four with the example, send it to Intel, and they tell you, okay, this will be fixed in the next update. So it's actually quite, I mean, I wouldn't really recommend it to people unless they're doing scientific computing because in the scientific computing domain, Intel, they have their own MPI implementation. In addition, they ship lots of algorithms as libraries. And those algorithms, it's interesting, they're not just optimized for multicore, because I mean, you'd expect that from Intel to optimize

Starting point is 01:01:18 for their own CPUs. But in addition, some of the algorithms actually leverage MPI, meaning that if you want to do like a fast Fourier transform, and you want to do it quickly, then they would use their own MPI infrastructure. So if you've got like 100 machines prepped for this sort of thing, then you don't really have to write your own MPI code, you just do an invocation. And that invocation gets spread across the entire network, which is fantastic. I think that's the whole point of having algorithms that behave in this kind of transparent manner.

Starting point is 01:01:49 Of course, if I tried to do it by hand, it would take a really long time to implement all of this. Right. Okay. Well, thank you so much for your time, Dimitri. All right. Thanks a lot for having me over.

Starting point is 01:02:00 Thanks. Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you can follow CppCast on Twitter and like CppCast on Facebook. And of course, you can find all that info and the show notes on the

CppCast - High Performance Computing

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.